哈希表 (Hash Table)
哈希表 (Hash Tables)
哈希表是一种通过哈希函数将键映射到值的数据结构,提供近似O(1)的查找、插入和删除操作。
基础概念
哈希表原理
键值对存储:
key1 ----→ hash(key1) = 4 ----→ value1
↓
key2 ----→ hash(key2) = 2 ----→ value2
↓
key3 ----→ hash(key3) = 4 ----→ value3 (链式解决冲突)
哈希函数
常见的哈希函数设计:
1. 除法哈希:
hash(key) = key % table_size
2. 乘法哈希:
hash(key) = floor(table_size * (key * A % 1))
其中A为常数(≈0.618)
3. 字符串哈希:
hash = 0
for char in string:
hash = (hash * 31 + char) % table_size
冲突解决
- 链式法(Chaining)
[0] -> null
[1] -> key1 -> key5
[2] -> key2
[3] -> key3 -> key6 -> key7
[4] -> key4
- 开放寻址法(Open Addressing)
线性探测:
如果位置i被占用,尝试i+1, i+2, ...
二次探测:
如果位置i被占用,尝试i+1², i+2², ...
双重哈希:
如果位置h1(key)被占用,尝试h1(key) + i*h2(key)
经典问题
1. 两数之和
使用哈希表优化查找:
目标和: 9
数组: [2, 7, 11, 15]
哈希表构建过程:
步骤1: {2: 0} 寻找 9-2=7
步骤2: {2: 0, 7: 1} 找到 7,返回 [0,1]
def twoSum(nums, target):
seen = {}
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []
2. LRU缓存
哈希表+双向链表实现:
哈希表: {key: ListNode}
双向链表: head <-> node1 <-> node2 <-> tail
get(key2): put(key4):
before: before:
key1 <-> key2 <-> key3 key1 <-> key2 <-> key3
after: after:
key1 <-> key3 <-> key2 key2 <-> key3 <-> key4
class Node:
def __init__(self, key=0, val=0):
self.key = key
self.val = val
self.prev = None
self.next = None
class LRUCache:
def __init__(self, capacity):
self.cache = {}
self.capacity = capacity
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def get(self, key):
if key in self.cache:
node = self.cache[key]
self._remove(node)
self._add(node)
return node.val
return -1
def put(self, key, value):
if key in self.cache:
self._remove(self.cache[key])
node = Node(key, value)
self._add(node)
self.cache[key] = node
if len(self.cache) > self.capacity:
lru = self.head.next
self._remove(lru)
del self.cache[lru.key]
def _remove(self, node):
node.prev.next = node.next
node.next.prev = node.prev
def _add(self, node):
prev = self.tail.prev
prev.next = node
self.tail.prev = node
node.prev = prev
node.next = self.tail
3. 字母异位词分组
使用排序字符串作为哈希键:
输入: ["eat","tea","tan","ate","nat","bat"]
哈希表构建:
"aet": ["eat", "tea", "ate"]
"ant": ["tan", "nat"]
"abt": ["bat"]
输出: [["eat","tea","ate"], ["tan","nat"], ["bat"]]
def groupAnagrams(strs):
groups = {}
for s in strs:
key = ''.join(sorted(s))
groups.setdefault(key, []).append(s)
return list(groups.values())
4. 最长连续序列
LeetCode 128 - Longest Consecutive Sequence
使用哈希集合查找连续数字:
输入: [100,4,200,1,3,2]
哈希集合: {1,2,3,4,100,200}
查找过程:
1 -> 2 -> 3 -> 4 (长度=4)
100 (长度=1)
200 (长度=1)
返回: 4
def longestConsecutive(nums):
num_set = set(nums)
max_length = 0
for num in num_set:
if num - 1 not in num_set: # 序列的起点
current = num
current_length = 1
while current + 1 in num_set:
current += 1
current_length += 1
max_length = max(max_length, current_length)
return max_length
高级应用
1. 布隆过滤器
布隆过滤器结构:
key --→ hash1(key) --→ bit array
--→ hash2(key) --→ [0 1 1 0 1 0 1]
--→ hash3(key) --→
用途:
- 大规模数据去重
- 缓存穿透预防
- 网页URL去重
2. 一致性哈希
哈希环:
server1
↑
key1 ←-----→ server2
↓
server3
应用:
- 分布式缓存
- 负载均衡
- 数据分片
3. 计数器
Count-Min Sketch:
key --→ hash1(key) --→ [5 2 8 1 3]
--→ hash2(key) --→ [1 4 2 6 3]
--→ hash3(key) --→ [7 2 4 1 9]
应用:
- 流量统计
- 频率计数
- 热点检测
常见技巧
- 使用哈希表优化查找
- 设计合适的哈希键
- 处理哈希冲突
- 选择合适的哈希函数
- 权衡空间和时间
相关题目
- LeetCode 3 - Longest Substring Without Repeating Characters
- LeetCode 136 - Single Number
- LeetCode 187 - Repeated DNA Sequences
- LeetCode 560 - Subarray Sum Equals K
- LeetCode 706 - Design HashMap