Redis数据结构

字典

dictht 是一个散列表结构，使用拉链法解决哈希冲突。

Redis 的字典 dict 中包含两个哈希表 dictht，这是为了方便进行 rehash 操作。在扩容时，将其中一个 dictht 上的键值对 rehash 到另一个 dictht 上面，完成之后释放空间并交换两个 dictht 的角色。

类比于 JVM 新生代的 S0 和 S1区（和 Eden 区共同构成新生代）。

字典扩容

为什么需要扩容？
- 字典扩容需要同时满足如下两个条件：
  - 1、当前没有子进程在执行aof文件重写或者生成RDB文件；
  - 2、或者保存的节点数与哈希表大小的比例超过了安全阈值（默认值为5）
扩容方式：渐进式扩容（incremental rehashing）
- 优点：rehash 操作不是一次性完成，而是采用渐进方式，这是为了避免一次性执行过多的 rehash 操作给服务器带来过大的负担。

rehash

/* Performs N steps of incremental rehashing. Returns 1 if there are still
 * keys to move from the old to the new hash table, otherwise 0 is returned.
 *
 * Note that a rehashing step consists in moving a bucket (that may have more
 * than one key as we use chaining) from the old to the new hash table, however
 * since part of the hash table may be composed of empty spaces, it is not
 * guaranteed that this function will rehash even a single bucket, since it
 * will visit at max N*10 empty buckets in total, otherwise the amount of
 * work it does would be unbound and the function may block for a long time. */
int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;

    while(n-- && d->ht[0].used != 0) {
        dictEntry *de, *nextde;

        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(d->ht[0].size > (unsigned long)d->rehashidx);
        while(d->ht[0].table[d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht[0].table[d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            uint64_t h;

            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & d->ht[1].sizemask;
            de->next = d->ht[1].table[h];
            d->ht[1].table[h] = de;
            d->ht[0].used--;
            d->ht[1].used++;
            de = nextde;
        }
        d->ht[0].table[d->rehashidx] = NULL;
        d->rehashidx++;
    }

    /* Check if we already rehashed the whole table... */
    if (d->ht[0].used == 0) {
        zfree(d->ht[0].table);
        d->ht[0] = d->ht[1];
        _dictReset(&d->ht[1]);
        d->rehashidx = -1;
        return 0;
    }

    /* More to rehash... */
    return 1;
}

rehash是以 bucket(桶) 为基本单位进行渐进式的数据迁移的，每步完成一个bucket的迁移，直至所有数据迁移完毕。一个bucket对应哈希表数组中的一条entry链表。新版本的dictRehash()还加入了一个最大访问空桶数(empty_visits)的限制来进一步减小可能引起阻塞的时间。

渐进式哈希过程：

渐进式 rehash 通过记录 dict 的 rehashidx 完成，它从 0 开始，然后每执行一次 rehash 都会递增。例如在一次
rehash 中，要把 dict[0] rehash 到 dict[1]，这一次会把 dict[0] 上 table[rehashidx] 的键值对 rehash 到 dict[1]
上，dict[0] 的 table[rehashidx] 指向 null，并令 rehashidx++。

渐进式哈希的精髓在于：数据的迁移不是一次性完成的，而是可以通过 dictRehash() 这个函数分步规划的，并且调用方可以及时知道是否需要继续进行渐进式哈希操作。如果dict数据结构中存储了海量的数据，那么一次性迁移势必带来redis性能的下降，别忘了redis是单线程模型，在实时性要求高的场景下这可能是致命的。而渐进式哈希则将这种代价可控地分摊了，调用方可以在dict做插入、删除、查找或更新的时候执行dictRehash()，最小化数据迁移的代价。

在迁移的过程中，数据是在新表还是旧表中并不是一个非常急迫的需求，迁移的过程并不会丢失数据，在旧表中找不到再到新表中寻找就是了。

补充：哈希冲突及解决办法

什么是哈希冲突？
- 哈希是通过对数据进行再压缩，提高效率的一种解决方法。但由于通过哈希函数产生的哈希值是有限的，而 数据可能比较多，导致经过哈希函数处理后仍然有不同的数据（键值）对应相同的哈希值。这时候就产生了哈希冲突。
解决哈希冲突的办法
- 1、拉链法
- 2、开放地址法
- 3、再散列法
- 4、建立公共溢出区

拉链法实现步骤：

得到一个 key
计算 key 的 hashValue
根据 hashValue 值定位到 data[hashValue] 。( data[hashValue] 是一条链表)
若 data[hashValue] 为空则直接插入。
不然则添加到链表末尾。

拉链法示意图

开放地址法有个非常关键的特征，就是所有输入的元素全部存放在哈希表里，也就是说，位桶的实现是不需要任何的链表来实现的，换句话说，也就是这个哈希表的装载因子不会超过1。它的实现是在插入一个元素的时候，先通过哈希函数进行判断，若是发生哈希冲突，就以当前地址为基准，根据再寻址的方法（探查序列），去寻找下一个地址，若发生冲突再去寻找，直至找到一个为空的地址为止。

再散列法其实很简单，就是再使用哈希函数去散列一个输入的时候，输出是同一个位置就再次散列，直至不发生冲突位置.

缺点：每次冲突都要重新散列，计算时间增加。

再散列法