HashMap 1.8 源码简读

HashMap 1.8 源码简读

初始大小:默认为16

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

如果指定了大小，则会在初始化的时候将初始容量大小设置为大于等于指定初始值、且最小的2的整数次幂。

原理： ①在HashMap的构造方法HashMap(int initialCapacity, float loadFactor)或HashMap(int initialCapacity)中，会通过tableSizeFor方法将threshold的值置为2的整数次幂。 ②在第一次调用put(K key, V value)方法添加元素时，会调用resize()方法来初始化table，而在resize()方法中，此时初始容量会被threshold的值代替。

关键代码如下所示：

public HashMap(int initialCapacity, float loadFactor) {
    ...
    this.threshold = tableSizeFor(initialCapacity);
}

请看resize()方法中的这行注释：initial capacity was placed in threshold

final Node<K,V>[] resize() {
    ...
    if (oldCap > 0) {
        ...
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        ...
    }
    ...
}

如上所示，HashMap在指定了初始化容量之后，初始容器的大小会变为大于等于指定初始值、且最小的2的整数次幂。

tableSizeFor方法如下所示：

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

返回大于等于输入参数且最近的2的整数次幂的数. 测试结果如下：

tableSizeFor(0):1
tableSizeFor(1):1
tableSizeFor(2):2
tableSizeFor(3):4
tableSizeFor(4):4
tableSizeFor(5):8
tableSizeFor(6):8
tableSizeFor(7):8
tableSizeFor(8):8
tableSizeFor(9):16

装载因子和动态扩容：

装载因子：默认为0.75，可在HashMap初始化时设置。

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

扩容规则是新的容量是原来容量的两倍。 HashMap中table数组的初始化和动态扩容的方法是resize()，关于扩容容量的代码如下：注意这行代码：newCap = oldCap << 1

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    ...
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        ...
    else { // zero initial threshold signifies using defaults
        ...
    }
    ...
}

触发扩容操作的阈值：threshold

计算规则：threshold = capacity * loadFactor 变量定义如下：

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial
 */
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;

扩容触发条件：数据数量大于阈值时。代码在resize()方法中：

if (++size > threshold)
    resize();

触发扩容操作以后，会在resize()方法中进行数据迁移。是一次性数据迁移，细节请看源码。

散列冲突解决方法：链表法加红黑树

链表转化成红黑树，红黑树转化成链表。 1、链表转化成红黑树： HashMap#putVal方法中，当在该角标下，原有存储的链表结点个数大于等于7的时候，就将链表转化为红黑树。具体方法代码如下：

if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
    treeifyBin(tab, hash);

具体的转换过程请看treeifyBin方法。

2、红黑树转化成链表：当冲突以红黑树形态情况下，进行扩充时，将树转成两棵树，若树的的结点数小于等于UNTREEIFY_THRESHOLD，则转为链表形式。细节请看split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit)方法。

HashMap中的散列函数

hash方法如下：

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

在插入或查找的时候，计算key被映射到的位置

int index = hash(key) & (capacity - 1)

JDK HashMap中hash函数的设计，确实很巧妙：

首先hashcode本身是个32位整型值，这个值对于不同的对象必须保证唯一（Java规范），这也是大家常说的，重写equals必须重写hashcode的重要原因。

获取对象的hashcode以后，先进行移位运算，然后再和自己做异或运算，及hashcode ^ (hashcode »> 16)，这一步甚是巧妙，是将高16位移到低16位，这样计算出来的整型值将“具有”高位和低位的性质。

最后，用hash表当前的容量减去一，再和刚刚计算出来的整型值做位与运算。进行位与运算，很好理解，是为了计算出数组中的位置。但这里有个问题：为什么要用容量减去一？ 因为 A % B = A & (B - 1)，所以(h ^ (h >>> 16)) & (capitity -1) = (h ^ (h >>> 16)) % capitity，可以看出这里本质上是使用了[除留余数法]。

综上，可以看出，hashcode的随机性，加上移位异或运算，得到一个非常随机的hash值，再通过[除留余数法]，得到index，整体的设计过程和“散列函数”设计原则非常吻合。

参考：

https://time.geekbang.org/column/article/64586 的第一条评论

Java8 HashMap之tableSizeFor

散列表——维基百科

Java8-HashMap 详解

一日一钱，千日一千。绳锯木断，水滴石穿。

目录