一日一钱,千日一千。绳锯木断,水滴石穿。

HashMap 1.8 源码简读

Posted on By TinyVampirePudge

HashMap 1.8 源码简读

初始大小:默认为16

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

如果指定了大小,则会在初始化的时候将初始容量大小设置为大于等于指定初始值、且最小的2的整数次幂。

原理: ①在HashMap的构造方法HashMap(int initialCapacity, float loadFactor)HashMap(int initialCapacity)中,会通过tableSizeFor方法将threshold的值置为2的整数次幂。 ②在第一次调用put(K key, V value)方法添加元素时,会调用resize()方法来初始化table,而在resize()方法中,此时初始容量会被threshold的值代替。

关键代码如下所示:

public HashMap(int initialCapacity, float loadFactor) {
    ...
    this.threshold = tableSizeFor(initialCapacity);
}

请看resize()方法中的这行注释:initial capacity was placed in threshold

final Node<K,V>[] resize() {
    ...
    if (oldCap > 0) {
        ...
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        ...
    }
    ...
}

如上所示,HashMap在指定了初始化容量之后,初始容器的大小会变为大于等于指定初始值、且最小的2的整数次幂。

tableSizeFor方法如下所示:

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

返回大于等于输入参数且最近的2的整数次幂的数. 测试结果如下:

tableSizeFor(0):1
tableSizeFor(1):1
tableSizeFor(2):2
tableSizeFor(3):4
tableSizeFor(4):4
tableSizeFor(5):8
tableSizeFor(6):8
tableSizeFor(7):8
tableSizeFor(8):8
tableSizeFor(9):16

装载因子和动态扩容:

装载因子:默认为0.75,可在HashMap初始化时设置。

/**
 * The load factor used when none specified in constructor.
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

扩容规则是新的容量是原来容量的两倍。 HashMap中table数组的初始化和动态扩容的方法是resize(),关于扩容容量的代码如下: 注意这行代码:newCap = oldCap << 1

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    ...
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        ...
    else { // zero initial threshold signifies using defaults
        ...
    }
    ...
}
触发扩容操作的阈值:threshold

计算规则:threshold = capacity * loadFactor 变量定义如下:

/**
 * The next size value at which to resize (capacity * load factor).
 *
 * @serial
 */
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;

扩容触发条件:数据数量大于阈值时。 代码在resize()方法中:

if (++size > threshold)
    resize();

触发扩容操作以后,会在resize()方法中进行数据迁移。是一次性数据迁移,细节请看源码。

散列冲突解决方法:链表法加红黑树

链表转化成红黑树,红黑树转化成链表。 1、链表转化成红黑树: HashMap#putVal方法中,当在该角标下,原有存储的链表结点个数大于等于7的时候,就将链表转化为红黑树。具体方法代码如下:

if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
    treeifyBin(tab, hash);

具体的转换过程请看treeifyBin方法。

2、红黑树转化成链表: 当冲突以红黑树形态情况下,进行扩充时,将树转成两棵树,若树的的结点数小于等于UNTREEIFY_THRESHOLD,则转为链表形式。 细节请看split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit)方法。

HashMap中的散列函数

hash方法如下:

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

在插入或查找的时候,计算key被映射到的位置

int index = hash(key) & (capacity - 1)

JDK HashMap中hash函数的设计,确实很巧妙:

首先hashcode本身是个32位整型值,这个值对于不同的对象必须保证唯一(Java规范),这也是大家常说的,重写equals必须重写hashcode的重要原因。

获取对象的hashcode以后,先进行移位运算,然后再和自己做异或运算,及hashcode ^ (hashcode »> 16),这一步甚是巧妙,是将高16位移到低16位,这样计算出来的整型值将“具有”高位和低位的性质。

最后,用hash表当前的容量减去一,再和刚刚计算出来的整型值做位与运算。进行位与运算,很好理解,是为了计算出数组中的位置。但这里有个问题:为什么要用容量减去一? 因为 A % B = A & (B - 1),所以(h ^ (h >>> 16)) & (capitity -1) = (h ^ (h >>> 16)) % capitity,可以看出这里本质上是使用了[除留余数法]。

综上,可以看出,hashcode的随机性,加上移位异或运算,得到一个非常随机的hash值,再通过[除留余数法],得到index,整体的设计过程和“散列函数”设计原则非常吻合。

参考:

https://time.geekbang.org/column/article/64586 的第一条评论

Java8 HashMap之tableSizeFor

散列表——维基百科

Java8-HashMap 详解