Colliding PHP arrays

PHP Arrays. Credit  Loulou13

PHP arrays are a unicorn, as in they’re actually hash maps behind the scenes.
The fact that hash map implementations are vulnerable to collision denial of service attacks is old news and it was a common problem for multiple programming languages. PHP even added configuration to mitigate the problem back in the PHP 5.3.9 days.

However, after seeing the following entry in logs again and again:

PHP Warning: Unknown: Input variables exceeded 1000. To increase the limit change max_input_vars in php.ini

I wanted to get a better understanding of what the ‘would be’ attackers are doing, so todays topic is PHP arrays and how exactly are they implemented.

Hash maps

We know from the above picture that PHP arrays are implemented as hash maps. How do they work? Hash maps are a data structure that maps keys to values.

Here’s a high level diagram:

We start with a string key, apply a hash function and get the position of the element in an array of “buckets” where we store the actual value. A good hash function should map inputs as evenly as possible over its outputs to avoid collisions.

Collisions

However, as the number of inputs is infinite while the backing array of buckets size is finite, collisions will happen. In the above diagram both “a” and “b” keys are mapped to bucket index 1. Hash map implementations then need to have a mechanism for collision resolution. Several procedures for collision resolution are known, but the primary ones are open addressing and separate chaining, each with specific trade-offs.

Open addressing inserts the element at a different index if collisions occur while separate chaining stores a linked list of elements that map to the same index. For reasons that are out of scope for the current article, PHP chose the later approach:

This overly simplifies the actual PHP implementations, which optimises for returning elements in insertion order and easier deletion, but the point still stands. Elements are inserted in some sort of linked list, and if a collision happens the additional elements for a specific index are appended to the list.

The complexity of inserting an element into a PHP array is for the best case O(1). Given that the hash function should distribute keys evenly, the average complexity is also O(1). This makes inserting into PHP arrays fast.

However, what about the worst case? Based on the diagram, the worst case would be if all string keys are mapped to the same bucket index. That would mean that insertion, instead of being a O(1) operation now has O(n) complexity as we actually have to iterate through the linked list and insert the element.

At this point you might wonder why O(n) ? Why do we need to iterate through the list and not just store the pointer to the last element? The reason is that for each element where a key collision happens, we store both its original key and value. When setting $a["b"] we need to make sure we didn’t store a value for “b” before, and so we go through the whole linked list and compare its original key with each element in the list.

To further explain this, let’s take the diagram above. We inserted the value for string key “a” in the linked list at index 1. When inserting the value for string key “b”, the hash function maps it to index 1, and we can find the linked list there. We now need to make sure we didn’t already store a value for string key “b”, and so we go through the list and compare the original keys of the elements in the list with the key “b”. We did not find any element so we append “b” at the end of the list.

Triggering collision in practice

With the boring stuff out of the way, let’s talk about what this means in practice. How exactly can we cause full collisions, and how much do they impact performance?

We said that collisions happen when the hash function ends up mapping multiple inputs to the same output index. For the purpose of our attack we’d like to make the hash function map as many inputs to the same index, to increase the time complexity as much as possible.

It turns out PHP arrays actually use 2 different hash functions depending on the array key type.

Integer keys

If the key is an integer, PHP will try to use it directly, to preserve space. However, given that the backing array size is finite, PHP will mask the key to make sure it will fits into the backing array.

PHP arrays have a size that’s always a power of 2, to keep thing reasonably efficient in time and space. The size of an array starts at a default of 8 elements and when full it doubles up in size.
If you store 9 elements, the size of the array is 16, if you store 31 elements the size is 32 and so on.
When inserting an integer key into the array, the mask applied depends on the size:

mask = arraySize – 1

If we take 32 which is 100000 in binary, the mask then becomes 011111.
Because the size is always a power of 2, this means that as long as:

insertedKey % arraySize

is constant, we can cause all elements to collide and constantly trigger the worst case scenario.

<?php

$size = pow(2,15);
$start = microtime(true);
$key = 123;
for($i=0; $i<($size);$i++) {
    $ints[$key] = 0;
    $key += $size;
}
$end = microtime(true);
echo "Inserting $size evil elements took " . number_format($end - $start, 10) . " seconds \n";

$start = microtime(true);
$key = 123;
for($i=0; $i<($size);$i++) {
    $ints2[$key] = 0;
    $key += $size+1;
}
$end = microtime(true);
echo "Inserting $size good elements took " . number_format($end - $start, 10) . " seconds \n";

Let’s discuss the above code sample a bit. We’re building an array of 32768 elements ( 2 ^ 15). In the first part of the example we start with a random key (123) and always increment it by the size of the array to keep key % size constant. Just as a control for our test, in the second part of the example we also start with a random key and increment it by size +1 which in turn makes key % size not constant anymore. Here are some results on my machine:

That is quite the difference. As the array size increases (let’s say 2^ 17), because the number of elements to traverse in the list is quadratic, the results get worse and worse:

20 seconds!

Cool! it seems we can collide integer keys! How about strings?

String keys

It turns out, we can also cause collisions for string keys. The hash function PHP uses for string keys is DJBX33A, one of the simplest and fastest string hash functions. However, it turns out to be quite easy to collide. If we take a 2 characters string, increment the first character by one and decrement the second by 33 we should be able to cause a collision.

Testing this with a crude PHP implementation of DJBX33A :

<?php

function DJBX33A($key) {
    $hash = 5381;
    for($i=0; $i<strlen($key); $i++) {
        $hash = $hash * 33 + ord($key[$i]);
    }
    
    return $hash;
}

echo DJBX33A("ab") . "\n"; // 5863208
echo DJBX33A("Ex") . "\n"; // 5862306
echo DJBX33A("FW") . "\n"; // 5862306 -> collision

We can easily find pairs of 2 characters that collide. Another interesting property of DJBX33A is that if we start with 2 colliding strings and concatenate them in every possible way, the results will still collide. How would this look in PHP?

<?php

$size = pow(2, $length = 15);
$initial_colliding = [
    'Ex',
    'FW',
];
$initial_size = count($initial_colliding);
$start = microtime(true);

for($i=0; $i<($size);$i++) {
     $s = str_pad(base_convert($i, 10, $initial_size), $length, '0', STR_PAD_LEFT);
     $strings1[strtr($s, $initial_colliding)] = 0;
}
$end = microtime(true);
echo "Inserting $size evil elements took " . number_format($end - $start, 10) . " seconds \n";

$start = microtime(true);
$key = 'aaaaaaaaaaaaaaaaaaaaaaaaa';
for($i=0; $i<($size);$i++) {
    $strings2[$key] = 0;
    $key++;
}
$end = microtime(true);
echo "Inserting $size good elements took " . number_format($end - $start, 10) . " seconds \n";

The trick used to generate the elements is as follows: we first generate the binary representation of $i and pad it to the required size. We then use strtr to convert every 0 to Ex and every 1 to FW. This will give us concatenated strings which should collide to the same key.

Running the code above on my machine:

Looks like it works! Let’s bump up the number of elements in the array to 2^17:

Yikes!

That kept a processor core at 100% for close to 120 seconds! I can see how this turns into a denial of service attack.

What does this have to do with the log message we started with? Given that we can generate the problematic keys, we can also send them via GET or POST to an unsuspecting PHP based website. $_POST, $_GET and $_COOKIE are just arrays, and so the server will be kept busy for quite a while trying to load up the data that we send. In practice the web server will intervene and cut-off the request, however we can just send more requests :).

To protect against such an attack PHP introduced the max_input_vars config. Having a default value of 1000 should be more than enough for most applications while also protecting against this DoS attack as the impact of 1000 collisions is quite low.

max_input_vars however does not cover json_decode so as a take-away from this post, be careful when decoding user provided JSON.

One question remains: Why do we need max_input_vars. Can’t we just fix arrays? We just need to replace the hash function with something that’s not easy to collide right?

Unfortunately, it’s not that simple. As PHP arrays are the most used data structure of the language, the performance impact of switching the hash function was deemed too high.

Closing words

The goal of the article is not to expose a new vulnerability, this issue is known for many many years. However, in the wild, people are still trying to attack websites using this vulnerability and I was curious as to what exactly they are doing.

There are many better articles out there on this topic, such as this one from nikic, but hopefully I was able to explain the reasoning behind the problem and you’re now more aware of a potential issue.

Follow me on Twitter @sdnunca for more content.

4 thoughts on “Colliding PHP arrays

    1. Just ran a comparison locally for the integer version for 7.4 and 8.0. But I see no meaningful difference.
      Ran on a Ryzen 3700x on default clock speed.

      $ ./colliding-php
      Version: 7.4.15
      Inserting 262144 evil elements took 33.701657056808 seconds
      Inserting 262144 good elements took 0.0043520927429199 seconds
      $ ./colliding-php
      Version: 8.0.2
      Inserting 262144 evil elements took 34.048852205276 seconds
      Inserting 262144 good elements took 0.0044040679931641 seconds

      Like

Leave a comment