Colliding PHP arrays

PHP Arrays. Credit  Loulou13

PHP arrays are a unicorn, as in they’re actually hash maps behind the scenes.
The fact that hash map implementations are vulnerable to collision denial of service attacks is old news and it was a common problem for multiple programming languages. PHP even added configuration to mitigate the problem back in the PHP 5.3.9 days.

However, after seeing the following entry in logs again and again:

PHP Warning: Unknown: Input variables exceeded 1000. To increase the limit change max_input_vars in php.ini

I wanted to get a better understanding of what the ‘would be’ attackers are doing, so todays topic is PHP arrays and how exactly are they implemented.

Hash maps

We know from the above picture that PHP arrays are implemented as hash maps. How do they work? Hash maps are a data structure that maps keys to values.

Here’s a high level diagram:

We start with a string key, apply a hash function and get the position of the element in an array of “buckets” where we store the actual value. A good hash function should map inputs as evenly as possible over its outputs to avoid collisions.

Collisions

However, as the number of inputs is infinite while the backing array of buckets size is finite, collisions will happen. In the above diagram both “a” and “b” keys are mapped to bucket index 1. Hash map implementations then need to have a mechanism for collision resolution. Several procedures for collision resolution are known, but the primary ones are open addressing and separate chaining, each with specific trade-offs.

Open addressing inserts the element at a different index if collisions occur while separate chaining stores a linked list of elements that map to the same index. For reasons that are out of scope for the current article, PHP chose the later approach:

This overly simplifies the actual PHP implementations, which optimises for returning elements in insertion order and easier deletion, but the point still stands. Elements are inserted in some sort of linked list, and if a collision happens the additional elements for a specific index are appended to the list.

The complexity of inserting an element into a PHP array is for the best case O(1). Given that the hash function should distribute keys evenly, the average complexity is also O(1). This makes inserting into PHP arrays fast.

However, what about the worst case? Based on the diagram, the worst case would be if all string keys are mapped to the same bucket index. That would mean that insertion, instead of being a O(1) operation now has O(n) complexity as we actually have to iterate through the linked list and insert the element.

At this point you might wonder why O(n) ? Why do we need to iterate through the list and not just store the pointer to the last element? The reason is that for each element where a key collision happens, we store both its original key and value. When setting $a["b"] we need to make sure we didn’t store a value for “b” before, and so we go through the whole linked list and compare its original key with each element in the list.

To further explain this, let’s take the diagram above. We inserted the value for string key “a” in the linked list at index 1. When inserting the value for string key “b”, the hash function maps it to index 1, and we can find the linked list there. We now need to make sure we didn’t already store a value for string key “b”, and so we go through the list and compare the original keys of the elements in the list with the key “b”. We did not find any element so we append “b” at the end of the list.

Triggering collision in practice

With the boring stuff out of the way, let’s talk about what this means in practice. How exactly can we cause full collisions, and how much do they impact performance?

We said that collisions happen when the hash function ends up mapping multiple inputs to the same output index. For the purpose of our attack we’d like to make the hash function map as many inputs to the same index, to increase the time complexity as much as possible.

It turns out PHP arrays actually use 2 different hash functions depending on the array key type.

Integer keys

If the key is an integer, PHP will try to use it directly, to preserve space. However, given that the backing array size is finite, PHP will mask the key to make sure it will fits into the backing array.

PHP arrays have a size that’s always a power of 2, to keep thing reasonably efficient in time and space. The size of an array starts at a default of 8 elements and when full it doubles up in size.
If you store 9 elements, the size of the array is 16, if you store 31 elements the size is 32 and so on.
When inserting an integer key into the array, the mask applied depends on the size:

mask = arraySize – 1

If we take 32 which is 100000 in binary, the mask then becomes 011111.
Because the size is always a power of 2, this means that as long as:

insertedKey % arraySize

is constant, we can cause all elements to collide and constantly trigger the worst case scenario.

<?php

$size = pow(2,15);
$start = microtime(true);
$key = 123;
for($i=0; $i<($size);$i++) {
    $ints[$key] = 0;
    $key += $size;
}
$end = microtime(true);
echo "Inserting $size evil elements took " . number_format($end - $start, 10) . " seconds \n";

$start = microtime(true);
$key = 123;
for($i=0; $i<($size);$i++) {
    $ints2[$key] = 0;
    $key += $size+1;
}
$end = microtime(true);
echo "Inserting $size good elements took " . number_format($end - $start, 10) . " seconds \n";

Let’s discuss the above code sample a bit. We’re building an array of 32768 elements ( 2 ^ 15). In the first part of the example we start with a random key (123) and always increment it by the size of the array to keep key % size constant. Just as a control for our test, in the second part of the example we also start with a random key and increment it by size +1 which in turn makes key % size not constant anymore. Here are some results on my machine:

That is quite the difference. As the array size increases (let’s say 2^ 17), because the number of elements to traverse in the list is quadratic, the results get worse and worse:

20 seconds!

Cool! it seems we can collide integer keys! How about strings?

String keys

It turns out, we can also cause collisions for string keys. The hash function PHP uses for string keys is DJBX33A, one of the simplest and fastest string hash functions. However, it turns out to be quite easy to collide. If we take a 2 characters string, increment the first character by one and decrement the second by 33 we should be able to cause a collision.

Testing this with a crude PHP implementation of DJBX33A :

<?php

function DJBX33A($key) {
    $hash = 5381;
    for($i=0; $i<strlen($key); $i++) {
        $hash = $hash * 33 + ord($key[$i]);
    }
    
    return $hash;
}

echo DJBX33A("ab") . "\n"; // 5863208
echo DJBX33A("Ex") . "\n"; // 5862306
echo DJBX33A("FW") . "\n"; // 5862306 -> collision

We can easily find pairs of 2 characters that collide. Another interesting property of DJBX33A is that if we start with 2 colliding strings and concatenate them in every possible way, the results will still collide. How would this look in PHP?

<?php

$size = pow(2, $length = 15);
$initial_colliding = [
    'Ex',
    'FW',
];
$initial_size = count($initial_colliding);
$start = microtime(true);

for($i=0; $i<($size);$i++) {
     $s = str_pad(base_convert($i, 10, $initial_size), $length, '0', STR_PAD_LEFT);
     $strings1[strtr($s, $initial_colliding)] = 0;
}
$end = microtime(true);
echo "Inserting $size evil elements took " . number_format($end - $start, 10) . " seconds \n";

$start = microtime(true);
$key = 'aaaaaaaaaaaaaaaaaaaaaaaaa';
for($i=0; $i<($size);$i++) {
    $strings2[$key] = 0;
    $key++;
}
$end = microtime(true);
echo "Inserting $size good elements took " . number_format($end - $start, 10) . " seconds \n";

The trick used to generate the elements is as follows: we first generate the binary representation of $i and pad it to the required size. We then use strtr to convert every 0 to Ex and every 1 to FW. This will give us concatenated strings which should collide to the same key.

Running the code above on my machine:

Looks like it works! Let’s bump up the number of elements in the array to 2^17:

Yikes!

That kept a processor core at 100% for close to 120 seconds! I can see how this turns into a denial of service attack.

What does this have to do with the log message we started with? Given that we can generate the problematic keys, we can also send them via GET or POST to an unsuspecting PHP based website. $_POST, $_GET and $_COOKIE are just arrays, and so the server will be kept busy for quite a while trying to load up the data that we send. In practice the web server will intervene and cut-off the request, however we can just send more requests :).

To protect against such an attack PHP introduced the max_input_vars config. Having a default value of 1000 should be more than enough for most applications while also protecting against this DoS attack as the impact of 1000 collisions is quite low.

max_input_vars however does not cover json_decode so as a take-away from this post, be careful when decoding user provided JSON.

One question remains: Why do we need max_input_vars. Can’t we just fix arrays? We just need to replace the hash function with something that’s not easy to collide right?

Unfortunately, it’s not that simple. As PHP arrays are the most used data structure of the language, the performance impact of switching the hash function was deemed too high.

Closing words

The goal of the article is not to expose a new vulnerability, this issue is known for many many years. However, in the wild, people are still trying to attack websites using this vulnerability and I was curious as to what exactly they are doing.

There are many better articles out there on this topic, such as this one from nikic, but hopefully I was able to explain the reasoning behind the problem and you’re now more aware of a potential issue.

Follow me on Twitter @sdnunca for more content.

Redux in 30 lines of PHP

The official website calls Redux “a predictable state container for JS Apps”. The main benefit of Redux is that it shines a light on your global application state, allowing you to trace when, where, why, and how your application’s state changed.

Mutable global state is indeed bad, mmmkay

The idea that global state is bad is drilled into programmers since the beginning of their programming career. One thing that isn’t clearly explained however is why.

There are several issues with global state, but the most important problem is that not only can anyone change the application state, but also that there’s no way you can tell who changed the state.

If your first contact with Redux comes in the context of a React Redux application, the Redux side of things will feel like magic.

As I usually try to understand the tools I’m using, the following tries to be a toy implementation of Redux in PHP, in the hopes of gaining a deeper understanding of the concepts behind Redux.

The most important realization was that Redux is mostly convention, with a bit of library code to tie it all together. As you could probably tell from the title, the language we’re gonna use is PHP, so consider yourself warned.

Great. Let’s go ahead and start with the simple state we’re going to manage:

$initialState = [
    'count' => [
        'count' => 1
    ],
];

Simple enough. As per the Redux convention, we don’t modify the state directly:

$initialState['count']['count'] +=1

But we can interact with the state using actions:

const INCREMENT_ACTION = 'INCREMENT';
const DECREMENT_ACTION = 'DECREMENT';
$actions = [
    'increment' => [
        'type' =>  INCREMENT_ACTION
    ],
    'decrement' => [
        'type' => DECREMENT_ACTION
    ],
];

The state and actions are plain PHP arrays, no magic about them.

We also need a reducer, a function that takes our state, processes the action and generates the new state:

function countReducer(array $state, $action) {
    switch($action['type']) {
        case INCREMENT_ACTION:
            return array_replace([], $state, ['count' => $state['count'] + 1]);
        case DECREMENT_ACTION:
            return array_replace([], $state, ['count' => $state['count'] - 1]);
        default:
            return $state;
    }
}

We’re just generating a new state array by incrementing / decrementing the previous value. Reducers receive part of the initial state array, based on their key, but more on that later when we’re going to introduce combineReducers. For now, let’s imagine countReducer receives the  value in $initialState['count']:

[
	'count' => 1
]

We still haven’t written any kind of library code yet, so far it’s all been convention.

If we take a look at the Redux API, we start by using the createStore method. We can then use the dispatch and subscribe methods to dispatch actions and subscribe to state change updates.

Translating that to PHP, let’s create a Store class. How would that look:

class Store 
{
    protected array $state;
    protected Closure $reducer;
    public function __construct(callable $reducer, array $initialState)
    {
        $this->state = $initialState;
        $this->reducer = Closure::fromCallable($reducer);
    }
}

The class now accepts an initial state and a reducer function. In Redux you can combine multiple reducers into one and pass that combined result to the createStore method, later on we’ll discuss how we can achieve that.

Great, how about dispatch and subscribe? Let’s start by adding a method to fetch the store’s state and then the subscribe method:

protected array $listeners = [];
public function getState()
{
	return $this->state;
}
public function subscribe(callable $listener)
{
	$this->listeners[] = Closure::fromCallable($listener);
}

Subscribing to the store means adding a callback function which will be called when the state updates. How would we use the subscribe method?

$store->subscribe(function($state) {
    print_r($state);
});

How about dispatch:

public function dispatch(array $action)
{
    $this->state = ($this->reducer)($this->getState(), $action);
    foreach($this->listeners as $listener) {
    	$listener($this->getState());
    }
}

In order to dispatch an action, we need to generate the new state by calling the reducer and then notifying all the subscribed listeners.

Awesome. Do we need anything else? No, that’s pretty much it. We now have a basic Redux implementation in PHP. Here’s how you’d use this:

<?php
class Store
{
    protected array $state;
    protected Closure $reducer;
    protected array $listeners = [];
    public function __construct( callable $reducer, array $initialState)
    {
        $this->state = $initialState;
        $this->reducer = Closure::fromCallable($reducer);
    }
    public function getState()
    {
        return $this->state;
    }
    public function subscribe(callable $listener)
    {
        $this->listeners[] = Closure::fromCallable($listener);
    }
    public function dispatch(array $action)
    {
        $this->state = ($this->reducer)($this->getState(), $action);
        foreach($this->listeners as $listener) {
            $listener($this->getState());
        }
    }
}
$initialState = [
    'count' => [
        'count' => 1
    ],
];
const INCREMENT_ACTION = 'INCREMENT';
const DECREMENT_ACTION = 'DECREMENT';
$actions = [
    'increment' => [
        'type' =>  INCREMENT_ACTION
    ],
    'decrement' => [
        'type' => DECREMENT_ACTION
    ],
];
function countReducer(array $state, $action) {
    switch($action['type']) {
        case INCREMENT_ACTION:
            return array_replace([], $state, ['count' => $state['count'] + 1]);
        case DECREMENT_ACTION:
            return array_replace([], $state, ['count' => $state['count'] - 1]);
        default:
            return $state;
    }
}
$store = new Store('countReducer', $initialState['count']);
$store->subscribe(function($state) {
    print_r($state);
});
$store->dispatch($actions['increment']);
$store->dispatch($actions['increment']);
$store->dispatch($actions['increment']);
$store->dispatch($actions['decrement']);

The output of the above code is:

Array
(
    [count] => 2
)
Array
(
    [count] => 3
)
Array
(
    [count] => 4
)
Array
(
    [count] => 3
)

You can then add custom data on the action and modify the reducer to take the custom data into account:

const INCREMENT_BY_ACTION = 'INCREMENT_BY';
function countReducer(array $state, $action) {
    switch($action['type']) {
        case INCREMENT_BY_ACTION;
            $value = $action['value'] ?? 0;
            return array_replace([], $state, ['count' => $state['count'] + $value]);
        case INCREMENT_ACTION:
            return array_replace([], $state, ['count' => $state['count'] + 1]);
        case DECREMENT_ACTION:
            return array_replace([], $state, ['count' => $state['count'] - 1]);
        default:
            return $state;
    }
}
$store->dispatch(['type' => INCREMENT_BY_ACTION, 'value' => 5]);

Awesome. The only problem now is that our store accepts only one reducer. That’s also the case for the createStore Redux function. The way they solve this problem is by providing a combineReducers method:

rootReducer = combineReducers({potato: potatoReducer, tomato: tomatoReducer});

So the method takes in a list of reducers and ends up providing one reducer that combines them. How would that look in PHP?

function combineReducers(array $reducers) {
    return function(array $state, $action) use ($reducers) {
        $newState = $state;
        foreach($reducers as $stateKey => $reducer) {
            if(!isset($newState[$stateKey])) {
                $newState[$stateKey] = [];
            }
            $newState[$stateKey] = $reducer($newState[$stateKey], $action);
        }
        return $newState;
    };
}

We start by accepting an array of reducers and we need to produce a callback that sticks to the reducer signature (takes in state and an action, and produces the resulting state).

Inside that callback we then go through all of the registered reducers, call them one by one with their appropriate piece of the state (based on the reducer key) and return the resulting state.

We would then call this as follows:

$initialState = [
    'count' => [
        'count' => 1
    ],
    'sum' => 0
];
function sumReducer($state, $action) {
    switch($action['type']) {
        case ADD_SUM:
            return $state + 1;
        default:
            return $state;
    }
    
$store = new Store(combineReducers([
    'count' => 'countReducer',
    'sum' => 'sumReducer'
]), $initialState);

We can now add multiple reducers to the store, to be able to split up the task of generating new state.

We’ve now wrapped up our toy implementation of Redux. I hope the article above, even if written in PHP, helped shed some light on how Redux works behind the scenes.

Follow me on Twitter @sdnunca for more content.