I’ve bought Tinkerwell a while back but I could never seem to find the time to set it up and start using it.
During a weekend with more free time than usual I finally decided to set it up.
Writing a custom driver
As I needed custom support for the project I’m currently working on, I’ve quickly started writing a custom driver for it. Writing a driver is a quick and painless process, however it wasn’t long until I ran into an issue.
The way drivers are designed in Tinkerwell, there’s a method called bootstrap where you can load all your application specific start-up code.
The issue I ran into is around global variables. The PHP documentation mentions that:
When a file is included, the code it contains inherits the variable scope of the line on which the include occurs. Any variables available at that line in the calling file will be available within the called file, from that point forward.
This means that all variables defined in the included files are available inside the bootstrap function scope. However, trying to use any of those variables using global ends up resulting in those variables not being found, as they don’t exist in the global scope.
Variable scope and include
There is an work-around available, if you can extract all the variables that are used globally, you can declare them as global before including the bootstrapping files. This way assigning values to those variables ends up updating them in the global scope and further global calls work as expected. That is however tedious, as I couldn’t find an easy way to extract the variables used globally in the project.
As the PHP code that Tinkerwell runs is available in .phar format, I’ve ended up modifying the code to have the driver provide a list of files for inclusion and directly include them in the global scope. I’ve also created an issue to hopefully add support for this use-case in Tinkerwell.
Hopefully this provides a bit of insight into how variable scoping works with including files as this puzzled me for a second or 5 when I ran into it.
PHP arrays are a unicorn, as in they’re actually hash maps behind the scenes. The fact that hash map implementations are vulnerable to collision denial of service attacks is old news and it was a common problem for multiple programming languages. PHP even added configuration to mitigate the problem back in the PHP 5.3.9 days.
However, after seeing the following entry in logs again and again:
PHP Warning: Unknown: Input variables exceeded 1000. To increase the limit change max_input_vars in php.ini
I wanted to get a better understanding of what the ‘would be’ attackers are doing, so todays topic is PHP arrays and how exactly are they implemented.
Hash maps
We know from the above picture that PHP arrays are implemented as hash maps. How do they work? Hash maps are a data structure that maps keys to values.
Here’s a high level diagram:
We start with a string key, apply a hash function and get the position of the element in an array of “buckets” where we store the actual value. A good hash function should map inputs as evenly as possible over its outputs to avoid collisions.
Collisions
However, as the number of inputs is infinite while the backing array of buckets size is finite, collisions will happen. In the above diagram both “a” and “b” keys are mapped to bucket index 1. Hash map implementations then need to have a mechanism for collision resolution. Several procedures for collision resolution are known, but the primary ones are open addressing and separate chaining, each with specific trade-offs.
Open addressing inserts the element at a different index if collisions occur while separate chaining stores a linked list of elements that map to the same index. For reasons that are out of scope for the current article, PHP chose the later approach:
This overly simplifies the actual PHP implementations, which optimises for returning elements in insertion order and easier deletion, but the point still stands. Elements are inserted in some sort of linked list, and if a collision happens the additional elements for a specific index are appended to the list.
The complexity of inserting an element into a PHP array is for the best case O(1). Given that the hash function should distribute keys evenly, the average complexity is also O(1). This makes inserting into PHP arrays fast.
However, what about the worst case? Based on the diagram, the worst case would be if all string keys are mapped to the same bucket index. That would mean that insertion, instead of being a O(1) operation now has O(n) complexity as we actually have to iterate through the linked list and insert the element.
At this point you might wonder why O(n) ? Why do we need to iterate through the list and not just store the pointer to the last element? The reason is that for each element where a key collision happens, we store both its original key and value. When setting $a["b"] we need to make sure we didn’t store a value for “b” before, and so we go through the whole linked list and compare its original key with each element in the list.
To further explain this, let’s take the diagram above. We inserted the value for string key “a” in the linked list at index 1. When inserting the value for string key “b”, the hash function maps it to index 1, and we can find the linked list there. We now need to make sure we didn’t already store a value for string key “b”, and so we go through the list and compare the original keys of the elements in the list with the key “b”. We did not find any element so we append “b” at the end of the list.
Triggering collision in practice
With the boring stuff out of the way, let’s talk about what this means in practice. How exactly can we cause full collisions, and how much do they impact performance?
We said that collisions happen when the hash function ends up mapping multiple inputs to the same output index. For the purpose of our attack we’d like to make the hash function map as many inputs to the same index, to increase the time complexity as much as possible.
It turns out PHP arrays actually use 2 different hash functions depending on the array key type.
Integer keys
If the key is an integer, PHP will try to use it directly, to preserve space. However, given that the backing array size is finite, PHP will mask the key to make sure it will fits into the backing array.
PHP arrays have a size that’s always a power of 2, to keep thing reasonably efficient in time and space. The size of an array starts at a default of 8 elements and when full it doubles up in size. If you store 9 elements, the size of the array is 16, if you store 31 elements the size is 32 and so on. When inserting an integer key into the array, the mask applied depends on the size:
mask = arraySize – 1
If we take 32 which is 100000 in binary, the mask then becomes 011111. Because the size is always a power of 2, this means that as long as:
insertedKey % arraySize
is constant, we can cause all elements to collide and constantly trigger the worst case scenario.
Let’s discuss the above code sample a bit. We’re building an array of 32768 elements ( 2 ^ 15). In the first part of the example we start with a random key (123) and always increment it by the size of the array to keep key % size constant. Just as a control for our test, in the second part of the example we also start with a random key and increment it by size +1 which in turn makes key % size not constant anymore. Here are some results on my machine:
That is quite the difference. As the array size increases (let’s say 2^ 17), because the number of elements to traverse in the list is quadratic, the results get worse and worse:
20 seconds!
Cool! it seems we can collide integer keys! How about strings?
String keys
It turns out, we can also cause collisions for string keys. The hash function PHP uses for string keys is DJBX33A, one of the simplest and fastest string hash functions. However, it turns out to be quite easy to collide. If we take a 2 characters string, increment the first character by one and decrement the second by 33 we should be able to cause a collision.
Testing this with a crude PHP implementation of DJBX33A :
We can easily find pairs of 2 characters that collide. Another interesting property of DJBX33A is that if we start with 2 colliding strings and concatenate them in every possible way, the results will still collide. How would this look in PHP?
The trick used to generate the elements is as follows: we first generate the binary representation of $i and pad it to the required size. We then use strtr to convert every 0 to Ex and every 1 to FW. This will give us concatenated strings which should collide to the same key.
Running the code above on my machine:
Looks like it works! Let’s bump up the number of elements in the array to 2^17:
Yikes!
That kept a processor core at 100% for close to 120 seconds! I can see how this turns into a denial of service attack.
What does this have to do with the log message we started with? Given that we can generate the problematic keys, we can also send them via GET or POST to an unsuspecting PHP based website. $_POST, $_GET and $_COOKIE are just arrays, and so the server will be kept busy for quite a while trying to load up the data that we send. In practice the web server will intervene and cut-off the request, however we can just send more requests :).
To protect against such an attack PHP introduced the max_input_vars config. Having a default value of 1000 should be more than enough for most applications while also protecting against this DoS attack as the impact of 1000 collisions is quite low.
max_input_vars however does not cover json_decode so as a take-away from this post, be careful when decoding user provided JSON.
One question remains: Why do we need max_input_vars. Can’t we just fix arrays? We just need to replace the hash function with something that’s not easy to collide right?
The goal of the article is not to expose a new vulnerability, this issue is known for many many years. However, in the wild, people are still trying to attack websites using this vulnerability and I was curious as to what exactly they are doing.
There are many better articles out there on this topic, such as this one from nikic, but hopefully I was able to explain the reasoning behind the problem and you’re now more aware of a potential issue.
The official website calls Redux “a predictable state container for JS Apps”. The main benefit of Redux is that it shines a light on your global application state, allowing you to trace when, where, why, and how your application’s state changed.
Mutable global state is indeed bad, mmmkay
The idea that global state is bad is drilled into programmers since the beginning of their programming career. One thing that isn’t clearly explained however is why.
There are several issues with global state, but the most important problem is that not only can anyone change the application state, but also that there’s no way you can tell who changed the state.
If your first contact with Redux comes in the context of a React Redux application, the Redux side of things will feel like magic.
As I usually try to understand the tools I’m using, the following tries to be a toy implementation of Redux in PHP, in the hopes of gaining a deeper understanding of the concepts behind Redux.
The most important realization was that Redux is mostly convention, with a bit of library code to tie it all together. As you could probably tell from the title, the language we’re gonna use is PHP, so consider yourself warned.
Great. Let’s go ahead and start with the simple state we’re going to manage:
$initialState = [
'count' => [
'count' => 1
],
];
Simple enough. As per the Redux convention, we don’t modify the state directly:
We’re just generating a new state array by incrementing / decrementing the previous value. Reducers receive part of the initial state array, based on their key, but more on that later when we’re going to introduce combineReducers. For now, let’s imagine countReducer receives the value in $initialState['count']:
[
'count' => 1
]
We still haven’t written any kind of library code yet, so far it’s all been convention.
If we take a look at the Redux API, we start by using the createStore method. We can then use the dispatch and subscribe methods to dispatch actions and subscribe to state change updates.
Translating that to PHP, let’s create a Store class. How would that look:
class Store
{
protected array $state;
protected Closure $reducer;
public function __construct(callable $reducer, array $initialState)
{
$this->state = $initialState;
$this->reducer = Closure::fromCallable($reducer);
}
}
The class now accepts an initial state and a reducer function. In Redux you can combine multiple reducers into one and pass that combined result to the createStore method, later on we’ll discuss how we can achieve that.
Great, how about dispatch and subscribe? Let’s start by adding a method to fetch the store’s state and then the subscribe method:
protected array $listeners = [];
public function getState()
{
return $this->state;
}
public function subscribe(callable $listener)
{
$this->listeners[] = Closure::fromCallable($listener);
}
Subscribing to the store means adding a callback function which will be called when the state updates. How would we use the subscribe method?
Awesome. The only problem now is that our store accepts only one reducer. That’s also the case for the createStore Redux function. The way they solve this problem is by providing a combineReducers method:
We start by accepting an array of reducers and we need to produce a callback that sticks to the reducer signature (takes in state and an action, and produces the resulting state).
Inside that callback we then go through all of the registered reducers, call them one by one with their appropriate piece of the state (based on the reducer key) and return the resulting state.
We can now add multiple reducers to the store, to be able to split up the task of generating new state.
We’ve now wrapped up our toy implementation of Redux. I hope the article above, even if written in PHP, helped shed some light on how Redux works behind the scenes.