i am Chris Smith

Protocol Engineer, Security Researcher, Software Consultant

Ruby Hashes... what are they good for?

Hashes are a very powerful and useful tool in Ruby, so I wanted to dig a little more into them.

One of the most commonly used data storage structures in my programming so far has been Arrays. Arrays are great, they are lists of things that you might need later. You can access them fairly easily using using the index (i.e. my_awesome_array[0]). In Ruby, there are powerful built in enumerables that allow you to look into and manipulate arrays. Arrays however are not great at relating data to other data so can be cumbersome for some types of data or flows.

Other programming languages (PHP and Javascript in my experience) also have something like hashes though they are called associative arrays. Personally I find this naming convention a little clearer since they operate similarly and Hashes are associating the data.

So what does that mean. Basically if we have array: [1,2,3,4], there is no relationship between your data. Unless you know the order of the array, you cannot get from 1 to 2. However in a hash {a: 1, b: 2} you can only know a and get back 1 since those data are associated with each other and a key/value pair.

This can make working with data faster and easier. It becomes faster because you don’t have to iterate through an entire array to find the value you are looking for. It is easier because the data starts to have meaning beyond its order. In one Stack Overflow post their tests show that searching an array vs finding a key in a hash resulted in a more than 2 sec difference in time when searching 10,000 entries.

So how can a hash be so fast compared to an array. As I am learning with Ruby, EVERYTHING is an object. After a bit of digging, I found out that a Hash is actually 1) an object and 2) storing its information in bins. These bins are actually… drumroll please… arrays!

So here is what I understand is happening: When you create a new Hash ruby instantiates a new Hash object. This object starts with 11 bins (Arrays). If any bin has more than 5 elements in it, ruby reorganizes things and increases the number of bins. This process can be a bit slow but only has to happen once. When you are looking for something in the Hash ruby is smart enough to calculate its bin again and only look inside that array. This sounds an awful like a search tree algorithm which is why we go from a O(n) for array searches to O(1) for hash search times. (More on search trees and linked lists in a later post.)

Further Reading: