Sam Trenholme's webpage
Support this website

More on SipHash

 

October 9 2013

This blog attempts to better explain why SipHash was created.

==SipHash in Javascript?==

There is, believe it or not, a Javascript implementation of SipHash. While an interesting thought experiment, SipHash really isn’t an algorithm for running in a high-level scripting language. It’s an algorithm for low-level languages like C, LLVM, or assembler.

What should be using SipHash is not a script, but the underlying engine running a script.

==The type of code that would use SipHash==

Most high-level scripting language have a data type called either an “associative array”, a “hash”, or a “dictionary”. This data type allows a collection of data to be indexed by a string in addition to being indexed by number.

Here is how a dictionary is used in Python:

last_name = {}
last_name["John"] = ["Smith"]
last_name["Debby"] = ["Harry"]
last_name["Miley"] = ["Cyrus"]
last_name["Tom"] = ["Hanks", "Cruise"]

Here, “last_name” is a collection of last names, indexed by the person’s first name. It lets us know that, in our circle of friends, there are two Toms: Tom Hanks and Tom Cruise.

==Under the hood==

Computers, at a low level, do not index data by string, but instead index data by number. Under the hood, Python has special code that converts a string in to a number, and then puts the data in a “hash bucket” with that number.

==The problem SipHash solves==

The underlying code for converting a string in to a number is designed to not have a lot of strings converted in to the same number. If this assumption is broken, then certain types of attacks are possible.

Until SipHash came out last year, programmers had to resort to using ad-hoc methods to convert strings in to numbers in a way that was hard for attackers to guess. Deadwood, for example, uses a method that is secure assuming that the attacker does not know the numbers strings are being converted in to.

The reason Deadwood uses an ad-hoc method instead of a real cryptographic primitive was because there was not, when I designed Deadwood in 2007, a way of very quickly generating a small random number from a string using a cryptographic primitive. Indeed, I observed, while designing Deadwood in 2007, that “the quick and dirty hash [that Deadwood uses] is between 750 and 1400 times faster than Radio Gatun” for hashing short strings.

SipHash solves the very particular problem of implementing a hash data type in a low-level language where it’s important the “hash compression” function (the code that converts a string in to a number) is one where an attacker can not guess what number a given string will generate, even if they somehow know the numbers generated by other strings.

==See also==

My SipHash implementation SipHash

To post a comment about this blog entry, go to the forum (self-signed https). New accounts may post once I approve the account.