Compile-time string hashing

A forum to store posts deemed exceptionally wise and useful
Post Reply
Radikalizm
Posts: 1215
Joined: Tue Jan 09, 2007 7:03 pm
Location: Leuven, Belgium

Compile-time string hashing

Post by Radikalizm »

Note: If you are experienced with string comparisons and the problems associated with it, you can skip the wall of text below and go right to the link provided.

Let's imagine a scenario: Say that you've written this awesome game, and you've come up with a nice event/messaging system to update your scenes or to notify some node that something in your scene has happened, take an explosion for example.
When you implemented this system you thought about how to handle messages/events: Do you hardcode all different kinds of events into your game/engine using an enum? No, that would be quite limiting if you ever wanted to expand your list of possible events, and would be quite hard to manage after a while, so instead you chose to use strings to identify your events.
All fine and dandy, you might say, you can just send out events or messages using a string to identify the type of the event and the possible arguments you give to said event.
After a while your game is sending out tons of events and everything is handled correctly, but when you profile your application you start to see a bottleneck: String comparison

Let's look at string comparison for a while:
To compare 2 strings to eachother (maybe using a strcmp or a loop) you always have to traverse each character in said strings - after all, C-style strings are nothing more than null-terminated char arrays - so you can compare each and every one of them to see whether they match in both strings. This creates a linear complexity (ie. O(n) ), which means that your string comparison gets more complex, and therefore takes more time to complete, with a constant amount for every character you add.
In general programming a linear complexity is not all that bad, and it isn't that bad either in more high-performance programming (like games) when this kind of complexity doesn't appear all that much,
but we're in a situation here where we do a lot of these string comparisons which just keep on piling up, eating your game's performance.

What could we possibly do about this problem?
Of course it would be a good idea to try and reduce the amount of string comparisons in your game to a bare minimum, but obviously this is easier said than done and will never get rid of the problem completely.
What if I told you there's a way to compare strings in constant time? (ie. O(1) ) That would solve all of our string-related problems wouldn't it? Well, there is such a way, it may not solve all string-related problems, but it does solve the string comparison problem.
Meet String Hashing.

The process of string hashing is simple, for each unique string that needs to be compared you generate a unique hash (using your favorite hashing algorithm). This hash will be an integer, and because of the nature of hashing will be identical to a hash generated by an identical base string.
But hang on a minute, this doesn't solve our problem at all! To generate a hash from a string one would still need to traverse each character in said string, which brings us right back to our original problem!
Luckily there's a quite impressive solution to this problem: we can use our compiler to generate these hashes at compile-time! Better yet, we can make this work in a completely non-intrusive way, so no need to mess with the internals of your game or engine.
This completely removes the complexity of comparing strings or generating hashes from your application, and puts the burden on the compiler through some nice preprocessor programming or template metaprogramming. (whichever you like best)

I could explain the process right here, but I could never do it quite as well as they do on this page:


http://www.gamasutra.com/view/news/3819 ... ashing.php


They present 2 techniques for creating a quite ingenious non-intrusive compile-time string hashing implementation, which could also be used perfectly at run-time.

I'd really recommend any performance-oriented developer to check this out, as string comparisons can pose a major problem in many games.
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Compile-time string hashing

Post by hendu »

Do you hardcode all different kinds of events into your game/engine using an enum? No, that would be quite limiting
compile-time
I don't see the point? It's still compile-time.
mongoose7
Posts: 1227
Joined: Wed Apr 06, 2011 12:13 pm

Re: Compile-time string hashing

Post by mongoose7 »

That's the point. It is only for static strings. The string is not stored anywhere and it is referenced only as a number. It's a bit like snake oil.
Radikalizm
Posts: 1215
Joined: Tue Jan 09, 2007 7:03 pm
Location: Leuven, Belgium

Re: Compile-time string hashing

Post by Radikalizm »

hendu wrote:
Do you hardcode all different kinds of events into your game/engine using an enum? No, that would be quite limiting
compile-time
I don't see the point? It's still compile-time.
There's a difference between manually defining enum values and being able to use strings in your application which automatically get hashed by the compiler. An algorithm being executed at compile-time is absolutely not the same as hardcoding values.
Say you have a central enum in your game engine which defines types of events. Managing this enum becomes a huge mess when your application requires a good amount of event types (which is pretty common for games) and manually adding values to this enum whenever you need a new type of event is absolutely not flexible. Using strings for this makes this all much easier since you're not tied to one central 'table' for defining event types. If you need a new type of event you can just enter a string (ie. "Explosion") as event type in your event structure and you're good to go.

This string-hashing implementation does not require you to do adjust your programming behaviour, you can just use strings like you're used to and the compiler will create hashes for them when a comparison is needed.

Let me give you another example:

Say you have a resource loader which loads in resources based on their filename, and which stores these resources in a map somewhere with the resource filename as a key. Let's say you also have a function which can return resources based on their names by looking through the resource map.

The code you'd write for this would look something like this:

Code: Select all

 
loadResource("someTexture.jpg");
Texture* someTexture = getResource("someTexture.jpg");
 
This would work fine, but it would need to use an expensive string comparison

With string hashing, you could use a string hash as a key in your resource map, and the exact same code as written above could be used to get the exact same functionality while eliminating the string comparison from the getResource call, since the "someTexture.jpg" string is converted into a string hash by the compiler

String hashing is a proven technique, and is used a lot especially in the game development industry.
I know you like to get into discussions with me hendu, but there's no reason to be skeptical about this one ;)

@mongoose7:

It's perfectly possible to adapt the implementation to store the string and the hash together, but I was just talking about the hashing part here
As stated, this algorithm also works perfectly at run-time, so it's completely possible to define a string and hash it. Of course re-hashing would be required whenever the string were to be modified, so yes it's only for static strings.
mongoose7
Posts: 1227
Joined: Wed Apr 06, 2011 12:13 pm

Re: Compile-time string hashing

Post by mongoose7 »

So instead of writing loadResource(E_SomeTexture) you prefer loadResource("someTexture"). I think that is the only difference. The enum table handles itself because the code won't compile until it is defined. Your code won't compile until you add the string to your "central table".

I can't see the difference myself.

Oops, there is a difference - you can add E_SomeTexture to a switch statement.
Radikalizm
Posts: 1215
Joined: Tue Jan 09, 2007 7:03 pm
Location: Leuven, Belgium

Re: Compile-time string hashing

Post by Radikalizm »

mongoose7 wrote:So instead of writing loadResource(E_SomeTexture) you prefer loadResource("someTexture"). I think that is the only difference. The enum table handles itself because the code won't compile until it is defined. Your code won't compile until you add the string to your "central table".

I can't see the difference myself.

Oops, there is a difference - you can add E_SomeTexture to a switch statement.
Why wouldn't you be able to add string hashes to a switch statement?
hendu
Posts: 2600
Joined: Sat Dec 18, 2010 12:53 pm

Re: Compile-time string hashing

Post by hendu »

So instead of one centrally defined enum, this style would have "Explosion" in X different places. Maybe one of them typos it as "Exlposion". Maybe it's merely inconvenient having to change it in X places.

I still see strings, even with this hack, as much messier.
serengeor
Posts: 1712
Joined: Tue Jan 13, 2009 7:34 pm
Location: Lithuania

Re: Compile-time string hashing

Post by serengeor »

I've used this (well not exactly this one, I found some piece of code that does it somewhere else) for string based searching. I had my entities named using strings but also calculated hashes for those names so when I would search for an entity I would calculate the hash for search string and compare it with the hashes in the list. Should give a bit of speed I think.
Working on game: Marrbles (Currently stopped).
mongoose7
Posts: 1227
Joined: Wed Apr 06, 2011 12:13 pm

Re: Compile-time string hashing

Post by mongoose7 »

For the specific case of, say, looking up textures, you could have findTex(char *tex) and, inside that function, hash the name and look it up that way. That is a valid technique an ddoes not affect the interface. But the discussion here is concering a method of hashing which is explicit and applies across all the code. To take the example of findTex(), all code accessing that function would have to be aware of the hashing.
serengeor
Posts: 1712
Joined: Tue Jan 13, 2009 7:34 pm
Location: Lithuania

Re: Compile-time string hashing

Post by serengeor »

mongoose7 wrote:To take the example of findTex(), all code accessing that function would have to be aware of the hashing.
Why is that so? (seriously I can't find the reason, please share your thoughts)
Working on game: Marrbles (Currently stopped).
mongoose7
Posts: 1227
Joined: Wed Apr 06, 2011 12:13 pm

Re: Compile-time string hashing

Post by mongoose7 »

We *are* talking about the method outlined by Radikalism, yes?

When we call findTex(specialHash("texname")), "specialHash" is basically a compiler directive, so you would need the header that specifies it. Also, "texname" has to be a literal string.

Perhaps you are thinking that you could handle the hashing inside findTex?

Code: Select all

int findTex(char *texname)
{
  int hash = specialHash(texname);       <==== compile error
 
..
}
Remember, the "specialHash" is calculated at compile time.
serengeor
Posts: 1712
Joined: Tue Jan 13, 2009 7:34 pm
Location: Lithuania

Re: Compile-time string hashing

Post by serengeor »

Oh right, I mistakenly thought about the implementation I used. It wasn't compile time as this is, though maybe for things like this it would be more aplicable :roll:
Working on game: Marrbles (Currently stopped).
REDDemon
Developer
Posts: 1044
Joined: Tue Aug 31, 2010 8:06 pm
Location: Genova (Italy)

Re: Compile-time string hashing

Post by REDDemon »

what about 32 bit IDs? :-/
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Post Reply