Compile-time string hashing
Posted: Wed Jan 18, 2012 10:11 am
Note: If you are experienced with string comparisons and the problems associated with it, you can skip the wall of text below and go right to the link provided.
Let's imagine a scenario: Say that you've written this awesome game, and you've come up with a nice event/messaging system to update your scenes or to notify some node that something in your scene has happened, take an explosion for example.
When you implemented this system you thought about how to handle messages/events: Do you hardcode all different kinds of events into your game/engine using an enum? No, that would be quite limiting if you ever wanted to expand your list of possible events, and would be quite hard to manage after a while, so instead you chose to use strings to identify your events.
All fine and dandy, you might say, you can just send out events or messages using a string to identify the type of the event and the possible arguments you give to said event.
After a while your game is sending out tons of events and everything is handled correctly, but when you profile your application you start to see a bottleneck: String comparison
Let's look at string comparison for a while:
To compare 2 strings to eachother (maybe using a strcmp or a loop) you always have to traverse each character in said strings - after all, C-style strings are nothing more than null-terminated char arrays - so you can compare each and every one of them to see whether they match in both strings. This creates a linear complexity (ie. O(n) ), which means that your string comparison gets more complex, and therefore takes more time to complete, with a constant amount for every character you add.
In general programming a linear complexity is not all that bad, and it isn't that bad either in more high-performance programming (like games) when this kind of complexity doesn't appear all that much,
but we're in a situation here where we do a lot of these string comparisons which just keep on piling up, eating your game's performance.
What could we possibly do about this problem?
Of course it would be a good idea to try and reduce the amount of string comparisons in your game to a bare minimum, but obviously this is easier said than done and will never get rid of the problem completely.
What if I told you there's a way to compare strings in constant time? (ie. O(1) ) That would solve all of our string-related problems wouldn't it? Well, there is such a way, it may not solve all string-related problems, but it does solve the string comparison problem.
Meet String Hashing.
The process of string hashing is simple, for each unique string that needs to be compared you generate a unique hash (using your favorite hashing algorithm). This hash will be an integer, and because of the nature of hashing will be identical to a hash generated by an identical base string.
But hang on a minute, this doesn't solve our problem at all! To generate a hash from a string one would still need to traverse each character in said string, which brings us right back to our original problem!
Luckily there's a quite impressive solution to this problem: we can use our compiler to generate these hashes at compile-time! Better yet, we can make this work in a completely non-intrusive way, so no need to mess with the internals of your game or engine.
This completely removes the complexity of comparing strings or generating hashes from your application, and puts the burden on the compiler through some nice preprocessor programming or template metaprogramming. (whichever you like best)
I could explain the process right here, but I could never do it quite as well as they do on this page:
http://www.gamasutra.com/view/news/3819 ... ashing.php
They present 2 techniques for creating a quite ingenious non-intrusive compile-time string hashing implementation, which could also be used perfectly at run-time.
I'd really recommend any performance-oriented developer to check this out, as string comparisons can pose a major problem in many games.
Let's imagine a scenario: Say that you've written this awesome game, and you've come up with a nice event/messaging system to update your scenes or to notify some node that something in your scene has happened, take an explosion for example.
When you implemented this system you thought about how to handle messages/events: Do you hardcode all different kinds of events into your game/engine using an enum? No, that would be quite limiting if you ever wanted to expand your list of possible events, and would be quite hard to manage after a while, so instead you chose to use strings to identify your events.
All fine and dandy, you might say, you can just send out events or messages using a string to identify the type of the event and the possible arguments you give to said event.
After a while your game is sending out tons of events and everything is handled correctly, but when you profile your application you start to see a bottleneck: String comparison
Let's look at string comparison for a while:
To compare 2 strings to eachother (maybe using a strcmp or a loop) you always have to traverse each character in said strings - after all, C-style strings are nothing more than null-terminated char arrays - so you can compare each and every one of them to see whether they match in both strings. This creates a linear complexity (ie. O(n) ), which means that your string comparison gets more complex, and therefore takes more time to complete, with a constant amount for every character you add.
In general programming a linear complexity is not all that bad, and it isn't that bad either in more high-performance programming (like games) when this kind of complexity doesn't appear all that much,
but we're in a situation here where we do a lot of these string comparisons which just keep on piling up, eating your game's performance.
What could we possibly do about this problem?
Of course it would be a good idea to try and reduce the amount of string comparisons in your game to a bare minimum, but obviously this is easier said than done and will never get rid of the problem completely.
What if I told you there's a way to compare strings in constant time? (ie. O(1) ) That would solve all of our string-related problems wouldn't it? Well, there is such a way, it may not solve all string-related problems, but it does solve the string comparison problem.
Meet String Hashing.
The process of string hashing is simple, for each unique string that needs to be compared you generate a unique hash (using your favorite hashing algorithm). This hash will be an integer, and because of the nature of hashing will be identical to a hash generated by an identical base string.
But hang on a minute, this doesn't solve our problem at all! To generate a hash from a string one would still need to traverse each character in said string, which brings us right back to our original problem!
Luckily there's a quite impressive solution to this problem: we can use our compiler to generate these hashes at compile-time! Better yet, we can make this work in a completely non-intrusive way, so no need to mess with the internals of your game or engine.
This completely removes the complexity of comparing strings or generating hashes from your application, and puts the burden on the compiler through some nice preprocessor programming or template metaprogramming. (whichever you like best)
I could explain the process right here, but I could never do it quite as well as they do on this page:
http://www.gamasutra.com/view/news/3819 ... ashing.php
They present 2 techniques for creating a quite ingenious non-intrusive compile-time string hashing implementation, which could also be used perfectly at run-time.
I'd really recommend any performance-oriented developer to check this out, as string comparisons can pose a major problem in many games.