LZ77 Very Slow

Discussion about everything. New games, 3d math, development tips...
Alpha Omega
Posts: 288
Joined: Wed Oct 29, 2008 12:07 pm

LZ77 Very Slow

Post by Alpha Omega »

I made an implementation of the LZ77 algorithm to learn the ins and outs of file compression. I compared the compression ratios to that of ZIP and for files with heavy repetition they were not as good but not terrible. However the code is soooo slow. It works quick for small files (of course) but large files >=5 MB can take minutes!!! I am using a 4096 byte moving window varient for the sliding window. I know the searching is costly for possible searching 4096 times per byte in the file, my question is how do ZIP and other compression programs execute so damn fast!? How do you profile your code for speed? I am trying to find the bottle neck in the code but am having trouble :(.
CuteAlien
Admin
Posts: 9670
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: LZ77 Very Slow

Post by CuteAlien »

Use a profiler. On Linux there is gprof and on Windows a free profiler is "very sleepy" (really, that's it's name ^_^). Or add timer-code into your functions (for example like this: http://irrlicht.sourceforge.net/forum/v ... highlight=).
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Alpha Omega
Posts: 288
Joined: Wed Oct 29, 2008 12:07 pm

Re: LZ77 Very Slow

Post by Alpha Omega »

Thanks, that program is very good. I replaced the brute force search method I was using with a 256 x 256 index table. I am now compressing 50 MB @ 26.66 seconds and decompressing @ 10.00 seconds. Right now I am getting > 50% compression on most text files. I am going to implement a huffman code algorithm before the LZ77 (I think 7zip does this?) to hopefully achieve a good final compression.

I know Irrlicht uses zlib for compression right? For games and file compression, I know the decompression speed is most desired so you can decompress on the fly and put the data right into the game, right? I am hoping to get the final version to about 10 seconds compression and 5 seconds decompression by the time I am finished. I just don't know if the compression % will be as good, we will see.
mongoose7
Posts: 1227
Joined: Wed Apr 06, 2011 12:13 pm

Re: LZ77 Very Slow

Post by mongoose7 »

Huffman is not good. Its only distinction is that, given a particular example, it will compress it better than any other method, but it does not then compress other data well. That is, you can set it up to compress just one piece of data.
Alpha Omega
Posts: 288
Joined: Wed Oct 29, 2008 12:07 pm

Re: LZ77 Very Slow

Post by Alpha Omega »

Huffman will only not work if the file contains all 256 different byte combinations that are of equal probability. So prolly not good with image files but text files/log files can achieved great compression. Also using huffman algorithm will not eliminate the redundancy in the bit stream, as I see it replacing an 8 bit combination with a 3 bit combination will still result in the same encoded pattern giving further compression and faster LZ77 execution.

But the best compression is with arithmetic coding models, however the speed is terrible. This algorithm will be for very quick execution to possibly encode information through transmissions.

I have a problem with Huffman algorithm when it comes to transmitting the table used during compression. So maybe I will choose 2-3 predetermined look up tables to do the encoding, prolly more inefficient but will keep the speed cost low.
mongoose7
Posts: 1227
Joined: Wed Apr 06, 2011 12:13 pm

Re: LZ77 Very Slow

Post by mongoose7 »

Alpha Omega wrote:Huffman will only not work if the file contains all 256 different byte combinations that are of equal probability. So prolly not good with image files but text files/log files can achieved great compression. Also using huffman algorithm will not eliminate the redundancy in the bit stream, as I see it replacing an 8 bit combination with a 3 bit combination will still result in the same encoded pattern giving further compression and faster LZ77 execution.
Well, I was going to walk on by, and maybe I should, but:

2^3 is 8. So you are compressing files with only 8 distinct characters in them?
Alpha Omega
Posts: 288
Joined: Wed Oct 29, 2008 12:07 pm

Re: LZ77 Very Slow

Post by Alpha Omega »

No I was just using it as an example. Obviously you have use to use more bits for less probable characters. Anyway your right, huffman code is not very efficient compared to arithmetic coding, and I think I can make an implementation that is pretty quick, combining a LZW style dictionary into the arithmetic coding algorithm.
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Re: LZ77 Very Slow

Post by hybrid »

Maybe try bzip or lzma
REDDemon
Developer
Posts: 1044
Joined: Tue Aug 31, 2010 8:06 pm
Location: Genova (Italy)

Re: LZ77 Very Slow

Post by REDDemon »

The fastest should be gzip. lzma has very good compression but is slower (not to much). Every compression algorithm has pros and contros (for games i think that zlib and gzip are very good.. other formats are good for server/clients wich needs to minimize bandwith when downloading). I never tried to use directly libraries like zlib or libpng. I spent some time on data crypt/decrypt streams. but not on data compression.
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
Virion
Competition winner
Posts: 2148
Joined: Mon Dec 18, 2006 5:04 am

Re: LZ77 Very Slow

Post by Virion »

idsoft used zlib (I think) for doom3 and the compression is pretty good.
The assets for Doom 3 are 3.6GB uncompressed, and only 1.44GB after being packed up.
http://www.iddevnet.com/doom3/packfiles.php
CuteAlien
Admin
Posts: 9670
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: LZ77 Very Slow

Post by CuteAlien »

Jake03 a bot or not - that is the question. Typical bot pattern (the kind that copy-pastes texts which sound similar from elsewhere and adds spam a few days later), but could also be uninformed coder who asks for code-help without showing code...
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
hybrid
Admin
Posts: 14143
Joined: Wed Apr 19, 2006 9:20 pm
Location: Oldenburg(Oldb), Germany
Contact:

Re: LZ77 Very Slow

Post by hybrid »

Yep, actually the mentioned code is missing. But assuming this was just a copy-paste error, this post might make sense. But well, looking at the two other posts, it's either bot or troll, so maybe ready for deletion anyway...
CuteAlien
Admin
Posts: 9670
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: LZ77 Very Slow

Post by CuteAlien »

Ok, Jake03 turned out to be a bot (he added spam now). Grr.. really, really hate that combination of delayed spam + bots copying text that sounds similar to thread-text from somewhere.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
REDDemon
Developer
Posts: 1044
Joined: Tue Aug 31, 2010 8:06 pm
Location: Genova (Italy)

Re: LZ77 Very Slow

Post by REDDemon »

What about adding extra antibot questions for users wich have less than X messages? So before posting any message on the forum if you have less than 30 messages a question will be asked.
Junior Irrlicht Developer.
Real value in social networks is not about "increasing" number of followers, but about getting in touch with Amazing people.
- by Me
CuteAlien
Admin
Posts: 9670
Joined: Mon Mar 06, 2006 2:25 pm
Location: Tübingen, Germany
Contact:

Re: LZ77 Very Slow

Post by CuteAlien »

We're not even sure if that's bots anymore - or just people in some poor country which are payed for spamming. But the idea is sound - I'll ask Yoran if something like that could be done.
IRC: #irrlicht on irc.libera.chat
Code snippet repository: https://github.com/mzeilfelder/irr-playground-micha
Free racer made with Irrlicht: http://www.irrgheist.com/hcraftsource.htm
Post Reply