Page 1 of 2

LZ77 Very Slow

Posted: Sun Oct 16, 2011 1:15 pm
by Alpha Omega
I made an implementation of the LZ77 algorithm to learn the ins and outs of file compression. I compared the compression ratios to that of ZIP and for files with heavy repetition they were not as good but not terrible. However the code is soooo slow. It works quick for small files (of course) but large files >=5 MB can take minutes!!! I am using a 4096 byte moving window varient for the sliding window. I know the searching is costly for possible searching 4096 times per byte in the file, my question is how do ZIP and other compression programs execute so damn fast!? How do you profile your code for speed? I am trying to find the bottle neck in the code but am having trouble :(.

Re: LZ77 Very Slow

Posted: Sun Oct 16, 2011 7:07 pm
by CuteAlien
Use a profiler. On Linux there is gprof and on Windows a free profiler is "very sleepy" (really, that's it's name ^_^). Or add timer-code into your functions (for example like this: http://irrlicht.sourceforge.net/forum/v ... highlight=).

Re: LZ77 Very Slow

Posted: Thu Oct 20, 2011 1:18 am
by Alpha Omega
Thanks, that program is very good. I replaced the brute force search method I was using with a 256 x 256 index table. I am now compressing 50 MB @ 26.66 seconds and decompressing @ 10.00 seconds. Right now I am getting > 50% compression on most text files. I am going to implement a huffman code algorithm before the LZ77 (I think 7zip does this?) to hopefully achieve a good final compression.

I know Irrlicht uses zlib for compression right? For games and file compression, I know the decompression speed is most desired so you can decompress on the fly and put the data right into the game, right? I am hoping to get the final version to about 10 seconds compression and 5 seconds decompression by the time I am finished. I just don't know if the compression % will be as good, we will see.

Re: LZ77 Very Slow

Posted: Thu Oct 20, 2011 1:46 am
by mongoose7
Huffman is not good. Its only distinction is that, given a particular example, it will compress it better than any other method, but it does not then compress other data well. That is, you can set it up to compress just one piece of data.

Re: LZ77 Very Slow

Posted: Thu Oct 20, 2011 2:14 am
by Alpha Omega
Huffman will only not work if the file contains all 256 different byte combinations that are of equal probability. So prolly not good with image files but text files/log files can achieved great compression. Also using huffman algorithm will not eliminate the redundancy in the bit stream, as I see it replacing an 8 bit combination with a 3 bit combination will still result in the same encoded pattern giving further compression and faster LZ77 execution.

But the best compression is with arithmetic coding models, however the speed is terrible. This algorithm will be for very quick execution to possibly encode information through transmissions.

I have a problem with Huffman algorithm when it comes to transmitting the table used during compression. So maybe I will choose 2-3 predetermined look up tables to do the encoding, prolly more inefficient but will keep the speed cost low.

Re: LZ77 Very Slow

Posted: Thu Oct 20, 2011 9:34 am
by mongoose7
Alpha Omega wrote:Huffman will only not work if the file contains all 256 different byte combinations that are of equal probability. So prolly not good with image files but text files/log files can achieved great compression. Also using huffman algorithm will not eliminate the redundancy in the bit stream, as I see it replacing an 8 bit combination with a 3 bit combination will still result in the same encoded pattern giving further compression and faster LZ77 execution.
Well, I was going to walk on by, and maybe I should, but:

2^3 is 8. So you are compressing files with only 8 distinct characters in them?

Re: LZ77 Very Slow

Posted: Thu Oct 20, 2011 10:19 pm
by Alpha Omega
No I was just using it as an example. Obviously you have use to use more bits for less probable characters. Anyway your right, huffman code is not very efficient compared to arithmetic coding, and I think I can make an implementation that is pretty quick, combining a LZW style dictionary into the arithmetic coding algorithm.

Re: LZ77 Very Slow

Posted: Fri Oct 21, 2011 11:04 am
by hybrid
Maybe try bzip or lzma

Re: LZ77 Very Slow

Posted: Tue Nov 22, 2011 1:06 am
by REDDemon
The fastest should be gzip. lzma has very good compression but is slower (not to much). Every compression algorithm has pros and contros (for games i think that zlib and gzip are very good.. other formats are good for server/clients wich needs to minimize bandwith when downloading). I never tried to use directly libraries like zlib or libpng. I spent some time on data crypt/decrypt streams. but not on data compression.

Re: LZ77 Very Slow

Posted: Tue Nov 22, 2011 3:49 am
by Virion
idsoft used zlib (I think) for doom3 and the compression is pretty good.
The assets for Doom 3 are 3.6GB uncompressed, and only 1.44GB after being packed up.
http://www.iddevnet.com/doom3/packfiles.php

Re: LZ77 Very Slow

Posted: Fri Dec 02, 2011 9:44 am
by CuteAlien
Jake03 a bot or not - that is the question. Typical bot pattern (the kind that copy-pastes texts which sound similar from elsewhere and adds spam a few days later), but could also be uninformed coder who asks for code-help without showing code...

Re: LZ77 Very Slow

Posted: Fri Dec 02, 2011 12:27 pm
by hybrid
Yep, actually the mentioned code is missing. But assuming this was just a copy-paste error, this post might make sense. But well, looking at the two other posts, it's either bot or troll, so maybe ready for deletion anyway...

Re: LZ77 Very Slow

Posted: Tue Dec 06, 2011 10:17 am
by CuteAlien
Ok, Jake03 turned out to be a bot (he added spam now). Grr.. really, really hate that combination of delayed spam + bots copying text that sounds similar to thread-text from somewhere.

Re: LZ77 Very Slow

Posted: Tue Dec 06, 2011 8:33 pm
by REDDemon
What about adding extra antibot questions for users wich have less than X messages? So before posting any message on the forum if you have less than 30 messages a question will be asked.

Re: LZ77 Very Slow

Posted: Tue Dec 06, 2011 9:11 pm
by CuteAlien
We're not even sure if that's bots anymore - or just people in some poor country which are payed for spamming. But the idea is sound - I'll ask Yoran if something like that could be done.