I just put together salza 0.5. Everything should be around two times faster. I learned a lot about LispWorks, Allegro, and SBCMUCL in the process. Well, mostly LispWorks.
Salza implements the DEFLATE data format; this produces a stream of literal values and length, distance pairs. It also uses a naive version the compression algorithm suggested in the RFC: keep a hash table of the positions of three-byte combinations in the input stream, and use the table to continually look for past matches to incoming data. (There are wrinkles, but that's the gist.)
Statistical profilers are very handy tools. Both Allegro and SBCL make it very obvious where to look for bottlenecks.
Lisp hash tables were one of the most obvious bottlenecks. Since the keys and values are all in the range of 0 to #xFFFFFF, I wrote a specialized hash table that is somewhat faster than the default implementation. Then I found out that #xFFFFFF is a bignum on LispWorks, and cried a little inside.
Another obvious bottleneck was REPLACE. Salza used REPLACE to copy data from buffer to buffer, and it was a significant bottleneck on SBCL, after hash tables. Writing a version of REPLACE with lots of octet-vector declarations was a performance win.
SBCL and CMUCL use type inference to generate very fast code. They can also automagically use unboxed mod-32 arithmetic. I changed my implementation of the CRC32 algorithm from one that worked on 32 bit values (which didn't cons at all in SBCL) to one that worked on the high and low 16 bits instead to avoid a lot of overhead in Allegro and LispWorks.
(LispWorks has a special int32 type, but then you're forced to use a separate, non-standard interface for working with objects of that type. Ugh.)
(simple-array foo) is not equivalent to (simple-array foo (*)); getting it right improves things a lot.
Anything not inlined can be a big loser. I managed to eliminate many non-inline calls to AREF, ASH, and bitwise logical ops, but there is still room for improvement.
In fact, there is a lot of room for improvement everywhere. This has been a real brain-stretching exercise; it's my first time working with LispWorks and Allegro, my first hash table implementation, my first time staring at disassembly, my first time trying to make accurate type declarations to squeeze out performance. It's been a lot of fun, and I hope I can make salza faster and less ugly as time goes by.