Class LZ77Compressor
Most LZ77 derived algorithms split input data into blocks of uncompressed data (called literal blocks) and back-references (pairs of offsets and lengths)
that state "add length
bytes that are the same as those already written starting offset
bytes before the current position. The details of how
those blocks and back-references are encoded are quite different between the algorithms and some algorithms perform additional steps (Huffman encoding in the
case of DEFLATE for example).
This class attempts to extract the core logic - finding back-references - so it can be re-used. It follows the algorithm explained in section 4 of RFC 1951 (DEFLATE) and currently doesn't implement the "lazy match" optimization. The three-byte hash function used in this class is the same as the one used by zlib and InfoZIP's ZIP implementation of DEFLATE. The whole class is strongly inspired by InfoZIP's implementation.
LZ77 is used vaguely here (as well as many other places that talk about it :-), LZSS would likely be closer to the truth but LZ77 has become the synonym for a whole family of algorithms.
The API consists of a compressor that is fed byte
s and emits LZ77Compressor.Block
s to a registered callback where the blocks represent either
literal blocks
, back-references
or end of data markers
. In order to ensure the callback receives all
information, the #finish
method must be used once all data has been fed into the compressor.
Several parameters influence the outcome of the "compression":
windowSize
- the size of the sliding window, must be a power of two - this determines the maximum offset a back-reference can take. The compressor maintains a buffer
of twice of
windowSize
- real world values are in the area of 32k. minBackReferenceLength
- Minimal length of a back-reference found. A true minimum of 3 is hard-coded inside of this implementation but bigger lengths can be configured.
maxBackReferenceLength
- Maximal length of a back-reference found.
maxOffset
- Maximal offset of a back-reference.
maxLiteralLength
- Maximal length of a literal block.
- Since:
- 1.14
- See Also:
-
- "https://tools.ietf.org/html/rfc1951#section-4"
- This class is not thread-safe
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final class
Represents a back-reference.static class
Base class representing blocks the compressor may emit.static interface
Callback invoked while the compressor processes data.static final class
A simple "we are done" marker.static final class
Represents a literal block of data. -
Constructor Summary
ConstructorDescriptionLZ77Compressor
(Parameters params, LZ77Compressor.Callback callback) Initializes a compressor with parameters and a callback. -
Method Summary
Modifier and TypeMethodDescriptionvoid
compress
(byte[] data) Feeds bytes into the compressor which in turn may emit zero or more blocks to the callback during the execution of this method.void
compress
(byte[] data, int off, int len) Feeds bytes into the compressor which in turn may emit zero or more blocks to the callback during the execution of this method.void
finish()
Tells the compressor to process all remaining data and signal end of data to the callback.void
prefill
(byte[] data) Adds some initial data to fill the window with.
-
Constructor Details
-
LZ77Compressor
Initializes a compressor with parameters and a callback.- Parameters:
params
- the parameterscallback
- the callback- Throws:
NullPointerException
- if either parameter isnull
-
-
Method Details
-
compress
Feeds bytes into the compressor which in turn may emit zero or more blocks to the callback during the execution of this method.- Parameters:
data
- the data to compress - must not be null- Throws:
IOException
- if the callback throws an exception
-
compress
Feeds bytes into the compressor which in turn may emit zero or more blocks to the callback during the execution of this method.- Parameters:
data
- the data to compress - must not be nulloff
- the start offset of the datalen
- the number of bytes to compress- Throws:
IOException
- if the callback throws an exception
-
finish
Tells the compressor to process all remaining data and signal end of data to the callback.The compressor will in turn emit at least one block (
LZ77Compressor.EOD
) but potentially multiple blocks to the callback during the execution of this method.- Throws:
IOException
- if the callback throws an exception
-
prefill
Adds some initial data to fill the window with.This is used if the stream has been cut into blocks and back-references of one block may refer to data of the previous block(s). One such example is the LZ4 frame format using block dependency.
- Parameters:
data
- the data to fill the window with.- Throws:
IllegalStateException
- if the compressor has already started to accept data
-