WriterOptions configures Writer. LGWin is the base 2 logarithm of the sliding window size.
Range is 10 to 24. 0 indicates automatic configuration based on Quality. Quality controls the compression-speed vs compression-density trade-offs.
The higher the quality, the slower the compression. Range is 0 to 11.
func NewWriterOptions(dst io.Writer, options WriterOptions) *Writer
A (forgetful) hash table to the data seen by the compressor, to
help create backward references to previous data.
Hashes are stored in chains which are bucketed to groups. Group of chains
share a storage "bank". When more than "bank size" chain nodes are added,
oldest nodes are replaced; this way several chains may share a tail.addr[]uint32bankBitsuintbanks[][]slotbucketBitsuintfree_slot_idx[]uint16hasherCommonhasherCommonhasherCommon.dict_num_lookupsuinthasherCommon.dict_num_matchesuinthasherCommon.is_prepared_boolhasherCommon.paramshasherParamshead[]uint16max_hopsuintnumBanksuintnumLastDistancesToCheckinttiny_hash[65536]byte(*hashForgetfulChain) Common() *hasherCommon Find a longest backward match of &data[cur_ix] up to the length of
max_length and stores the position cur_ix in the hash table.
REQUIRES: PrepareDistanceCachehashForgetfulChain must be invoked for current distance cache
values; if this method is invoked repeatedly with the same distance
cache values, it is enough to invoke PrepareDistanceCachehashForgetfulChain once.
Does not look for matches longer than max_length.
Does not look for matches further away than max_backward.
Writes the best match into |out|.
|out|->score is updated only if a better match is found. HashBytes is the function that chooses the bucket to place the address in.(*hashForgetfulChain) HashTypeLength() uint(*hashForgetfulChain) Initialize(params *encoderParams)(*hashForgetfulChain) Prepare(one_shot bool, input_size uint, data []byte)(*hashForgetfulChain) PrepareDistanceCache(distance_cache []int)(*hashForgetfulChain) StitchToPreviousBlock(num_bytes uint, position uint, ringbuffer []byte, ring_buffer_mask uint) Look at 4 bytes at &data[ix & mask]. Compute a hash from these, and prepend
node to corresponding chain; also update tiny_hash for current position.(*hashForgetfulChain) StoreLookahead() uint(*hashForgetfulChain) StoreRange(data []byte, mask uint, ix_start uint, ix_end uint)
*hashForgetfulChain : hasherHandle
A (forgetful) hash table to the data seen by the compressor, to
help create backward references to previous data.
This is a hash map of fixed size (1 << 16). Starting from the
given index, 1 buckets are used to store values of a key.bucketBitsuintbucketSweepintbuckets[]uint32hasherCommonhasherCommonhasherCommon.dict_num_lookupsuinthasherCommon.dict_num_matchesuinthasherCommon.is_prepared_boolhasherCommon.paramshasherParamshashLenuintuseDictionarybool(*hashLongestMatchQuickly) Common() *hasherCommon Find a longest backward match of &data[cur_ix & ring_buffer_mask]
up to the length of max_length and stores the position cur_ix in the
hash table.
Does not look for matches longer than max_length.
Does not look for matches further away than max_backward.
Writes the best match into |out|.
|out|->score is updated only if a better match is found. HashBytes is the function that chooses the bucket to place
the address in. The HashLongestMatch and hashLongestMatchQuickly
classes have separate, different implementations of hashing.(*hashLongestMatchQuickly) HashTypeLength() uint(*hashLongestMatchQuickly) Initialize(params *encoderParams)(*hashLongestMatchQuickly) Prepare(one_shot bool, input_size uint, data []byte)(*hashLongestMatchQuickly) PrepareDistanceCache(distance_cache []int)(*hashLongestMatchQuickly) StitchToPreviousBlock(num_bytes uint, position uint, ringbuffer []byte, ringbuffer_mask uint) Look at 5 bytes at &data[ix & mask].
Compute a hash from these, and store the value somewhere within
[ix .. ix+3].(*hashLongestMatchQuickly) StoreLookahead() uint(*hashLongestMatchQuickly) StoreRange(data []byte, mask uint, ix_start uint, ix_end uint)
*hashLongestMatchQuickly : hasherHandle
Package-Level Functions (total 454, in which 6 are exported)
HTTPCompressor chooses a compression method (brotli, gzip, or none) based on
the Accept-Encoding header, sets the Content-Encoding header, and returns a
WriteCloser that implements that compression. The Close method must be called
before the current HTTP handler returns.
NewReader creates a new Reader reading the given reader.
Writes to the returned writer are compressed and written to dst.
It is the caller's responsibility to call Close on the Writer when done.
Writes may be buffered and not flushed until Close.
NewWriterLevel is like NewWriter but specifies the compression level instead
of assuming DefaultCompression.
The compression level can be DefaultCompression or any integer value between
BestSpeed and BestCompression inclusive.
NewWriterOptions is like NewWriter but specifies WriterOptions
NewWriterV2 is like NewWriterLevel, but it uses the new implementation
based on the matchfinder package. It currently supports up to level 9;
if a higher level is specified, level 9 will be used.
Usually, we always choose the longest backward reference. This function
allows for the exception of that rule.
If we choose a backward reference that is further away, it will
usually be coded with more bits. We approximate this by assuming
log2(distance). If the distance can be expressed in terms of the
last four distances, we use some heuristic constants to estimate
the bits cost. For the first up to four literals we use the bit
cost of the literals from the literal cost model, after that we
use the average bit cost of the cost model.
This function is used to sometimes discard a longer backward reference
when it is not much longer and the bit cost for encoding it is more
than the saved literals.
backward_reference_offset MUST be positive.
Ensures that accumulator is not empty.
May consume up to sizeof(brotli_reg_t) - 1 bytes of input.
Returns false if data is required but there is no input available.
For BROTLI_ALIGNED_READ this function also prepares bit reader for aligned
reading.
Adds the next symbol to the current histogram. When the current histogram
reaches the target size, decides on merging the block.
Adds the next symbol to the current histogram. When the current histogram
reaches the target size, decides on merging the block.
Adds the next symbol to the current histogram. When the current histogram
reaches the target size, decides on merging the block.
Does either of three things:
(1) emits the current block with a new block type;
(2) emits the current block with the type of the second last block;
(3) merges the current block with the last block.
Does either of three things:
(1) emits the current block with a new block type;
(2) emits the current block with the type of the second last block;
(3) merges the current block with the last block.
Does either of three things:
(1) emits the current block with a new block type;
(2) emits the current block with the type of the second last block;
(3) merges the current block with the last block.
Builds a literal prefix code into "depths" and "bits" based on the statistics
of the "input" string and stores it into the bit stream.
Note that the prefix code here is built from the pre-LZ77 input, therefore
we can only approximate the statistics of the actual literal stream.
Moreover, for long inputs we build a histogram from a sample of the input
and thus have to assign a non-zero depth for each literal.
Returns estimated compression ratio millibytes/char for encoding given input
with generated code.
Calculates the smallest feasible ring buffer.
If we know the data size is small, do not allocate more ring buffer
size than needed to reduce memory usage.
When this method is called, metablock size and flags MUST be decoded.
Computes the bit cost reduction by combining out[idx1] and out[idx2] and if
it is below a threshold, stores the pair (idx1, idx2) in the *pairs queue.
Computes the bit cost reduction by combining out[idx1] and out[idx2] and if
it is below a threshold, stores the pair (idx1, idx2) in the *pairs queue.
Computes the bit cost reduction by combining out[idx1] and out[idx2] and if
it is below a threshold, stores the pair (idx1, idx2) in the *pairs queue.
Compresses "input" string to the "*storage" buffer as one or more complete
meta-blocks, and updates the "*storage_ix" bit position.
If "is_last" is 1, emits an additional empty last meta-block.
"cmd_depth" and "cmd_bits" contain the command and distance prefix codes
(see comment in encode.h) used for the encoding of this input fragment.
If "is_last" is 0, they are updated to reflect the statistics
of this input fragment, to be used for the encoding of the next fragment.
"*cmd_code_numbits" is the number of bits of the compressed representation
of the command and distance prefix codes, and "cmd_code" is an array of
at least "(*cmd_code_numbits + 7) >> 3" size that contains the compressed
command and distance prefix codes. If "is_last" is 0, these are also
updated to represent the updated "cmd_depth" and "cmd_bits".
REQUIRES: "input_size" is greater than zero, or "is_last" is 1.
REQUIRES: "input_size" is less or equal to maximal metablock size (1 << 24).
REQUIRES: All elements in "table[0..table_size-1]" are initialized to zero.
REQUIRES: "table_size" is an odd (9, 11, 13, 15) power of two
OUTPUT: maximal copy distance <= |input_size|
OUTPUT: maximal copy distance <= BROTLI_MAX_BACKWARD_LIMIT(18)
Compresses "input" string to the "*storage" buffer as one or more complete
meta-blocks, and updates the "*storage_ix" bit position.
If "is_last" is 1, emits an additional empty last meta-block.
REQUIRES: "input_size" is greater than zero, or "is_last" is 1.
REQUIRES: "input_size" is less or equal to maximal metablock size (1 << 24).
REQUIRES: "command_buf" and "literal_buf" point to at least
kCompressFragmentTwoPassBlockSize long arrays.
REQUIRES: All elements in "table[0..table_size-1]" are initialized to zero.
REQUIRES: "table_size" is a power of two
OUTPUT: maximal copy distance <= |input_size|
OUTPUT: maximal copy distance <= BROTLI_MAX_BACKWARD_LIMIT(18)
Fills in dist_cache[0..3] with the last four distances (as defined by
Section 4. of the Spec) that would be used at (block_start + pos) if we
used the shortest path of commands from block_start, computed from
nodes[0..pos]. The last four distances at block_start are in
starting_dist_cache[0..3].
REQUIRES: nodes[pos].cost < kInfinity
REQUIRES: nodes[0..pos] satisfies that "ZopfliNode array invariant".
Returns the minimum possible copy length that can improve the cost of any
future position.
Returns log2 of the size of main ring buffer area.
Allocate at least lgwin + 1 bits for the ring buffer so that the newly
added block fits there completely and we still get lgwin bits and at least
read_block_size_bits + 1 bits because the copy tail length needs to be
smaller than ring-buffer size.
Adds the next symbol to the current block type and context. When the
current block reaches the target size, decides on merging the block.
Does either of three things:
(1) emits the current block with a new block type;
(2) emits the current block with the type of the second last block;
(3) merges the current block with the last block.
Get the actual bit values for a tree of bit depths.
Copies remaining input bytes stored in the bit reader to the output. Value
|num| may not be larger than BrotliGetRemainingBytes. The bit reader must be
warmed up again after this.
Copies the given input data to the internal ring buffer of the compressor.
No processing of the data occurs at this time and this function can be
called multiple times before calling WriteBrotliData() to process the
accumulated input. At most input_block_size() bytes of input data can be
copied to the ring buffer, otherwise the next WriteBrotliData() will fail.
This function will create a Huffman tree.
The catch here is that the tree cannot be arbitrarily deep.
Brotli specifies a maximum depth of 15 bits for "code trees"
and 7 bits for "code length code trees."
count_limit is the value that is to be faked as the minimum value
and this minimum value is raised until the tree matches the
maximum length requirement.
This algorithm is not of excellent performance for very long data blocks,
especially when population counts are longer than 2**tree_limit, but
we are not planning to use this with extremely long blocks.
See http://en.wikipedia.org/wiki/Huffman_coding
Block switch for insert/copy length.
Reads 3..54 bits.
Decodes a context map.
Decoding is done in 4 phases:
1) Read auxiliary information (6..16 bits) and allocate memory.
In case of trivial context map, decoding is finished at this phase.
2) Decode Huffman table using ReadHuffmanCode function.
This table will be used for reading context map items.
3) Read context map items; "0" values could be run-length encoded.
4) Optionally, apply InverseMoveToFront transform to the resulting map.
Decodes the block type and updates the state for literal context.
Reads 3..54 bits.
Decodes a metablock length and flags by reading 2 - 31 bits.
Invariant: input stream is never overconsumed:
- invalid input implies that the whole stream is invalid -> any amount of
input could be read and discarded
- when result is "needs more input", then at least one more byte is REQUIRED
to complete decoding; all input data MUST be consumed by decoder, so
client could swap the input buffer
- when result is "needs more output" decoder MUST ensure that it doesn't
hold more than 7 bits in bit reader; this saves client from swapping input
buffer ahead of time
- when result is "success" decoder MUST return all unused data back to input
buffer; this is possible because the invariant is held on enter
Decodes the Huffman code.
This method doesn't read data from the bit reader, BUT drops the amount of
bits that correspond to the decoded symbol.
bits MUST contain at least 15 (BROTLI_HUFFMAN_MAX_CODE_LENGTH) valid bits.
Decodes a number in the range [0..255], by reading 1 - 11 bits.
Decodes WBITS by reading 1 - 7 bits, or 0x11 for "Large Window Brotli".
Precondition: bit-reader accumulator has at least 8 bits.
Processes the accumulated input data and writes
the new output meta-block to s.dest, if one has been
created (otherwise the processed input data is buffered internally).
If |is_last| or |force_flush| is true, an output meta-block is
always created. However, until |is_last| is true encoder may retain up
to 7 bits of the last byte of output. To force encoder to dump the remaining
bits use WriteMetadata() to append an empty meta-data block.
Returns false if the size of the input data is larger than
input_block_size().
|nibblesbits| represents the 2 bits to encode MNIBBLES (0-3)
REQUIRES: length > 0
REQUIRES: length <= (1 << 24)
Allocates ring-buffer.
s->ringbuffer_size MUST be updated by BrotliCalculateRingBufferSize before
this function is called.
Last two bytes of ring-buffer are initialized to 0, so context calculation
could be done uniformly for the first two and all other positions.
Faster logarithm for small integers, with the property of log2(0) == 0.
Guarantees that there are at least |n_bits| + 1 bits in accumulator.
Precondition: accumulator contains at least 1 bit.
|n_bits| should be in the range [1..24] for regular build. For portable
non-64-bit little-endian build only 16 bits are safe to request.
Mostly like BrotliFillBitWindow, but guarantees only 16 bits and reads no
more than BROTLI_SHORT_FILL_BIT_WINDOW_READ bytes of input.
Finds all backward matches of &data[cur_ix & ring_buffer_mask] up to the
length of max_length and stores the position cur_ix in the hash table.
Sets *num_matches to the number of matches found, and stores the found
matches in matches[0] to matches[*num_matches - 1]. The matches will be
sorted by strictly increasing length and (non-strictly) increasing
distance.
Assigns a block id from the range [0, num_histograms) to each data element
in data[0..length) and fills in block_id[0..length) with the assigned values.
Returns the number of blocks, i.e. one plus the number of block switches.
Assigns a block id from the range [0, num_histograms) to each data element
in data[0..length) and fills in block_id[0..length) with the assigned values.
Returns the number of blocks, i.e. one plus the number of block switches.
Assigns a block id from the range [0, num_histograms) to each data element
in data[0..length) and fills in block_id[0..length) with the assigned values.
Returns the number of blocks, i.e. one plus the number of block switches.
Function to find maximal matching prefixes of strings.
Copyright 2013 Google Inc. All Rights Reserved.
Distributed under MIT license.
See file LICENSE for detail or copy at https://opensource.org/licenses/MIT
Find the best 'out' histogram for each of the 'in' histograms.
When called, clusters[0..num_clusters) contains the unique values from
symbols[0..in_size), but this property is not preserved in this function.
Note: we assume that out[]->bit_cost_ is already up-to-date.
Find the best 'out' histogram for each of the 'in' histograms.
When called, clusters[0..num_clusters) contains the unique values from
symbols[0..in_size), but this property is not preserved in this function.
Note: we assume that out[]->bit_cost_ is already up-to-date.
Decodes a series of Huffman table using ReadHuffmanCode function.
Transform:
1) initialize list L with values 0, 1,... 255
2) For each input element X:
2.1) let Y = L[X]
2.2) remove X-th element from L
2.3) prepend Y to L
2.4) append Y to output
In most cases max(Y) <= 7, so most of L remains intact.
To reduce the cost of initialization, we reuse L, remember the upper bound
of Y values, and reinitialize only first elements in L.
Most of input values are 0 and 1. To reduce number of branches, we replace
inner for loop with do-while.
When searching for backward references and have not seen matches for a long
time, we can skip some match lookups. Unsuccessful match lookups are very
expensive and this kind of a heuristic speeds up compression quite a lot.
At first 8 byte strides are taken and every second byte is put to hasher.
After 4x more literals stride by 16 bytes, every put 4-th byte to hasher.
Applied only to qualities 2 to 9.
negotiateContentEncoding returns the best offered content encoding for the
request's Accept-Encoding header. If two offers match with equal weight and
then the offer earlier in the list is preferred. If no offers are
acceptable, then "" is returned.
Returns the table width of the next 2nd level table. |count| is the histogram
of bit lengths for the remaining symbols, |len| is the code length of the
next processed symbol.
Change the population counts in a way that the consequent
Huffman tree compression, especially its RLE-part will be more
likely to compress this data more efficiently.
length contains the size of the histogram.
counts contains the population counts.
good_for_rle is a buffer of at least length size
parseAccept parses Accept* headers.
Returns 1 if at least min_fraction of the bytes between pos and
pos + length in the (data, mask) ring-buffer is UTF8-encoded, otherwise
returns 0.
Process repeated symbol code length.
A) Check if it is the extension of previous repeat sequence; if the decoded
value is not BROTLI_REPEAT_PREVIOUS_CODE_LENGTH, then it is a new
symbol-skip
B) Update repeat variable
C) Check if operation is feasible (fits alphabet)
D) For each symbol do the same operations as in ProcessSingleCodeLength
PRECONDITION: code_len == BROTLI_REPEAT_PREVIOUS_CODE_LENGTH or
code_len == BROTLI_REPEAT_ZERO_CODE_LENGTH
Process single decoded symbol code length:
A) reset the repeat variable
B) remember code length (if it is not 0)
C) extend corresponding index-chain
D) reduce the Huffman space
E) update the histogram
Tries to pull one byte of input to accumulator.
Returns false if there is no input available.
Decodes the Huffman tables.
There are 2 scenarios:
A) Huffman code contains only few symbols (1..4). Those symbols are read
directly; their code lengths are defined by the number of symbols.
For this scenario 4 - 49 bits will be read.
B) 2-phase decoding:
B.1) Small Huffman table is decoded; it is specified with code lengths
encoded with predefined entropy code. 32 - 74 bits are used.
B.2) Decoded table is used to decode code lengths of symbols in resulting
Huffman table. In worst case 3520 bits are read.
Decodes the next Huffman code using data prepared by PreloadSymbol.
Reads 0 - 15 bits. Also peeks 8 following bits.
Reads (s->symbol + 1) symbols.
Totally 1..4 symbols are read, 1..11 bits each.
The list of symbols MUST NOT contain duplicates.
Reads and decodes the next Huffman code from bit-stream.
This method peeks 16 bits of input and drops 0 - 15 of them.
Returns reverse(num >> BROTLI_REVERSE_BITS_BASE, BROTLI_REVERSE_BITS_MAX),
where reverse(value, len) is the bit-wise reversal of the len least
significant bits of value.
Finds runs of zeros in v[0..in_size) and replaces them with a prefix code of
the run length plus extra bits (lower 9 bits is the prefix code and the rest
are the extra bits). Non-zero values in v[] are shifted by
*max_length_prefix. Will not create prefix codes bigger than the initial
value of *max_run_length_prefix. The prefix code of run length L is simply
Log2Floor(L) and the number of extra bits is the same as the prefix code.
Stores the hash of the next 4 bytes and in a single tree-traversal, the
hash bucket's binary tree is searched for matches and is re-rooted at the
current position.
If less than 128 data is available, the hash bucket of the
current position is searched for matches, but the state of the hash table
is not changed, since we can not know the final sorting order of the
current (incomplete) sequence.
This function must be called with increasing cur_ix positions.
Stores the block switch command with index block_ix to the bit stream.
This function writes bits into bytes in increasing addresses, and within
a byte least-significant-bit first.
The function can write up to 56 bits in one go with WriteBits
Example: let's assume that 3 bits (Rs below) have been written already:
BYTE-0 BYTE+1 BYTE+2
0000 0RRR 0000 0000 0000 0000
Now, we could write 5 or less bits in MSB by just sifting by 3
and OR'ing to BYTE-0.
For n bits, we take the last 5 bits, OR that with high bits in BYTE-0,
and locate the rest in BYTE+1, BYTE+2, etc.
Write a Huffman tree from bit depths into the bit-stream representation
of a Huffman tree. The generated Huffman tree is to be compressed once
more using a Huffman tree
Dumps remaining output bits and metadata header to |header|.
Returns number of produced bytes.
REQUIRED: |header| should be 8-byte aligned and at least 16 bytes long.
REQUIRED: |block_size| <= (1 << 24).
Dumps output.
Returns BROTLI_DECODER_NEEDS_MORE_OUTPUT only if there is more output to push
and either ring-buffer is as big as window size, or |force| is true.
Computes the shortest path of commands from position to at most
position + num_bytes.
On return, path->size() is the number of commands found and path[i] is the
length of the i-th command (copy length plus insert length).
Note that the sum of the lengths of all commands can be less than num_bytes.
On return, the nodes[0..num_bytes] array will have the following
"ZopfliNode array invariant":
For each i in [1..num_bytes], if nodes[i].cost < kInfinity, then
(1) nodes[i].copy_length() >= 2
(2) nodes[i].command_length() <= i and
(3) nodes[i - nodes[i].command_length()].cost < kInfinity
REQUIRES: nodes != nil and len(nodes) >= num_bytes + 1
A lookup table for small values of log2(int) to be used in entropy
computation.
", ".join(["%.16ff" % x for x in [0.0]+[log2(x) for x in range(1, 256)]])
Maximum possible Huffman table size for an alphabet size of (index * 32),
max code length 15 and root table bits 8.
kHashMul32 multiplier has these properties:
* The multiplier must be odd. Otherwise we may lose the highest bit.
* No long streaks of ones or zeros.
* There is no effort to ensure that it is a prime, the oddity is enough
for this use.
* The number has been tuned heuristically against compression benchmarks.
We need the slack region for the following reasons:
- doing up to two 16-byte copies for fast backward copying
- inserting transformed dictionary word (5 prefix + 24 base + 8 suffix)
* Operations that can be performed by streaming encoder.
* Operations that can be performed by streaming encoder.
* Operations that can be performed by streaming encoder.
* Operations that can be performed by streaming encoder.
readBufSize is a "good" buffer size that avoids excessive round-trips
between C and Go but doesn't waste too much memory on buffering.
It is arbitrarily chosen to be equal to the constant used in io.Copy.
The pages are generated with Goldsv0.8.4. (GOOS=linux GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds.