Involved Source Filesdecode.godecode_other.goencode.goencode_other.go Package snapref implements the Snappy compression format. It aims for very
high speeds and reasonable compression.
There are actually two Snappy formats: block and stream. They are related,
but different: trying to decompress block-compressed data as a Snappy stream
will fail, and vice versa. The block format is the Decode and Encode
functions and the stream format is the Reader and Writer types.
The block format, the more common case, is used when the complete size (the
number of bytes) of the original data is known upfront, at the time
compression starts. The stream format, also known as the framing format, is
for when that isn't always true.
The canonical, C++ implementation is at https://github.com/google/snappy and
it only implements the block format.
Package-Level Type Names (total 2, both are exported)
/* sort exporteds by: | */
Reader is an io.Reader that can read Snappy-compressed bytes.
Reader handles the Snappy stream format, not the Snappy block format.buf[]bytedecoded[]byteerrerror decoded[i:j] contains decoded bytes that have not yet been passed on. decoded[i:j] contains decoded bytes that have not yet been passed on.rio.ReaderreadHeaderbool Read satisfies the io.Reader interface. ReadByte satisfies the io.ByteReader interface. Reset discards any buffered data, resets all state, and switches the Snappy
reader to read from r. This permits reusing a Reader rather than allocating
a new one.(*Reader) fill() error(*Reader) readFull(p []byte, allowEOF bool) (ok bool)
*Reader : github.com/klauspost/compress/flate.Reader
*Reader : compress/flate.Reader
*Reader : io.ByteReader
*Reader : io.Reader
func NewReader(r io.Reader) *Reader
Writer is an io.Writer that can write Snappy-compressed bytes.
Writer handles the Snappy stream format, not the Snappy block format.errerror ibuf is a buffer for the incoming (uncompressed) bytes.
Its use is optional. For backwards compatibility, Writers created by the
NewWriter function have ibuf == nil, do not buffer incoming bytes, and
therefore do not need to be Flush'ed or Close'd. obuf is a buffer for the outgoing (compressed) bytes.wio.Writer wroteStreamHeader is whether we have written the stream header. Close calls Flush and then closes the Writer. Flush flushes the Writer to its underlying io.Writer. Reset discards the writer's state and switches the Snappy writer to write to
w. This permits reusing a Writer rather than allocating a new one. Write satisfies the io.Writer interface.(*Writer) write(p []byte) (nRet int, errRet error)
*Writer : internal/bisect.Writer
*Writer : io.Closer
*Writer : io.WriteCloser
*Writer : io.Writer
*Writer : github.com/refraction-networking/utls.transcriptHash
*Writer : crypto/tls.transcriptHash
func NewBufferedWriter(w io.Writer) *Writer
func NewWriter(w io.Writer) *Writer
Package-Level Functions (total 17, in which 8 are exported)
Decode returns the decoded form of src. The returned slice may be a sub-
slice of dst if dst was large enough to hold the entire decoded block.
Otherwise, a newly allocated slice will be returned.
The dst and src must not overlap. It is valid to pass a nil dst.
Decode handles the Snappy block format, not the Snappy stream format.
DecodedLen returns the length of the decoded block.
Encode returns the encoded form of src. The returned slice may be a sub-
slice of dst if dst was large enough to hold the entire encoded block.
Otherwise, a newly allocated slice will be returned.
The dst and src must not overlap. It is valid to pass a nil dst.
Encode handles the Snappy block format, not the Snappy stream format.
EncodeBlockInto exposes encodeBlock but checks dst size.
MaxEncodedLen returns the maximum length of a snappy block, given its
uncompressed length.
It will return a negative value if srcLen is too large to encode.
NewBufferedWriter returns a new Writer that compresses to w, using the
framing format described at
https://github.com/google/snappy/blob/master/framing_format.txt
The Writer returned buffers writes. Users must call Close to guarantee all
data has been forwarded to the underlying io.Writer. They may also call
Flush zero or more times before calling Close.
NewReader returns a new Reader that decompresses from r, using the framing
format described at
https://github.com/google/snappy/blob/master/framing_format.txt
NewWriter returns a new Writer that compresses to w.
The Writer returned does not buffer writes. There is no need to Flush or
Close such a Writer.
Deprecated: the Writer returned is not suitable for many small writes, only
for few large writes. Use NewBufferedWriter instead, which is efficient
regardless of the frequency and shape of the writes, and remember to Close
that Writer when done.
crc implements the checksum specified in section 3 of
https://github.com/google/snappy/blob/master/framing_format.txt
decode writes the decoding of src to dst. It assumes that the varint-encoded
length of the decompressed bytes has already been read, and that len(dst)
equals that length.
It returns 0 on success or a decodeErrCodeXxx error code on failure.
decodedLen returns the length of the decoded block and the number of bytes
that the length header occupied.
emitCopy writes a copy chunk and returns the number of bytes written.
It assumes that:
dst is long enough to hold the encoded bytes
1 <= offset && offset <= 65535
4 <= length && length <= 65535
emitLiteral writes a literal chunk and returns the number of bytes written.
It assumes that:
dst is long enough to hold the encoded bytes
1 <= len(lit) && len(lit) <= 65536
encodeBlock encodes a non-empty src to a guaranteed-large-enough dst. It
assumes that the varint-encoded length of the decompressed bytes has already
been written.
It also assumes that:
len(dst) >= MaxEncodedLen(len(src)) &&
minNonLiteralBlockSize <= len(src) && len(src) <= maxBlockSize
inputMargin is the minimum number of extra input bytes to keep, inside
encodeBlock's inner loop. On some architectures, this margin lets us
implement a fast path for emitLiteral, where the copy of short (<= 16 byte)
literals can be implemented as a single load to and store from a 16-byte
register. That literal's actual length can be as short as 1 byte, so this
can copy up to 15 bytes too much, but that's OK as subsequent iterations of
the encoding loop will fix up the copy overrun, and this inputMargin ensures
that we don't overrun the dst and src buffers.
maxBlockSize is the maximum size of the input to encodeBlock. It is not
part of the wire format per se, but some parts of the encoder assume
that an offset fits into a uint16.
Also, for the framing format (Writer type instead of Encode function),
https://github.com/google/snappy/blob/master/framing_format.txt says
that "the uncompressed data in a chunk must be no longer than 65536
bytes".
maxEncodedLenOfMaxBlockSize equals MaxEncodedLen(maxBlockSize), but is
hard coded to be a const instead of a variable, so that obufLen can also
be a const. Their equivalence is confirmed by
TestMaxEncodedLenOfMaxBlockSize.
minNonLiteralBlockSize is the minimum size of the input to encodeBlock that
could be encoded with a copy tag. This is the minimum with respect to the
algorithm used by encodeBlock, not a minimum enforced by the file format.
The encoded output must start with at least a 1 byte literal, as there are
no previous bytes to copy. A minimal (1 byte) copy after that, generated
from an emitCopy call in encodeBlock's main loop, would require at least
another inputMargin bytes, for the reason above: we want any emitLiteral
calls inside encodeBlock's main loop to use the fast path if possible, which
requires being able to overrun by inputMargin bytes. Thus,
minNonLiteralBlockSize equals 1 + 1 + inputMargin.
The C++ code doesn't use this exact threshold, but it could, as discussed at
https://groups.google.com/d/topic/snappy-compression/oGbhsdIJSJ8/discussion
The difference between Go (2+inputMargin) and C++ (inputMargin) is purely an
optimization. It should not affect the encoded form. This is tested by
TestSameEncodingAsCppShortCopies.
Each encoded block begins with the varint-encoded length of the decoded data,
followed by a sequence of chunks. Chunks begin and end on byte boundaries. The
first byte of each chunk is broken into its 2 least and 6 most significant bits
called l and m: l ranges in [0, 4) and m ranges in [0, 64). l is the chunk tag.
Zero means a literal tag. All other values mean a copy tag.
For literal tags:
- If m < 60, the next 1 + m bytes are literal bytes.
- Otherwise, let n be the little-endian unsigned integer denoted by the next
m - 59 bytes. The next 1 + n bytes after that are literal bytes.
For copy tags, length bytes are copied from offset bytes ago, in the style of
Lempel-Ziv compression algorithms. In particular:
- For l == 1, the offset ranges in [0, 1<<11) and the length in [4, 12).
The length is 4 + the low 3 bits of m. The high 3 bits of m form bits 8-10
of the offset. The next byte is bits 0-7 of the offset.
- For l == 2, the offset ranges in [0, 1<<16) and the length in [1, 65).
The length is 1 + m. The offset is the little-endian unsigned integer
denoted by the next 2 bytes.
- For l == 3, this tag is a legacy format that is no longer issued by most
encoders. Nonetheless, the offset ranges in [0, 1<<32) and the length in
[1, 65). The length is 1 + m. The offset is the little-endian unsigned
integer denoted by the next 4 bytes.
Each encoded block begins with the varint-encoded length of the decoded data,
followed by a sequence of chunks. Chunks begin and end on byte boundaries. The
first byte of each chunk is broken into its 2 least and 6 most significant bits
called l and m: l ranges in [0, 4) and m ranges in [0, 64). l is the chunk tag.
Zero means a literal tag. All other values mean a copy tag.
For literal tags:
- If m < 60, the next 1 + m bytes are literal bytes.
- Otherwise, let n be the little-endian unsigned integer denoted by the next
m - 59 bytes. The next 1 + n bytes after that are literal bytes.
For copy tags, length bytes are copied from offset bytes ago, in the style of
Lempel-Ziv compression algorithms. In particular:
- For l == 1, the offset ranges in [0, 1<<11) and the length in [4, 12).
The length is 4 + the low 3 bits of m. The high 3 bits of m form bits 8-10
of the offset. The next byte is bits 0-7 of the offset.
- For l == 2, the offset ranges in [0, 1<<16) and the length in [1, 65).
The length is 1 + m. The offset is the little-endian unsigned integer
denoted by the next 2 bytes.
- For l == 3, this tag is a legacy format that is no longer issued by most
encoders. Nonetheless, the offset ranges in [0, 1<<32) and the length in
[1, 65). The length is 1 + m. The offset is the little-endian unsigned
integer denoted by the next 4 bytes.
Each encoded block begins with the varint-encoded length of the decoded data,
followed by a sequence of chunks. Chunks begin and end on byte boundaries. The
first byte of each chunk is broken into its 2 least and 6 most significant bits
called l and m: l ranges in [0, 4) and m ranges in [0, 64). l is the chunk tag.
Zero means a literal tag. All other values mean a copy tag.
For literal tags:
- If m < 60, the next 1 + m bytes are literal bytes.
- Otherwise, let n be the little-endian unsigned integer denoted by the next
m - 59 bytes. The next 1 + n bytes after that are literal bytes.
For copy tags, length bytes are copied from offset bytes ago, in the style of
Lempel-Ziv compression algorithms. In particular:
- For l == 1, the offset ranges in [0, 1<<11) and the length in [4, 12).
The length is 4 + the low 3 bits of m. The high 3 bits of m form bits 8-10
of the offset. The next byte is bits 0-7 of the offset.
- For l == 2, the offset ranges in [0, 1<<16) and the length in [1, 65).
The length is 1 + m. The offset is the little-endian unsigned integer
denoted by the next 2 bytes.
- For l == 3, this tag is a legacy format that is no longer issued by most
encoders. Nonetheless, the offset ranges in [0, 1<<32) and the length in
[1, 65). The length is 1 + m. The offset is the little-endian unsigned
integer denoted by the next 4 bytes.
Each encoded block begins with the varint-encoded length of the decoded data,
followed by a sequence of chunks. Chunks begin and end on byte boundaries. The
first byte of each chunk is broken into its 2 least and 6 most significant bits
called l and m: l ranges in [0, 4) and m ranges in [0, 64). l is the chunk tag.
Zero means a literal tag. All other values mean a copy tag.
For literal tags:
- If m < 60, the next 1 + m bytes are literal bytes.
- Otherwise, let n be the little-endian unsigned integer denoted by the next
m - 59 bytes. The next 1 + n bytes after that are literal bytes.
For copy tags, length bytes are copied from offset bytes ago, in the style of
Lempel-Ziv compression algorithms. In particular:
- For l == 1, the offset ranges in [0, 1<<11) and the length in [4, 12).
The length is 4 + the low 3 bits of m. The high 3 bits of m form bits 8-10
of the offset. The next byte is bits 0-7 of the offset.
- For l == 2, the offset ranges in [0, 1<<16) and the length in [1, 65).
The length is 1 + m. The offset is the little-endian unsigned integer
denoted by the next 2 bytes.
- For l == 3, this tag is a legacy format that is no longer issued by most
encoders. Nonetheless, the offset ranges in [0, 1<<32) and the length in
[1, 65). The length is 1 + m. The offset is the little-endian unsigned
integer denoted by the next 4 bytes.
The pages are generated with Goldsv0.8.4. (GOOS=linux GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @zigo_101 (reachable from the left QR code) to get the latest news of Golds.