Package: github.com/klauspost/compress/internal/snapref

package snapref Import Path github.com/klauspost/compress/internal/snapref (on go.dev) Dependency Relation imports 4 packages, and imported by one package

Involved Source Files decode.go decode_other.go encode.go encode_other.go d snappy.go Package snapref implements the Snappy compression format. It aims for very high speeds and reasonable compression. There are actually two Snappy formats: block and stream. They are related, but different: trying to decompress block-compressed data as a Snappy stream will fail, and vice versa. The block format is the Decode and Encode functions and the stream format is the Reader and Writer types. The block format, the more common case, is used when the complete size (the number of bytes) of the original data is known upfront, at the time compression starts. The stream format, also known as the framing format, is for when that isn't always true. The canonical, C++ implementation is at https://github.com/google/snappy and it only implements the block format.

Package-Level Type Names (total 2, both are exported)

/* sort exporteds by: alphabet | popularity */

type Reader (struct) Reader is an io.Reader that can read Snappy-compressed bytes. Reader handles the Snappy stream format, not the Snappy block format. Fields (total 7, none are exported) /* 7 unexporteds ... *//* 7 unexporteds: */ buf []byte decoded []byte err error i int decoded[i:j] contains decoded bytes that have not yet been passed on. j int decoded[i:j] contains decoded bytes that have not yet been passed on. r io.Reader readHeader bool Methods (total 5, in which 3 are exported) (*Reader) Read(p []byte) (int, error) Read satisfies the io.Reader interface. (*Reader) ReadByte() (byte, error) ReadByte satisfies the io.ByteReader interface. (*Reader) Reset(reader io.Reader) Reset discards any buffered data, resets all state, and switches the Snappy reader to read from r. This permits reusing a Reader rather than allocating a new one. /* 2 unexporteds ... *//* 2 unexporteds: */ (*Reader) fill() error (*Reader) readFull(p []byte, allowEOF bool) (ok bool) Implements (at least 4, all are exported) *Reader : github.com/klauspost/compress/flate.Reader *Reader : compress/flate.Reader *Reader : io.ByteReader *Reader : io.Reader As Outputs Of (at least one exported) func NewReader(r io.Reader) *Reader

type Writer (struct) Writer is an io.Writer that can write Snappy-compressed bytes. Writer handles the Snappy stream format, not the Snappy block format. Fields (total 5, none are exported) /* 5 unexporteds ... *//* 5 unexporteds: */ err error ibuf []byte ibuf is a buffer for the incoming (uncompressed) bytes. Its use is optional. For backwards compatibility, Writers created by the NewWriter function have ibuf == nil, do not buffer incoming bytes, and therefore do not need to be Flush'ed or Close'd. obuf []byte obuf is a buffer for the outgoing (compressed) bytes. w io.Writer wroteStreamHeader bool wroteStreamHeader is whether we have written the stream header. Methods (total 5, in which 4 are exported) (*Writer) Close() error Close calls Flush and then closes the Writer. (*Writer) Flush() error Flush flushes the Writer to its underlying io.Writer. (*Writer) Reset(writer io.Writer) Reset discards the writer's state and switches the Snappy writer to write to w. This permits reusing a Writer rather than allocating a new one. (*Writer) Write(p []byte) (nRet int, errRet error) Write satisfies the io.Writer interface. /* one unexported ... *//* one unexported: */ (*Writer) write(p []byte) (nRet int, errRet error) Implements (at least 6, in which 4 are exported) *Writer : internal/bisect.Writer *Writer : io.Closer *Writer : io.WriteCloser *Writer : io.Writer /* 2+ unexporteds ... *//* 2+ unexporteds: */ *Writer : github.com/refraction-networking/utls.transcriptHash *Writer : crypto/tls.transcriptHash As Outputs Of (at least 2, both are exported) func NewBufferedWriter(w io.Writer) *Writer func NewWriter(w io.Writer) *Writer

Package-Level Functions (total 17, in which 8 are exported)

func Decode(dst, src []byte) ([]byte, error) Decode returns the decoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire decoded block. Otherwise, a newly allocated slice will be returned. The dst and src must not overlap. It is valid to pass a nil dst. Decode handles the Snappy block format, not the Snappy stream format.

func DecodedLen(src []byte) (int, error) DecodedLen returns the length of the decoded block.

func Encode(dst, src []byte) []byte Encode returns the encoded form of src. The returned slice may be a sub- slice of dst if dst was large enough to hold the entire encoded block. Otherwise, a newly allocated slice will be returned. The dst and src must not overlap. It is valid to pass a nil dst. Encode handles the Snappy block format, not the Snappy stream format.

func EncodeBlockInto(dst, src []byte) (d int) EncodeBlockInto exposes encodeBlock but checks dst size.

func MaxEncodedLen(srcLen int) int MaxEncodedLen returns the maximum length of a snappy block, given its uncompressed length. It will return a negative value if srcLen is too large to encode.

func NewBufferedWriter(w io.Writer) *Writer NewBufferedWriter returns a new Writer that compresses to w, using the framing format described at https://github.com/google/snappy/blob/master/framing_format.txt The Writer returned buffers writes. Users must call Close to guarantee all data has been forwarded to the underlying io.Writer. They may also call Flush zero or more times before calling Close.

func NewReader(r io.Reader) *Reader NewReader returns a new Reader that decompresses from r, using the framing format described at https://github.com/google/snappy/blob/master/framing_format.txt

func NewWriter(w io.Writer) *Writer NewWriter returns a new Writer that compresses to w. The Writer returned does not buffer writes. There is no need to Flush or Close such a Writer. Deprecated: the Writer returned is not suitable for many small writes, only for few large writes. Use NewBufferedWriter instead, which is efficient regardless of the frequency and shape of the writes, and remember to Close that Writer when done.

/* 9 unexporteds ... *//* 9 unexporteds: */

func crc(b []byte) uint32 crc implements the checksum specified in section 3 of https://github.com/google/snappy/blob/master/framing_format.txt

func decode(dst, src []byte) int decode writes the decoding of src to dst. It assumes that the varint-encoded length of the decompressed bytes has already been read, and that len(dst) equals that length. It returns 0 on success or a decodeErrCodeXxx error code on failure.

func decodedLen(src []byte) (blockLen, headerLen int, err error) decodedLen returns the length of the decoded block and the number of bytes that the length header occupied.

func emitCopy(dst []byte, offset, length int) int emitCopy writes a copy chunk and returns the number of bytes written. It assumes that: dst is long enough to hold the encoded bytes 1 <= offset && offset <= 65535 4 <= length && length <= 65535

func emitLiteral(dst, lit []byte) int emitLiteral writes a literal chunk and returns the number of bytes written. It assumes that: dst is long enough to hold the encoded bytes 1 <= len(lit) && len(lit) <= 65536

func encodeBlock(dst, src []byte) (d int) encodeBlock encodes a non-empty src to a guaranteed-large-enough dst. It assumes that the varint-encoded length of the decompressed bytes has already been written. It also assumes that: len(dst) >= MaxEncodedLen(len(src)) && minNonLiteralBlockSize <= len(src) && len(src) <= maxBlockSize

func hash(u, shift uint32) uint32

func load32(b []byte, i int) uint32

func load64(b []byte, i int) uint64

Package-Level Variables (total 6, in which 3 are exported)

var ErrCorrupt error ErrCorrupt reports that the input is invalid.

var ErrTooLarge error ErrTooLarge reports that the uncompressed length is too large.

var ErrUnsupported error ErrUnsupported reports that the input isn't supported.

/* 3 unexporteds ... *//* 3 unexporteds: */

var crcTable *crc32.Table

var errClosed error

var errUnsupportedLiteralLength error

Package-Level Constants (total 20, none are exported) /* 20 unexporteds ... *//* 20 unexporteds: */

const checksumSize = 4

const chunkHeaderSize = 4

const chunkTypeCompressedData = 0

const chunkTypePadding = 254

const chunkTypeStreamIdentifier = 255

const chunkTypeUncompressedData = 1

const decodeErrCodeCorrupt = 1

const decodeErrCodeUnsupportedLiteralLength = 2

const inputMargin = 15 inputMargin is the minimum number of extra input bytes to keep, inside encodeBlock's inner loop. On some architectures, this margin lets us implement a fast path for emitLiteral, where the copy of short (<= 16 byte) literals can be implemented as a single load to and store from a 16-byte register. That literal's actual length can be as short as 1 byte, so this can copy up to 15 bytes too much, but that's OK as subsequent iterations of the encoding loop will fix up the copy overrun, and this inputMargin ensures that we don't overrun the dst and src buffers.

const magicBody = "sNaPpY"

const magicChunk = "\xff\x06\x00\x00sNaPpY"

const maxBlockSize = 65536 maxBlockSize is the maximum size of the input to encodeBlock. It is not part of the wire format per se, but some parts of the encoder assume that an offset fits into a uint16. Also, for the framing format (Writer type instead of Encode function), https://github.com/google/snappy/blob/master/framing_format.txt says that "the uncompressed data in a chunk must be no longer than 65536 bytes".

const maxEncodedLenOfMaxBlockSize = 76490 maxEncodedLenOfMaxBlockSize equals MaxEncodedLen(maxBlockSize), but is hard coded to be a const instead of a variable, so that obufLen can also be a const. Their equivalence is confirmed by TestMaxEncodedLenOfMaxBlockSize.

const minNonLiteralBlockSize = 17 minNonLiteralBlockSize is the minimum size of the input to encodeBlock that could be encoded with a copy tag. This is the minimum with respect to the algorithm used by encodeBlock, not a minimum enforced by the file format. The encoded output must start with at least a 1 byte literal, as there are no previous bytes to copy. A minimal (1 byte) copy after that, generated from an emitCopy call in encodeBlock's main loop, would require at least another inputMargin bytes, for the reason above: we want any emitLiteral calls inside encodeBlock's main loop to use the fast path if possible, which requires being able to overrun by inputMargin bytes. Thus, minNonLiteralBlockSize equals 1 + 1 + inputMargin. The C++ code doesn't use this exact threshold, but it could, as discussed at https://groups.google.com/d/topic/snappy-compression/oGbhsdIJSJ8/discussion The difference between Go (2+inputMargin) and C++ (inputMargin) is purely an optimization. It should not affect the encoded form. This is tested by TestSameEncodingAsCppShortCopies.

const obufHeaderLen int = 18

const obufLen int = 76508

const tagCopy1 = 1 Each encoded block begins with the varint-encoded length of the decoded data, followed by a sequence of chunks. Chunks begin and end on byte boundaries. The first byte of each chunk is broken into its 2 least and 6 most significant bits called l and m: l ranges in [0, 4) and m ranges in [0, 64). l is the chunk tag. Zero means a literal tag. All other values mean a copy tag. For literal tags: - If m < 60, the next 1 + m bytes are literal bytes. - Otherwise, let n be the little-endian unsigned integer denoted by the next m - 59 bytes. The next 1 + n bytes after that are literal bytes. For copy tags, length bytes are copied from offset bytes ago, in the style of Lempel-Ziv compression algorithms. In particular: - For l == 1, the offset ranges in [0, 1<<11) and the length in [4, 12). The length is 4 + the low 3 bits of m. The high 3 bits of m form bits 8-10 of the offset. The next byte is bits 0-7 of the offset. - For l == 2, the offset ranges in [0, 1<<16) and the length in [1, 65). The length is 1 + m. The offset is the little-endian unsigned integer denoted by the next 2 bytes. - For l == 3, this tag is a legacy format that is no longer issued by most encoders. Nonetheless, the offset ranges in [0, 1<<32) and the length in [1, 65). The length is 1 + m. The offset is the little-endian unsigned integer denoted by the next 4 bytes.

const tagCopy2 = 2 Each encoded block begins with the varint-encoded length of the decoded data, followed by a sequence of chunks. Chunks begin and end on byte boundaries. The first byte of each chunk is broken into its 2 least and 6 most significant bits called l and m: l ranges in [0, 4) and m ranges in [0, 64). l is the chunk tag. Zero means a literal tag. All other values mean a copy tag. For literal tags: - If m < 60, the next 1 + m bytes are literal bytes. - Otherwise, let n be the little-endian unsigned integer denoted by the next m - 59 bytes. The next 1 + n bytes after that are literal bytes. For copy tags, length bytes are copied from offset bytes ago, in the style of Lempel-Ziv compression algorithms. In particular: - For l == 1, the offset ranges in [0, 1<<11) and the length in [4, 12). The length is 4 + the low 3 bits of m. The high 3 bits of m form bits 8-10 of the offset. The next byte is bits 0-7 of the offset. - For l == 2, the offset ranges in [0, 1<<16) and the length in [1, 65). The length is 1 + m. The offset is the little-endian unsigned integer denoted by the next 2 bytes. - For l == 3, this tag is a legacy format that is no longer issued by most encoders. Nonetheless, the offset ranges in [0, 1<<32) and the length in [1, 65). The length is 1 + m. The offset is the little-endian unsigned integer denoted by the next 4 bytes.

const tagCopy4 = 3 Each encoded block begins with the varint-encoded length of the decoded data, followed by a sequence of chunks. Chunks begin and end on byte boundaries. The first byte of each chunk is broken into its 2 least and 6 most significant bits called l and m: l ranges in [0, 4) and m ranges in [0, 64). l is the chunk tag. Zero means a literal tag. All other values mean a copy tag. For literal tags: - If m < 60, the next 1 + m bytes are literal bytes. - Otherwise, let n be the little-endian unsigned integer denoted by the next m - 59 bytes. The next 1 + n bytes after that are literal bytes. For copy tags, length bytes are copied from offset bytes ago, in the style of Lempel-Ziv compression algorithms. In particular: - For l == 1, the offset ranges in [0, 1<<11) and the length in [4, 12). The length is 4 + the low 3 bits of m. The high 3 bits of m form bits 8-10 of the offset. The next byte is bits 0-7 of the offset. - For l == 2, the offset ranges in [0, 1<<16) and the length in [1, 65). The length is 1 + m. The offset is the little-endian unsigned integer denoted by the next 2 bytes. - For l == 3, this tag is a legacy format that is no longer issued by most encoders. Nonetheless, the offset ranges in [0, 1<<32) and the length in [1, 65). The length is 1 + m. The offset is the little-endian unsigned integer denoted by the next 4 bytes.

const tagLiteral = 0 Each encoded block begins with the varint-encoded length of the decoded data, followed by a sequence of chunks. Chunks begin and end on byte boundaries. The first byte of each chunk is broken into its 2 least and 6 most significant bits called l and m: l ranges in [0, 4) and m ranges in [0, 64). l is the chunk tag. Zero means a literal tag. All other values mean a copy tag. For literal tags: - If m < 60, the next 1 + m bytes are literal bytes. - Otherwise, let n be the little-endian unsigned integer denoted by the next m - 59 bytes. The next 1 + n bytes after that are literal bytes. For copy tags, length bytes are copied from offset bytes ago, in the style of Lempel-Ziv compression algorithms. In particular: - For l == 1, the offset ranges in [0, 1<<11) and the length in [4, 12). The length is 4 + the low 3 bits of m. The high 3 bits of m form bits 8-10 of the offset. The next byte is bits 0-7 of the offset. - For l == 2, the offset ranges in [0, 1<<16) and the length in [1, 65). The length is 1 + m. The offset is the little-endian unsigned integer denoted by the next 2 bytes. - For l == 3, this tag is a legacy format that is no longer issued by most encoders. Nonetheless, the offset ranges in [0, 1<<32) and the length in [1, 65). The length is 1 + m. The offset is the little-endian unsigned integer denoted by the next 4 bytes.