0.9.8.10
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Modules Pages
Public Member Functions | Private Attributes | List of all members
Hypertable::BlockCompressionCodecBmz Class Reference

Block compressor that uses the BMZ algorithm. More...

#include <BlockCompressionCodecBmz.h>

Inheritance diagram for Hypertable::BlockCompressionCodecBmz:
Inheritance graph
[legend]
Collaboration diagram for Hypertable::BlockCompressionCodecBmz:
Collaboration graph
[legend]

Public Member Functions

 BlockCompressionCodecBmz (const Args &args)
 Constructor. More...
 
virtual ~BlockCompressionCodecBmz ()
 Destructor. More...
 
virtual void deflate (const DynamicBuffer &input, DynamicBuffer &output, BlockHeader &header, size_t reserve=0)
 Compresses a buffer using the BMZ algorithm. More...
 
virtual void inflate (const DynamicBuffer &input, DynamicBuffer &output, BlockHeader &header)
 Decompresses a buffer compressed with the BMZ algorithm. More...
 
virtual void set_args (const Args &args)
 Sets arguments to control compression behavior. More...
 
virtual int get_type ()
 Returns enum value representing compression type BMZ. More...
 
- Public Member Functions inherited from Hypertable::BlockCompressionCodec
virtual ~BlockCompressionCodec ()
 Destructor. More...
 

Private Attributes

DynamicBuffer m_workmem
 Working memory buffer used by deflate() and inflate() More...
 
size_t m_offset
 Starting offset of fingerprints. More...
 
size_t m_fp_len
 Fingerprint length. More...
 

Additional Inherited Members

- Public Types inherited from Hypertable::BlockCompressionCodec
enum  Type {
  UNKNOWN =-1, NONE =0, BMZ =1, ZLIB =2,
  LZO =3, QUICKLZ =4, SNAPPY =5, COMPRESSION_TYPE_LIMIT =6
}
 Enumeration for compression type. More...
 
typedef std::vector< StringArgs
 Compression codec argument vector. More...
 
- Static Public Member Functions inherited from Hypertable::BlockCompressionCodec
static const char * get_compressor_name (uint16_t algo)
 Returns string mnemonic for compression type. More...
 

Detailed Description

Block compressor that uses the BMZ algorithm.

This class provides a way to compress and decompress blocks of data using the bmz algorithm, a compression algorithm based on the one described in the paper, Data Compression Using Long Common Strings (Bentley & McIlroy, 1999). This algorithm generally works well for data that contains long repeated strings. It was described in, Bigtable: A Distributed Storage System for Structured Data (Dean et al., 2006) as the compression algorithm they use for the "content" column of their crawler database. In this column they store multiple copies of each crawled page content.

Definition at line 49 of file BlockCompressionCodecBmz.h.

Constructor & Destructor Documentation

BlockCompressionCodecBmz::BlockCompressionCodecBmz ( const Args args)

Constructor.

Initializes members as follows: m_workmem (0), m_offset=0, m_fp_len=19. It then calls bmz_init() and then passes args into a call to set_args().

Parameters
argsArguments to control compression behavior
See also
set_args

Definition at line 40 of file BlockCompressionCodecBmz.cc.

BlockCompressionCodecBmz::~BlockCompressionCodecBmz ( )
virtual

Destructor.

Definition at line 46 of file BlockCompressionCodecBmz.cc.

Member Function Documentation

void BlockCompressionCodecBmz::deflate ( const DynamicBuffer input,
DynamicBuffer output,
BlockHeader header,
size_t  reserve = 0 
)
virtual

Compresses a buffer using the BMZ algorithm.

This method reserves enough space in output to hold the serialized header followed by the compressed input followed by reserve bytes. If the resulting compressed buffer is larger than the input buffer, then the input buffer is copied directly to the output buffer and the compression type is set to BlockCompressionCodec::NONE. Before serailizing header, the data_length, data_zlength, data_checksum, and compression_type fields are set appropriately. The output buffer is formatted as follows:

headercompressed datareserve
Parameters
inputInput buffer
outputOutput buffer
headerBlock header populated by function
reserveAdditional space to reserve at end of output buffer

Implements Hypertable::BlockCompressionCodec.

Definition at line 69 of file BlockCompressionCodecBmz.cc.

virtual int Hypertable::BlockCompressionCodecBmz::get_type ( )
inlinevirtual

Returns enum value representing compression type BMZ.

Returns the enum value BMZ

See also
BlockCompressionCodec::BMZ
Returns
Compression type (BMZ)

Implements Hypertable::BlockCompressionCodec.

Definition at line 116 of file BlockCompressionCodecBmz.h.

void BlockCompressionCodecBmz::inflate ( const DynamicBuffer input,
DynamicBuffer output,
BlockHeader header 
)
virtual

Decompresses a buffer compressed with the BMZ algorithm.

See also
deflate() for description of input buffer format
Parameters
inputInput buffer
outputOutput buffer
headerBlock header

Implements Hypertable::BlockCompressionCodec.

Definition at line 104 of file BlockCompressionCodecBmz.cc.

void BlockCompressionCodecBmz::set_args ( const Args args)
virtual

Sets arguments to control compression behavior.

The arguments accepted by this method are described in the following table.

ArgumentDescription
–fp-len nFingerprint length
–offset nStarting offset of fingerprints
Parameters
argsVector of arguments

Reimplemented from Hypertable::BlockCompressionCodec.

Definition at line 56 of file BlockCompressionCodecBmz.cc.

Member Data Documentation

size_t Hypertable::BlockCompressionCodecBmz::m_fp_len
private

Fingerprint length.

Definition at line 127 of file BlockCompressionCodecBmz.h.

size_t Hypertable::BlockCompressionCodecBmz::m_offset
private

Starting offset of fingerprints.

Definition at line 124 of file BlockCompressionCodecBmz.h.

DynamicBuffer Hypertable::BlockCompressionCodecBmz::m_workmem
private

Working memory buffer used by deflate() and inflate()

Definition at line 121 of file BlockCompressionCodecBmz.h.


The documentation for this class was generated from the following files: