Using RocksDB SST as a file format
https://github.com/facebook/rocksdb/wiki/A-Tutorial-of-RocksDB-SST-formats#block-based-table
In block-based table, data is chucked into (almost) fix-sized blocks (default block size is 4k). Each block, in turn, keeps a bunch of entries.
When storing data, we can compress and/or encode data efficiently within a block, which often resulted in a much smaller data size compared with the raw data size.
As for the record retrieval, we'll first locate the block where target record may reside, then read the block to memory, and finally search that record within the block. Of course, to avoid frequent reads of the same block, we introduced the block cache to keep the loaded blocks in the memory.
https://github.com/facebook/rocksdb/wiki/Rocksdb-BlockBasedTable-Format
- https://github.com/facebook/rocksdb/blob/master/include/rocksdb/sst_file_reader.h
- https://github.com/facebook/rocksdb/blob/master/include/rocksdb/sst_file_writer.h
RocksDB provide the user with APIs that can be used to create SST files that can be ingested later. This can be useful if you have a use case that needs to load the data quickly, but the process of creating the data can be done offline.
The python package for RocksDB does not support reading or writing to SST files.
Migrated from T3066 (view on Phabricator)