Skip to content

Persistent readonly perfect hash table

A python module implemented in C that ingests key/value pairs, writes them on file and creates a hash table to be used as an index, in the same file. The file is a Shard in the Read Storage.

Format

  • Format version
  • An index which is a hash table
  • The content of the objects

Let's assume the shard stores the following objects:

  • foo (len=3, sha256=2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae)
  • bar (len=3, sha256=fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9)
  • baz (len=3, sha256=baa5a0964d3320fbc0c6a922140453c8513ea24ab8fd0577034804a967248096)
  • quux (len=4, sha256=053057fda9a935f2d4fa8c7bc62a411a26926e00b491c07c1b2ec1909078a0a2)
  • (the empty object, len=0, sha256=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)

The content of the file would look like this:

header       | 00 |
             +----+                                                  
        0000 | 01 |
             +----+                                                  

index        |         sha256         |      offset       |        size       |
             | 00   01   ..   30   31 | 32   ..   38   39 | 40   ..   46   47 |
             +----+----+----+----+----+----+----+----+----+----+----+----+----+
        0001 | 2c | 26 | .. | e7 | ae | 00 | .. | 00 | f1 | 00 | .. | 00 | 03 |
        0031 | fc | de | .. | 8f | b9 | 00 | .. | 00 | f4 | 00 | .. | 00 | 03 |
        0061 | ba | a5 | .. | 09 | 96 | 00 | .. | 00 | f7 | 00 | .. | 00 | 03 |
        0091 | 05 | 30 | .. | a0 | a2 | 00 | .. | 00 | fa | 00 | .. | 00 | 04 |
        00c1 | e3 | b0 | .. | b8 | 55 | 00 | .. | 00 | 00 | 00 | .. | 00 | 00 |
             +----+----+----+----+----+----+----+----+----+----+----+----+----+
			 
payload      +----+----+----+
        00f1 | 66 | 6f | 6f |
        00f4 | 62 | 61 | 72 |  
        00f7 | 62 | 61 | 7a |
        00fa | 71 | 75 | 75 | 78 |
             +----+----+----+----+

Writing

It is assumed writing is done in batch, sequentially

Reading

  • HASH(SHA256) in the index
  • Seek to the object content to stream it to the caller in chunks of a given size

Migrated from T3104 (view on Phabricator)

Edited by Phabricator Migration user
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information