Persistent readonly perfect hash table

A python module implemented in C that ingests key/value pairs, writes them on file and creates a hash table to be used as an index, in the same file. The file is a Shard in the Read Storage.

Format

Format version
An index which is a hash table
The content of the objects

Let's assume the shard stores the following objects:

foo (len=3, sha256=2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae)
bar (len=3, sha256=fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9)
baz (len=3, sha256=baa5a0964d3320fbc0c6a922140453c8513ea24ab8fd0577034804a967248096)
quux (len=4, sha256=053057fda9a935f2d4fa8c7bc62a411a26926e00b491c07c1b2ec1909078a0a2)
(the empty object, len=0, sha256=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855)

The content of the file would look like this:

header       | 00 |
             +----+                                                  
        0000 | 01 |
             +----+                                                  

index        |         sha256         |      offset       |        size       |
             | 00   01   ..   30   31 | 32   ..   38   39 | 40   ..   46   47 |
             +----+----+----+----+----+----+----+----+----+----+----+----+----+
        0001 | 2c | 26 | .. | e7 | ae | 00 | .. | 00 | f1 | 00 | .. | 00 | 03 |
        0031 | fc | de | .. | 8f | b9 | 00 | .. | 00 | f4 | 00 | .. | 00 | 03 |
        0061 | ba | a5 | .. | 09 | 96 | 00 | .. | 00 | f7 | 00 | .. | 00 | 03 |
        0091 | 05 | 30 | .. | a0 | a2 | 00 | .. | 00 | fa | 00 | .. | 00 | 04 |
        00c1 | e3 | b0 | .. | b8 | 55 | 00 | .. | 00 | 00 | 00 | .. | 00 | 00 |
             +----+----+----+----+----+----+----+----+----+----+----+----+----+
			 
payload      +----+----+----+
        00f1 | 66 | 6f | 6f |
        00f4 | 62 | 61 | 72 |  
        00f7 | 62 | 61 | 7a |
        00fa | 71 | 75 | 75 | 78 |
             +----+----+----+----+

Writing

It is assumed writing is done in batch, sequentially

Reading

HASH(SHA256) in the index
Seek to the object content to stream it to the caller in chunks of a given size

Migrated from T3104 (view on Phabricator)

Edited Jan 08, 2023 by Phabricator Migration user

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information