Persistent readonly perfect hash table
A python module implemented in C that ingests key/value pairs, writes them on file and creates a hash table to be used as an index, in the same file. The file is a Shard in the Read Storage.
Format
- Format version
- An index which is a hash table
- The content of the objects
Let's assume the shard stores the following objects:
-
foo
(len=3, sha256=2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
) -
bar
(len=3, sha256=fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9
) -
baz
(len=3, sha256=baa5a0964d3320fbc0c6a922140453c8513ea24ab8fd0577034804a967248096
) -
quux
(len=4, sha256=053057fda9a935f2d4fa8c7bc62a411a26926e00b491c07c1b2ec1909078a0a2
) -
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
)
The content of the file would look like this:
header | 00 |
+----+
0000 | 01 |
+----+
index | sha256 | offset | size |
| 00 01 .. 30 31 | 32 .. 38 39 | 40 .. 46 47 |
+----+----+----+----+----+----+----+----+----+----+----+----+----+
0001 | 2c | 26 | .. | e7 | ae | 00 | .. | 00 | f1 | 00 | .. | 00 | 03 |
0031 | fc | de | .. | 8f | b9 | 00 | .. | 00 | f4 | 00 | .. | 00 | 03 |
0061 | ba | a5 | .. | 09 | 96 | 00 | .. | 00 | f7 | 00 | .. | 00 | 03 |
0091 | 05 | 30 | .. | a0 | a2 | 00 | .. | 00 | fa | 00 | .. | 00 | 04 |
00c1 | e3 | b0 | .. | b8 | 55 | 00 | .. | 00 | 00 | 00 | .. | 00 | 00 |
+----+----+----+----+----+----+----+----+----+----+----+----+----+
payload +----+----+----+
00f1 | 66 | 6f | 6f |
00f4 | 62 | 61 | 72 |
00f7 | 62 | 61 | 7a |
00fa | 71 | 75 | 75 | 78 |
+----+----+----+----+
Writing
It is assumed writing is done in batch, sequentially
Reading
- HASH(SHA256) in the index
- Seek to the object content to stream it to the caller in chunks of a given size
Migrated from T3104 (view on Phabricator)
Edited by Phabricator Migration user