Skip to content
Snippets Groups Projects

Rename the project as swh-shard, migrate to pybind11 and add cli tools

Open David Douard requested to merge douardda/swh-perfecthash:pybind11 into master
Compare and
28 files
+ 1288
548
Compare changes
  • Side-by-side
  • Inline
Files
28
+ 55
0
@@ -8,3 +8,58 @@ The Read Shard has the following structure:
* bytes \[``objects_position``, ``index_position``\[: ``objects_count`` times the size of the object (``u_int64_t``) followed by the content of the object
* bytes \[``index_position``, ``hash_position``\[: An array of index entries. The size of the array is provided by ``cmph_size`` after building the hash function. An index entry is made of the key (of ``SHARD_KEY_LEN`` bytes) and the object position (``u_int64_t``) in the range \[``objects_position``, ``index_position``\[. If the object position is ``UINT64_MAX``, this means the object has been deleted.
* bytes \[``hash_position``, ...\[: The hash function, as written by ``cmph_dump``
In more details:
+--------------------------+------+----------------------------+
| Section | pos | description (length) |
+==========================+======+============================+
| **SHARD_MAGIC** | 0 | SHARD_OFFSET_MAGIC (32) |
+--------------------------+------+----------------------------+
| **header** | 32 | Header (56) |
+--------------------------+------+----------------------------+
| ``version`` | | uint64_t (8) |
+--------------------------+------+----------------------------+
| ``objects_count`` | | uint64_t (8) |
+--------------------------+------+----------------------------+
| ``objects_position`` <op>| | uint64_t (8) |
+--------------------------+------+----------------------------+
| ``objects_size`` | | uint64_t (8) |
+--------------------------+------+----------------------------+
| ``index_position`` <ip> | | uint64_t (8) |
+--------------------------+------+----------------------------+
| ``index_size`` | | uint64_t (8) |
+--------------------------+------+----------------------------+
| ``hash_position`` <hp> | | uint64_t (8) |
+--------------------------+------+----------------------------+
| **Objects** | <op> | |
+--------------------------+------+----------------------------+
| ``object0 size`` | | uint64_t (8) |
+--------------------------+------+----------------------------+
| ``object0 data`` | | bytes (<object0 size>) |
+--------------------------+------+----------------------------+
| ``object1 size`` | | uint64_t (8) |
+--------------------------+------+----------------------------+
| ``object1 data`` | | bytes (<object1 size> |
+--------------------------+------+----------------------------+
| ... | | |
+--------------------------+------+----------------------------+
| **Index** | <ip> | |
+--------------------------+------+----------------------------+
| ``object0 key`` | | SHARD_KEY_LEN (32) |
+--------------------------+------+----------------------------+
| ``object0 offset`` | | uint64_t (8) |
+--------------------------+------+----------------------------+
| ... | | |
+--------------------------+------+----------------------------+
| **Hash map** | <hp> | |
+--------------------------+------+----------------------------+
| ``hash function`` | | <as written by cmph_dump> |
+--------------------------+------+----------------------------+
``SHARD_MAGIC`` is the constant ``SWHShard`` (with ``\x00`` padding to 32
characters).
Index entries for deleted content are using the special value
``{key=\x00...\x00, offset=2**64-1}``.
Loading