Rename the project as swh-shard, migrate to pybind11 and add cli tools
It does not make much sense to call it perfecthash, since the aim of this package is creating, reading and manipulating shard files (which do use cmph to speed extracting content objects from the shard file, but this is an implementation detail, really).
Use pybind11 to wrap the cmph and shard manipulation code instead of cffi, it makes is a bit easier to add (C/C++) features in the extension.
Add a cli tool to manipulate shard files. Currently, it allows to:
- read the header of the shard file
- list entries in the shard file (as a list of {key: length})
- get an object from a shard file
- create a shard file from a list of files
- delete one (or more) entry from a shard
The extension source files have been moved to src/_shard, and the python source files for the swh.shard package have been moved to src/swh/shard, moving away from all other swh package structure. This is required to prevent side effects of having the local 'swh' directory in the working directory of the developer, thus in the sys.path (by default), breaking the dark magic involved in the loading of the package when it is installed in editable mode (i.e. not to break pytest when executed directly from the source with the package being installed in editable mode).
Merge request reports
Activity
Jenkins job DOPH/gitlab-builds #68 failed in 49 sec.
See Console Output, Blue Ocean and Coverage Report for more details.This is a proposal for improvement of the shard file manipulation library.
It should be 100% compatible with existing usage in winery (winery tests are OK with it).
It provides a few cli tools to manipulate shard files:
$ swh-shard info toto.shard Shard toto.shard ├─version: 1 ├─objects: 1996 │ ├─position: 512 │ └─size: 18843173 ├─index │ ├─position: 18843685 │ └─size: 80680 └─hash └─position: 18924365 $ swh-shard ls toto.shard 1641c1829c716fefe077aaf51639cd85f30ecc0518c97a17289e9a6e28df7055: 132 bytes c62ef6626212fa5123c8ad773cdff7aa4186b61038c86a0ad02eaa5b29f21eb1: 682 bytes 05f43789f9bb3464e5287a98bdace0696a037fa183462a83c0071d3209d9d1ca: 3243 bytes a1dec2c7e652871138a9eefa41d9d302c42c5202886bd48b3a8a0ebc7404ff20: 2876 bytes [...] $ swh-shard get toto.shard 1641c1829c716fefe077aaf51639cd85f30ecc0518c97a17289e9a6e28df7055 | sha256sum 1641c1829c716fefe077aaf51639cd85f30ecc0518c97a17289e9a6e28df7055 - $ swh-shard create tutu.shard src/swh/shard/*.py There are 3 entries after deduplication: 3 entries Done
Edited by David Douard- Resolved by David Douard
- Resolved by David Douard
- Resolved by David Douard
- Resolved by David Douard
- Resolved by David Douard
- Resolved by David Douard
- Resolved by David Douard
- Resolved by David Douard
- Resolved by David Douard
This definitely goes in the right direction, thanks!
I'm uncomfortable with having this all bundled into one commit. At least the changes are mostly done in separate files, so should be somewhat easy to split in logical commits.
Generally I think a bit too much of the details of the serialization format of the shard index is leaking into the python binding module. This probably was already the case with the cffi binding, but it might be worth cleaning this up now? I've left a few comments to that effect.
mentioned in merge request swh-docs!473 (merged)
added 9 commits
- d8ba4d01 - Do not use -std=c++17 when compiling C code for test_hash
- 3e8a30c3 - extension: Replace %ld by %lu in string format for unsigned long
- 353b0e5a - extension: initialize index entries as "deleted" entries
- b4efc6bc - docs: give more details on the shard file format
- 2591bbc8 - extension: rename extension and C files hash.{ch} as shard.{ch}
- 9b0fc2b9 - Rename the package as swh.shard
- b61b266c - Migrate to pybind11 and restructure the source code directory
- 0785cc82 - Add a cli tool to manipulate shard files
- 1d4938ea - Update the README file with a "Quick Start" section
Toggle commit listJenkins job DOPH/gitlab-builds #69 failed in 48 sec.
See Console Output, Blue Ocean and Coverage Report for more details.Jenkins job DOPH/gitlab-builds #70 failed in 45 sec.
See Console Output, Blue Ocean and Coverage Report for more details.added 1 commit
- 4462d74e - Update the README file with a "Quick Start" section
Jenkins job DOPH/gitlab-builds #71 succeeded in 43 sec.
See Console Output, Blue Ocean and Coverage Report for more details.