shard format v2
This issue is dedicated to discuss the idea of updating the shard file format.
Ideas:
- move the index and mph function after the header rather than having them at the end of the file;
- rationale: makes a partial shard (as partially copied) somewhat recoverable ;
- should not require a new shard format per se (all the offsets are stored in the header so it should not require a version bump)
- add file size within the index in addition to the main payload area
- rationale:
- requires a version bump
- add some form of CRC of the header at least
- rationale: identify a bitrot in the header; which, if corrupted, can make the whole shard useless
- requires a version bump
- use a xMB alignment for all the sections of the shard file format
- rationale: XXX
- does not require a version bump
- think about (/assess) making shard file format usable on cloud storage (s3)
- rationale: XXX
- make the primary hash an "parameter" of the shard file (eg. advertized in the header)
- make the shard file able to store several hash indexes/mph functions
- rationale: would that be useful?