- Sep 23, 2021
-
-
vlorentz authored
For consistency, as the classes are now in swhids.py
-
vlorentz authored
1. Add a warning 2. Move identifier/manifest documentation to git_objects.py 3. Remove all imports of that module. Motivation: * SWHID classes were moved to swhids.py * manifest computation functions were moved to git_objects.py * Only reexports and trivial wrappers of model.py remain
-
vlorentz authored
-
vlorentz authored
They are not used anywhere.
-
vlorentz authored
Refactor identifiers & model to make *_git_object() functions work on model classes instead of dicts Since we now use these classes everywhere, computing hashes required using to_dict() just to compute identifiers, which can be a performance bottleneck in code computing many checksums.
-
vlorentz authored
A future commit will make identifier computation use the attrs classes, which are strict about what they accept.
-
- Jul 23, 2021
-
-
Nicolas Dandrimont authored
This allows distinguishing multiple potential versions of the mapping between external objects and their counterparts archived in Software Heritage, for instance when a loader has a backwards-incompatible change that should result in objects being loaded again. The field defaults to zero, in which case it's backwards-compatible with the previous implementation in terms of identifier computation.
-
- Jun 15, 2021
-
-
David Douard authored
the problem was for datetime<epoch, the timestamp is negative, but since it's a float that includes the microseconds, if both are true (< epoch and microsecond > 0), then the computed (int) timestamp was off by one. Add dedicated tests for this.
-
- May 11, 2021
-
-
vlorentz authored
The git_object is what will be actually useful to the vault. It's also easier to test, because test_identifier.py has the entire git_object in its test data.
-
- Apr 13, 2021
-
-
Antoine Lambert authored
According to the SWHID specification, it is not forbidden for a qualifier value to contain a '=' character (for instance in origin URL). So update parsing code to handle that special case.
-
Antoine Lambert authored
Some ValidationError exceptions could not be serialized to string due to these format errors. Related to T3234
-
- Mar 04, 2021
-
-
vlorentz authored
The rounding algorithm wasn't specified
-
vlorentz authored
Serializing as ISO8601 makes the hash brittle, because the database may change the timezone silently and/or lose precision in the microseconds. As we do not need precise timestamp, using an integer is good enough, and is consistant with the git format. The manifest also does not need to contain a timezone, as it only represents the timezone of the system that fetched this metadata, which is useless data.
-
vlorentz authored
This will be used to compute an intrisic identifier for RawExtrinsicMetadata; which can be used for deduplication and refering to it like any other sha1_git instead of needed to use a tuple of its fields.
- Mar 03, 2021
-
- Mar 01, 2021
-
-
vlorentz authored
It can be handy as a shortcut to build SWHID objects.
- Feb 23, 2021
-
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
* Quote/unquote path * Fix line parsing and serializing to properly handle None * Fix error raised by check_visit/check_anchor
-
vlorentz authored
They were all very similar and only differ in what 'edge' cases they accept
-
vlorentz authored
Following the discussion on T3034, we decided to replace SWHID with two or three classes: * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers) * CoreSWHID, for "core SWHID" only (standard types + no qualifiers) * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers) This commit adds the last one. It also removes "ori" as a valid object type for CoreSWHID and QualifiedSWHID, as it now only belongs in ExtendedSWHID.
- Feb 19, 2021
-
-
vlorentz authored
And store their parsed values (CoreSWHID, tuple of ints, etc.) instead of string.
-
vlorentz authored
Following the discussion on T3034, we decided to replace SWHID with two or three classes: * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers) * CoreSWHID, for "core SWHID" only (standard types + no qualifiers) * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers) This commit adds the second one
-
vlorentz authored
Following the discussion on T3034, we decided to replace SWHID with two or three classes: * QualifiedSWHID to replace the existing SWHID (standard types + qualifiers) * CoreSWHID, for "core SWHID" only (standard types + no qualifiers) * ExtendedSWHID for internal use in Software Heritage (extra types + no qualifiers) Since migrating from SWHID will break existing code, this commit uses the opportunity to modernize it a little, ie.: * `keyword`-only constructor, to get rid of the hacky default values for `object_type` and `object_id` * enum instead of strings for the object type * `bytes` instead of an hex string for the object id * rename `metadata` to `qualifiers`
-
- Jan 12, 2021
-
-
vlorentz authored
They were mixed in with snapshot tests.
-
vlorentz authored
test_identifiers: Make sure that {directory,revision,release,snapshot}_identifier() doesn't just return a value from the dict. For example, before this commit, you could replace the code of revision_identifier() with this: def release_identifier(release): return release.get("id", b"") and all tests would still pass.
-
- Dec 30, 2020
-
-
Stefano Zacchiroli authored
Before this change there was a lot of overlap between parse_swhid() and the attrs-based validators in the SWHID class. Also, the validation implementation in parse_swhid() was done by hand. With this change the coarse-grained validation done by parse_swhid() is now delegated to a regex. The semantic validation of SWHIDs is left to attrs validators. The regex is also exposed as a module attribute, to be used by client code that want to syntactically validate SWHIDs without necessarily instantiate SWHID classes (we have several other modules doing that already, and they are using slightly different hand-made regexs, which isn't great). As part of this change we also clean up the use of ValidationError exceptions, systematically passing the problematic parts of SWHID as arguments, and uniform error messages. This change also brings some speed up in SWHID parsing. On a benchmark parsing ~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one ~2:50 minutes, or a ~9% speedup. Closes T2788
-
- Nov 12, 2020
-
-
Antoine R. Dumont authored
So parse_swhid raises a ValidationError when that is detected. Related to T2769
-
Antoine R. Dumont authored
Related to T2769
-
- Nov 10, 2020
-
-
Antoine R. Dumont authored
Related to T2769
-
- Sep 29, 2020
-
-
vlorentz authored
I created one in the wrong directory and didn't see the existing one.
-
- Sep 17, 2020
-
-
Antoine Lambert authored
Related to T2610
-
- Jul 08, 2020
-
-
Antoine Lambert authored
-
- Jul 06, 2020
-
-
David Douard authored
Add a new extra_headers attribute on Revision and use it for computing the revision's id instead of extract it from the metadata field. Only accept (bytes, bytes) as extra_header. Add a post init hook to Revision to initialize this new attribute from given metadata, if any, for bw compat. Also amend the revision_d hyptothesis strategy to generate extra_headers.
-
- Jul 03, 2020
-
-
Antoine Lambert authored
When Software Heritage persistent identifiers were introduced, they were not yet abbreviated as SWHIDs. Now that abbreviation is growing adoption, rename some functions and types in swh.model.identifiers for consistency: - PersistentId -> SWHID - persistent_identifier -> swhid - parse_persistent_identifier -> parse_swhid Backward compatibility with previous naming is maintained but deprecation warnings are introduced to encourage the use of the new names. Numerous variables in swh.model codebase have also been renamed accordingly. Also rework and improve documentation.
-
- Jun 15, 2020
-
-
David Douard authored
thus in TimestampWithTimezone.from_dict(). This is needed to help consuming existing (invalid) messages from kafka. Warning: tests added in this revision do not cover the whole normalize_timestamp() function.
-
- Apr 08, 2020
-
-
David Douard authored
- blackify all the python files, - enable black in pre-commit, - add a black tox environment.
-