- Mar 18, 2021
-
-
Nicolas Dandrimont authored
This truncation is already enshrined at the identifier level. Truncate the object itself as well, to reduce the possibility multiple different metadata objects with the same identifier.
-
- Mar 12, 2021
-
- Mar 08, 2021
-
-
David Douard authored
was modifying the dict given as argument.
-
- Mar 04, 2021
-
-
vlorentz authored
Serializing as ISO8601 makes the hash brittle, because the database may change the timezone silently and/or lose precision in the microseconds. As we do not need precise timestamp, using an integer is good enough, and is consistant with the git format. The manifest also does not need to contain a timezone, as it only represents the timezone of the system that fetched this metadata, which is useless data.
-
vlorentz authored
So that they can be properly deduplicated and referenced.
- Mar 01, 2021
-
-
vlorentz authored
SWHID is deprecated; and CoreSWHID does not support qualifiers at all, so RawExtrinsicMetadata no longer needs to check there are no qualifiers.
-
vlorentz authored
ExtendedSWHID can identify either a software artifact or an origin, so we no longer need Union[SWHID, str]. Therefore, we no longer need the 'type' attribute, as it was only used to tell when the target is a SWHID and when it's an origin URL.
-
vlorentz authored
It can be handy as a shortcut to build SWHID objects.
-
- Dec 30, 2020
-
-
Stefano Zacchiroli authored
Before this change there was a lot of overlap between parse_swhid() and the attrs-based validators in the SWHID class. Also, the validation implementation in parse_swhid() was done by hand. With this change the coarse-grained validation done by parse_swhid() is now delegated to a regex. The semantic validation of SWHIDs is left to attrs validators. The regex is also exposed as a module attribute, to be used by client code that want to syntactically validate SWHIDs without necessarily instantiate SWHID classes (we have several other modules doing that already, and they are using slightly different hand-made regexs, which isn't great). As part of this change we also clean up the use of ValidationError exceptions, systematically passing the problematic parts of SWHID as arguments, and uniform error messages. This change also brings some speed up in SWHID parsing. On a benchmark parsing ~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one ~2:50 minutes, or a ~9% speedup. Closes T2788
-
- Nov 16, 2020
-
-
Nicolas Dandrimont authored
All reverse dependencies have been updated to avoid using it now, so it can now be removed, paving the way to recycle it into an intrinsic identifier.
-
- Oct 26, 2020
-
-
Nicolas Dandrimont authored
This backwards-compatible change prepares the transition to give RawExtrinsicMetadata an `id` field that is computed intrinsically from its contents (using the HashableObject mixin).
-
- Oct 08, 2020
-
-
vlorentz authored
that returns a value suitable for unicity constraints. Motivation: * this is somewhat more of a model concern than a journal/kafka concern IMO * this is one step toward adding support for non-model objects in KafkaJournalWriter Implementation of the unique_key methods comes from `swh.journal.serializers.object_key`.
-
- Sep 17, 2020
-
-
Antoine Lambert authored
Related to T2610
-
- Aug 14, 2020
-
-
vlorentz authored
We may unknowingly pass naive datetimes to the storage through them, causing the underlying DB to assign them a timezone that might not match the actual one. It already happens in swh.model and swh.loader.package tests.
-
- Jul 29, 2020
-
-
David Douard authored
-
- Jul 07, 2020
- Jul 06, 2020
-
-
David Douard authored
Add a new extra_headers attribute on Revision and use it for computing the revision's id instead of extract it from the metadata field. Only accept (bytes, bytes) as extra_header. Add a post init hook to Revision to initialize this new attribute from given metadata, if any, for bw compat. Also amend the revision_d hyptothesis strategy to generate extra_headers.
-
- Jun 24, 2020
-
-
David Douard authored
this aims at preventing constant usage of isinstance() based dispatch code when writing generic code handling model entities. For example, the "object_type" argument of JournalWriter.write_addition() has become superflous now we only pass model entities, etc. This idea comes olasd's reading of mypy doc: https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions This comes with a refactoring of from_dict.DiskBackedContent to make it *not* inherit from model.Content: object_type being Final, it cannot be overloaded.
-
- May 20, 2020
-
-
David Douard authored
Simply add a BaseModel.anonymize() method. Default implementation returns None, meaning the object is not anonymizable. For Person, the method returns a Person whith hashed fullname (and unset name and email). For Revision and Release, the method returns an anonymized version of the object, i.e. with instance of Person replaced by anonymized ones.
-
- Apr 10, 2020
-
-
Antoine R. Dumont authored
This also adapts the hypothesis strategies, using the plural form origin_visit_statuses. That plural form is acceptable because in our context, the statuses are countable. Related to T2310
-
- Apr 08, 2020
-
-
David Douard authored
- blackify all the python files, - enable black in pre-commit, - add a black tox environment.
-
- Apr 01, 2020
-
-
David Douard authored
With support for str representation of date. Mostly for testing purpose.
-
David Douard authored
instead of a reference to an Origin entity.
-
David Douard authored
- add a validator for negative_utc (can be True iff offset is 0), - update the timestamps_with_timezone hypothesis strategy, - add low-level tests for it.
-
David Douard authored
-
- Mar 31, 2020
-
-
Antoine R. Dumont authored
(pairing with @vlorentz) Related to T2310
-
- Mar 11, 2020
-
-
David Douard authored
this does not work in the general case since there is no (recursive) convertion of objects used as model object initialization. We can only check when using the from_dict() factory.
-
David Douard authored
for better clarity on the code author's intention.
-
- Mar 04, 2020
-
- Mar 02, 2020
-
-
Nicolas Dandrimont authored
This lets us generate Content objects directly from a bytestring, with the proper set of hashes auto-generated from the contents.
-
- Feb 27, 2020
-
-
vlorentz authored
Will be used by loaders.
-
- Feb 24, 2020
-
-
vlorentz authored
They will be used by loaders, so they can deal only with model objects, instead of having to do the same conversion themselves. This removes the `data` and `save_path` arguments of `from_file` and `from_disk`, as data loading is always deferred from now on. To access it, users are now expected to either open the data files themselves, or us `.to_model().with_data()`.
-
- Feb 14, 2020
-
-
vlorentz authored
Can be useful to deduplicate code in swh-storage.
-
- Jan 30, 2020
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Related to P589
-
- Nov 29, 2019
-
-
Antoine Lambert authored
Add support to automatically compute identifier in the following object models: Directory, Release, Revision, Snapshot. If the identifier is not provided as parameter, it will be computed when the model is initialized.
-
- Oct 30, 2019
-
- Oct 29, 2019
-
-
David Douard authored
we do not really need them to be mutable, plus we gain their instances now being hashable, so we can add them in set() for example.
-