- Jan 07, 2022
-
- Dec 22, 2021
-
-
vlorentz authored
1. Most objects do not need it so it's a waste of space 2. This means we just extend the existing format (some objects will have that key in their dict) instead of changing it (retroactively adding it to all objects)
-
vlorentz authored
This will be used to store the original manifest of 'weird' git objects, when we cannot reasonably represent them otherwise.
- Dec 15, 2021
-
-
vlorentz authored
Using .now() produces data that differs between xdist processes, as files are imported after forking, and xdist requires consistent data across processes.
-
- Dec 08, 2021
-
-
vlorentz authored
It calls attr.validate() (which calls the validators), and recomputes the hash of HashableObject instances. A future commit will also make it check the raw_manifest attribute when relevant
-
vlorentz authored
For the sake of completeness (a future commit may depend on it).
-
vlorentz authored
For now it is filled from 'offset' and 'negative_utc', but it will replace them in a future commit. This is to simplify and add support for more 'weird' offsets we do not currently support.
-
- Dec 01, 2021
-
-
vlorentz authored
I don't know any instance of these, but there is no harm in checking them.
-
- Nov 05, 2021
-
-
vlorentz authored
1. hashes are now repr()ed as `hash_to_bytes("1234...")` instead of b"\x12\x34..."` 2. SWHID objects are now repr()ed as `CoreSWHID.from_string('swh:1:...:1234...')` instead of `CoreSWHID(scheme='swh', version='1', object_type=..., object_id=b'\x12\x34')` 3. enums are now repr()ed as `MyEnum.NAME` instead of "<MyEnum.NAME: 'value'>` Thanks to these three changes, using repr() on a model object now prints a string that can be pasted directly in a `.py` file to write a new test case.
-
- Oct 01, 2021
-
-
vlorentz authored
The previous replaced attrs-strict's type validator with our own, stricter and faster, validator. However, the strictness can be a burden in other packages; for example, swh-storage tests rely on it to insert dummy data that raises exception when accessed, and it would be hard to do while using the exact expected type. This commit reverts the strict behavior, but keeps the performance optimization, by always checking with type equality, but in case type equality fails (which would raise an error before this commit), it gives the value a 'second chance', by trying isinstance. This means that, outside tests, isinstance should not be used at all, or very rarely.
-
- Sep 28, 2021
-
-
vlorentz authored
This reimplements attrs_strict.type_validator(), using type equality instead of isinstance. This makes my checksum validation script (that mostly just instantiates model objects, computes a checksum, then discard) run twice as fast.
-
- Sep 23, 2021
-
-
vlorentz authored
1. Add a warning 2. Move identifier/manifest documentation to git_objects.py 3. Remove all imports of that module. Motivation: * SWHID classes were moved to swhids.py * manifest computation functions were moved to git_objects.py * Only reexports and trivial wrappers of model.py remain
-
- Jun 15, 2021
-
-
David Douard authored
-
- Mar 18, 2021
-
-
Nicolas Dandrimont authored
This truncation is already enshrined at the identifier level. Truncate the object itself as well, to reduce the possibility multiple different metadata objects with the same identifier.
-
- Mar 12, 2021
-
- Mar 08, 2021
-
-
David Douard authored
was modifying the dict given as argument.
-
- Mar 04, 2021
-
-
vlorentz authored
Serializing as ISO8601 makes the hash brittle, because the database may change the timezone silently and/or lose precision in the microseconds. As we do not need precise timestamp, using an integer is good enough, and is consistant with the git format. The manifest also does not need to contain a timezone, as it only represents the timezone of the system that fetched this metadata, which is useless data.
-
vlorentz authored
So that they can be properly deduplicated and referenced.
- Mar 01, 2021
-
-
vlorentz authored
SWHID is deprecated; and CoreSWHID does not support qualifiers at all, so RawExtrinsicMetadata no longer needs to check there are no qualifiers.
-
vlorentz authored
ExtendedSWHID can identify either a software artifact or an origin, so we no longer need Union[SWHID, str]. Therefore, we no longer need the 'type' attribute, as it was only used to tell when the target is a SWHID and when it's an origin URL.
-
vlorentz authored
It can be handy as a shortcut to build SWHID objects.
-
- Dec 30, 2020
-
-
Stefano Zacchiroli authored
Before this change there was a lot of overlap between parse_swhid() and the attrs-based validators in the SWHID class. Also, the validation implementation in parse_swhid() was done by hand. With this change the coarse-grained validation done by parse_swhid() is now delegated to a regex. The semantic validation of SWHIDs is left to attrs validators. The regex is also exposed as a module attribute, to be used by client code that want to syntactically validate SWHIDs without necessarily instantiate SWHID classes (we have several other modules doing that already, and they are using slightly different hand-made regexs, which isn't great). As part of this change we also clean up the use of ValidationError exceptions, systematically passing the problematic parts of SWHID as arguments, and uniform error messages. This change also brings some speed up in SWHID parsing. On a benchmark parsing ~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one ~2:50 minutes, or a ~9% speedup. Closes T2788
-
- Nov 16, 2020
-
-
Nicolas Dandrimont authored
All reverse dependencies have been updated to avoid using it now, so it can now be removed, paving the way to recycle it into an intrinsic identifier.
-
- Oct 26, 2020
-
-
Nicolas Dandrimont authored
This backwards-compatible change prepares the transition to give RawExtrinsicMetadata an `id` field that is computed intrinsically from its contents (using the HashableObject mixin).
-
- Oct 08, 2020
-
-
vlorentz authored
that returns a value suitable for unicity constraints. Motivation: * this is somewhat more of a model concern than a journal/kafka concern IMO * this is one step toward adding support for non-model objects in KafkaJournalWriter Implementation of the unique_key methods comes from `swh.journal.serializers.object_key`.
-
- Sep 17, 2020
-
-
Antoine Lambert authored
Related to T2610
-
- Aug 14, 2020
-
-
vlorentz authored
We may unknowingly pass naive datetimes to the storage through them, causing the underlying DB to assign them a timezone that might not match the actual one. It already happens in swh.model and swh.loader.package tests.
-
- Jul 29, 2020
-
-
David Douard authored
-
- Jul 07, 2020
- Jul 06, 2020
-
-
David Douard authored
Add a new extra_headers attribute on Revision and use it for computing the revision's id instead of extract it from the metadata field. Only accept (bytes, bytes) as extra_header. Add a post init hook to Revision to initialize this new attribute from given metadata, if any, for bw compat. Also amend the revision_d hyptothesis strategy to generate extra_headers.
-
- Jun 24, 2020
-
-
David Douard authored
this aims at preventing constant usage of isinstance() based dispatch code when writing generic code handling model entities. For example, the "object_type" argument of JournalWriter.write_addition() has become superflous now we only pass model entities, etc. This idea comes olasd's reading of mypy doc: https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions This comes with a refactoring of from_dict.DiskBackedContent to make it *not* inherit from model.Content: object_type being Final, it cannot be overloaded.
-
- May 20, 2020
-
-
David Douard authored
Simply add a BaseModel.anonymize() method. Default implementation returns None, meaning the object is not anonymizable. For Person, the method returns a Person whith hashed fullname (and unset name and email). For Revision and Release, the method returns an anonymized version of the object, i.e. with instance of Person replaced by anonymized ones.
-
- Apr 10, 2020
-
-
Antoine R. Dumont authored
This also adapts the hypothesis strategies, using the plural form origin_visit_statuses. That plural form is acceptable because in our context, the statuses are countable. Related to T2310
-
- Apr 08, 2020
-
-
David Douard authored
- blackify all the python files, - enable black in pre-commit, - add a black tox environment.
-
- Apr 01, 2020
-
-
David Douard authored
With support for str representation of date. Mostly for testing purpose.
-
David Douard authored
instead of a reference to an Origin entity.
-
David Douard authored
- add a validator for negative_utc (can be True iff offset is 0), - update the timestamps_with_timezone hypothesis strategy, - add low-level tests for it.
-