Commits · 5f5d4f941568801804855c57e1974e1c93837876 · Renaud Boyer / swh-model

Mar 18, 2021

Truncate RawExtrinsicMetadata.discovery_date to a second · af5e4614

This truncation is already enshrined at the identifier level. Truncate
the object itself as well, to reduce the possibility multiple different
metadata objects with the same identifier.

af5e4614

Mar 12, 2021
- model: Add a swhid() method to RawExtrinsicMetadata. · 975e9892
  vlorentz authored 4 years ago
  
  All other hashable objects but ExtId have one. It will be used by swh-deposit.
  v2.2.0
  
  975e9892
Mar 08, 2021
- Fix MetadataAuthority.from_dict() · fca36585
  David Douard authored 4 years ago
  
  was modifying the dict given as argument.
  fca36585
Mar 04, 2021

model: Remove override of RawExtrinsicMetadata.unique_key(), so it now returns the hash. · 2185f930
vlorentz authored 4 years ago

v2.0.0

2185f930

identifiers: Change the manifest format of raw_extrinsic_metadata to use integer instead of ISO8601 · 3ce41250

vlorentz authored 4 years ago

Serializing as ISO8601 makes the hash brittle, because the database may
change the timezone silently and/or lose precision in the microseconds.

As we do not need precise timestamp, using an integer is good enough,
and is consistant with the git format.

The manifest also does not need to contain a timezone, as it only
represents the timezone of the system that fetched this metadata,
which is useless data.

3ce41250

model: Add 'id' field to RawExtrinsicMetadata · fc808e1f
vlorentz authored 4 years ago
```
So that they can be properly deduplicated and referenced.
```
fc808e1f

Mar 01, 2021

RawExtrinsicMetadata: Use CoreSWHID instead of SWHID for contexts · 31a8a0f2

vlorentz authored 4 years ago

SWHID is deprecated; and CoreSWHID does not support qualifiers at all,
so RawExtrinsicMetadata no longer needs to check there are no qualifiers.

31a8a0f2

RawExtrinsicMetadata: Use ExtendedSWHID as target and remove type · 752fb81d

vlorentz authored 4 years ago

ExtendedSWHID can identify either a software artifact or an origin,
so we no longer need Union[SWHID, str].

Therefore, we no longer need the 'type' attribute, as it was only
used to tell when the target is a SWHID and when it's an origin URL.

752fb81d

Add a swhid() method to all hashable objects. · 256bca2c
vlorentz authored 4 years ago
```
It can be handy as a shortcut to build SWHID objects.
```
256bca2c

Dec 30, 2020

SWHID parsing: simplify and deduplicate validation logic · 57468505

Stefano Zacchiroli authored 4 years ago

Before this change there was a lot of overlap between parse_swhid() and the
attrs-based validators in the SWHID class. Also, the validation implementation
in parse_swhid() was done by hand.

With this change the coarse-grained validation done by parse_swhid() is now
delegated to a regex. The semantic validation of SWHIDs is left to attrs
validators. The regex is also exposed as a module attribute, to be used by
client code that want to syntactically validate SWHIDs without necessarily
instantiate SWHID classes (we have several other modules doing that already,
and they are using slightly different hand-made regexs, which isn't great).

As part of this change we also clean up the use of ValidationError exceptions,
systematically passing the problematic parts of SWHID as arguments, and uniform
error messages.

This change also brings some speed up in SWHID parsing. On a benchmark parsing
~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one
~2:50 minutes, or a ~9% speedup.

Closes T2788

57468505

Nov 16, 2020

Drop backwards-compatibility support for RawExtrinsicMetadata.id · a3b6a644

Nicolas Dandrimont authored 4 years ago

All reverse dependencies have been updated to avoid using it now, so it can now
be removed, paving the way to recycle it into an intrinsic identifier.

a3b6a644

Oct 26, 2020

Rename the RawExtrinsicMetadata id field to target · 9da17a51

Nicolas Dandrimont authored 4 years ago

This backwards-compatible change prepares the transition to give
RawExtrinsicMetadata an `id` field that is computed intrinsically from its
contents (using the HashableObject mixin).

9da17a51

Oct 08, 2020

Add a 'unique_key' method on model objects · a251df2e

vlorentz authored 4 years ago

that returns a value suitable for unicity constraints.

Motivation:

* this is somewhat more of a model concern than a journal/kafka
  concern IMO
* this is one step toward adding support for non-model objects in
  KafkaJournalWriter

Implementation of the unique_key methods comes from
`swh.journal.serializers.object_key`.

a251df2e

Sep 17, 2020
- python: Reorder imports with isort · a2737185
  Antoine Lambert authored 4 years ago
  
  Related to T2610
  a2737185
Aug 14, 2020

model: Raise error on naive datetimes. · 6dd6acec

vlorentz authored 4 years ago

We may unknowingly pass naive datetimes to the storage through them,
causing the underlying DB to assign them a timezone that might not match
the actual one.

It already happens in swh.model and swh.loader.package tests.

6dd6acec

Jul 29, 2020
- Fix incorrectly typed null constants in extra_headers byte strings · b58d901b
  David Douard authored 4 years ago
  
  b58d901b
Jul 07, 2020
- Implement to_dict and from_dict for metadata-related classes. · 9e475a70
  vlorentz authored 4 years ago
  
  9e475a70
- Add raw metadata to the model. · 78fc5f7c
  vlorentz authored 4 years ago
  
  This will allow swh-storage to have a signature for *_metadata_add that is consistent with other *_add endpoints.
  78fc5f7c
Jul 06, 2020

Extract the extra_headers from metadata on the Revision model class · a7d9aca2

David Douard authored 4 years ago

Add a new extra_headers attribute on Revision and use it for computing
the revision's id instead of extract it from the metadata field.

Only accept (bytes, bytes) as extra_header.

Add a post init hook to Revision to initialize this new attribute from
given metadata, if any, for bw compat.

Also amend the revision_d hyptothesis strategy to generate extra_headers.

a7d9aca2

Jun 24, 2020

Tag model entities with their "object_type" · e632abed

David Douard authored 4 years ago

this aims at preventing constant usage of isinstance() based dispatch
code when writing generic code handling model entities.

For example, the "object_type" argument of JournalWriter.write_addition() has
become superflous now we only pass model entities, etc.

This idea comes olasd's reading of mypy doc:

  https://mypy.readthedocs.io/en/latest/literal_types.html#tagged-unions

This comes with a refactoring of from_dict.DiskBackedContent to make
it *not* inherit from model.Content: object_type being Final, it cannot
be overloaded.

e632abed

May 20, 2020

Add support for model object anonymization · 29312dff

David Douard authored 4 years ago

Simply add a BaseModel.anonymize() method. Default implementation returns
None, meaning the object is not anonymizable.

For Person, the method returns a Person whith hashed fullname (and unset
name and email).

For Revision and Release, the method returns an anonymized version of
the object, i.e. with instance of Person replaced by anonymized ones.

29312dff

Apr 10, 2020

model: Rename OriginVisitUpdate to OriginVisitStatus · 401bc17d

Antoine R. Dumont authored 4 years ago

This also adapts the hypothesis strategies, using the plural form
origin_visit_statuses. That plural form is acceptable because in our context,
the statuses are countable.

Related to T2310

Verified

401bc17d

Apr 08, 2020

Enable black · bf3f1cec

David Douard authored 4 years ago

- blackify all the python files,
- enable black in pre-commit,
- add a black tox environment.

bf3f1cec

Apr 01, 2020
- model: add support for ctime in [Skipped]Content.from_[data,dict]() · ca0f6a1e
  David Douard authored 5 years ago
  
  With support for str representation of date. Mostly for testing purpose.
  ca0f6a1e
- model: fix SkippedContent origin to be a str · 6ce0f714
  David Douard authored 5 years ago
  
  instead of a reference to an Origin entity.
  6ce0f714
- model: improve a bit the TimestampWithTimezone model · 10b06992
  David Douard authored 5 years ago
  
  - add a validator for negative_utc (can be True iff offset is 0), - update the timestamps_with_timezone hypothesis strategy, - add low-level tests for it.
  10b06992
- tests: add low level tests for the Timestamp model entity · ac9d4c84
  David Douard authored 5 years ago
  
  ac9d4c84
Mar 31, 2020
- model: Add new OriginVisitUpdate model object + test strategy · e9a4c751
  Antoine R. Dumont authored 5 years ago
  
  (pairing with @vlorentz) Related to T2310
  v0.0.63 Verified
  
  e9a4c751
Mar 11, 2020

test/model: do not test direct instanciation of model objects · 56ae59c5

David Douard authored 5 years ago

this does not work in the general case since there is no (recursive)
convertion of objects used as model object initialization.

We can only check when using the from_dict() factory.

56ae59c5

tests/models: use d.copy() instead of dict(d) · c7469603
David Douard authored 5 years ago
```
for better clarity on the code author's intention.
```
c7469603

Mar 04, 2020
- Add classmethod Person.from_address, to parse from 'name <email>' strings. · a5a9f57c
  vlorentz authored 5 years ago
  
  This will allow deduplicating code across loaders.
  v0.0.60
  
  a5a9f57c
Mar 02, 2020

Add a method to generate Content/SkippedContent from binary data · ded150d6

Nicolas Dandrimont authored 5 years ago

This lets us generate Content objects directly from a bytestring, with the
proper set of hashes auto-generated from the contents.

ded150d6

Feb 27, 2020
- Add from_datetime and from_iso8601 constructors for TimestampWithTimezone. · 750d1471
  vlorentz authored 5 years ago
  
  Will be used by loaders.
  750d1471
Feb 24, 2020

Add to_model() method to from_disk.{Content,Directory}, to convert to canonical model objects. · 6da524cb

vlorentz authored 5 years ago

They will be used by loaders, so they can deal only with
model objects, instead of having to do the same conversion themselves.

This removes the `data` and `save_path` arguments of `from_file` and
`from_disk`, as data loading is always deferred from now on.
To access it, users are now expected to either open the data files
themselves, or us `.to_model().with_data()`.

6da524cb

Feb 14, 2020
- Add method BaseModel.hashes(). · 2c1e02b8
  vlorentz authored 5 years ago
  
  Can be useful to deduplicate code in swh-storage.
  2c1e02b8
Jan 30, 2020
- test_model: Simplify and align model checks · 4b779e1e
  Antoine R. Dumont authored 5 years ago
  
  v0.0.53 Verified
  
  4b779e1e
- model: Update revision date types to be optional · b54adf79
  Antoine R. Dumont authored 5 years ago
  
  Related to P589
  Verified
  
  b54adf79
Nov 29, 2019

model: Add automatic object identifier computation support · 4e4c4ff2

Antoine Lambert authored 5 years ago

Add support to automatically compute identifier in the following object models:
Directory, Release, Revision, Snapshot.

If the identifier is not provided as parameter, it will be computed when the model
is initialized.

4e4c4ff2

Oct 30, 2019
- Make OriginVisit.origin a string instead of a dict. · 0b9c5be2
  vlorentz authored 5 years ago
  
  v0.0.51
  
  0b9c5be2
Oct 29, 2019

model: make model entities frozen · 75645964

David Douard authored 5 years ago

we do not really need them to be mutable, plus we gain their instances now
being hashable, so we can add them in set() for example.

75645964