Commits · 34f61010062469418db8d115c2d993bde1377cb2 · Renaud Boyer / swh-model

Jun 11, 2024

from_disk: Do not recurse in ignored directories · 34f61010

vlorentz authored 9 months ago

Using os.walk() does not make much sense when we want to control what
directories to recurse into.

Additionally, this uses os.scandir directly, which allows us to directly
sort symlinks and files apart from directories (while os.walk groups
symlinks with directories) without two extra system calls.

34f61010

Add a mk_tree() helper function for tests · 8689b0c0

David Douard authored 9 months ago

This creates a tree structure from an easy to read textual
representation of said structure.

8689b0c0

Rework from_disk.Directory.from_disk() implementation · f0f21b4d

David Douard authored 9 months ago

It used to traverse all the directory and filter elements of said tree
only afterwards; this version should be a bit smarter and not go too far
deep in directories that should be ignored.

We cannot just use the subtree eviction mechanism of
'os.walk(topdown=True)' because the filtering callback takes some
context of the subdirectory content (typically to be able to evict empty
directories).

This version of the code is a bit more complex but should do the trick.

f0f21b4d

Jun 10, 2024
- Remove (deprecated) dir_filter arg from Directory.from_disk() · a3470344
  David Douard authored 9 months ago
  
  a3470344
May 30, 2024
- Use functools.partial instead of manually currifying · 7816ec91
  vlorentz authored 9 months ago
  
  7816ec91
- deprecation: also fix some deprecation in test_identifiers · c1797a2b
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  git_objects.snapshot_git_object wants Snapshot argument, lets move things to Snapshot object.
  c1797a2b
- enum-cleanup: also migrate "object_dicts" to use ModelObjectType · 58a7f781
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  58a7f781
- enum-cleanup: use ModelObject type for "objects" strategy in test_hypothesis_strategies · 326e6df6
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  object_dicts still need to be migrated.
  326e6df6
- enum-cleanup: use ModelObjectType in `test_ensure_visit_status_snapshot_consistency` · 780b98bb
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  780b98bb
- enum-cleanup: use ModelObjectType in `test_ensure_visit_status_date_consistency` · 05b8e7e0
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  05b8e7e0
- enum-cleanup: use ModelObjectType in `test_swh_model_data` · e3f7a03b
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  e3f7a03b
- enum-cleanup: move TEST_OBJECTS keys to ModelObjectType, · 20d4ee14
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  And automatically generate it as we are at it.
  20d4ee14
- enum-cleanup: stop using string value in `test_anonymization` · 97e47f07
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  97e47f07
- enum-cleanup: stop using string value in `test_todict_inverse_fromdict` · de319040
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  We also take this as an opportunity to use the blacklist_types feature in that test.
  de319040
- enum-cleanup: use ModelObjectType in strategies.objects · 6d8ddab9
  Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago
  
  We also typed the function, which turned out more painful than anticipated.
  6d8ddab9
May 29, 2024

enum-cleanup: use ReleaseTargetType instead of ObjectType · 861fd30c
Pierre-Yves David authored 9 months ago and Pierre-Yves David committed 9 months ago

861fd30c

test: properly scope file write in a test · 19732c8d

Pierre-Yves David authored 9 months ago and

vlorentz committed 9 months ago

This test work with cpython because as we do not assign the open file to
a variable, the reference counting garbage collect it right after the
`write` call. Deleting the file object mean the write is flushed.

On an interpreter without reference counting, as pypy, the file object might be
garbaged collected too late, breaking the test.

So we give the file object a clear life cycle.

19732c8d

May 28, 2024

enum-deprecation: stop wrongly using deprecated · 34113bc0

Pierre-Yves David authored 9 months ago and

vlorentz committed 9 months ago

As for the previous commit, using deprecated actually change the source
Class leading all usage to raise deprecation warning. So we have to
remove the deprecated and keep the compatibility silently for a small
while.

34113bc0

DiskBackedContent: use a different approach for deprecation · c8ef4083

Pierre-Yves David authored 9 months ago and

vlorentz committed 9 months ago

It turns out that calling @deprecated on Content alter the class, and
any instantion of "Content" will be wrongly marked as deprecated… So use
a different approach too preserve compatibility (and we won't keep it
long).

c8ef4083

May 23, 2024
- model: deprecate comparing object_type's enums with string · 2c63e4cf
  Pierre-Yves David authored 10 months ago
  
  It do works, but it is now discouraged.
  2c63e4cf
May 17, 2024
- model: Fix black formatting · ca6b3e0e
  Antoine Lambert authored 10 months ago
  
  ca6b3e0e
May 16, 2024
- model: rename TargetType to SnapshotTargetType · 0833fa75
  Pierre-Yves David authored 10 months ago and vlorentz committed 10 months ago
  
  TargetType is far too generic and introduces confusion with ReleaseTargetType. So we rename it to a clearer name.
  v6.13.0
  
  0833fa75
- model: rename ObjectType to ReleaseTargetType · 72a0f1a7
  Pierre-Yves David authored 10 months ago and vlorentz committed 10 months ago
  
  ObjectType is far too generic and introduces confusion with swhids.ObjectType. So we rename it to a clearer name.
  72a0f1a7
- model: Improve typing syntax with __future__.annotations · 80ff8e06
  Antoine Lambert authored 10 months ago
  
  80ff8e06
- model: Fix regression in Content.to_dict · 5e9cc05b
  Antoine Lambert authored 10 months ago
  
  A Content object with no attached data or get_data function could no longer be converted to a dict as MissingData exception was raised.
  5e9cc05b
May 15, 2024

QualifiedSWHID: Fix (de)serialization of 'origin' qualifier · 9cf7ad9d

vlorentz authored 10 months ago and

Antoine Lambert committed 10 months ago

Having the escaped URL in `swhid.origin` is inconsistent with self.path
(which is always escaped) and never what we want, because it is only
useful while serializing, which is already handled by `__str__`.

This led to swh-indexer#4738
where swh-deposit parsed a qualified SWHID, then used `.origin` to get
an origin URL.

Additionally, as serialization always escapes the `origin` qualifier,
this means that deserializing then re-serializing a qualified SWHID
would double-escape it.

Finally, fixing this made the test uncover that `%` was not escaped
while serializing, while `;` was, leading to incorrect (and ambiguous)
escaped URLs.

9cf7ad9d

DiskBackedContent: add a small temporary compatibility layer · f1f62388

Pierre-Yves David authored 10 months ago

There are two other package using DiskBackedContent "swh-loader-svn" and
"swh-loader-cvs". Both use it to check "DiskBackedContent.object_type"
at the same time as "model.Content.object_type".

so we do this small hack to avoid breaking these other module until
they migrate.

f1f62388

from_disk: introduce a ModelObjectType enum · 8b29444a

Pierre-Yves David authored 10 months ago

This sets the pieces in place to finally cleanup the confusion from the
various object_type attributes. They now have different type, so we
should be able to start detecting error at some point.

As for FromDiskType, we keep compatibility with string value for now.
This avoid breaking existing code.

8b29444a

DiskBackedContent: remove the class in favor of a simpler composition approach · d65a844a

Pierre-Yves David authored 10 months ago

Instead of having multiple class and `object_type` value, we just adds
a few lines in the main `model.Content` class to retrieved data on
demand. The `with_data` logic already existed there anyway.

This will avoid having from_disk extending the model from the outside.

d65a844a

May 14, 2024
- from_disk: introduce a FromDiskType enum · 02e79499
  Pierre-Yves David authored 1 year ago
  
  This is part of a wider effort to differentiate the various type of "object_type" attribute around the model code base.
  02e79499
- model: adds type annotation for iter_directory · f47cc1b7
  Pierre-Yves David authored 10 months ago
  
  This requires cleaning various item along the ways. Which is probably an added benefit. Especially, mypy now consider BaseModel to hold a object_type attribute.
  f47cc1b7
- test: fix bogus "tzfile" representation on the fly in test_repr · 39296ace
  Pierre-Yves David authored 10 months ago
  
  Check inline comment for details.
  39296ace
Apr 24, 2024

Add size limit to origin URLs · 906e5093

vlorentz authored 10 months ago

Currently the only limit is "enforced" by PostgreSQL.

This makes sure that origins created after we switch to Cassandra as the
primary storage remain compatible with a PostgreSQL-based storage.

906e5093

Mar 29, 2024
- Apply swh-py-template v0.2.0 · 7ae2a4f3
  David Douard authored 11 months ago
  
  7ae2a4f3
Mar 26, 2024
- Add 'evolve' method to BaseModel objects · dcb7a1ed
  vlorentz authored 11 months ago
  
  dcb7a1ed
Feb 29, 2024

from_disk: Add optional progress callback · cebe917a

Franck Bret authored 1 year ago

Add an optional progress callback to `from_disk` method. It can
returns the number of computed entries for each top entries traversed.
This is useful for CLI, in particular to display progress information
for SWH Scanner.

cebe917a

Feb 20, 2024
- model: Add payload to ExtID class · f396177e
  Timothy Sample authored 2 years ago and vlorentz committed 1 year ago
  
  v6.12.0
  
  f396177e
Feb 05, 2024
- tox: Bump mypy to 1.8.0 · 68c4cc7d
  Antoine Lambert authored 1 year ago
  
  Related to swh/meta#5075.
  68c4cc7d
Jan 09, 2024

discovery: support optional callback for information · e54151a4

Pierre-Yves David authored 1 year ago

Right now, the discovery process offered by `filter_known_objects`
returns all results after the discovery is complete. The new callback
provides a way to get information "in real time" which is useful for at
least a couple of planned use case in the SWH scanner:
- displaying progress information while processing
- update a graphical UI in real time.

This simple callback fits this need without too much troubles.

For some reason, mypy complained about the existing type hint in this
file for unclear reason. So I fixed them.

e54151a4

policy: drop async usage that is now unnecessary · 1a59d42f
Pierre-Yves David authored 1 year ago
```
The web Client is no longer using async so we no longer needs it.
```
1a59d42f