- Jun 11, 2024
-
-
vlorentz authored
Using os.walk() does not make much sense when we want to control what directories to recurse into. Additionally, this uses os.scandir directly, which allows us to directly sort symlinks and files apart from directories (while os.walk groups symlinks with directories) without two extra system calls.
-
David Douard authored
This creates a tree structure from an easy to read textual representation of said structure.
-
David Douard authored
It used to traverse all the directory and filter elements of said tree only afterwards; this version should be a bit smarter and not go too far deep in directories that should be ignored. We cannot just use the subtree eviction mechanism of 'os.walk(topdown=True)' because the filtering callback takes some context of the subdirectory content (typically to be able to evict empty directories). This version of the code is a bit more complex but should do the trick.
-
- Jun 10, 2024
-
-
David Douard authored
-
- May 30, 2024
-
-
vlorentz authored
-
git_objects.snapshot_git_object wants Snapshot argument, lets move things to Snapshot object.
-
-
object_dicts still need to be migrated.
-
-
-
-
And automatically generate it as we are at it.
-
-
We also take this as an opportunity to use the blacklist_types feature in that test.
-
We also typed the function, which turned out more painful than anticipated.
-
- May 29, 2024
-
-
-
This test work with cpython because as we do not assign the open file to a variable, the reference counting garbage collect it right after the `write` call. Deleting the file object mean the write is flushed. On an interpreter without reference counting, as pypy, the file object might be garbaged collected too late, breaking the test. So we give the file object a clear life cycle.
-
- May 28, 2024
-
-
As for the previous commit, using deprecated actually change the source Class leading all usage to raise deprecation warning. So we have to remove the deprecated and keep the compatibility silently for a small while.
-
It turns out that calling @deprecated on Content alter the class, and any instantion of "Content" will be wrongly marked as deprecated… So use a different approach too preserve compatibility (and we won't keep it long).
-
- May 23, 2024
-
-
Pierre-Yves David authored
It do works, but it is now discouraged.
-
- May 17, 2024
-
-
Antoine Lambert authored
-
- May 16, 2024
-
-
TargetType is far too generic and introduces confusion with ReleaseTargetType. So we rename it to a clearer name.
-
ObjectType is far too generic and introduces confusion with swhids.ObjectType. So we rename it to a clearer name.
-
Antoine Lambert authored
-
Antoine Lambert authored
A Content object with no attached data or get_data function could no longer be converted to a dict as MissingData exception was raised.
-
- May 15, 2024
-
-
Having the escaped URL in `swhid.origin` is inconsistent with self.path (which is always escaped) and never what we want, because it is only useful while serializing, which is already handled by `__str__`. This led to swh-indexer#4738 where swh-deposit parsed a qualified SWHID, then used `.origin` to get an origin URL. Additionally, as serialization always escapes the `origin` qualifier, this means that deserializing then re-serializing a qualified SWHID would double-escape it. Finally, fixing this made the test uncover that `%` was not escaped while serializing, while `;` was, leading to incorrect (and ambiguous) escaped URLs.
-
Pierre-Yves David authored
There are two other package using DiskBackedContent "swh-loader-svn" and "swh-loader-cvs". Both use it to check "DiskBackedContent.object_type" at the same time as "model.Content.object_type". so we do this small hack to avoid breaking these other module until they migrate.
-
Pierre-Yves David authored
This sets the pieces in place to finally cleanup the confusion from the various object_type attributes. They now have different type, so we should be able to start detecting error at some point. As for FromDiskType, we keep compatibility with string value for now. This avoid breaking existing code.
-
Pierre-Yves David authored
Instead of having multiple class and `object_type` value, we just adds a few lines in the main `model.Content` class to retrieved data on demand. The `with_data` logic already existed there anyway. This will avoid having from_disk extending the model from the outside.
-
- May 14, 2024
-
-
Pierre-Yves David authored
This is part of a wider effort to differentiate the various type of "object_type" attribute around the model code base.
-
Pierre-Yves David authored
This requires cleaning various item along the ways. Which is probably an added benefit. Especially, mypy now consider BaseModel to hold a object_type attribute.
-
Pierre-Yves David authored
Check inline comment for details.
-
- Apr 24, 2024
-
-
vlorentz authored
Currently the only limit is "enforced" by PostgreSQL. This makes sure that origins created after we switch to Cassandra as the primary storage remain compatible with a PostgreSQL-based storage.
-
- Mar 29, 2024
-
-
David Douard authored
-
- Mar 26, 2024
-
-
vlorentz authored
-
- Feb 29, 2024
-
-
Franck Bret authored
Add an optional progress callback to `from_disk` method. It can returns the number of computed entries for each top entries traversed. This is useful for CLI, in particular to display progress information for SWH Scanner.
-
- Feb 20, 2024
- Feb 05, 2024
-
-
Antoine Lambert authored
Related to swh/meta#5075.
-
- Jan 09, 2024
-
-
Pierre-Yves David authored
Right now, the discovery process offered by `filter_known_objects` returns all results after the discovery is complete. The new callback provides a way to get information "in real time" which is useful for at least a couple of planned use case in the SWH scanner: - displaying progress information while processing - update a graphical UI in real time. This simple callback fits this need without too much troubles. For some reason, mypy complained about the existing type hint in this file for unclear reason. So I fixed them.
-
Pierre-Yves David authored
The web Client is no longer using async so we no longer needs it.
-