Commits on Source (84)
-
Antoine Lambert authored
When using attr < 21.3.0, adding field transformer breaks attrs integration with hypothesis, because attributes transformed with such function are not casted to generated AttrsClass, but remains just an list of attributes. This causes error in hypothesis by raising an AttributeError. As we use attr 21.2.0 in production and when building debian buster package, add a workaround for that issue as explained here: https://github.com/python-attrs/attrs/issues/821.
dd3bab81 -
David Douard authoredda5c23bd
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
fe8d5558 -
Antoine Lambert authored
from_disk.Content object created for a symlink was missing path info so ensure to add it for consistency with from_disk.Content object created for a regular file.
818ad826 -
Antoine Lambert authored
There were two issues that was preventing to browse some SWHIDs given as examples in that documentation: - Some sphinx links were broken in rDMODe1c3fe80731226618616117dfd67a95f3d365645 - A SWHID with ';' in its path qualifier was correctly percent escaped but when used as URL argument an extra percent escaping is required as HTTP server will unescape URL arguments and thus break SWHID percent escaping. Closes T4721
f883e224 -
Antoine Lambert authored
In order to remove warnings about /apidoc/*.rst files being included multiple times in toc when building full swh documentation, prefer to include module indices only when building standalone package documentation. Also include them the proper sphinx way. Related to T4496
-
Antoine Lambert authored
This fixes python 3.7 support due to poetry, a dependency of isort, that removed support for that Python version in a recent release.
-
Antoine Lambert authored
Previously when looking up data by key in an ImmutableDict, the inner tuple storing keys and values was iterated until finding the requested key. This is not really efficient when the ImmutableDict contains a lot of entries, typically for an origin snapshot containing a lot of branches. So use an inner dictionary to speedup look up by key operations and improve loader performances.
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-
-
Antoine Lambert authored
Related to swh/meta#4960
-
Antoine Lambert authored
Better using latest mypy release.
-
Jérémy Bobbio (Lunar) authored
This adds several helper methods returning SWHIDs to model objects, namely: - SkippedContent.swhid() - DirectoryEntry.swhid() - SnapshotBranch.swhid() - Release.target_swhid() - Revision.directory_swhid() and Release.parent_swhids() - OriginVisitStatus.origin_swhid() and OriginVisitStatus.snapshot_swhid()
-
Nicolas Dandrimont authored
When parsing the configuration, tox would complain about the unfollowed line continuation (which is what happens when the testenv was qualified with neither full nor minimal). Moving {posargs} to be unqualified allows the line continuation character to always have something behind it.
b6446c1e -
Nicolas Dandrimont authored
Instead of going back to py3, pass through the environment name, so that it can be called with an arbitrary interpreter version.
4413c4b0 -
Nicolas Dandrimont authored
This allows using the "system" tox, if it's recent enough, instead of always provisioning an internal .tox venv with tox 4.
-
Nicolas Dandrimont authored
This separate package was introduced recently and is needed for our CLIs to pass type checking.
-
-
As with other fields containing sha1_git values, display hexadecimal representation of parent revision ids.
-
Antoine Lambert authored
Use a list instead of a tuple to keep mypy happy with latest hypothesis version.
-
Raphaël Gomès authored
`dir_filter` only filters directories. `swh-scanner` needs to accurately filter out ignored files before making expensive requests to the web API. We introduce a more general `path_filter` that allows us to differentiate between files and folders. `dir_filter` is now deprecated and will be removed once the remaining users in other packages are migrated over to the new API. `accept_all_directories` is also deprecated, because it only implies accepting *directories* even though its behavior also accepts non-directory entries when used with `path_filter`.
1286c8a4 -
Nicolas Dandrimont authored
-
vlorentz authored
instead of a mix-in class. A future commit will add a method implemented by both with different signatures that mypy cannot unify yet.
-
David Douard authored
Convert README from markdown to ReST to make it embeddable in docs/index.rst
-
David Douard authoredad6a532a
-
David Douard authored
-
Antoine Lambert authored
When building package documentation outside tox by calling make in the docs directory, the include of Makefile.sphinx inside the docs Makefile was failing as its relative path was invalid. So adapt this relative path according if the SWH_PACKAGE_DOC_TOX_BUILD environment variable is set or not.
-
David Douard authored
-
David Douard authored5ff7c5b5
-
David Douard authored
-
David Douard authored
-
Pierre-Yves David authored
The web Client is no longer using async so we no longer needs it.
1a59d42f -
Pierre-Yves David authored
Right now, the discovery process offered by `filter_known_objects` returns all results after the discovery is complete. The new callback provides a way to get information "in real time" which is useful for at least a couple of planned use case in the SWH scanner: - displaying progress information while processing - update a graphical UI in real time. This simple callback fits this need without too much troubles. For some reason, mypy complained about the existing type hint in this file for unclear reason. So I fixed them.
-
Antoine Lambert authored
Related to swh/meta#5075.
-
-
Franck Bret authored
Add an optional progress callback to `from_disk` method. It can returns the number of computed entries for each top entries traversed. This is useful for CLI, in particular to display progress information for SWH Scanner.
-
vlorentz authored
-
David Douard authored
-
vlorentz authored
Currently the only limit is "enforced" by PostgreSQL. This makes sure that origins created after we switch to Cassandra as the primary storage remain compatible with a PostgreSQL-based storage.
-
Pierre-Yves David authored
Check inline comment for details.
39296ace -
Pierre-Yves David authored
This requires cleaning various item along the ways. Which is probably an added benefit. Especially, mypy now consider BaseModel to hold a object_type attribute.
f47cc1b7 -
Pierre-Yves David authored
This is part of a wider effort to differentiate the various type of "object_type" attribute around the model code base.
02e79499 -
Pierre-Yves David authored
Instead of having multiple class and `object_type` value, we just adds a few lines in the main `model.Content` class to retrieved data on demand. The `with_data` logic already existed there anyway. This will avoid having from_disk extending the model from the outside.
d65a844a -
Pierre-Yves David authored
This sets the pieces in place to finally cleanup the confusion from the various object_type attributes. They now have different type, so we should be able to start detecting error at some point. As for FromDiskType, we keep compatibility with string value for now. This avoid breaking existing code.
8b29444a -
Pierre-Yves David authored
There are two other package using DiskBackedContent "swh-loader-svn" and "swh-loader-cvs". Both use it to check "DiskBackedContent.object_type" at the same time as "model.Content.object_type". so we do this small hack to avoid breaking these other module until they migrate.
-
Having the escaped URL in `swhid.origin` is inconsistent with self.path (which is always escaped) and never what we want, because it is only useful while serializing, which is already handled by `__str__`. This led to swh/devel/swh-indexer#4738 where swh-deposit parsed a qualified SWHID, then used `.origin` to get an origin URL. Additionally, as serialization always escapes the `origin` qualifier, this means that deserializing then re-serializing a qualified SWHID would double-escape it. Finally, fixing this made the test uncover that `%` was not escaped while serializing, while `;` was, leading to incorrect (and ambiguous) escaped URLs.
-
Antoine Lambert authored
A Content object with no attached data or get_data function could no longer be converted to a dict as MissingData exception was raised.
5e9cc05b -
Antoine Lambert authored
-
ObjectType is far too generic and introduces confusion with swhids.ObjectType. So we rename it to a clearer name.
72a0f1a7 -
TargetType is far too generic and introduces confusion with ReleaseTargetType. So we rename it to a clearer name.
-
Antoine Lambert authored
-
Pierre-Yves David authored
It do works, but it is now discouraged.
-
It turns out that calling @deprecated on Content alter the class, and any instantion of "Content" will be wrongly marked as deprecated… So use a different approach too preserve compatibility (and we won't keep it long).
c8ef4083 -
As for the previous commit, using deprecated actually change the source Class leading all usage to raise deprecation warning. So we have to remove the deprecated and keep the compatibility silently for a small while.
-
This test work with cpython because as we do not assign the open file to a variable, the reference counting garbage collect it right after the `write` call. Deleting the file object mean the write is flushed. On an interpreter without reference counting, as pypy, the file object might be garbaged collected too late, breaking the test. So we give the file object a clear life cycle.
-
861fd30c
-
We also typed the function, which turned out more painful than anticipated.
6d8ddab9 -
We also take this as an opportunity to use the blacklist_types feature in that test.
de319040 -
97e47f07
-
And automatically generate it as we are at it.
20d4ee14 -
e3f7a03b
-
05b8e7e0
-
780b98bb
-
object_dicts still need to be migrated.
326e6df6 -
58a7f781
-
git_objects.snapshot_git_object wants Snapshot argument, lets move things to Snapshot object.
-
vlorentz authored
-
David Douard authored
-
David Douard authored
It used to traverse all the directory and filter elements of said tree only afterwards; this version should be a bit smarter and not go too far deep in directories that should be ignored. We cannot just use the subtree eviction mechanism of 'os.walk(topdown=True)' because the filtering callback takes some context of the subdirectory content (typically to be able to evict empty directories). This version of the code is a bit more complex but should do the trick.
f0f21b4d -
David Douard authored
This creates a tree structure from an easy to read textual representation of said structure.
-
vlorentz authored
Using os.walk() does not make much sense when we want to control what directories to recurse into. Additionally, this uses os.scandir directly, which allows us to directly sort symlinks and files apart from directories (while os.walk groups symlinks with directories) without two extra system calls.
-
David Douard authored91e4eb6e
-
David Douard authored
-
Pierre-Yves David authored
Before this patch giving `from_disk` and directory path with final '/'s would confuses the exclusion code as it would assume it need to exclude one extra character to get the basename. This would swallow the first char of the directory to exclude, leading to a righful crash. This would affect directory excluded in the first directory only as sub directory would be called without a '/'. We fix the path at the top of the function as trailing '/'s could confuse other code too in the future.
-
Pierre-Yves David authored
This will be useful to reuse them in other tests.
6e1b7272 -
Pierre-Yves David authored
Before, there was no clean way to feed a qswhid to a json encoder.
-
Hélène Jonin authored
-
vlorentz authored
It was deprecated in 2021, and is not used anywhere anymore.
-
vlorentz authored
-
vlorentz authored
-
Antoine Lambert authored
Bump development tools: mypy, codespell, isort, ... Move all tools configuration in pyproject.toml. Remove no longer needed mypy overrides.
Verified0a5814d8 -
Antoine Lambert authored
-
Antoine Lambert authored
This was used at the time we were building debian packages for swh components but we no longer do that.
-
vlorentz authored
This will allow the Git loader to catch this particular exception, in order to replace the overflowing timestamp with a placeholder value instead of failing to load a whole repository.
Showing
- .copier-answers.yml 11 additions, 0 deletions.copier-answers.yml
- .git-blame-ignore-revs 2 additions, 3 deletions.git-blame-ignore-revs
- .gitignore 12 additions, 11 deletions.gitignore
- .pre-commit-config.yaml 30 additions, 16 deletions.pre-commit-config.yaml
- CODE_OF_CONDUCT.md 1 addition, 1 deletionCODE_OF_CONDUCT.md
- MANIFEST.in 0 additions, 6 deletionsMANIFEST.in
- README.rst 6 additions, 5 deletionsREADME.rst
- bin/swh-hashtree 2 additions, 3 deletionsbin/swh-hashtree
- docs/Makefile 1 addition, 1 deletiondocs/Makefile
- docs/data-model.rst 8 additions, 1 deletiondocs/data-model.rst
- docs/index.rst 10 additions, 5 deletionsdocs/index.rst
- docs/persistent-identifiers.rst 9 additions, 9 deletionsdocs/persistent-identifiers.rst
- mypy.ini 0 additions, 26 deletionsmypy.ini
- pyproject.toml 87 additions, 1 deletionpyproject.toml
- pytest.ini 0 additions, 8 deletionspytest.ini
- requirements-test.txt 4 additions, 1 deletionrequirements-test.txt
- setup.cfg 0 additions, 8 deletionssetup.cfg
- setup.py 0 additions, 76 deletionssetup.py
- swh/__init__.py 0 additions, 3 deletionsswh/__init__.py
- swh/model/cli.py 21 additions, 11 deletionsswh/model/cli.py
.copier-answers.yml
0 → 100644
MANIFEST.in
deleted
100644 → 0
mypy.ini
deleted
100644 → 0
pytest.ini
deleted
100644 → 0
setup.cfg
deleted
100644 → 0
setup.py
deleted
100755 → 0
swh/__init__.py
deleted
100644 → 0