- Nov 16, 2023
-
-
David Douard authored
Convert README from markdown to ReST to make it embeddable in docs/index.rst
-
- Nov 15, 2023
-
-
vlorentz authored
instead of a mix-in class. A future commit will add a method implemented by both with different signatures that mypy cannot unify yet.
-
- Nov 14, 2023
-
-
Nicolas Dandrimont authored
-
Raphaël Gomès authored
`dir_filter` only filters directories. `swh-scanner` needs to accurately filter out ignored files before making expensive requests to the web API. We introduce a more general `path_filter` that allows us to differentiate between files and folders. `dir_filter` is now deprecated and will be removed once the remaining users in other packages are migrated over to the new API. `accept_all_directories` is also deprecated, because it only implies accepting *directories* even though its behavior also accepts non-directory entries when used with `path_filter`.
-
- Sep 25, 2023
-
-
Antoine Lambert authored
Use a list instead of a tuple to keep mypy happy with latest hypothesis version.
-
- Aug 29, 2023
-
-
As with other fields containing sha1_git values, display hexadecimal representation of parent revision ids.
-
- Aug 21, 2023
- Jul 12, 2023
-
-
Nicolas Dandrimont authored
This separate package was introduced recently and is needed for our CLIs to pass type checking.
-
- Jun 14, 2023
-
-
Nicolas Dandrimont authored
This allows using the "system" tox, if it's recent enough, instead of always provisioning an internal .tox venv with tox 4.
-
Nicolas Dandrimont authored
Instead of going back to py3, pass through the environment name, so that it can be called with an arbitrary interpreter version.
-
Nicolas Dandrimont authored
When parsing the configuration, tox would complain about the unfollowed line continuation (which is what happens when the testenv was qualified with neither full nor minimal). Moving {posargs} to be unqualified allows the line continuation character to always have something behind it.
-
- Mar 16, 2023
-
-
Jérémy Bobbio (Lunar) authored
This adds several helper methods returning SWHIDs to model objects, namely: - SkippedContent.swhid() - DirectoryEntry.swhid() - SnapshotBranch.swhid() - Release.target_swhid() - Revision.directory_swhid() and Release.parent_swhids() - OriginVisitStatus.origin_swhid() and OriginVisitStatus.snapshot_swhid()
-
- Feb 17, 2023
-
-
Antoine Lambert authored
Better using latest mypy release.
-
Antoine Lambert authored
Related to swh/meta#4960
-
- Feb 16, 2023
-
-
-
Jérémy Bobbio (Lunar) authored
Related to swh/meta#4959
-
- Feb 13, 2023
-
-
Antoine Lambert authored
Previously when looking up data by key in an ImmutableDict, the inner tuple storing keys and values was iterated until finding the requested key. This is not really efficient when the ImmutableDict contains a lot of entries, typically for an origin snapshot containing a lot of branches. So use an inner dictionary to speedup look up by key operations and improve loader performances.
-
- Feb 02, 2023
-
-
Antoine Lambert authored
This fixes python 3.7 support due to poetry, a dependency of isort, that removed support for that Python version in a recent release.
-
- Dec 19, 2022
-
-
Antoine Lambert authored
In order to remove warnings about /apidoc/*.rst files being included multiple times in toc when building full swh documentation, prefer to include module indices only when building standalone package documentation. Also include them the proper sphinx way. Related to T4496
-
- Dec 15, 2022
-
-
Antoine Lambert authored
There were two issues that was preventing to browse some SWHIDs given as examples in that documentation: - Some sphinx links were broken in rDMODe1c3fe80731226618616117dfd67a95f3d365645 - A SWHID with ';' in its path qualifier was correctly percent escaped but when used as URL argument an extra percent escaping is required as HTTP server will unescape URL arguments and thus break SWHID percent escaping. Closes T4721
-
- Dec 05, 2022
-
-
Antoine Lambert authored
from_disk.Content object created for a symlink was missing path info so ensure to add it for consistency with from_disk.Content object created for a regular file.
-
- Oct 18, 2022
-
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
David Douard authored
-
- Oct 17, 2022
-
-
Antoine Lambert authored
When using attr < 21.3.0, adding field transformer breaks attrs integration with hypothesis, because attributes transformed with such function are not casted to generated AttrsClass, but remains just an list of attributes. This causes error in hypothesis by raising an AttributeError. As we use attr 21.2.0 in production and when building debian buster package, add a workaround for that issue as explained here: https://github.com/python-attrs/attrs/issues/821.
-
Antoine Lambert authored
Previously the MerkleNode.collect method was returning a dict whose keys are node types and values dict of {<node_hash>: <node_data>}. In order to give more flexibility to client code for the processing of collected nodes, prefer to simply return a set of MerkleNode. As a consequence, MerkleNode objects need to be hashable by Python so the __hash__ method has also been implemented. Closes T4633
-
- Sep 30, 2022
-
-
Antoine Lambert authored
It exists use cases where sha512 checksums need to be computed (content integrity checks for instances) so add sha512 in the list of hashing algorithms supported by the MultiHash class.
-
- Sep 29, 2022
-
-
Pierre-Yves David authored
The pattern were validated from $PWD and later applied on path relative to `root_path`. So we shuffle a bit of code to test them againt root_path. We make the absolute pattern relative in the same go. This code is coming from swh-scanner and should probably get an overhaul, how ever for now we start with making it no broken.
-
- Sep 23, 2022
-
-
Pierre-Yves David authored
This reduce the number of function call and should be faster. The mashup of blind optimisation in the previous changeset yield some interesting results in total. It would be insightful to measure them individually, but that would take more time than we currently have. When testing all the validator changes on our previous "benchmark" we see quite interesting improvement. swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = base: 17 minutes 48 seconds before: 11 minutes 50 seconds after: 11 minutes 11 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: base: 43% before: 15% after: 11%
-
Pierre-Yves David authored
(This commit is actually doing two things /o\) - we inline the type-checking in the custom validator to reduce the number of function call. - we optimize some of the custom validator by skipping the creation of intermediate tuples.
-
Pierre-Yves David authored
Since all `generic_type_validator` are optimized away, the code will no longer be called. So we remove that code to avoid any drifting. A nice "exception" is provided in case this start getting called again in the future.
-
Pierre-Yves David authored
Since try/except context are known to be expensive in Python, it seems useful to remove them.
-
Pierre-Yves David authored
This ensure we don't have any remaining `generic_type_validator` call that have not been optimized away.
-
Pierre-Yves David authored
This indirection seems useless and is probably the remains of some long forgotten rituals.
-
Pierre-Yves David authored
This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. Some of this will be done in later changesets.
-
Pierre-Yves David authored
This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants.
-
Pierre-Yves David authored
Before this change we would do the following : 1) translate from_disk's object into `dict`, 2) sort these dict, 3) feed the list to `Directory.from_dict`, 4) create DirectoryEntry from these dict. Skipping the directory creating and directly creating the DirectoryEntries provide us with a small but stable and noticeable performance win. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 11 minute 56 seconds aftere: 11 minute 50 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 17% after: 15%
-
Pierre-Yves David authored
Do not create model object while sorting entry before creating model object. This is another case of "let us create object X to prepare the creation of object X", slowing things down. In practice, we will likely skip this code-path after the next changeset, however this seems useful to get this performance footgun out the way. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before 12 minutes 59 seconds after: 11 minute 56 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 24% after: 17%
-
Pierre-Yves David authored
Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 17 minutes 48 seconds after: 12 minutes 59 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 43% after: 24%
-
- Aug 30, 2022
-
- Aug 08, 2022
-