- Feb 14, 2023
-
-
Jenkins for Software Heritage authored
Update to upstream version '6.6.2' with Debian dir e5c2f227d7e6087b3152d6160e9502f4199a6105
- Feb 13, 2023
-
-
Antoine Lambert authored
Previously when looking up data by key in an ImmutableDict, the inner tuple storing keys and values was iterated until finding the requested key. This is not really efficient when the ImmutableDict contains a lot of entries, typically for an origin snapshot containing a lot of branches. So use an inner dictionary to speedup look up by key operations and improve loader performances.
-
- Feb 02, 2023
-
-
Antoine Lambert authored
This fixes python 3.7 support due to poetry, a dependency of isort, that removed support for that Python version in a recent release.
-
- Dec 19, 2022
-
-
Antoine Lambert authored
In order to remove warnings about /apidoc/*.rst files being included multiple times in toc when building full swh documentation, prefer to include module indices only when building standalone package documentation. Also include them the proper sphinx way. Related to T4496
-
- Dec 15, 2022
-
-
Jenkins for Software Heritage authored
Update to upstream version '6.6.1' with Debian dir 5264b8650def64f97ea2b7aae15cadc2dc496623
-
Antoine Lambert authored
There were two issues that was preventing to browse some SWHIDs given as examples in that documentation: - Some sphinx links were broken in rDMODe1c3fe80731226618616117dfd67a95f3d365645 - A SWHID with ';' in its path qualifier was correctly percent escaped but when used as URL argument an extra percent escaping is required as HTTP server will unescape URL arguments and thus break SWHID percent escaping. Closes T4721
- Dec 05, 2022
-
-
Antoine Lambert authored
from_disk.Content object created for a symlink was missing path info so ensure to add it for consistency with from_disk.Content object created for a regular file.
-
- Oct 18, 2022
-
-
David Douard authored
- pre-commit from 4.1.0 to 4.3.0, - codespell from 2.2.1 to 2.2.2, - black from 22.3.0 to 22.10.0 and - flake8 from 4.0.1 to 5.0.4. Also freeze flake8 dependencies. Also change flake8's repo config to github (the gitlab mirror being outdated).
-
David Douard authored
-
- Oct 17, 2022
-
-
Jenkins for Software Heritage authored
Update to upstream version '6.6.0' with Debian dir 18c244cdb4270a0e59b359f3522dd0a7f1bf55f8
-
Antoine Lambert authored
When using attr < 21.3.0, adding field transformer breaks attrs integration with hypothesis, because attributes transformed with such function are not casted to generated AttrsClass, but remains just an list of attributes. This causes error in hypothesis by raising an AttributeError. As we use attr 21.2.0 in production and when building debian buster package, add a workaround for that issue as explained here: https://github.com/python-attrs/attrs/issues/821.
-
Antoine Lambert authored
Previously the MerkleNode.collect method was returning a dict whose keys are node types and values dict of {<node_hash>: <node_data>}. In order to give more flexibility to client code for the processing of collected nodes, prefer to simply return a set of MerkleNode. As a consequence, MerkleNode objects need to be hashable by Python so the __hash__ method has also been implemented. Closes T4633
- Sep 30, 2022
-
-
Jenkins for Software Heritage authored
Update to upstream version '6.5.1' with Debian dir a243e7419a0f035452acb893131c246c9f49946a
-
Antoine Lambert authored
It exists use cases where sha512 checksums need to be computed (content integrity checks for instances) so add sha512 in the list of hashing algorithms supported by the MultiHash class.
- Sep 29, 2022
-
-
Pierre-Yves David authored
The pattern were validated from $PWD and later applied on path relative to `root_path`. So we shuffle a bit of code to test them againt root_path. We make the absolute pattern relative in the same go. This code is coming from swh-scanner and should probably get an overhaul, how ever for now we start with making it no broken.
-
- Sep 26, 2022
-
-
Jenkins for Software Heritage authored
Update to upstream version '6.5.0' with Debian dir 58d5f805181a3d5452f6da0fec2d4dd6cec629b6
- Sep 23, 2022
-
-
Pierre-Yves David authored
This reduce the number of function call and should be faster. The mashup of blind optimisation in the previous changeset yield some interesting results in total. It would be insightful to measure them individually, but that would take more time than we currently have. When testing all the validator changes on our previous "benchmark" we see quite interesting improvement. swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = base: 17 minutes 48 seconds before: 11 minutes 50 seconds after: 11 minutes 11 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: base: 43% before: 15% after: 11%
-
Pierre-Yves David authored
(This commit is actually doing two things /o\) - we inline the type-checking in the custom validator to reduce the number of function call. - we optimize some of the custom validator by skipping the creation of intermediate tuples.
-
Pierre-Yves David authored
Since all `generic_type_validator` are optimized away, the code will no longer be called. So we remove that code to avoid any drifting. A nice "exception" is provided in case this start getting called again in the future.
-
Pierre-Yves David authored
Since try/except context are known to be expensive in Python, it seems useful to remove them.
-
Pierre-Yves David authored
This ensure we don't have any remaining `generic_type_validator` call that have not been optimized away.
-
Pierre-Yves David authored
This indirection seems useless and is probably the remains of some long forgotten rituals.
-
Pierre-Yves David authored
This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. Some of this will be done in later changesets.
-
Pierre-Yves David authored
This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants.
-
Pierre-Yves David authored
Before this change we would do the following : 1) translate from_disk's object into `dict`, 2) sort these dict, 3) feed the list to `Directory.from_dict`, 4) create DirectoryEntry from these dict. Skipping the directory creating and directly creating the DirectoryEntries provide us with a small but stable and noticeable performance win. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 11 minute 56 seconds aftere: 11 minute 50 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 17% after: 15%
-
Pierre-Yves David authored
Do not create model object while sorting entry before creating model object. This is another case of "let us create object X to prepare the creation of object X", slowing things down. In practice, we will likely skip this code-path after the next changeset, however this seems useful to get this performance footgun out the way. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before 12 minutes 59 seconds after: 11 minute 56 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 24% after: 17%
-
Pierre-Yves David authored
Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 17 minutes 48 seconds after: 12 minutes 59 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 43% after: 24%
-
- Aug 31, 2022
-
-
Jenkins for Software Heritage authored
Update to upstream version '6.4.1' with Debian dir 057ba180164d4c2d0d5dff8360fd67d0f915cc5b