- Sep 30, 2022
-
-
Antoine Lambert authored
It exists use cases where sha512 checksums need to be computed (content integrity checks for instances) so add sha512 in the list of hashing algorithms supported by the MultiHash class.
- Sep 29, 2022
-
-
Pierre-Yves David authored
The pattern were validated from $PWD and later applied on path relative to `root_path`. So we shuffle a bit of code to test them againt root_path. We make the absolute pattern relative in the same go. This code is coming from swh-scanner and should probably get an overhaul, how ever for now we start with making it no broken.
-
- Sep 26, 2022
- Sep 23, 2022
-
-
Pierre-Yves David authored
This reduce the number of function call and should be faster. The mashup of blind optimisation in the previous changeset yield some interesting results in total. It would be insightful to measure them individually, but that would take more time than we currently have. When testing all the validator changes on our previous "benchmark" we see quite interesting improvement. swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = base: 17 minutes 48 seconds before: 11 minutes 50 seconds after: 11 minutes 11 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: base: 43% before: 15% after: 11%
-
Pierre-Yves David authored
(This commit is actually doing two things /o\) - we inline the type-checking in the custom validator to reduce the number of function call. - we optimize some of the custom validator by skipping the creation of intermediate tuples.
-
Pierre-Yves David authored
Since all `generic_type_validator` are optimized away, the code will no longer be called. So we remove that code to avoid any drifting. A nice "exception" is provided in case this start getting called again in the future.
-
Pierre-Yves David authored
Since try/except context are known to be expensive in Python, it seems useful to remove them.
-
Pierre-Yves David authored
This ensure we don't have any remaining `generic_type_validator` call that have not been optimized away.
-
Pierre-Yves David authored
This indirection seems useless and is probably the remains of some long forgotten rituals.
-
Pierre-Yves David authored
This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. Some of this will be done in later changesets.
-
Pierre-Yves David authored
This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants.
-
Pierre-Yves David authored
Before this change we would do the following : 1) translate from_disk's object into `dict`, 2) sort these dict, 3) feed the list to `Directory.from_dict`, 4) create DirectoryEntry from these dict. Skipping the directory creating and directly creating the DirectoryEntries provide us with a small but stable and noticeable performance win. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 11 minute 56 seconds aftere: 11 minute 50 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 17% after: 15%
-
Pierre-Yves David authored
Do not create model object while sorting entry before creating model object. This is another case of "let us create object X to prepare the creation of object X", slowing things down. In practice, we will likely skip this code-path after the next changeset, however this seems useful to get this performance footgun out the way. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before 12 minutes 59 seconds after: 11 minute 56 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 24% after: 17%
-
Pierre-Yves David authored
Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 17 minutes 48 seconds after: 12 minutes 59 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 43% after: 24%
-
- Aug 31, 2022
- Aug 30, 2022
-
- Aug 12, 2022
- Aug 08, 2022
-
-
vlorentz authored
This is needed by swh-scrubber when recomputing the hash of such snapshots.
- Aug 04, 2022
- Jul 20, 2022
- Jul 19, 2022
-
- Jul 12, 2022
- Jul 11, 2022
-
- Jul 06, 2022
-
-
vlorentz authored
It will be used by swh.storage.backfiller (so indirectly, swh.scrubber) to load directories from the postgresql database, whose schema accidentally allowed directories with duplicate entries -- without corrupting the shape of the directory too much.
-
- Jul 04, 2022
-
-
vlorentz authored
-
- May 09, 2022
-
-
Pratyush authored
-
- May 01, 2022
-
-
John Ericson authored
We can use `format_git_object_from_parts` inside it.
-
- Apr 27, 2022
-
-
John Ericson authored
This would be useful for the IPFS bridge, and seems good to complete the API in any sense.
- Apr 26, 2022
-
-
vlorentz authored
-
- Apr 21, 2022
-
-
Antoine Lambert authored
That hook can be frustrating as it can discard a long commit message if it finds a typo in it so better removing it.
-
- Apr 11, 2022
-
-
David Douard authored
it's a piece of information used several times in the swh stack.
- Apr 08, 2022
-
-
Antoine Lambert authored
-
Antoine Lambert authored
Related to T3922
-
Antoine Lambert authored
black is considered stable since release 22.1.0 and the version we are currently using is quite outdated and not compatible with click 8.1.0, so it is time to bump it to its latest stable release. Please note that E501 pycodestyle warning related to line length is replaced by B950 one from flake8-bugbear as recommended by black. https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#line-length Related to T3922
-