- Oct 17, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '6.6.0' with Debian dir 18c244cdb4270a0e59b359f3522dd0a7f1bf55f8
-
Antoine Lambert authored
When using attr < 21.3.0, adding field transformer breaks attrs integration with hypothesis, because attributes transformed with such function are not casted to generated AttrsClass, but remains just an list of attributes. This causes error in hypothesis by raising an AttributeError. As we use attr 21.2.0 in production and when building debian buster package, add a workaround for that issue as explained here: https://github.com/python-attrs/attrs/issues/821.
-
Antoine Lambert authored
Previously the MerkleNode.collect method was returning a dict whose keys are node types and values dict of {<node_hash>: <node_data>}. In order to give more flexibility to client code for the processing of collected nodes, prefer to simply return a set of MerkleNode. As a consequence, MerkleNode objects need to be hashable by Python so the __hash__ method has also been implemented. Closes T4633
- Sep 30, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '6.5.1' with Debian dir a243e7419a0f035452acb893131c246c9f49946a
-
Antoine Lambert authored
It exists use cases where sha512 checksums need to be computed (content integrity checks for instances) so add sha512 in the list of hashing algorithms supported by the MultiHash class.
- Sep 29, 2022
-
-
Pierre-Yves David authored
The pattern were validated from $PWD and later applied on path relative to `root_path`. So we shuffle a bit of code to test them againt root_path. We make the absolute pattern relative in the same go. This code is coming from swh-scanner and should probably get an overhaul, how ever for now we start with making it no broken.
-
- Sep 26, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '6.5.0' with Debian dir 58d5f805181a3d5452f6da0fec2d4dd6cec629b6
- Sep 23, 2022
-
-
Pierre-Yves David authored
This reduce the number of function call and should be faster. The mashup of blind optimisation in the previous changeset yield some interesting results in total. It would be insightful to measure them individually, but that would take more time than we currently have. When testing all the validator changes on our previous "benchmark" we see quite interesting improvement. swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = base: 17 minutes 48 seconds before: 11 minutes 50 seconds after: 11 minutes 11 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: base: 43% before: 15% after: 11%
-
Pierre-Yves David authored
(This commit is actually doing two things /o\) - we inline the type-checking in the custom validator to reduce the number of function call. - we optimize some of the custom validator by skipping the creation of intermediate tuples.
-
Pierre-Yves David authored
Since all `generic_type_validator` are optimized away, the code will no longer be called. So we remove that code to avoid any drifting. A nice "exception" is provided in case this start getting called again in the future.
-
Pierre-Yves David authored
Since try/except context are known to be expensive in Python, it seems useful to remove them.
-
Pierre-Yves David authored
This ensure we don't have any remaining `generic_type_validator` call that have not been optimized away.
-
Pierre-Yves David authored
This indirection seems useless and is probably the remains of some long forgotten rituals.
-
Pierre-Yves David authored
This should reduces function calls and speeds things up. It might be useful to introduce even more specialized validator in the future. It would also be useful to skip the intermediate try/except. Some of this will be done in later changesets.
-
Pierre-Yves David authored
This is currently doing nothing, but prepare for actually changing the generic validator into faster specialized variants.
-
Pierre-Yves David authored
Before this change we would do the following : 1) translate from_disk's object into `dict`, 2) sort these dict, 3) feed the list to `Directory.from_dict`, 4) create DirectoryEntry from these dict. Skipping the directory creating and directly creating the DirectoryEntries provide us with a small but stable and noticeable performance win. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 11 minute 56 seconds aftere: 11 minute 50 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 17% after: 15%
-
Pierre-Yves David authored
Do not create model object while sorting entry before creating model object. This is another case of "let us create object X to prepare the creation of object X", slowing things down. In practice, we will likely skip this code-path after the next changeset, however this seems useful to get this performance footgun out the way. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before 12 minutes 59 seconds after: 11 minute 56 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 24% after: 17%
-
Pierre-Yves David authored
Before this change, a Directory object was built to compute the `id` of we fed to the Directory object we built for `to_model`. We tested this change on simple information of the Mercurial loader, with a noop-loader stockage: swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel = Median time of 3 run = before: 17 minutes 48 seconds after: 12 minutes 59 seconds On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage: before: 43% after: 24%
-
- Aug 31, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '6.4.1' with Debian dir 057ba180164d4c2d0d5dff8360fd67d0f915cc5b
- Aug 30, 2022
-
- Aug 12, 2022
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '6.4.0' with Debian dir 57a9924f627e97fd6dad7f13a16705030ad01fca