- Apr 27, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 23, 2021
-
-
Hakim Baaloudj authored
-
- Apr 16, 2021
-
-
Antoine Lambert authored
This fixes the following warning: <generated>:1: WARNING: Inline emphasis start-string without end-string. Related to T2265
-
- Apr 15, 2021
-
-
vlorentz authored
-
- Apr 09, 2021
-
-
Antoine Pietri authored
-
Antoine Pietri authored
Right now we are generating two different binary mappings on disk for the translation between SWHID <-> webgraph node ID: 1. The node2swh.bin reverse map, which contains a list of binary SWHID. It allows O(1) access to the SWHID of a given node n by seek()ing to the position (n * record size). 2. The swhid2node.bin map, which contains a list of <SWHID, node ID> ordered by SWHID. The node ID of a given SWHID can be found in O(log N) by doing a binary search on this list. Because the swhid -> node binary search requires multiple seek() on the disk, it is very slow and cannot be used for performance sensitive code. The alternative route is to compute the node from the minimal perfect hash function and then lookup the equivalent node in the permuted graph using the .order file containing the permutation, which can be done in O(1) and is extremely fast. However, this has two caveats: - The .order file is ~150 GB, which would be too big to load in memory. - MPH cannot check whether their input is valid. They could do so probabilistically if we signed them, but when replying to a query in the graph service, we want to be absolutely certain that a node is or is not present in the graph. This code mitigates these problems in two ways. First, it memory-maps the permutation file to avoid loading it in main memory. Second, it uses a roundtrip check to detect invalid SWHIDs: we hash + permute the SWHID, then use the reverse map to check that the obtained node ID is associated to the original SWHID. This is a big performance gain (basic benchmarks show a ~x3 speedup). To go even faster, we offer a boolean option to skip the roundtrip check, to use when we know that the input is always valid. This will also allow us in the future to remove the swhid2node map completely, however it is currently still in use by the Python frontend to encode the SWHIDs. This will be done directly in the Java side in the future.
-
- Apr 06, 2021
-
-
Antoine Pietri authored
-
- Apr 02, 2021
-
-
Antoine Pietri authored
-
Antoine Pietri authored
-
- Mar 31, 2021
-
-
Stefano Zacchiroli authored
tuning batch size by hand is no longer needed since 5a987aae
-
- Mar 23, 2021
-
-
Antoine Pietri authored
-
- Mar 15, 2021
-
-
Stefano Zacchiroli authored
Rationale: we do not need to do so at all, because we are only interested in commits that directly contain the content or dir (at some depth), nor in any of their successors. Note: this bug was not responsible for wrong answers in most cases (because successors will tend to have higher timestamps), but incurred significant extra time (4x in early benchmarks) due to exploring significant commit histories for any file/dir appearing in large projects.
-
- Mar 11, 2021
-
-
Stefano Zacchiroli authored
-
- Feb 26, 2021
-
-
Antoine Pietri authored
-
Stefano Zacchiroli authored
-
- Feb 25, 2021
-
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
vlorentz authored
The SWHID class is deprecated, and no longer supports 'ori' objects.
-
- Feb 24, 2021
-
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
- Feb 23, 2021
-
-
Antoine Pietri authored
-
- Feb 12, 2021
-
-
Antoine Pietri authored
-
- Feb 09, 2021
-
-
Antoine Pietri authored
-
- Feb 03, 2021
-
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
- Jan 08, 2021
-
-
-
Antoine Pietri authored
Fix T2595
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
- Jan 07, 2021
-
-
Antoine Pietri authored
-
- Nov 12, 2020
-
-
Antoine Lambert authored
Since aiohttp 3.7, the internal handling of aiohttp.web.HTTPException has changed and a aiohttp.web_response.Response is now constructed from the exception text then returned. aiohttp exception constructions in swh-graph were passing error message through the body keyword parameter instead of the text one, leading to an error related to unicode decoding as body is expected to be bytes. Using the text keyword parameter when constructing aiohttp exceptions now ensures that response body will be properly encoded. Closes T2768
-
- Nov 08, 2020
-
-
Stefano Zacchiroli authored
-
- Nov 07, 2020
-
-
Stefano Zacchiroli authored
-
- Oct 06, 2020
-
-
Thibault Allançon authored
- Replace the old (and unused) .clang-format with Spotless - Add a pre-commit hook to enforce style - Fix style on the Java codebase
-
- Oct 05, 2020
-
-
Antoine Pietri authored
-
Antoine Pietri authored
-