- Aug 10, 2021
-
-
vlorentz authored
Most of the time is spent maxing out the CPU in the Python process. This change has two effects: 1. lines are joined before being encoded (instead of encoding them one-by-one) 2. larger network packets are sent, instead of a single packet per line I don't know which affects the performance, but overall, this is a consistent 25 to 35% speed-up to the overall run time of SimpleTraversalView.
-
vlorentz authored
-
vlorentz authored
-
- Jul 27, 2021
-
-
vlorentz authored
This required a minor Backend refactoring to make it work without context managers.
-
- Jul 09, 2021
-
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
- Jun 19, 2021
-
-
Stefano Zacchiroli authored
-
Stefano Zacchiroli authored
this is equivalant to git2graph, but does not rely on the git object storage (and of course does not crawl revisions)
-
- Jun 17, 2021
-
-
Stefano Zacchiroli authored
-
- Jun 15, 2021
-
- Jun 09, 2021
-
-
Antoine R. Dumont authored
-
- Jun 02, 2021
-
-
Antoine Pietri authored
-
- May 12, 2021
-
-
Antoine Pietri authored
-
Antoine Pietri authored
-
- May 10, 2021
-
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
Antoine Pietri authored
-
- May 08, 2021
-
-
Antoine Pietri authored
-
- May 07, 2021
-
-
Antoine Pietri authored
-
- May 05, 2021
-
-
vlorentz authored
It allows other components that depend on swh-graph to run their tests without depending on the WebGraph process itself.
-
- May 04, 2021
-
-
vlorentz authored
For consistency with other RPC clients, and because RemoteException is too generic to be useful (it also covers internal errors of the server, which can be temporary errors, unlike GraphArgumentException which is for invalid arguments)
-
vlorentz authored
-
Hakim Baaloudj authored
-
vlorentz authored
It made too many assumptions on how swh-graph orders edges, and would break on implementation changes. Motivation: I am working on an alternative implementation.
-
- Apr 27, 2021
-
-
vlorentz authored
It's a purely RPC API, it does not make sense to call it REST.
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 23, 2021
-
-
Hakim Baaloudj authored
-
- Apr 16, 2021
-
-
Antoine Lambert authored
This fixes the following warning: <generated>:1: WARNING: Inline emphasis start-string without end-string. Related to T2265
-
- Apr 15, 2021
-
-
vlorentz authored
-
- Apr 09, 2021
-
-
Antoine Pietri authored
-
Antoine Pietri authored
Right now we are generating two different binary mappings on disk for the translation between SWHID <-> webgraph node ID: 1. The node2swh.bin reverse map, which contains a list of binary SWHID. It allows O(1) access to the SWHID of a given node n by seek()ing to the position (n * record size). 2. The swhid2node.bin map, which contains a list of <SWHID, node ID> ordered by SWHID. The node ID of a given SWHID can be found in O(log N) by doing a binary search on this list. Because the swhid -> node binary search requires multiple seek() on the disk, it is very slow and cannot be used for performance sensitive code. The alternative route is to compute the node from the minimal perfect hash function and then lookup the equivalent node in the permuted graph using the .order file containing the permutation, which can be done in O(1) and is extremely fast. However, this has two caveats: - The .order file is ~150 GB, which would be too big to load in memory. - MPH cannot check whether their input is valid. They could do so probabilistically if we signed them, but when replying to a query in the graph service, we want to be absolutely certain that a node is or is not present in the graph. This code mitigates these problems in two ways. First, it memory-maps the permutation file to avoid loading it in main memory. Second, it uses a roundtrip check to detect invalid SWHIDs: we hash + permute the SWHID, then use the reverse map to check that the obtained node ID is associated to the original SWHID. This is a big performance gain (basic benchmarks show a ~x3 speedup). To go even faster, we offer a boolean option to skip the roundtrip check, to use when we know that the input is always valid. This will also allow us in the future to remove the swhid2node map completely, however it is currently still in use by the Python frontend to encode the SWHIDs. This will be done directly in the Java side in the future.
-