Commits · topology · Platform / Development / swh-graph

Jun 02, 2021
- ClusteringCoefficient: revrel.txt -> relrev.txt · 7057bacf
  Antoine Pietri authored 3 years ago
  
  7057bacf
May 12, 2021
- Replace graph constructor by load/loadMapped calls · 4e6c940c
  Antoine Pietri authored 3 years ago
  
  4e6c940c
- Add ImmutableGraph load methods · 6ae95b68
  Antoine Pietri authored 3 years ago
  
  6ae95b68
May 10, 2021
- InOutDegree: fix switch fallthrough bug · 6ef89157
  Antoine Pietri authored 4 years ago
  
  6ef89157
- topology: InOutDegree: compute per-layer stats · 65c76530
  Antoine Pietri authored 4 years ago
  
  65c76530
- topology: ClusteringCoefficient: work on undirected graph · e960d201
  Antoine Pietri authored 4 years ago
  
  e960d201
- topology: ClusteringCoefficient: remove allowedNodes parameter · b40f861e
  Antoine Pietri authored 4 years ago
  
  b40f861e
- topology: ClusteringCoefficient: version without subgraph · 84a1327a
  Antoine Pietri authored 4 years ago
  
  84a1327a
- topology: ClusteringCoefficient: version with subgraph · 34ba6645
  Antoine Pietri authored 4 years ago
  
  34ba6645
- java: reformat topology files · d6879309
  Antoine Pietri authored 4 years ago
  
  d6879309
- topology: AveragePaths: print the temporary result regularly · 00384d9a
  Antoine Pietri authored 4 years ago
  
  00384d9a
- topology: ConnectedComponents: use new lazy Subgraph · ac7bc2c0
  Antoine Pietri authored 4 years ago
  
  ac7bc2c0
- topology: add AveragePaths.java · 02414c10
  Antoine Pietri authored 4 years ago
  
  02414c10
- Add lazy subgraph implementation · 31387876
  Antoine Pietri authored 4 years ago
  
  31387876
May 08, 2021
- NodeIdMap: more String/MPH compatibility · f7e9ddf7
  Antoine Pietri authored 3 years ago
  
  f7e9ddf7
May 07, 2021
- LabelMapBuilder: use sort by default · 991a96db
  Antoine Pietri authored 3 years ago
  
  991a96db
May 05, 2021
- Add a simple alternative "client" in pure-python · 3e0da947
  vlorentz authored 3 years ago
  
  It allows other components that depend on swh-graph to run their tests without depending on the WebGraph process itself.
  View commits for tag v0.4.0 v0.4.0
  
  3e0da947
May 04, 2021
- client: Raise GraphArgumentException on 4xx response, instead of generic RemoteException. · d0ccf1e5
  vlorentz authored 3 years ago
  
  For consistency with other RPC clients, and because RemoteException is too generic to be useful (it also covers internal errors of the server, which can be temporary errors, unlike GraphArgumentException which is for invalid arguments)
  d0ccf1e5
- Traversal: Fix ignored maxEdges being ignored in overloaded constructor. · 62c2fd37
  vlorentz authored 3 years ago
  
  62c2fd37
- adds a filter by node type as a query argument · 85880ae7
  Hakim Baaloudj authored 3 years ago
  
  85880ae7
- Make test_visit_edges_limited less strict · 1a8b00f4
  vlorentz authored 3 years ago
  
  It made too many assumptions on how swh-graph orders edges, and would break on implementation changes. Motivation: I am working on an alternative implementation.
  1a8b00f4
Apr 27, 2021

s/REST/RPC/ · 6f21897a
vlorentz authored 3 years ago
```
It's a purely RPC API, it does not make sense to call it REST.
```
View commits for tag v0.3.1 v0.3.1

6f21897a

tox: Add sphinx environments to check sane doc build · f049e264

Antoine Lambert authored 3 years ago

Enable to check package documentation can be built without producing
sphinx warnings.

The sphinx environment is designed to be used in continuous integration
in order to prevent breaking documentation build when committing changes.

The sphinx-dev environment is designed to be used inside a full swh
development environment.

Related to T3258

f049e264

Apr 23, 2021
- Add an anti DoS limit for edges traversed as a query parameter. · e8f052a3
  Hakim Baaloudj authored 3 years ago
  
  e8f052a3
Apr 16, 2021

cli: Fix sphinx warning · 04a1656c

Antoine Lambert authored 3 years ago

This fixes the following warning:

<generated>:1: WARNING: Inline emphasis start-string without end-string.

Related to T2265

04a1656c

Apr 15, 2021
- Fix various Sphinx warnings · e86a39d9
  vlorentz authored 3 years ago
  
  e86a39d9
Apr 09, 2021

NodeIdMap: add backward compatibility for loading MPH on strings · 4f751998
Antoine Pietri authored 4 years ago

4f751998

NodeIdMap: use the MPH + mmapped .order to translate SWHID -> node ID · 53bbd5c6

Antoine Pietri authored 4 years ago

Right now we are generating two different binary mappings on disk for
the translation between SWHID <-> webgraph node ID:

1. The node2swh.bin reverse map, which contains a list of binary SWHID.
It allows O(1) access to the SWHID of a given node n by seek()ing to the
position (n * record size).

2. The swhid2node.bin map, which contains a list of <SWHID, node ID>
   ordered by SWHID. The node ID of a given SWHID can be found in
   O(log N) by doing a binary search on this list.

Because the swhid -> node binary search requires multiple seek() on the
disk, it is very slow and cannot be used for performance sensitive code.

The alternative route is to compute the node from the minimal perfect
hash function and then lookup the equivalent node in the permuted graph
using the .order file containing the permutation, which can be done in
O(1) and is extremely fast. However, this has two caveats:

- The .order file is ~150 GB, which would be too big to load in memory.

- MPH cannot check whether their input is valid. They could do so
  probabilistically if we signed them, but when replying to a query in
  the graph service, we want to be absolutely certain that a node is or
  is not present in the graph.

This code mitigates these problems in two ways. First, it memory-maps
the permutation file to avoid loading it in main memory. Second, it uses
a roundtrip check to detect invalid SWHIDs: we hash + permute the
SWHID, then use the reverse map to check that the obtained node ID is
associated to the original SWHID.

This is a big performance gain (basic benchmarks show a ~x3 speedup).
To go even faster, we offer a boolean option to skip the roundtrip
check, to use when we know that the input is always valid.

This will also allow us in the future to remove the swhid2node map
completely, however it is currently still in use by the Python frontend
to encode the SWHIDs. This will be done directly in the Java side in the
future.

53bbd5c6

Apr 06, 2021
- java: fix formatting · 15c2da0f
  Antoine Pietri authored 4 years ago
  
  15c2da0f
Apr 02, 2021
- Recompress test graph with byte array MPH · f055c4ea
  Antoine Pietri authored 4 years ago
  
  f055c4ea
- Compress graph with byte arrays instead of strings · 7eef7cb3
  Antoine Pietri authored 4 years ago
  
  7eef7cb3
Mar 31, 2021
- docs: drop mention of conffile in quickstart · 8d30918c
  Stefano Zacchiroli authored 4 years ago
  
  tuning batch size by hand is no longer needed since 5a987aae
  8d30918c
Mar 23, 2021
- Merge branch 'label_permissions' · 469d7561
  Antoine Pietri authored 4 years ago
  
  469d7561
Mar 15, 2021

FindEarliestRevision: bug fix: do not follow rev:rev edges · 58b46f78

Stefano Zacchiroli authored 4 years ago

Rationale: we do not need to do so at all, because we are only interested in
commits that directly contain the content or dir (at some depth), nor in any of
their successors.

Note: this bug was not responsible for wrong answers in most cases (because
successors will tend to have higher timestamps), but incurred significant extra
time (4x in early benchmarks) due to exploring significant commit histories for
any file/dir appearing in large projects.

58b46f78

Mar 11, 2021
- FindEarliestRevision: make it work as a *nix filter and add accounting · e0ef3b9b
  Stefano Zacchiroli authored 4 years ago
  
  e0ef3b9b
Feb 26, 2021
- LabelMapBuilder: add TextualEdgeLabelLineIterator, fix BSort · 968f9c6c
  Antoine Pietri authored 4 years ago
  
  968f9c6c
- docs: link to official SANER 2020 paper in the proceedings · 21de1e13
  Stefano Zacchiroli authored 4 years ago
  
  21de1e13
Feb 25, 2021
- LabelMapBuilder: support both sorting methods · 0aa06168
  Antoine Pietri authored 4 years ago
  
  0aa06168
- LabelMapBuilder: refactor logic in separate line iterators · 4e2fedc3
  Antoine Pietri authored 4 years ago
  
  4e2fedc3
- Use MPH functions operating on byte arrays · 19f7da78
  Antoine Pietri authored 4 years ago
  
  19f7da78