Commits · fad1e71325df480ef66f083d8d05c3f7668d61a8 · Antoine Lambert / swh-graph

Nov 12, 2019
- app.py: fix wrong invocation of node_of_pid in /walk · fad1e713
  Stefano Zacchiroli authored 5 years ago
  
  fad1e713
- REST server: preliminary support for HEAD requests · f9bdeef3
  Stefano Zacchiroli authored 5 years ago
  
  f9bdeef3
- REST server: validate all query parameters and refactor validators · cc712446
  Stefano Zacchiroli authored 5 years ago
  
  cc712446
- REST server: add validation for PID parameters · 0ab76fb7
  Stefano Zacchiroli authored 5 years ago
  
  0ab76fb7
- REST API doc: document /count method variants · ed9e0c00
  Stefano Zacchiroli authored 5 years ago
  
  ed9e0c00
- REST API doc: update to match current aiohttp implementation · 779fc85f
  Stefano Zacchiroli authored 5 years ago
  
  779fc85f
- app.py: inline MIME types, they are single use anyway · 72d62a1a
  Stefano Zacchiroli authored 5 years ago
  
  72d62a1a
- REST server: set content-type to text or ndjson where appropriate · e602cdd3
  Stefano Zacchiroli authored 5 years ago
  
  e602cdd3
Nov 11, 2019
- CLI: add "swh graph map lookup" to lookup values in binary maps · 2ad5d55e
  Stefano Zacchiroli authored 5 years ago
  
  2ad5d55e
- backend.py: log which JAR is being used and warn if multiple ones exist · 0b0e84ad
  Stefano Zacchiroli authored 5 years ago
  
  0b0e84ad
Nov 09, 2019
- doc: stop building javadoc for now, as we do not ship it anyway · 6290eda4
  Stefano Zacchiroli authored 5 years ago
  
  ... and it is brittle enough to make the overall doc build fail fairly often. We will reconsider building it when we can actually shipped it (see T1971)
  6290eda4
- doc: integrate git2graph doc into top-level doc and toc · ddf77c8d
  Stefano Zacchiroli authored 5 years ago
  
  ddf77c8d
- git2graph doc: explain why only HEAD is supported among symbolic refs · 222bfbd6
  Stefano Zacchiroli authored 5 years ago
  
  222bfbd6
- git2graph: update doc and benchmark to use zstd · 5ddbc352
  Stefano Zacchiroli authored 5 years ago
  
  5ddbc352
- javadoc: fix docstring syntax error in NodeTypesMap · fcf6f1bf
  Stefano Zacchiroli authored 5 years ago
  
  fix docs.s.o build failure
  fcf6f1bf
- Makefile: add convenience target "java-doc" · 87af24e3
  Stefano Zacchiroli authored 5 years ago
  
  87af24e3
- doc: update docker-related documentation and scripts · 353b91bf
  Stefano Zacchiroli authored 5 years ago
  
  353b91bf
Nov 08, 2019

webgraph.py: use named pipes to read zst decompression output · 48f1802d

This avoids that the MPH step loads the full (decompressed!) nodes file in
memory.

To achieve this, force usage of /bin/bash as shell to run the various steps.

48f1802d

Nov 07, 2019
- Setup.java: remove unused import · 5027a9e4
  Stefano Zacchiroli authored 5 years ago
  
  5027a9e4
- switch compression pipeline from gzip to zstd · 18775d9b
  Stefano Zacchiroli authored 5 years ago
  
  18775d9b
- webgraph.py: make sure {logback} is always interpolated · 3d4ce98d
  Stefano Zacchiroli authored 5 years ago
  
  before this change it wasn't interpolated in case java_tool_options was given in configuration (and contained '{logback}')
  3d4ce98d
- map generation: further logging tuning · 6d9e6725
  Stefano Zacchiroli authored 5 years ago
  
  - add logging for begin/end of loading steps (.mph, .order) - add logging of local speed for pid->node, because average speed might be skewed by temporary sort hangs
  6d9e6725
Nov 06, 2019
- map generation: distinguish log lines by map type · 20f50405
  Stefano Zacchiroli authored 5 years ago
  
  20f50405
- map generation: reduce sort buffer size (by *cough* 1024x *cough*) · 44c6c653
  Stefano Zacchiroli authored 5 years ago
  
  44c6c653
- source code layout: move java/server/ to java/ · 355b573b
  Stefano Zacchiroli authored 5 years ago
  
  no longer needed extra indirection
  355b573b
- naming: rename map generation class from Setup to MapBuilder · 47e236ad
  Stefano Zacchiroli authored 5 years ago
  
  47e236ad
- map generation: add % completion and ETA · 94603d2e
  Stefano Zacchiroli authored 5 years ago
  
  94603d2e
- java coding style: remove all tabs in favour of spaces · 17d2cc9f
  Stefano Zacchiroli authored 5 years ago
  
  purely cosemtic, no functional change
  17d2cc9f
- maps generation: implement proper progress logging and use real loggers · 1451788c
  Stefano Zacchiroli authored 5 years ago
  
  1451788c
- map generation: reduce sort memory usage from 66% to 40% of max_ram · 94f07b96
  Stefano Zacchiroli authored 5 years ago
  
  94f07b96
Nov 05, 2019

Setup.java: shell out node2pid map generation to sort · 6d2f04b4

Stefano Zacchiroli authored 5 years ago

This floors the maximum amount of RAM that will be used for this step, avoiding
OOM kills. By relying on GNU sort we use an industry-grade tool for this kind
of stuff, paging to disk as needed.

Closes T1950

6d2f04b4

Makefile: add generic java-* dispatcher target · f246fa85
Stefano Zacchiroli authored 5 years ago

f246fa85

webgraph.py: use shell=True in compression step execution · d6d5ef95

Stefano Zacchiroli authored 5 years ago

this allows to be more flexible in how steps are implemented, which is gonna
come in handy when we change compression format for nodes/edges files

d6d5ef95

rename int2pid/pid2int to node2pid/pid2node on the Python side · ac154593

Stefano Zacchiroli authored 5 years ago

For naming uniformity with the Java side, that uses "node" for integer node IDs
everywhere. Before this change it was really confusing to have commands like
"swh map dump -t int2pid" to generate files like "foo.node2pid.bin".

ac154593

Nov 04, 2019
- pid2int2int2pid: new tool to generate int->PID map from PID->int one · 6f8266a5
  Stefano Zacchiroli authored 5 years ago
  
  6f8266a5
- CLI: add new "dumb" sequential map writer "swh graph map write" · e970329f
  Stefano Zacchiroli authored 5 years ago
  
  this subsumes the previous tools/migrations/ used to migrate from CSV to binary maps
  e970329f
- test data: remove obsolete textual maps · 8b5cd013
  Stefano Zacchiroli authored 5 years ago
  
  8b5cd013
- binary maps: change type IDs on Python side, to be compatible with Java · 55ba50a2
  Stefano Zacchiroli authored 5 years ago
  
  Before this change we had an off-by-1: Java type integer IDs were 0-based, Python ones 1-based. With this change they match and are both 0-based. WARNING: with this change we break backward compatibility for the Python client when reading binary maps that were generated (via a Python hack) before this change. They will need to be regenerated either using the now available Java-based generation of binary maps or by rerunning the Python hack with the new code.
  55ba50a2
- cli.py: document configuration parameter and reorder args · db4a9264
  Stefano Zacchiroli authored 5 years ago
  
  db4a9264
- cli.py: update docstring doc about available compression steps · 78868a54
  Stefano Zacchiroli authored 5 years ago
  
  78868a54