-
-
v0.2.10a39c45f · ·
v0.2.1 * fat JAR: trim size down from 100 to 40 MB, by excluding mg4j * webgraph.py: improve logging by adding explicit start/end timings * app.py: fix wrong invocation of node_of_pid in /walk * REST server: preliminary support for HEAD requests * REST server: validate all query parameters and refactor validators * REST server: add validation for PID parameters * REST API doc: document /count method variants * REST API doc: update to match current aiohttp implementation * app.py: inline MIME types, they are single use anyway * REST server: set content-type to text or ndjson where appropriate * CLI: add "swh graph map lookup" to lookup values in binary maps * backend.py: log which JAR is being used and warn if multiple ones exist * doc: stop building javadoc for now, as we do not ship it anyway * doc: integrate git2graph doc into top-level doc and toc * git2graph doc: explain why only HEAD is supported among symbolic refs * git2graph: update doc and benchmark to use zstd * javadoc: fix docstring syntax error in NodeTypesMap * Makefile: add convenience target "java-doc" * doc: update docker-related documentation and scripts
-
v0.2.048f1802d · ·
v0.2.0 * webgraph.py: use named pipes to read zst decompression output * Setup.java: remove unused import * switch compression pipeline from gzip to zstd * webgraph.py: make sure {logback} is always interpolated * map generation: further logging tuning * map generation: distinguish log lines by map type * map generation: reduce sort buffer size (by *cough* 1024x *cough*) * source code layout: move java/server/ to java/ * naming: rename map generation class from Setup to MapBuilder * map generation: add % completion and ETA * java coding style: remove all tabs in favour of spaces * maps generation: implement proper progress logging and use real loggers * map generation: reduce sort memory usage from 66% to 40% of max_ram * Setup.java: shell out node2pid map generation to sort * Makefile: add generic java-* dispatcher target * webgraph.py: use shell=True in compression step execution * rename int2pid/pid2int to node2pid/pid2node on the Python side * pid2int2int2pid: new tool to generate int->PID map from PID->int one * CLI: add new "dumb" sequential map writer "swh graph map write" * test data: remove obsolete textual maps * binary maps: change type IDs on Python side, to be compatible with Java * cli.py: document configuration parameter and reorder args * cli.py: update docstring doc about available compression steps * switch Java map generation from CSV to binary format * test_cli.py: reduce memory requirements * cli.py: add support for --config-file and compression configuration * tox: more robust detection of swh-graph JAR * tests: Move dataset folder to fix test_api_client hang * remove cruft dir java/**/t/, committed by mistake * temporarily disable test_api_client.py due to T2055 * tox.ini: fix pytest ImportMismatchError * makefile: add convenience clean-java target * mypy.ini: ignore psutil, due to missing stubs * test_cli.py: new test for CLI end-to-end compression * webgraph.py: autoatically generate mappings at the end of compression * java mapping Setup: make Usage message more standard-looking * CLI: add one-stop shop compression "swh graph compress ...." * java deps: bump fastutil dep to 8.3.0 * swh graph CLI: fix bogus "swh graph graph" nesting * server/app.py: make mypy pass again * server/app.py: fix flake8 spacing * java build: include LAW via Maven * java: benchmark: add ForkCC class * tests/pid: test iter_prefix * backend: fix default stack and heap size * java: Adapt BFS benchmark for a simple visit * java: Move GenDistribution in benchmark/ * Add single-thread BFS benchmark * ParallelBFS benchmark: add support to optionally use transposed graph * benchmark.ParallelBFS: new class to time parallel BFS on the full graph * Graph: add getBVGraph getter for underlying BVgraph(s) * java: add GenDistribution.java directly in the tree * java: Traversal: add a NbNodesAccessed method * java: public nodeTypesMap for iteration * Graph: comment out useless loading of nodeIdMap * java: Entry: add lightweight copies in count methods * app: add 3.6 compatibility * backend: add hardcoded java_opts * java: traversal: use HashSet to reduce initialization time * Enable iteration on the graph * git2graph: fix snapshot ID computation, now compatible with swh identify * git2graph: remove left over, comment out debug prints
-
v0.1.09657658b · ·
v0.1.0 * git2graph: add support for origins & snapshots * git2graph: sanitize struct names (refactoring) * Add count() methods * graph: cosmetic fixes and comments * backend: propagate java errors to the toplevel * graph.py: add 'deep' methods * Add low level API (WIP) * conftest: remove unused variable * java/graph: docs: fix trailing commas * java/graph: fix javadoc errors * git2graph: add support for (explicitly passed) origin nodes * git2graph: add back node output support, with simpler/saner semantics * git2graph: update benchmark figures in README * git2graph: drop node filtering and output, it has no sane semantics * git2graph: implement user-customizable graph filtering * git2graph: add real CLI parsing and make both nodes/edges files optional * git2graph: add test suite * git2graph: add (static) support for filtering desired nodes & edges * git2graph Makefile: factor out definition of wanted libs * pom.xml: reindent to 2-space per TAB (Maven convention) * java toolchain: sanitize (fat) jar naming to swh-graph-X.Y.Z.jar * Dockerfile: tigthening, joining RUN runs together and rm temp stuff
-
v0.0.33619d6d3 · ·
v0.0.3 * mypy: ignore py4j (does not have stubs) * Merge branch 'aiohttp_server' * api server: add links to API doc to index * api client: handle stream decoding in swh.graph * cli: fix rpc-serve default port * git2graph: switch to a macro for libgit->swh type conversion * git2graph Makefile: be more strict (-Werror) and pass cflags to ld too * git2graph: gitignore gprof report file * server: move serve command to cli.py * server: refactor handler proxying to backed * server: refactor some constants * server: proper JAR deployment method * git2graph: make sure it can be used concurrently and document how * add migration tools from old CSV maps to new binary ones * git2graph: new tool to crawl a git repo and dump it as a graph * tox: anticipate mypy run to just after flake8 * java: remove useless StreamTraversal.java * Update tests to work with the new Python server * API client: update to use the new streaming lines format * Server API: reorganize app/main * Server API: handle visits of paths * Move example dataset in common tests/dataset directory, add binary maps * Reimplement REST API in Python with Py4J + aiohttp * Create the aiohttp server * wip: streaming interface python <-> java * init.py: switch to documented way of extending path * Dockerfile: use ARG for webgraph/law version numbers * Dockerfile: bump law version to currently avaiable upstream (2.6.0) * Dockerfile: fix path to logging configuration file * MANIFEST.in: ship py.typed * swh.graph.pid: add type annotations * typing: minimal changes to make a no-op mypy run pass * tox.ini: remove undeclared check-manifest environment * pid.py: use a dict for more idiomatic file mode check * test_pid.py: fix alphabetic ordering of node types * cli.py: avoid importing unused PID_BIN_SIZE constant * pid.py: avoid importing unused mmap constants * CLI: make restore of int->pid maps use mmap writing instead of seek * pid maps: add limited support for updatable maps * Graph.java: implement a flyweight copy() method * requirements.txt: add missing dep on aiohttp * int->pid map restore: support arbitrarily ordered inputs * binary (de)serialiazer for more compact PID<->int maps * integrate cli with swh.core.cli * docs: fix toc * fix indentation in test code too * fix @author in Java files to use team name * cosmetic: reindent Java code to match coding style * reports: benchmarks: add unit in tables * reports: benchmarks: add machine specs * reports: add benchmarks
-