Skip to content

Add a script to generate a topological sort

This will (probably) be used to compute the set of all contributors to an origin (swh-dataset#4695 (moved)) by propagating them through revision history.

This uses the order induced by a DFS, so that most revisions will be processed right after their unique ancestor (or at least one of them) to maximize cache hits when propogating the author set.

Additionally, this includes a count of the number of successors of each revision, so it can be recorded and decremented on every cache hit, allowing to discard a revision's author set from the cache once all its successors are processed.

This is implemented as a single-thread algorithm because:

  1. it's simpler
  2. my attempts multi-threading resulted in worse performance, probably because most of the time is spent locking the head of the stack (or the head and/or tail of a queue, as I tried a BFS too)

Test Plan

Works on teaser dataset and the full graph (takes ~35 hours)


Migrated from D8883 (view on Phabricator)

Merge request reports