Make the Mercurial loader incremental (!69) · Merge requests · Platform / Development / swh-loader-mercurial

Raphaël Gomès requested to merge generated-differential-D5687-source into generated-differential-D5687-target May 05, 2021

Before this change, if a previous snapshot of a given origin existed and new changes were detected, we would start from scratch.

This change leverages the recent new db mapping for external ids (like Mercurial's node ids) to internal SWH ids to compute what has changed from the latest snapshot, now that it is possible to find an SWH id from a Mercurial node id.

For revisions, the logic is simple: look at the heads we've saved and ask Mercurial for all the revisions that are not ancestors of these heads (themselves excluded). This is not as "clever" as the full Mercurial discovery algorithm, but is much simpler and good enough for the kinds of scales we're operating at on a single repository.

For tags, the previous logic assumed that all possible target revisions were done in the same run. Here, we look at the difference between the tags Mercurial reports and the one form the previous snapshot; any new tag will either have its corresponding release in cache (because it was processed in the same run) or fetched from the database using the aforementioned mapping.

Test Plan

Note that I saw that test_identify tests seem to fail when run with the rest of the tests, I'm not sure why. So I'm trying it on the CI to see if it behaves the same.

Migrated from D5687 (view on Phabricator)

Make the Mercurial loader incremental

Test Plan

Merge request reports