Make the Mercurial loader incremental
Before this change, if a previous snapshot of a given origin existed and new changes were detected, we would start from scratch.
This change leverages the recent new db mapping for external ids (like Mercurial's node ids) to internal SWH ids to compute what has changed from the latest snapshot, now that it is possible to find an SWH id from a Mercurial node id.
For revisions, the logic is simple: look at the heads we've saved and ask Mercurial for all the revisions that are not ancestors of these heads (themselves excluded). This is not as "clever" as the full Mercurial discovery algorithm, but is much simpler and good enough for the kinds of scales we're operating at on a single repository.
For tags, the previous logic assumed that all possible target revisions were done in the same run. Here, we look at the difference between the tags Mercurial reports and the one form the previous snapshot; any new tag will either have its corresponding release in cache (because it was processed in the same run) or fetched from the database using the aforementioned mapping.
Test Plan
Note that I saw that test_identify
tests seem to fail when run with
the rest of the tests, I'm not sure why. So I'm trying it on the CI to
see if it behaves the same.
Migrated from D5687 (view on Phabricator)