- May 21, 2021
-
-
Raphaël Gomès authored
-
Raphaël Gomès authored
... instead of doing it in bulk at the end.
-
- May 20, 2021
-
-
Nicolas Dandrimont authored
This code hasn't used this module for years (last import removed in 2786cd48).
-
- May 19, 2021
-
-
Raphaël Gomès authored
This is the minimal amount of code needed to switch from the old one to the new one. If the new loader proves to be good enough, we may remove the old one entirely.
-
- May 07, 2021
-
-
Raphaël Gomès authored
Before this change, if a previous snapshot of a given origin existed and new changes were detected, we would start from scratch. This change leverages the recent new db mapping for external ids (like Mercurial's node ids) to internal SWH ids to compute what has changed from the latest snapshot, now that it is possible to find an SWH id from a Mercurial node id. For revisions, the logic is simple: look at the heads we've saved and ask Mercurial for all the revisions that are not ancestors of these heads (themselves excluded). This is not as "clever" as the full Mercurial discovery algorithm, but is much simpler and good enough for the kinds of scales we're operating at on a single repository. For tags, the previous logic assumed that all possible target revisions were done in the same run. Here, we look at the difference between the tags Mercurial reports and the one form the previous snapshot; any new tag will either have its corresponding release in cache (because it was processed in the same run) or fetched from the database using the aforementioned mapping.
-
Raphaël Gomès authored
Simply initializing a loader would empty the environment, which can cause seemingly unrelated things to break. Moving the environment handling to the `pre_cleanup` phase ensures that `cleanup` will also be called and the environment will not be left in a broken state. We also add the `HGRCSKIPREPO` variable that I forgot to add in the test environment. This is still needed because the tests invoke `hg` directly. We could potentially have a wrapper util that uses a context-manager to do the environment manipulation closer to the issue, but we'd have to make sure that no other bare `hg` invocations can happen, even in random subprocesses.
-
- Apr 30, 2021
-
-
Raphaël Gomès authored
Some corrupted repos have missing files or broken logical links in the underlying Mercurial datastructure, which means that say sometimes fail for a given revision. This does not mean we should throw away the rest of the repository. (Tested on repos of various levels and flavors of corruption in the Boatbucket archive)
-
Raphaël Gomès authored
`HGRCPATH` only tells Mercurial to ignore the user's config files, but some repositories have a `.hg/hgrc` file (only in the case that you copy the files instead of cloning, if present) that is usually used for server-side configuration. We want to ignore this, since it might affect loading and ask for hooks that are not there or are otherwise annoying/dangerous, for example.
-
Raphaël Gomès authored
The old loader (bundle2 loader) already received this treatment which ensures Mercurial doesn't pick up on any user customization, but I apparently forgot to apply the same changes to the new one.
-
Raphaël Gomès authored
This circumvents a few celery-related issues, and is consistent with what the rest of the codebase does. stdlib multiprocessing is not able to spawn children from daemonic processes, and even says so plainly if you try: `AssertionError: daemonic processes are not allowed to have children` This is incompatible with the SWH infrastructure which needs to do this exactly. Fortunately, we're already using billiard and celery. I'm assuming that there could be other blocking or annoying differences between stdlib and billiard, but we will save ourselves the trouble of finding out.
-
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 13, 2021
-
-
vlorentz authored
-
- Apr 06, 2021
-
-
vlorentz authored
We want ro remove Revision.metadata. This aligns HgBundle20Loader's behavior on HgLoaderFromDisk's.
-
vlorentz authored
It already writes it with raw_extrinsic_metadata_add/extid_add, and read it with extid_get_*. This code was only kept for compatibility while we were migrating the extids. This is now done, so this code is useless.
-
- Mar 30, 2021
-
-
vlorentz authored
SWHID has a specific meaning defined in https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html (in short, the sha1_git is only part of a SWHID), but this variables store only the hash part of the SWHIDs.
-
vlorentz authored
-
vlorentz authored
For now, ExtIDs are used in addition to revision metadata. But in the near future, we want to migrate nodeids from revision metadata to the ExtID storage, and drop all revision metadata.
-
vlorentz authored
-
vlorentz authored
This is a minor performance optimization, removing items from the call to revision_get when we know their result will be None. Motivation: A future commit will refactor this function, and dealing only with revision ids makes it simpler.
-
- Mar 29, 2021
-
-
vlorentz authored
Instead of using hashutil.MultiHash directly and converting to model.Content.
-
- Mar 05, 2021
- Feb 26, 2021
-
-
Summary: When a repository has corrupted revision, the revision and its descendants are not loaded. This commit only deals with missing `filelogs` and configures the exclusion system. Missing `filelogs` are recoverable errors that should be skipped or saved as `SkippedContent` but being missing the `SkippedContent` cannot be calculated. This point is left for future commits. Reviewers: #reviewers Subscribers: ardumont, Alphare, vlorentz Differential Revision: https://forge.softwareheritage.org/D4688
-
Raphaël Gomès authored
This generalizes the work done in ef3a2ba7 to (supposedly) all places invoking Mercurial. In short, this limits the environment to the smallest subset needed (i.e. `$PATH`) and uses the Mercurial-specific variables to disable user customizations and configs.
-
Antoine R. Dumont authored
-
Summary: By looking at the previous snapshot heads, loading of an unchanged repository will be uneventful. Reviewers: #reviewers, douardda Reviewed By: #reviewers, douardda Subscribers: douardda Differential Revision: https://forge.softwareheritage.org/D4643
-
Raphaël Gomès authored
By default, Mercurial updates to the default revision after cloning, which - while probably good UX - is wasteful in the context of automated archival and server-side operations.
-
- Feb 25, 2021
-
-
Antoine R. Dumont authored
This module is: - not referenced in the setup.py (so no simple cli call) - not tested - using an in-memory storage - no longer the canonical way of triggering a mercurial ingestion Note that this also fixes the visit date cli input. When user provided, this date should be parsed as a datetime prior to being passed to the loader constructor.
-
- Feb 23, 2021
-
-
Raphaël Gomès authored
There is currently no end-to-end test to catch this regression. I'm not certain whether there is something in place to write such tests, but this is already better to have a working cli.
-
Raphaël Gomès authored
This change adds two environment variables that have been supported for 10+ years by Mercurial to make its output predictable for use in scripts. This was already done by a previous patch in tests, but it is also (even more?) useful here.
-
Raphaël Gomès authored
We don't want the user's environment to affect `hg`'s behavior. The bare minimum is the `PATH`. In the next patch, we add the Mercurial-specific variables to ensure a "vanilla" behavior.
-
Raphaël Gomès authored
The next patch will add an import and `isort` was complaining. It appears that this file hasn't been changed since the `isort` change.
-
Raphaël Gomès authored
Tests can break when ran in user environments if the output is customized, either by config options (like aliases) or extensions.
-
- Feb 17, 2021
-
-
Antoine R. Dumont authored
Note that this also updated some docstrings and types along the way. Related to T1410
-
- Feb 15, 2021
-
-
Antoine R. Dumont authored
This avoids failing visits for the wrong comparison check [1] [1] https://sentry.softwareheritage.org/share/issue/27017710a5ec49f991910a780d38d4ab/
-
Vincent Sellier authored
Related to T3030
-
- Feb 09, 2021
-
-
Vincent Sellier authored
Related to T3030
-
- Feb 03, 2021
-
-
Antoine R. Dumont authored
This should unstuck the debian build which complains about those not being registered.
-
- Feb 01, 2021
-
-
Antoine R. Dumont authored
-
- Dec 01, 2020
-
-
Antoine Cezar authored
By looking at differences between revisions, the repository tree is updated rather that fully rebuild for each one. Observed load time improvement on https://www.mercurial-scm.org/repo/hg/ 1:11:02 -> 47:58
-