- Jul 28, 2021
-
-
Antoine R. Dumont authored
Prior to this, depending on the load on jenkins, the test could be flaky and fail for the wrong reason [1] [1] https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/263/console
-
- Jun 16, 2021
-
-
Antoine Lambert authored
Since rDMODe09446a6f44b9070ea70a9760bc82dee0bbcb687, it must be an iterable of bytes.
-
- Jun 15, 2021
-
-
Raphaël Gomès authored
As discussed in T3352, the branching mechanism of Mercurial is more featureful than that of Git's. The Snapshot model was not designed with multiple heads, closed heads, bookmarks, etc. in mind, but with only branches being "pointers" to (mostly) revisions. As a workaround for a possible re-design of the Snapshot model (though nothing of the sort is planned for now), we define a mapping that better represents Mercurial's branching system. This allows for handling multiple heads per branch and closed branches, whose revisions (if not already covered by another branch) would previously have been lost to the ether. Additionally, bookmarks are now saved to get a better representation of the projects that do use them.
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
- Jun 03, 2021
-
-
Raphaël Gomès authored
The subprocess call (which should have been using run) uses the -u bash option, which stops the script at the first sight of an unset variable... but also doesn't set the variable expected by the very docstring example. We also reset the environment so that the hg script is not influenced by user configuration, which was a second source of breakage.
-
- May 28, 2021
-
-
Raphaël Gomès authored
Some branches have multiple heads for a single branch (they can even be closed heads). The SWH data model does not yet handle this, so we fix the issue by asking a more precise question to the repository by including all locals heads that are already stored as revisions. We also test that this resolves the issue where the new loader would always see the additional head as missing.
-
- May 27, 2021
-
-
Raphaël Gomès authored
The existing util for getting a repo's branches skips closed branches but did not leave any explanation why, either in the code or in the commit message. I cannot think of a good reason for ignoring closed branches, so we're removing this exception, which in turn fixes the incremental issue detailed in T3336. This has affected existing tests of the two repositories that had closed branches. A test for the incremental behavior was added as well.
-
- May 21, 2021
-
-
Raphaël Gomès authored
-
Raphaël Gomès authored
... instead of doing it in bulk at the end.
-
- May 20, 2021
-
-
Nicolas Dandrimont authored
This code hasn't used this module for years (last import removed in 2786cd48).
-
- May 19, 2021
-
-
Raphaël Gomès authored
This is the minimal amount of code needed to switch from the old one to the new one. If the new loader proves to be good enough, we may remove the old one entirely.
-
- May 07, 2021
-
-
Raphaël Gomès authored
Before this change, if a previous snapshot of a given origin existed and new changes were detected, we would start from scratch. This change leverages the recent new db mapping for external ids (like Mercurial's node ids) to internal SWH ids to compute what has changed from the latest snapshot, now that it is possible to find an SWH id from a Mercurial node id. For revisions, the logic is simple: look at the heads we've saved and ask Mercurial for all the revisions that are not ancestors of these heads (themselves excluded). This is not as "clever" as the full Mercurial discovery algorithm, but is much simpler and good enough for the kinds of scales we're operating at on a single repository. For tags, the previous logic assumed that all possible target revisions were done in the same run. Here, we look at the difference between the tags Mercurial reports and the one form the previous snapshot; any new tag will either have its corresponding release in cache (because it was processed in the same run) or fetched from the database using the aforementioned mapping.
-
Raphaël Gomès authored
Simply initializing a loader would empty the environment, which can cause seemingly unrelated things to break. Moving the environment handling to the `pre_cleanup` phase ensures that `cleanup` will also be called and the environment will not be left in a broken state. We also add the `HGRCSKIPREPO` variable that I forgot to add in the test environment. This is still needed because the tests invoke `hg` directly. We could potentially have a wrapper util that uses a context-manager to do the environment manipulation closer to the issue, but we'd have to make sure that no other bare `hg` invocations can happen, even in random subprocesses.
-
- Apr 30, 2021
-
-
Raphaël Gomès authored
Some corrupted repos have missing files or broken logical links in the underlying Mercurial datastructure, which means that say sometimes fail for a given revision. This does not mean we should throw away the rest of the repository. (Tested on repos of various levels and flavors of corruption in the Boatbucket archive)
-
Raphaël Gomès authored
`HGRCPATH` only tells Mercurial to ignore the user's config files, but some repositories have a `.hg/hgrc` file (only in the case that you copy the files instead of cloning, if present) that is usually used for server-side configuration. We want to ignore this, since it might affect loading and ask for hooks that are not there or are otherwise annoying/dangerous, for example.
-
Raphaël Gomès authored
The old loader (bundle2 loader) already received this treatment which ensures Mercurial doesn't pick up on any user customization, but I apparently forgot to apply the same changes to the new one.
-
Raphaël Gomès authored
This circumvents a few celery-related issues, and is consistent with what the rest of the codebase does. stdlib multiprocessing is not able to spawn children from daemonic processes, and even says so plainly if you try: `AssertionError: daemonic processes are not allowed to have children` This is incompatible with the SWH infrastructure which needs to do this exactly. Fortunately, we're already using billiard and celery. I'm assuming that there could be other blocking or annoying differences between stdlib and billiard, but we will save ourselves the trouble of finding out.
-
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 13, 2021
-
-
vlorentz authored
-
- Apr 06, 2021
-
-
vlorentz authored
We want ro remove Revision.metadata. This aligns HgBundle20Loader's behavior on HgLoaderFromDisk's.
-
vlorentz authored
It already writes it with raw_extrinsic_metadata_add/extid_add, and read it with extid_get_*. This code was only kept for compatibility while we were migrating the extids. This is now done, so this code is useless.
-
- Mar 30, 2021
-
-
vlorentz authored
SWHID has a specific meaning defined in https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html (in short, the sha1_git is only part of a SWHID), but this variables store only the hash part of the SWHIDs.
-
vlorentz authored
-
vlorentz authored
For now, ExtIDs are used in addition to revision metadata. But in the near future, we want to migrate nodeids from revision metadata to the ExtID storage, and drop all revision metadata.
-
vlorentz authored
-
vlorentz authored
This is a minor performance optimization, removing items from the call to revision_get when we know their result will be None. Motivation: A future commit will refactor this function, and dealing only with revision ids makes it simpler.
-
- Mar 29, 2021
-
-
vlorentz authored
Instead of using hashutil.MultiHash directly and converting to model.Content.
-
- Mar 05, 2021
- Feb 26, 2021
-
-
Summary: When a repository has corrupted revision, the revision and its descendants are not loaded. This commit only deals with missing `filelogs` and configures the exclusion system. Missing `filelogs` are recoverable errors that should be skipped or saved as `SkippedContent` but being missing the `SkippedContent` cannot be calculated. This point is left for future commits. Reviewers: #reviewers Subscribers: ardumont, Alphare, vlorentz Differential Revision: https://forge.softwareheritage.org/D4688
-
Raphaël Gomès authored
This generalizes the work done in ef3a2ba7 to (supposedly) all places invoking Mercurial. In short, this limits the environment to the smallest subset needed (i.e. `$PATH`) and uses the Mercurial-specific variables to disable user customizations and configs.
-
Antoine R. Dumont authored
-
Summary: By looking at the previous snapshot heads, loading of an unchanged repository will be uneventful. Reviewers: #reviewers, douardda Reviewed By: #reviewers, douardda Subscribers: douardda Differential Revision: https://forge.softwareheritage.org/D4643
-
Raphaël Gomès authored
By default, Mercurial updates to the default revision after cloning, which - while probably good UX - is wasteful in the context of automated archival and server-side operations.
-
- Feb 25, 2021
-
-
Antoine R. Dumont authored
This module is: - not referenced in the setup.py (so no simple cli call) - not tested - using an in-memory storage - no longer the canonical way of triggering a mercurial ingestion Note that this also fixes the visit date cli input. When user provided, this date should be parsed as a datetime prior to being passed to the loader constructor.
-
- Feb 23, 2021
-
-
Raphaël Gomès authored
There is currently no end-to-end test to catch this regression. I'm not certain whether there is something in place to write such tests, but this is already better to have a working cli.
-
Raphaël Gomès authored
This change adds two environment variables that have been supported for 10+ years by Mercurial to make its output predictable for use in scripts. This was already done by a previous patch in tests, but it is also (even more?) useful here.
-
Raphaël Gomès authored
We don't want the user's environment to affect `hg`'s behavior. The bare minimum is the `PATH`. In the next patch, we add the Mercurial-specific variables to ensure a "vanilla" behavior.
-
Raphaël Gomès authored
The next patch will add an import and `isort` was complaining. It appears that this file hasn't been changed since the `isort` change.
-
Raphaël Gomès authored
Tests can break when ran in user environments if the output is customized, either by config options (like aliases) or extensions.
-