- Jul 29, 2021
-
-
Jenkins for Software Heritage authored
Update to upstream version '2.1.0' with Debian dir 0e0206897fd866fbbe27b1b465fb800687815846
-
Antoine R. Dumont authored
For now this hardcodes the version to 1 for either reading or writing instructions. This allows: - store the new hashes with a version (actually no version means version 0). - to keep the old loader mercurial ExtID references in the archives (no need to clean them up as that poses other problems regarding the journal) - in effect unblock the current ingestion/updates of existing origins which already have more than one ExtIDs due to different incompatible versions. The storage implementation does not allow filtering on the extid_version so it's up to the loader to do the filtering. Hence the current implementation.
- Jul 28, 2021
-
-
Antoine R. Dumont authored
Prior to this, depending on the load on jenkins, the test could be flaky and fail for the wrong reason [1] [1] https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/263/console
-
- Jun 16, 2021
-
-
Jenkins for Software Heritage authored
Update to upstream version '2.0.0' with Debian dir 10e7add71e53c3001407c36bdc9c34b852e5b839
-
Antoine Lambert authored
Since rDMODe09446a6f44b9070ea70a9760bc82dee0bbcb687, it must be an iterable of bytes.
- Jun 15, 2021
-
-
Raphaël Gomès authored
As discussed in T3352, the branching mechanism of Mercurial is more featureful than that of Git's. The Snapshot model was not designed with multiple heads, closed heads, bookmarks, etc. in mind, but with only branches being "pointers" to (mostly) revisions. As a workaround for a possible re-design of the Snapshot model (though nothing of the sort is planned for now), we define a mapping that better represents Mercurial's branching system. This allows for handling multiple heads per branch and closed branches, whose revisions (if not already covered by another branch) would previously have been lost to the ether. Additionally, bookmarks are now saved to get a better representation of the projects that do use them.
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
- Jun 03, 2021
-
-
Raphaël Gomès authored
The subprocess call (which should have been using run) uses the -u bash option, which stops the script at the first sight of an unset variable... but also doesn't set the variable expected by the very docstring example. We also reset the environment so that the hg script is not influenced by user configuration, which was a second source of breakage.
-
- May 28, 2021
-
-
Jenkins for Software Heritage authored
Update to upstream version '1.1.0' with Debian dir 40505fd6e1c54a6bcee6c8b7ea2157cf3ff5e964
-
Raphaël Gomès authored
Some branches have multiple heads for a single branch (they can even be closed heads). The SWH data model does not yet handle this, so we fix the issue by asking a more precise question to the repository by including all locals heads that are already stored as revisions. We also test that this resolves the issue where the new loader would always see the additional head as missing.
- May 27, 2021
-
-
Raphaël Gomès authored
The existing util for getting a repo's branches skips closed branches but did not leave any explanation why, either in the code or in the commit message. I cannot think of a good reason for ignoring closed branches, so we're removing this exception, which in turn fixes the incremental issue detailed in T3336. This has affected existing tests of the two repositories that had closed branches. A test for the incremental behavior was added as well.
-
- May 21, 2021
-
-
Raphaël Gomès authored
-
Raphaël Gomès authored
... instead of doing it in bulk at the end.
-
- May 20, 2021
-
-
Nicolas Dandrimont authored
-
Jenkins for Software Heritage authored
Update to upstream version '1.0.0' with Debian dir 716a4513986c9e33d5796aa11f38fd726f1d022d
-
Nicolas Dandrimont authored
This code hasn't used this module for years (last import removed in 2786cd48).
-
- May 19, 2021
-
-
Raphaël Gomès authored
This is the minimal amount of code needed to switch from the old one to the new one. If the new loader proves to be good enough, we may remove the old one entirely.
-
- May 07, 2021
-
-
Raphaël Gomès authored
Before this change, if a previous snapshot of a given origin existed and new changes were detected, we would start from scratch. This change leverages the recent new db mapping for external ids (like Mercurial's node ids) to internal SWH ids to compute what has changed from the latest snapshot, now that it is possible to find an SWH id from a Mercurial node id. For revisions, the logic is simple: look at the heads we've saved and ask Mercurial for all the revisions that are not ancestors of these heads (themselves excluded). This is not as "clever" as the full Mercurial discovery algorithm, but is much simpler and good enough for the kinds of scales we're operating at on a single repository. For tags, the previous logic assumed that all possible target revisions were done in the same run. Here, we look at the difference between the tags Mercurial reports and the one form the previous snapshot; any new tag will either have its corresponding release in cache (because it was processed in the same run) or fetched from the database using the aforementioned mapping.
-
Raphaël Gomès authored
Simply initializing a loader would empty the environment, which can cause seemingly unrelated things to break. Moving the environment handling to the `pre_cleanup` phase ensures that `cleanup` will also be called and the environment will not be left in a broken state. We also add the `HGRCSKIPREPO` variable that I forgot to add in the test environment. This is still needed because the tests invoke `hg` directly. We could potentially have a wrapper util that uses a context-manager to do the environment manipulation closer to the issue, but we'd have to make sure that no other bare `hg` invocations can happen, even in random subprocesses.
-
- Apr 30, 2021
-
-
Raphaël Gomès authored
Some corrupted repos have missing files or broken logical links in the underlying Mercurial datastructure, which means that say sometimes fail for a given revision. This does not mean we should throw away the rest of the repository. (Tested on repos of various levels and flavors of corruption in the Boatbucket archive)
-
Raphaël Gomès authored
`HGRCPATH` only tells Mercurial to ignore the user's config files, but some repositories have a `.hg/hgrc` file (only in the case that you copy the files instead of cloning, if present) that is usually used for server-side configuration. We want to ignore this, since it might affect loading and ask for hooks that are not there or are otherwise annoying/dangerous, for example.
-
Raphaël Gomès authored
The old loader (bundle2 loader) already received this treatment which ensures Mercurial doesn't pick up on any user customization, but I apparently forgot to apply the same changes to the new one.
-
Raphaël Gomès authored
This circumvents a few celery-related issues, and is consistent with what the rest of the codebase does. stdlib multiprocessing is not able to spawn children from daemonic processes, and even says so plainly if you try: `AssertionError: daemonic processes are not allowed to have children` This is incompatible with the SWH infrastructure which needs to do this exactly. Fortunately, we're already using billiard and celery. I'm assuming that there could be other blocking or annoying differences between stdlib and billiard, but we will save ourselves the trouble of finding out.
-
- Apr 29, 2021
-
-
Jenkins for Software Heritage authored
Update to upstream version '0.5.0' with Debian dir 104045b9a285c2f83dd03bd15da8f6b8bcdef615
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 13, 2021
-
-
vlorentz authored
-
- Apr 06, 2021
-
-
vlorentz authored
We want ro remove Revision.metadata. This aligns HgBundle20Loader's behavior on HgLoaderFromDisk's.
-
vlorentz authored
It already writes it with raw_extrinsic_metadata_add/extid_add, and read it with extid_get_*. This code was only kept for compatibility while we were migrating the extids. This is now done, so this code is useless.
-
- Mar 30, 2021
-
-
vlorentz authored
SWHID has a specific meaning defined in https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html (in short, the sha1_git is only part of a SWHID), but this variables store only the hash part of the SWHIDs.
-
vlorentz authored
-