- Nov 08, 2023
-
-
Antoine Lambert authored
It might exist cases where multiple versions of a package target the same release object. For instance a rpm package has one specific version for each distribution release but they can target the same intrinsic version and source package contents are exactly the same. So avoid downloading and processing a package version if the corresponding extid has already been encountered during the current loading by maintaining a mapping between extids and release swhids.
-
- Oct 04, 2023
-
-
Antoine Lambert authored
Before computing the nar hash in the fetch_data method, detect if fetched artifact is coming from a VCS (git, hg or svn) by checking the visit type of the loader and set vcs_type parameter of Nar constructor accordingly. Related to swh/devel/swh-loader-git#4751.
-
Antoine Lambert authored
When computing the recursive nar hash of a directory fetched from a VCS like git or svn, only special directories related to the used VCS (.git or .svn) should be excluded. For instance when a directory was fetched from git, only .git folders should be excluded. Related to swh/devel/swh-loader-git#4751.
-
- Sep 28, 2023
-
-
- Sep 25, 2023
-
-
Antoine Lambert authored
Previously package versions were sorted according to packages dict keys but this is not reliable as older versions can be sorted after newer ones. Prefer to sort package versions according to their build time then as it produces a correct ordering and ensure the HEAD branch alias will target the most recent version of a package.
-
- Sep 19, 2023
-
-
Raphaël Gomès authored
The initial implementation of the discovery algorithm was incorrectly done in this package, it has now been refactored in the appropriate places, so we can just use those.
-
- Sep 18, 2023
-
-
Antoine Lambert authored
Return 1 as exit code when the "swh loader run" command failed, i.e. when the visit status associated to the performed loading is different from "full". It enables to check if a loading failed in scripts calling that command.
-
Antoine Lambert authored
It can be useful in tests or to run the directory loader on a local tarball.
-
Antoine Lambert authored
Case sensitivity is important for that representation as it can lead to invalid decodings otherwise.
-
Antoine Lambert authored
Previously only top level VCS directories were excluded but that behavior does not match the one from the "guix hash -x -S nar" command who recursively excludes those directories when computing a nar hash. So ensure to have the same behavior to avoid hash mismatch issues when using a directory loader. Related to swh/devel/swh-loader-git#4751.
-
vlorentz authored
-
- Sep 14, 2023
-
- Aug 28, 2023
-
-
Antoine Lambert authored
Do no attempt to parse version with the packaging.version module as rpm version format do not match Python package one and thus numerous parsing were failing. Use package intrinsic version in the message for each produced release in order for the loader to not create different releases targeting the same directory. Adapt input data sent by the RPM lister to its latest changes, notably in tests. Related to swh/meta#5011.
-
- Aug 08, 2023
-
-
Antoine R. Dumont authored
The current implementation uses the default filtering which accepts all folders. It's going to be needed at least for the git loader which has to ignore the root .git folder and empty ones. Refs. swh/meta#3781
-
- Jul 25, 2023
-
-
Antoine Lambert authored
The python requests library automatically deflate downloaded content bytes if the response header content-encoding is set to a supported encoding. However some HTTP servers can serve a tarball with content-type set to application/x-gzip and content-encoding set to gzip which is wrong as tarball is uncompressed while downloading it. That behavior can make a file checksum check after download fail as the expected checksum was computed on the compressed version of the file, not the uncompressed one. So ensure to prevent automatic deflate by reading response raw content instead of using the iter_content method when the content-type and content-encoding headers are both set to gzip format.
-
- Jul 10, 2023
-
-
Antoine Lambert authored
The get_enclosed_fields method does not return fields with no value so we need to use the dictionary get method to avoid raising KeyError and fix the loading of packages with missing metadata.
-
- Jul 04, 2023
-
-
Antoine Lambert authored
Instead of calling opam for each package metadata fields to extract, get all these fields using a single opam call.
-
Antoine Lambert authored
Use subprocess.run instead of subprocess.call and subprocess.Popen and check opam command return codes to catch any possible issues.
-
Antoine Lambert authored
When using the loader not in production context (in docker for instance), the opam root folder is usually not present so ensure to initialize it or update its repositories otherwise the loading of packages will fail.
-
Antoine Lambert authored
opam 2.1 stores packages metadata in a tarball instead of having its content unpacked as with previous opam versions, so the workaround of walking the inner directories of the opam root to get all versions of a package is no longer working with opam 2.1. So use proper way to get all versions of a package by calling the following command: $ opam show --root <opam_root> <package> --field all-versions The previous issue reported in TODO comment was due to the fact that new opam repositories were not added in the default opam switch, this is now fixed and packages metadata from all repositories can be displayed using the opam show command.
-
Antoine Lambert authored
Opam 2.1 removed some CLI options and our tests data is in opam 2.0 format so we need to force their upgrade to 2.1 format.
-
- Jun 21, 2023
-
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
The default behavior of subprocess is to pull executables from a hardcoded list, which doesn't work when opam is installed manually in the user's home directory.
-
Nicolas Dandrimont authored
-
- Jun 08, 2023
-
-
Antoine R. Dumont authored
As it's already running in production, the 'tar' visit type is now immutable. So we cannot change anything related to it. So, instead rename it as before and adapt the check utility functions to allow those checks to be bypassed for some specific cases (like the ArchiveLoader). New loaders should fix their assertion failure immediately though. Refs. swh/infra/sysadm-environment#4906
-
Co-Authored: Antoine Lambert <antoine.lambert@inria.fr>
-
Antoine R. Dumont authored
This: - Adds a generic test on all tasks modules of the package to ensure tasks and lister visit type are ok so scheduling can happen. - Opens an utility function so we can use it on other loader module - fixes the archive loader which was misnamed and adapts accordingly its task and its loader's visit types. This now needs a new deployment and a migration script on the scheduler db for that loader. Refs. swh/infra/sysadm-environment#4906
-
Antoine R. Dumont authored
If there is, that will result in tasks not being scheduled when asked to. So now, this test will specifically catch such error. Some equivalent tests should be declared for all "tasks" modules in various swh packages. This detected the current discrepancy for the TarballDirectoryLoader. Refs. swh/infra/sysadm-environment#4906
-
- Jun 06, 2023
-
-
That should allow centralized configuration for loaders without convolutions in the sysadm dimension. That is generically adding options that may not be understood by all loaders without making them fail. The ones supporting those options will use it. The others not supporting those will just not use it. This is an alternative implementation of loosening the base loader constructor to accept kwargs which could pose side-effecty problems. Refs. swh/infra/ci-cd/swh-charts!62
-
- Jun 01, 2023
-
-
Jérémy Bobbio (Lunar) authored
The recent refactoring done in abe741a9, then used in swh-loader-svn@72dfc411, introduced build issues with the documentation: …/swh/loader/svn/directory.py:docstring of swh.loader.svn.directory.SvnDirectoryLoader.snapshot:1: WARNING: more than one target found for cross-reference 'Snapshot': swh.fuse.fs.artifact.Snapshot, swh.model.model.Snapshot …/swh/loader/svn/directory.py:docstring of swh.loader.svn.directory.SvnDirectoryLoader.cnts:1: WARNING: more than one target found for cross-reference 'Content': swh.fuse.fs.artifact.Content, swh.model.from_disk.Content, swh.model.model.Content …/swh/loader/svn/directory.py:docstring of swh.loader.svn.directory.SvnDirectoryLoader.dirs:1: WARNING: more than one target found for cross-reference 'Directory': swh.fuse.fs.artifact.Directory, swh.model.from_disk.Directory, swh.model.model.Directory vlorentz explained the cause: > SvnDirectoryLoader inherits from BaseDirectoryLoader which inherits > from NodeLoader, which defines: > > self.snapshot: Optional[Snapshot] = None > > and it loses the annotation's value (only keeps its string > representation) because of the inheritence: > https://github.com/sphinx-doc/sphinx/issues/10124 In order to fix this, we now use a qualified type reference in the initializers of NodeLoader and BaseDirectoryLoader.
-
- May 31, 2023
-
-
Antoine R. Dumont authored
The current and default snapshot built is fine for the TarballDirectoryLoader. We may have to improve the snapshot for directory coming from VCS (where they can be associated to "tag" or something else as well). Refs. swh/meta#4979
-
Antoine R. Dumont authored
This deduplicates current duplicated function in loader git and svn tests. Refs. swh/meta#4979
-
- May 30, 2023
-
-
Antoine R. Dumont authored
Refs. swh/meta#4979
-
Antoine R. Dumont authored
This moves the path mangling in tarball directory loader. This is the root cause of an issue in the loader git directory's current implementation. Refs. swh/meta#4979
-
- May 25, 2023
-
-
Antoine R. Dumont authored
Refs. swh/meta#4979
-
Antoine R. Dumont authored
The former is the base class and the latter is one specific implementation for tarball. Previously it was a single class. This explicits the current use for tarball directories. This will also allow to declare other directory loader implementations to deal with other origin types (vcs 'tree' as git, svn, hg). For this, developers need to provide a new class implementation per directory types. This class needs to inherit from BaseDirectoryLoader, and provide only the `fetch_directory` method. Now the BaseDirectoryLoader class is an abstract class and TarballDirectoryLoader is the first implementation in charge of ingesting 'directory' coming from a tarball. Further MRs will be opened to deal with Directory coming from Git, Hg or Svn in their respective loader package. Refs. swh/meta#4979
-
- May 15, 2023
-
-
vlorentz authored
"Changed in version 3.11: The population must be a sequence. Automatic conversion of sets to lists is no longer supported." -- https://docs.python.org/3/library/random.html#random.sample
-
- May 05, 2023
- Apr 26, 2023
-
-
Antoine R. Dumont authored
NodeLoader as in {Content|Directory}Loader. Those can directly ingest respectively a (remote) file (as Content) or a (remote) tarball artifact (as Directory). They are currently listed by the nixguix lister. Depending on the checksum_layout (standard, nar), they will compute checksums differently. With "standard" checksum layout, they are computing usual "swh" checksums (sha1, sha1_git, ...) and validate them. With "nar" checksum layout, they will compute the "nar" checksums and store those (once validated) as ExtID. This keeps the node loader construtor retro-compatible with the previous version. It still deals with `checksums_computation` as `checksum_layout` and falls back to "standard" layout when nothing is provided. Refs. swh/meta#4979
-