Skip to content
Snippets Groups Projects
Forked from Platform / Development / swh-loader-git
54 commits behind the upstream repository.
Antoine Lambert's avatar
Antoine Lambert authored
The git directory loader is used to archive guix source packages where
source code is located in a git repository at a specific reference.

To ensure SWH archives the exact same set of source code files for a
guix package, the recursive NAR hash of the source code directory is
computed and compared against the one computed by guix.

Previously the loader was always fetching git submodules if some were
set for the git repository but guix only fetch those for a couple of
packages and not for all git based ones. This could result in directory
hash mismatch when the loader fetches the submodules while it should
have not.

In order to woraround this, first compute the NAR hash without fetching
submodules and if this results in a directory hash mismatch then retry
the operation with the submodules fetched.

Related to #4751.
f9c18e78
History

swh-loader-git

The Software Heritage Git Loader is a tool and a library to walk a local Git repository and inject into the SWH dataset all contained files that weren't known before.

The main entry points are:

  • :class:swh.loader.git.loader.GitLoader for the main loader which can ingest either local or remote git repository's contents. This is the main implementation deployed in production.

  • :class:swh.loader.git.from_disk.GitLoaderFromDisk which ingests only local git clone repository.

  • :class:swh.loader.git.loader.GitLoaderFromArchive which ingests a git repository wrapped in an archive.

  • :class:swh.loader.git.directory.GitCheckoutLoader which ingests a git tree at a specific commit, branch or tag.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

See top-level LICENSE file for the full text of the GNU General Public License along with this program.

Dependencies

Runtime

  • python3
  • python3-dulwich
  • python3-retrying
  • python3-swh.core
  • python3-swh.model
  • python3-swh.storage
  • python3-swh.scheduler

Test

  • python3-nose

Requirements

  • implementation language, Python3
  • coding guidelines: conform to PEP8
  • Git access: via dulwich

CLI Run

You can run the loader from a remote origin (loader) or from an origin on disk (from_disk) directly by calling:

swh loader -C <config-file> run git <git-repository-url>

or "git_disk".

Configuration sample

/tmp/git.yml:

storage:
  cls: remote
  args:
    url: http://localhost:5002/