The Software Heritage Git Loader is a tool and a library to walk a local Git repository and inject into the SWH dataset all contained files that weren't known before. License ======= This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. See top-level LICENSE file for the full text of the GNU General Public License along with this program. Dependencies ============ Runtime ------- - python3 - python3-psycopg2 - python3-pygit2 Test ---- - python3-nose Requirements ============ - implementation language, Python3 - coding guidelines: conform to PEP8 - Git access: via libgit2/pygit - cache: implemented as Postgres tables Configuration ============= swh-git-loader depends on some tools, here are the configuration files for those: swh-db-manager -------------- This is solely a tool in charge of db cleanup now. Create a configuration file in **\~/.config/db-manager.ini** ``` {.ini} [main] # Where to store the logs log_dir = swh-git-loader/log # url access to db db_url = dbname=swhgitloader ``` See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the db url's schema swh-git-loader -------------- Create a configuration file in **\~/.config/swh/git-loader.ini**: ``` {.ini} [main] # Where to store the logs log_dir = /tmp/swh-git-loader/log # how to access the backend (remote or local) backend-type = remote # backend-type remote: url access to api rest's backend # backend-type local: configuration file to backend file .ini (cf. back.ini file) backend = http://localhost:5000 ``` Note: - [DB url DSL](http://initd.org/psycopg/docs/module.html#psycopg2.connect) - the configuration file can be changed in the CLI with the flag \`-c \<config-filepath\>\` or \`--config-file \<config-filepath\>\` swh-backend ----------- Backend api. This Create a configuration file in **\~/.config/swh/back.ini**: ``` {.ini} [main] # where to store blob on disk content_storage_dir = /tmp/swh-git-loader/content-storage # Where to store the logs log_dir = swh-git-loader/log # url access to db: dbname=<host> (port=<port> user=<user> pass=<pass>) db_url = dbname=swhgitloader # activate the compression for each vcs stored object # storage_compression = true # compute folder's depth on disk aa/bb/cc/dd # folder_depth = 2 # Debugger (for dev only) debug = true # server port to listen to requests port = 6000 ``` See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the db url's schema Run === Environment initialization -------------------------- ``` {.bash} export PYTHONPATH=`pwd`:$PYTHONPATH ``` Backend ------- ### With initialization This depends on swh-sql repository, so: ``` {.bash} cd /path/to/swh-sql && make clean initdb DBNAME=softwareheritage-dev ``` Using the Makefile eases: ``` {.bash} make drop-db create-db run-back FOLLOW_LOG=-f ``` ### without initialization Running the backend. ``` {.bash} ./bin/swh-backend -v ``` With makefile: ``` {.bash} make run-back FOLLOW_LOG=-f ``` Help ---- ``` {.bash} bin/swh-git-loader --help bin/swh-db-manager --help ``` Parse a repository from a clean slate ------------------------------------- Clean and initialize the model then parse the repository git: ``` {.bash} bin/swh-db-manager cleandb bin/swh-git-loader load /path/to/git/repo ``` For ease: ``` {.bash} time make cleandb run REPO_PATH=~/work/inria/repo/swh-git-cloner ``` Parse an existing repository ---------------------------- ``` {.bash} bin/swh-git-loader load /path/to/git/repo ``` Clean data ---------- This will truncate the relevant table in the schema ``` {.bash} bin/swh-db-manager cleandb ``` For ease: ``` {.bash} make cleandb ``` Init data --------- ``` {.bash} make drop-db create-db ```
Forked from
Platform / Development / swh-loader-git
576 commits behind the upstream repository.

Antoine R. Dumont
authored
Name | Last commit | Last update |
---|---|---|
bin | ||
doc | ||
resources | ||
scratch | ||
swh | ||
swh-loader-git-testdata @ d566a501 | ||
.gitignore | ||
.gitmodules | ||
AUTHORS | ||
LICENSE | ||
Makefile | ||
Makefile.local | ||
Makefile.tests | ||
README | ||
setup.cfg |