Skip to content
Snippets Groups Projects
Forked from Platform / Development / swh-loader-git
576 commits behind the upstream repository.
The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.

License
=======

This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.

See top-level LICENSE file for the full text of the GNU General Public
License along with this program.

Dependencies
============

Runtime
-------

-   python3
-   python3-psycopg2
-   python3-pygit2

Test
----

-   python3-nose

Requirements
============

-   implementation language, Python3
-   coding guidelines: conform to PEP8
-   Git access: via libgit2/pygit
-   cache: implemented as Postgres tables

Configuration
=============

swh-git-loader depends on some tools, here are the configuration files
for those:

swh-db-manager
--------------

This is solely a tool in charge of db cleanup now.

Create a configuration file in **\~/.config/db-manager.ini**

``` {.ini}
[main]

# Where to store the logs
log_dir = swh-git-loader/log

# url access to db
db_url = dbname=swhgitloader
```

See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the
db url's schema

swh-git-loader
--------------

Create a configuration file in **\~/.config/swh/git-loader.ini**:

``` {.ini}
[main]
# Where to store the logs
log_dir = /tmp/swh-git-loader/log

# how to access the backend (remote or local)
backend-type = remote

# backend-type remote: url access to api rest's backend
# backend-type local: configuration file to backend file .ini (cf. back.ini file)
backend = http://localhost:5000
```

Note:
-   [DB url
    DSL](http://initd.org/psycopg/docs/module.html#psycopg2.connect)
-   the configuration file can be changed in the CLI with the flag \`-c
    \<config-filepath\>\` or \`--config-file \<config-filepath\>\`

swh-backend
-----------

Backend api. This

Create a configuration file in **\~/.config/swh/back.ini**:

``` {.ini}
[main]

# where to store blob on disk
content_storage_dir = /tmp/swh-git-loader/content-storage

# Where to store the logs
log_dir = swh-git-loader/log

# url access to db: dbname=<host> (port=<port> user=<user> pass=<pass>)
db_url = dbname=swhgitloader

# activate the compression for each vcs stored object
# storage_compression = true

# compute folder's depth on disk aa/bb/cc/dd
# folder_depth = 2

# Debugger (for dev only)
debug = true

# server port to listen to requests
port = 6000
```

See <http://initd.org/psycopg/docs/module.html#psycopg2.connect> for the
db url's schema

Run
===

Environment initialization
--------------------------

``` {.bash}
export PYTHONPATH=`pwd`:$PYTHONPATH
```

Backend
-------

### With initialization

This depends on swh-sql repository, so:

``` {.bash}
cd /path/to/swh-sql && make clean initdb DBNAME=softwareheritage-dev
```

Using the Makefile eases:

``` {.bash}
make drop-db create-db run-back FOLLOW_LOG=-f
```

### without initialization

Running the backend.

``` {.bash}
./bin/swh-backend -v
```

With makefile:

``` {.bash}
make run-back FOLLOW_LOG=-f
```

Help
----

``` {.bash}
bin/swh-git-loader --help
bin/swh-db-manager --help
```

Parse a repository from a clean slate
-------------------------------------

Clean and initialize the model then parse the repository git:

``` {.bash}
bin/swh-db-manager cleandb
bin/swh-git-loader load /path/to/git/repo
```

For ease:

``` {.bash}
time make cleandb run REPO_PATH=~/work/inria/repo/swh-git-cloner
```

Parse an existing repository
----------------------------

``` {.bash}
bin/swh-git-loader load /path/to/git/repo
```

Clean data
----------

This will truncate the relevant table in the schema

``` {.bash}
bin/swh-db-manager cleandb
```

For ease:

``` {.bash}
make cleandb
```

Init data
---------

``` {.bash}
make drop-db create-db
```