-
Nancy Chauhan authoredNancy Chauhan authored
swh-lister
This component from the Software Heritage stack aims to produce listings of software origins and their urls hosted on various public developer platforms or package managers. As these operations are quite similar, it provides a set of Python modules abstracting common software origins listing behaviors.
It also provides several lister implementations, contained in the following Python modules:
swh.lister.bitbucket
swh.lister.debian
swh.lister.github
swh.lister.gitlab
swh.lister.gnu
swh.lister.pypi
swh.lister.npm
swh.lister.phabricator
swh.lister.cran
Dependencies
All required dependencies can be found in the requirements*.txt
files located
at the root of the repository.
Local deployment
lister configuration
Each lister implemented so far by Software Heritage (github
, gitlab
, debian
, pypi
, npm
)
must be configured by following the instructions below (please note that you have to replace
<lister_name>
by one of the lister name introduced above).
Preparation steps
mkdir ~/.config/swh/ ~/.cache/swh/lister/<lister_name>/
- create configuration file
~/.config/swh/lister_<lister_name>.yml
- Bootstrap the db instance schema
$ createdb lister-<lister_name>
$ python3 -m swh.lister.cli --db-url postgres:///lister-<lister_name> <lister_name>
Note: This bootstraps a minimum data set needed for the lister to run.
Configuration file sample
Minimalistic configuration shared by all listers to add in file ~/.config/swh/lister_<lister_name>.yml
:
storage:
cls: 'remote'
args:
url: 'http://localhost:5002/'
scheduler:
cls: 'remote'
args:
url: 'http://localhost:5008/'
lister:
cls: 'local'
args:
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
db: 'postgresql:///lister-<lister_name>'
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/<lister_name>/
Note: This expects storage (5002) and scheduler (5008) services to run locally
lister-github
Once configured, you can execute a GitHub lister using the following instructions in a python3
script:
import logging
from swh.lister.github.tasks import range_github_lister
logging.basicConfig(level=logging.DEBUG)
range_github_lister(364, 365)
...