Skip to content
Snippets Groups Projects
README.md 5.51 KiB

swh-lister

This component from the Software Heritage stack aims to produce listings of software origins and their urls hosted on various public developer platforms or package managers. As these operations are quite similar, it provides a set of Python modules abstracting common software origins listing behaviors.

It also provides several lister implementations, contained in the following Python modules:

  • swh.lister.bitbucket
  • swh.lister.debian
  • swh.lister.github
  • swh.lister.gitlab
  • swh.lister.gnu
  • swh.lister.pypi
  • swh.lister.npm
  • swh.lister.phabricator
  • swh.lister.cran

Dependencies

All required dependencies can be found in the requirements*.txt files located at the root of the repository.

Local deployment

lister configuration

Each lister implemented so far by Software Heritage (github, gitlab, debian, pypi, npm) must be configured by following the instructions below (please note that you have to replace <lister_name> by one of the lister name introduced above).

Preparation steps

  1. mkdir ~/.config/swh/ ~/.cache/swh/lister/<lister_name>/
  2. create configuration file ~/.config/swh/lister_<lister_name>.yml
  3. Bootstrap the db instance schema
$ createdb lister-<lister_name>
$ python3 -m swh.lister.cli --db-url postgres:///lister-<lister_name> <lister_name>

Note: This bootstraps a minimum data set needed for the lister to run.

Configuration file sample

Minimalistic configuration shared by all listers to add in file ~/.config/swh/lister_<lister_name>.yml:

storage:
  cls: 'remote'
  args:
    url: 'http://localhost:5002/'

scheduler:
  cls: 'remote'
  args:
    url: 'http://localhost:5008/'

lister:
  cls: 'local'
  args:
    # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
    db: 'postgresql:///lister-<lister_name>'

credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/<lister_name>/

Note: This expects storage (5002) and scheduler (5008) services to run locally

lister-github

Once configured, you can execute a GitHub lister using the following instructions in a python3 script:

import logging
from swh.lister.github.tasks import range_github_lister

logging.basicConfig(level=logging.DEBUG)
range_github_lister(364, 365)
...