From 7d192a2f1b4507f6bf043457f2eff7bbde6403c3 Mon Sep 17 00:00:00 2001
From: Antoine Lambert <antoine.lambert@inria.fr>
Date: Mon, 13 May 2019 15:20:05 +0200
Subject: [PATCH] README.md: Fix outdated instructions and improve formatting

---
 README.md | 282 ++++++++++++++++++++++++++----------------------------
 1 file changed, 135 insertions(+), 147 deletions(-)

diff --git a/README.md b/README.md
index d272eed8..24ebb042 100644
--- a/README.md
+++ b/README.md
@@ -1,192 +1,180 @@
-SWH-lister
-============
+swh-lister
+==========
 
-The Software Heritage Lister is both a library module to permit to
-centralize lister behaviors, and to provide lister implementations.
+This component from the Software Heritage stack aims to produce listings
+of software origins and their urls hosted on various public developer platforms
+or package managers. As these operations are quite similar, it provides a set of
+Python modules abstracting common software origins listing behaviors.
 
-Actual lister implementations are:
-
-- swh-lister-bitbucket
-- swh-lister-debian
-- swh-lister-github
-- swh-lister-gitlab
-- swh-lister-pypi
-
-Licensing
-----------
-
-This program is free software: you can redistribute it and/or modify it under
-the terms of the GNU General Public License as published by the Free Software
-Foundation, either version 3 of the License, or (at your option) any later
-version.
-
-This program is distributed in the hope that it will be useful, but WITHOUT ANY
-WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
-PARTICULAR PURPOSE.  See the GNU General Public License for more details.
-
-See top-level LICENSE file for the full text of the GNU General Public License
-along with this program.
+It also provides several lister implementations, contained in the
+following Python modules:
 
+- `swh.lister.bitbucket`
+- `swh.lister.debian`
+- `swh.lister.github`
+- `swh.lister.gitlab`
+- `swh.lister.pypi`
+- `swh.lister.npm`
 
 Dependencies
 ------------
 
-- python3
-- python3-requests
-- python3-sqlalchemy
-
-More details in requirements*.txt
-
+All required dependencies can be found in the `requirements*.txt` files located
+at the root of the repository.
 
 Local deployment
------------
-
-## lister-github
-
-### Preparation steps
-
-1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)
-2. mkdir ~/.config/swh/ ~/.cache/swh/lister/github.com/
-3. create configuration file ~/.config/swh/lister-github.com.yml
-4. Bootstrap the db instance schema
+----------------
 
-    $ createdb lister-github
-    $ python3 -m swh.lister.cli --db-url postgres:///lister-github github
+## lister configuration
 
-### Configuration file sample
-
-Minimalistic configuration:
-
-    $ cat ~/.config/swh/lister-github.com.yml
-    # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
-    lister_db_url: postgres:///lister-github
-    credentials: []
-    cache_responses: True
-    cache_dir: /home/user/.cache/swh/lister/github.com
-
-Note: This expects storage (5002) and scheduler (5008) services to run locally
+Each lister implemented so far by Software Heritage (`github`, `gitlab`, `debian`, `pypi`, `npm`)
+must be configured by following the instructions below (please note that you have to replace
+`<lister_name>` by one of the lister name introduced above).
 
-### Run
+### Preparation steps
 
-    $ python3
-    >>> import logging
-    >>> logging.basicConfig(level=logging.DEBUG)
-    >>> from swh.lister.github.tasks import range_github_lister; range_github_lister(364, 365)
-    INFO:root:listing repos starting at 364
-    DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com
-    DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repositories?since=364 HTTP/1.1" 200 None
-    DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
-    DEBUG:urllib3.connectionpool:http://localhost:5002 "POST /origin/add HTTP/1.1" 200 1
+1. `mkdir ~/.config/swh/ ~/.cache/swh/lister/<lister_name>/`
+2. create configuration file `~/.config/swh/lister_<lister_name>.yml`
+3. Bootstrap the db instance schema
 
+```lang=bash
+$ createdb lister-<lister_name>
+$ python3 -m swh.lister.cli --db-url postgres:///lister-<lister_name> <lister_name>
+```
 
-## lister-gitlab
+Note: This bootstraps a minimum data set needed for the lister to run.
 
-### preparation steps
+### Configuration file sample
 
-1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)
-2. mkdir ~/.config/swh/ ~/.cache/swh/lister/gitlab/
-3. create configuration file ~/.config/swh/lister-gitlab.yml
-4. Bootstrap the db instance schema
+Minimalistic configuration shared by all listers to add in file `~/.config/swh/lister_<lister_name>.yml`:
 
-    $ createdb lister-gitlab
-    $ python3 -m swh.lister.cli --db-url postgres:///lister-gitlab gitlab
+```lang=yml
+storage:
+  cls: 'remote'
+  args:
+    url: 'http://localhost:5002/'
 
-### Configuration file sample
+scheduler:
+  cls: 'remote'
+  args:
+    url: 'http://localhost:5008/'
 
-    $ cat ~/.config/swh/lister-gitlab.yml
+lister:
+  cls: 'local'
+  args:
     # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
-    lister_db_url: postgres:///lister-gitlab
-    credentials: []
-    cache_responses: True
-    cache_dir: /home/user/.cache/swh/lister/gitlab
+    db: 'postgresql:///lister-<lister_name>'
+
+credentials: []
+cache_responses: True
+cache_dir: /home/user/.cache/swh/lister/<lister_name>/
+```
 
 Note: This expects storage (5002) and scheduler (5008) services to run locally
 
-### Run
+## lister-github
 
-    $ python3
-    Python 3.6.6 (default, Jun 27 2018, 14:44:17)
-    [GCC 8.1.0] on linux
-    Type "help", "copyright", "credits" or "license" for more information.
-    >>> from swh.lister.gitlab.tasks import range_gitlab_lister; range_gitlab_lister(1, 2,
-      {'instance': 'debian', 'api_baseurl': 'https://salsa.debian.org/api/v4', 'sort': 'asc', 'per_page': 20})
-    >>> from swh.lister.gitlab.tasks import full_gitlab_relister; full_gitlab_relister(
-      {'instance':'0xacab', 'api_baseurl':'https://0xacab.org/api/v4', 'sort': 'asc', 'per_page': 20})
-    >>> from swh.lister.gitlab.tasks import incremental_gitlab_lister; incremental_gitlab_lister(
-      {'instance': 'freedesktop.org', 'api_baseurl': 'https://gitlab.freedesktop.org/api/v4',
-       'sort': 'asc', 'per_page': 20})
+Once configured, you can execute a GitHub lister using the following instructions in a `python3` script:
 
-## lister-debian
+```lang=python
+import logging
+from swh.lister.github.tasks import range_github_lister
 
-### preparation steps
+logging.basicConfig(level=logging.DEBUG)
+range_github_lister(364, 365)
+...
+```
 
-1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)
-2. mkdir ~/.config/swh/ ~/.cache/swh/lister/debian/
-3. create configuration file ~/.config/swh/lister-debian.yml
-4. Bootstrap the db instance schema
+## lister-gitlab
 
-    $ createdb lister-debian
-    $ python3 -m swh.lister.cli --db-url postgres:///lister-debian debian
- 
-    Note: This bootstraps a minimum data set needed for the debian
-    lister to run (for development)
+Once configured, you can execute a GitLab lister using the instructions detailed in the `python3` scripts below:
+
+```lang=python
+import logging
+from swh.lister.gitlab.tasks import range_gitlab_lister
+
+logging.basicConfig(level=logging.DEBUG)
+range_gitlab_lister(1, 2, {
+    'instance': 'debian',
+    'api_baseurl': 'https://salsa.debian.org/api/v4',
+    'sort': 'asc',
+    'per_page': 20
+})
+```
+
+```lang=python
+import logging
+from swh.lister.gitlab.tasks import full_gitlab_relister
+
+logging.basicConfig(level=logging.DEBUG)
+full_gitlab_relister({
+    'instance': '0xacab',
+    'api_baseurl': 'https://0xacab.org/api/v4',
+    'sort': 'asc',
+    'per_page': 20
+})
+```
+
+```lang=python
+import logging
+from swh.lister.gitlab.tasks import incremental_gitlab_lister
+
+logging.basicConfig(level=logging.DEBUG)
+incremental_gitlab_lister({
+    'instance': 'freedesktop.org',
+    'api_baseurl': 'https://gitlab.freedesktop.org/api/v4',
+    'sort': 'asc',
+    'per_page': 20
+})
+```
 
-### Configuration file sample
+## lister-debian
 
-    $ cat ~/.config/swh/lister-debian.yml
-    # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
-    lister_db_url: postgres:///lister-debian
-    credentials: []
-    cache_responses: True
-    cache_dir: /home/user/.cache/swh/lister/debian
+Once configured, you can execute a Debian lister using the following instructions in a `python3` script:
 
-Note: This expects storage (5002) and scheduler (5008) services to run locally
+```lang=python
+import logging
+from swh.lister.debian.tasks import debian_lister
 
-### Run
+logging.basicConfig(level=logging.DEBUG)
+debian_lister('Debian')
+```
 
-  $ python3
-  Python 3.6.6 (default, Jun 27 2018, 14:44:17)
-  [GCC 8.1.0] on linux
-  Type "help", "copyright", "credits" or "license" for more information.
-  >>> import logging; logging.basicConfig(level=logging.DEBUG); from swh.lister.debian.tasks import debian_lister; debian_lister('Debian')
-  DEBUG:root:Creating snapshot for distribution Distribution(Debian (deb) on http://deb.debian.org/debian/) on date 2018-07-27 09:22:50.461165+00:00
-  DEBUG:root:Processing area Area(stretch/main of Debian)
-  DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): deb.debian.org
-  DEBUG:urllib3.connectionpool:http://deb.debian.org:80 "GET /debian//dists/stretch/main/source/Sources.xz HTTP/1.1" 302 325
-  ...
+## lister-pypi
 
+Once configured, you can execute a PyPI lister using the following instructions in a `python3` script:
 
-## lister-pypi
+```lang=python
+import logging
+from swh.lister.pypi.tasks import pypi_lister
 
-### preparation steps
+logging.basicConfig(level=logging.DEBUG)
+pypi_lister()
+```
 
-1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)
-2. mkdir ~/.config/swh/ ~/.cache/swh/lister/pypi/
-3. create configuration file ~/.config/swh/lister-pypi.yml
-4. Bootstrap the db instance schema
+## lister-npm
 
-    $ createdb lister-pypi
-    $ python3 -m swh.lister.cli --db-url postgres:///lister-pypi pypi
+Once configured, you can execute a npm lister using the following instructions in a `python3` REPL:
 
-    Note: This bootstraps a minimum data set needed for the pypi
-    lister to run (for development)
+```lang=python
+import logging
+from swh.lister.npm.tasks import npm_lister
 
-### Configuration file sample
+logging.basicConfig(level=logging.DEBUG)
+npm_lister()
+```
 
-    $ cat ~/.config/swh/lister-pypi.yml
-    # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
-    lister_db_url: postgres:///lister-pypi
-    credentials: []
-    cache_responses: True
-    cache_dir: /home/user/.cache/swh/lister/pypi
+Licensing
+---------
 
-Note: This expects storage (5002) and scheduler (5008) services to run locally
+This program is free software: you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free Software
+Foundation, either version 3 of the License, or (at your option) any later
+version.
 
-### Run
+This program is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
+PARTICULAR PURPOSE.  See the GNU General Public License for more details.
 
-  $ python3
-  Python 3.6.6 (default, Jun 27 2018, 14:44:17)
-  [GCC 8.1.0] on linux
-  Type "help", "copyright", "credits" or "license" for more information.
-  >>> from swh.lister.pypi.tasks import pypi_lister; pypi_lister()
-  >>>
+See top-level LICENSE file for the full text of the GNU General Public License
+along with this program.
\ No newline at end of file
-- 
GitLab