Skip to content
GitLab
Explore
Sign in
Register
Primary navigation
Search or go to…
Project
S
swh-lister
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Platform
Development
swh-lister
Commits
7d192a2f
Commit
7d192a2f
authored
5 years ago
by
Antoine Lambert
Browse files
Options
Downloads
Patches
Plain Diff
README.md: Fix outdated instructions and improve formatting
parent
977d2459
No related branches found
Branches containing commit
No related tags found
Tags containing commit
1 merge request
!368
Fix outdated README for listers and improve formatting
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+135
-147
135 additions, 147 deletions
README.md
with
135 additions
and
147 deletions
README.md
+
135
−
147
View file @
7d192a2f
SWH
-lister
==========
==
swh
-lister
==========
The Software Heritage Lister is both a library module to permit to
centralize lister behaviors, and to provide lister implementations.
This component from the Software Heritage stack aims to produce listings
of software origins and their urls hosted on various public developer platforms
or package managers. As these operations are quite similar, it provides a set of
Python modules abstracting common software origins listing behaviors.
Actual lister implementations are:
-
swh-lister-bitbucket
-
swh-lister-debian
-
swh-lister-github
-
swh-lister-gitlab
-
swh-lister-pypi
Licensing
----------
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public License
along with this program.
It also provides several lister implementations, contained in the
following Python modules:
-
`swh.lister.bitbucket`
-
`swh.lister.debian`
-
`swh.lister.github`
-
`swh.lister.gitlab`
-
`swh.lister.pypi`
-
`swh.lister.npm`
Dependencies
------------
-
python3
-
python3-requests
-
python3-sqlalchemy
More details in requirements
*
.txt
All required dependencies can be found in the
`requirements*.txt`
files located
at the root of the repository.
Local deployment
-----------
## lister-github
### Preparation steps
1.
git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)
2.
mkdir ~/.config/swh/ ~/.cache/swh/lister/github.com/
3.
create configuration file ~/.config/swh/lister-github.com.yml
4.
Bootstrap the db instance schema
----------------
$ createdb lister-github
$ python3 -m swh.lister.cli --db-url postgres:///lister-github github
## lister configuration
### Configuration file sample
Minimalistic configuration:
$ cat ~/.config/swh/lister-github.com.yml
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
lister_db_url: postgres:///lister-github
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/github.com
Note: This expects storage (5002) and scheduler (5008) services to run locally
Each lister implemented so far by Software Heritage (
`github`
,
`gitlab`
,
`debian`
,
`pypi`
,
`npm`
)
must be configured by following the instructions below (please note that you have to replace
`<lister_name>`
by one of the lister name introduced above).
###
Run
###
Preparation steps
$ python3
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> from swh.lister.github.tasks import range_github_lister; range_github_lister(364, 365)
INFO:root:listing repos starting at 364
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repositories?since=364 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:5002 "POST /origin/add HTTP/1.1" 200 1
1.
`mkdir ~/.config/swh/ ~/.cache/swh/lister/<lister_name>/`
2.
create configuration file
`~/.config/swh/lister_<lister_name>.yml`
3.
Bootstrap the db instance schema
```
lang=bash
$ createdb lister-<lister_name>
$ python3 -m swh.lister.cli --db-url postgres:///lister-<lister_name> <lister_name>
```
## lister-gitlab
Note: This bootstraps a minimum data set needed for the lister to run.
###
preparation steps
###
Configuration file sample
1.
git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)
2.
mkdir ~/.config/swh/ ~/.cache/swh/lister/gitlab/
3.
create configuration file ~/.config/swh/lister-gitlab.yml
4.
Bootstrap the db instance schema
Minimalistic configuration shared by all listers to add in file
`~/.config/swh/lister_<lister_name>.yml`
:
$ createdb lister-gitlab
$ python3 -m swh.lister.cli --db-url postgres:///lister-gitlab gitlab
```
lang=yml
storage:
cls: 'remote'
args:
url: 'http://localhost:5002/'
### Configuration file sample
scheduler:
cls: 'remote'
args:
url: 'http://localhost:5008/'
$ cat ~/.config/swh/lister-gitlab.yml
lister:
cls: 'local'
args:
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
lister_db_url: postgres:///lister-gitlab
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/gitlab
db: 'postgresql:///lister-<lister_name>'
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/<lister_name>/
```
Note: This expects storage (5002) and scheduler (5008) services to run locally
##
# Run
##
lister-github
$ python3
Python 3.6.6 (default, Jun 27 2018, 14:44:17)
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from swh.lister.gitlab.tasks import range_gitlab_lister; range_gitlab_lister(1, 2,
{'instance': 'debian', 'api_baseurl': 'https://salsa.debian.org/api/v4', 'sort': 'asc', 'per_page': 20})
>>> from swh.lister.gitlab.tasks import full_gitlab_relister; full_gitlab_relister(
{'instance':'0xacab', 'api_baseurl':'https://0xacab.org/api/v4', 'sort': 'asc', 'per_page': 20})
>>> from swh.lister.gitlab.tasks import incremental_gitlab_lister; incremental_gitlab_lister(
{'instance': 'freedesktop.org', 'api_baseurl': 'https://gitlab.freedesktop.org/api/v4',
'sort': 'asc', 'per_page': 20})
Once configured, you can execute a GitHub lister using the following instructions in a
`python3`
script:
## lister-debian
```
lang=python
import logging
from swh.lister.github.tasks import range_github_lister
### preparation steps
logging.basicConfig(level=logging.DEBUG)
range_github_lister(364, 365)
...
```
1.
git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)
2.
mkdir ~/.config/swh/ ~/.cache/swh/lister/debian/
3.
create configuration file ~/.config/swh/lister-debian.yml
4.
Bootstrap the db instance schema
## lister-gitlab
$ createdb lister-debian
$ python3 -m swh.lister.cli --db-url postgres:///lister-debian debian
Note: This bootstraps a minimum data set needed for the debian
lister to run (for development)
Once configured, you can execute a GitLab lister using the instructions detailed in the
`python3`
scripts below:
```
lang=python
import logging
from swh.lister.gitlab.tasks import range_gitlab_lister
logging.basicConfig(level=logging.DEBUG)
range_gitlab_lister(1, 2, {
'instance': 'debian',
'api_baseurl': 'https://salsa.debian.org/api/v4',
'sort': 'asc',
'per_page': 20
})
```
```
lang=python
import logging
from swh.lister.gitlab.tasks import full_gitlab_relister
logging.basicConfig(level=logging.DEBUG)
full_gitlab_relister({
'instance': '0xacab',
'api_baseurl': 'https://0xacab.org/api/v4',
'sort': 'asc',
'per_page': 20
})
```
```
lang=python
import logging
from swh.lister.gitlab.tasks import incremental_gitlab_lister
logging.basicConfig(level=logging.DEBUG)
incremental_gitlab_lister({
'instance': 'freedesktop.org',
'api_baseurl': 'https://gitlab.freedesktop.org/api/v4',
'sort': 'asc',
'per_page': 20
})
```
##
# Configuration file sample
##
lister-debian
$ cat ~/.config/swh/lister-debian.yml
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
lister_db_url: postgres:///lister-debian
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/debian
Once configured, you can execute a Debian lister using the following instructions in a
`python3`
script:
Note: This expects storage (5002) and scheduler (5008) services to run locally
```
lang=python
import logging
from swh.lister.debian.tasks import debian_lister
### Run
logging.basicConfig(level=logging.DEBUG)
debian_lister('Debian')
```
$ python3
Python 3.6.6 (default, Jun 27 2018, 14:44:17)
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging; logging.basicConfig(level=logging.DEBUG); from swh.lister.debian.tasks import debian_lister; debian_lister('Debian')
DEBUG:root:Creating snapshot for distribution Distribution(Debian (deb) on http://deb.debian.org/debian/) on date 2018-07-27 09:22:50.461165+00:00
DEBUG:root:Processing area Area(stretch/main of Debian)
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): deb.debian.org
DEBUG:urllib3.connectionpool:http://deb.debian.org:80 "GET /debian//dists/stretch/main/source/Sources.xz HTTP/1.1" 302 325
...
## lister-pypi
Once configured, you can execute a PyPI lister using the following instructions in a
`python3`
script:
## lister-pypi
```
lang=python
import logging
from swh.lister.pypi.tasks import pypi_lister
### preparation steps
logging.basicConfig(level=logging.DEBUG)
pypi_lister()
```
1.
git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)
2.
mkdir ~/.config/swh/ ~/.cache/swh/lister/pypi/
3.
create configuration file ~/.config/swh/lister-pypi.yml
4.
Bootstrap the db instance schema
## lister-npm
$ createdb lister-pypi
$ python3 -m swh.lister.cli --db-url postgres:///lister-pypi pypi
Once configured, you can execute a npm lister using the following instructions in a
`python3`
REPL:
Note: This bootstraps a minimum data set needed for the pypi
lister to run (for development)
```
lang=python
import logging
from swh.lister.npm.tasks import npm_lister
### Configuration file sample
logging.basicConfig(level=logging.DEBUG)
npm_lister()
```
$ cat ~/.config/swh/lister-pypi.yml
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
lister_db_url: postgres:///lister-pypi
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/pypi
Licensing
---------
Note: This expects storage (5002) and scheduler (5008) services to run locally
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.
### Run
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
$ python3
Python 3.6.6 (default, Jun 27 2018, 14:44:17)
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from swh.lister.pypi.tasks import pypi_lister; pypi_lister()
>>>
See top-level LICENSE file for the full text of the GNU General Public License
along with this program.
\ No newline at end of file
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment