Deploy lister next-gen in staging
- Debian package
-
Deploy:
- #2998 (closed): OK: gitlab (instance inria scheduled)
- #2998 (closed): OK: github (cli run only, deemed too big for staging)
- #2998 (closed): OK: bitbucket (cli run only, deemed too big for staging)
- #2998 (closed): OK: phabricator (scheduled)
- #2998 (closed): OK: cgit (rescheduled existing instance)
- #2998 (closed): OK: gitea (cli run)
- #2998 (closed): OK: cran (cli run)
- #2998 (closed): OK: launchpad (cli run)
- #2998 (closed): OK: debian (scheduled)
- #2998 (closed): OK: pypi (scheduled)
- #2998 (closed): OK: npm (scheduled)
Current:
- status on origin listed: #2998 (closed)
- latest deployed python3-swh.lister version: v0.6.1
2 remaining listers (gnu, packagist) needs to be ported, they will be managed in dedicated task when the times come.
Migrated from T2998 (view on Phabricator)
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Antoine R. Dumont added Lister System administration priority:Normal labels
added Lister System administration priority:Normal labels
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont added state:wip label
added state:wip label
- Phabricator Migration user mentioned in commit swh/devel/swh-lister@7e757a15
mentioned in commit swh/devel/swh-lister@7e757a15
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont changed the description
changed the description
- Author Owner
gitlab instance deployed, status ok:
- Update current task in scheduler to actually trigger now:
swh-scheduler=> select * from task where id=962048; id | type | arguments | next_run | current_interval | status | policy | retries_left | priority --------+------------------+-----------------------------------------------------------------------------------------+-------------------------------+------------------+------------------------+-----------+--------------+---------- 962048 | list-gitlab-full | {"args": [], "kwargs": {"url": "https://gitlab.inria.fr/api/v4/", "instance": "inria"}} | 2021-04-27 17:48:56.188577+00 | 90 days | next_run_not_scheduled | recurring | 0 | (1 row)
- Then check run
Jan 27 17:48:30 worker0 python3[1167]: [2021-01-27 17:48:30,718: INFO/ForkPoolWorker-4] Task swh.lister.gitlab.tasks.FullGitLabRelister[62441d9c-9c07-4305-85eb-cb70cda23ea1] succeeded in 203.48368402300002s: {'pages': 145, 'origins': 2874}
And output:
swh-scheduler=> select count(*) from listed_origins where url like 'https://gitlab.inria.fr%'; count ------- 2874 (1 row)
Also added an incremental instance task.
- Antoine R. Dumont changed the description
changed the description
- Author Owner
cli run for the github instance, status ok:
swhworker@worker0:~$ dpkg -l python3-swh.lister | grep lister ii python3-swh.lister 0.5.4-1~swh1~bpo10+1 all Software Heritage Listers (bitbucket, git(lab|hub), pypi, etc...) swhworker@worker0:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister github ^C $ psql service=staging-swh-scheduler swh-scheduler=> select * from listers where instance_name='github'; id | name | instance_name | created | current_state | updated --------------------------------------+--------+---------------+-------------------------------+-------------------------+------------------------------- 9a27a3ac-1e88-48e0-9a9b-37ba28817473 | github | github | 2021-01-28 10:52:37.408887+00 | {"last_seen_id": 22865} | 2021-01-28 10:53:07.886609+00 (1 row) swh-scheduler=> select count(*) from listed_origins where url like 'https://github.com/%'; count ------- 5000 (1 row)
- Author Owner
cli run for the bitbucket lister, status ok:
$ swhworker@worker0:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/lister.yml swh lister run --lister bitbucket incremental=True WARNING:swh.lister.bitbucket.lister:No credentials set in configuration, using anonymous mode ^C swhworker@worker0:~$ $ psql service=staging-swh-scheduler psql (12.5) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. swh-scheduler=> select * from listers where instance_name='bitbucket'; id | name | instance_name | created | current_state | updated --------------------------------------+-----------+---------------+-------------------------------+---------------+------------------------------- c353a201-e4e1-42c2-b954-8a1c6c5928ae | bitbucket | bitbucket | 2021-01-28 11:00:10.034268+00 | {} | 2021-01-28 11:00:10.034268+00 (1 row) swh-scheduler=> select count(*) from listed_origins where url like 'https://bitbucket%'; count ------- 10000 (1 row) swh-scheduler=> select * from listers where instance_name='bitbucket'; id | name | instance_name | created | current_state | updated --------------------------------------+-----------+---------------+-------------------------------+---------------------------------------------------------+------------------------------- c353a201-e4e1-42c2-b954-8a1c6c5928ae | bitbucket | bitbucket | 2021-01-28 11:00:10.034268+00 | {"last_repo_cdate": "2012-06-01T12:57:01.156999+00:00"} | 2021-01-28 11:00:32.443539+00 (1 row)
- Antoine R. Dumont marked the checklist item bitbucket as completed
marked the checklist item bitbucket as completed
- Author Owner
Lister phabricator deployed with one instance (swh), status ok:
swhworker@worker0:~$ swh scheduler --url http://scheduler0.internal.staging.swh.network:5008/ task add list-phabricator-full url=https://forge.softwareheritage.org/api/diffusion.repository.search instance=swh Created 1 tasks Task 17089493 Next run: today (2021-01-28T11:09:56.580260+00:00) Interval: 90 days, 0:00:00 Type: list-phabricator-full Policy: recurring Args: Keyword args: instance: 'swh' url: 'https://forge.softwareheritage.org/api/diffusion.repository.search'
listed 180 origins with 2 pages:
Jan 28 11:10:00 worker0 python3[1157]: [2021-01-28 11:10:00,394: INFO/MainProcess] Received task: swh.lister.phabricator.tasks.FullPhabricatorLister[c7b95b0a-0f80-4b72-b7c6-0ea2df51ef02] Jan 28 11:10:01 worker0 python3[2392]: [2021-01-28 11:10:01,708: INFO/ForkPoolWorker-6] Task swh.lister.phabricator.tasks.FullPhabricatorLister[c7b95b0a-0f80-4b72-b7c6-0ea2df51ef02] succeeded in 1.285880941999494s: {'pages': 2, 'origins': 180}
Status ok:
swh-scheduler=> select count(*) from listed_origins where url like 'https://forge.softwareheritage%'; count ------- 180 (1 row) swh-scheduler=> select * from listers where instance_name='swh'; id | name | instance_name | created | current_state | updated --------------------------------------+-------------+---------------+-------------------------------+---------------+------------------------------- 12ded103-af37-41ac-ae3a-3643bb17ecd5 | phabricator | swh | 2021-01-28 11:09:40.631348+00 | {} | 2021-01-28 11:09:40.631348+00 (1 row)
- Antoine R. Dumont marked the checklist item phabricator as completed
marked the checklist item phabricator as completed
- Antoine R. Dumont changed the description
changed the description
- Phabricator Migration user mentioned in commit swh/devel/swh-lister@ae17b6b9
mentioned in commit swh/devel/swh-lister@ae17b6b9
- Author Owner
one cgit lister scheduled, status, it finished ok but [1]
Jan 28 12:36:39 worker0 python3[29180]: [2021-01-28 12:36:39,717: INFO/MainProcess] Received task: swh.lister.cgit.tasks.CGitListerTask[9544dbd3-fa73-42d4-a194-36d82a2370ea] Jan 28 12:41:46 worker0 python3[29190]: [2021-01-28 12:41:46,608: INFO/ForkPoolWorker-4] Task swh.lister.cgit.tasks.CGitListerTask[9544dbd3-fa73-42d4-a194-36d82a2370ea] succeeded in 306.8694303520024s: {'pages': 1, 'origins': 1070}
In scheduler, all is well:
swh-scheduler=> select count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id and l.name='cgit' and l.instance_name='git-kernel'; count ------- 1070 (1 row)
- [1] Note that this lister seems to need some writing improvments though. It seemed to have flushed the writing only at the end of the listing. If that's the real behavior (i'll need to check), that won't bode well for relatively high dimensioned instance like the cgit eclispe instance for example.
- Maintainer
Note that this lister seems to need some writing improvments though. It seemed to have flushed the writing only at the end of the listing. If that's the real behavior (i'll need to check), that won't bode well for relatively high dimensioned instance like the cgit eclispe instance for example.
cgit lister should flush origins after each page, which instance has been listed here ?
Some listers like debian might flush a large amount of origins per page, will be curious to see how it goes.
- Author Owner
cgit lister should flush origins after each page, which instance has been listed here ?
yes we did not implement anything particular in the cgit implementation. We left left that concern to the StatelessLister / Lister class.
- [1]
1121917 | list-cgit | {"args": [], "kwargs": {"url": "https://git.kernel.org", "instance": "git-kernel"}} | 2021-01-29 12:42:12.077249+00 | 1 day | next_run_not_scheduled | recurring | 0 |