staging: Deploy maven indexer/lister/loader
When the actual diffs land, making it run in staging should help in lifting papercuts.
Plan:
-
docker declaration should be complete now, make it run within docker
-
#3746 (closed), D7023: unstuck the listing scheduling
-
D7052: Land
-
Package new version
-
#3746 (closed): Failing with a new issue
-
D7139: Fix that issue
-
package new version v2.6.3
-
Make the lister actually list "maven" (load-maven) tasks to load
-
D7178: Make the scheduler actually schedule load-maven tasks
-
#4105 (closed): "Industrialize" maven-index-exporter docker image
-
Develop puppet manifest for the maven stack (indexer, lister, loader)
- infra/puppet/puppet-swh-site!505: Deploy node with maven index exporter service which computes the expected lister output (export.fld)
- infra/puppet/puppet-swh-site!506: Expose the export files through apache
- infra/puppet/puppet-swh-site!507: Update lister service to also manage the list maven
- infra/puppet/puppet-swh-site!508: Deploy swh-worker@loader_maven service (in charge of dealing with jar files)
- Install zfs tooling on that node (to reduce future disk space use [2])
-
infra/puppet/puppet-swh-site!507: Provision the new node to expose the computation results
-
Ensure the loader and lister maven are registered on the staging scheduler [1]
-
infra/swh-sysadmin-provisioning!69: Provision new node
-
Update inventory
-
Configure zfs partitions
-
Update firewall rules to allow icinga reporting
-
Finally schedule new maven lister instance to consume maven-central and clojars (for now)
-
[1]
$ swhscheduler@scheduler0:~$ swh scheduler --config-file /etc/softwareheritage/scheduler/backend.yml task-type register | grep -i maven
INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin loader.maven
INFO:swh.scheduler.cli.task_type:Create task type load-maven in scheduler
INFO:swh.scheduler.cli.task_type:Loading entrypoint for plugin lister.maven
- [2] export.fld can be quite large (maven-central ~18G), others might be less though clojars ~60Mib)
Migrated from T3746 (view on Phabricator)
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Phabricator Migration user marked this issue as related to swh/meta#1724
marked this issue as related to swh/meta#1724
- Antoine R. Dumont changed title from staging: Deploy maven exporter/lister/loader to staging: Deploy maven indexer/lister/loader
changed title from staging: Deploy maven exporter/lister/loader to staging: Deploy maven indexer/lister/loader
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont added state:wip label
added state:wip label
- Author Owner
docker
- Update the docker stack to the current repositories' heads
$ swh-doco-rebuild + DOCKER_CMD=/nix/store/r50lif10qfix73c3rxvjbmsd1x8v45rj-docker-20.10.12/bin/docker + cd /home/tony/work/inria/repo/swh/swh-environment/docker + /nix/store/r50lif10qfix73c3rxvjbmsd1x8v45rj-docker-20.10.12/bin/docker build -f Dockerfile --no-cache -t swh/stack . Sending build context to Docker daemon 99MB Step 1/13 : FROM python:3.7 ---> ad37de9b03ef ...
- Check the scheduler state after docker up:
10:05:11 swh-scheduler@localhost:5433=# select * from task_type where type like 'list-maven%' or type = 'load-jar-file'; select * from task_type where type like 'list-maven%' or type = 'load-jar-file'; +-[ RECORD 1 ]-----+---------------------------------------------------+ | type | list-maven-full | | description | Full update of a Maven repository instance | | backend_name | swh.lister.maven.tasks.FullMavenLister | | default_interval | 90 days | | min_interval | 90 days | | max_interval | 90 days | | backoff_factor | 1 | | max_queue_length | (null) | | num_retries | (null) | | retry_delay | (null) | +-[ RECORD 2 ]-----+---------------------------------------------------+ | type | list-maven-incremental | | description | Incremental update of a Maven repository instance | | backend_name | swh.lister.maven.tasks.IncrementalMavenLister | | default_interval | 1 day | | min_interval | 1 day | | max_interval | 1 day | | backoff_factor | 1 | | max_queue_length | (null) | | num_retries | (null) | | retry_delay | (null) | +-[ RECORD 3 ]-----+---------------------------------------------------+ | type | load-jar-file | | description | Load jar's artifacts. | | backend_name | swh.loader.package.maven.tasks.LoadMaven | | default_interval | 1 day | | min_interval | 1 day | | max_interval | 1 day | | backoff_factor | 1 | | max_queue_length | (null) | | num_retries | (null) | | retry_delay | (null) | +------------------+---------------------------------------------------+ Time: 0.338 ms
-
Looks like the necessary maven stack tasks are installed appropriately
-
Triggering the listing
$ swh-doco exec swh-scheduler swh scheduler task add list-maven-full url=https://repo1.maven.org/maven2/ index_url=http://swh-lister-maven-nginx/export.fld + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-scheduler swh scheduler task add list-maven-full url=https://repo1.maven.org/maven2/ index_url=http://swh-lister-maven-nginx/export.fld Created 1 tasks Task 29197997 Next run: today (2022-01-24T09:07:22.676425+00:00) Interval: 90 days, 0:00:00 Type: list-maven-full Policy: recurring Args: Keyword args: index_url: 'http://swh-lister-maven-nginx/export.fld' url: 'https://repo1.maven.org/maven2/' $ 10:08:16 swh-scheduler@localhost:5433=# select * from task where type like 'list-maven-full'; select * from task where type like 'list-maven-full'; +-[ RECORD 1 ]-----+-----------------------------------------------------------------------------------------------------------------------------+ | id | 29197997 | | type | list-maven-full | | arguments | {"args": [], "kwargs": {"url": "https://repo1.maven.org/maven2/", "index_url": "http://swh-lister-maven-nginx/export.fld"}} | | next_run | 2022-01-24 09:07:22.676425+00 | | current_interval | 90 days | | status | next_run_scheduled | | policy | recurring | | retries_left | 0 | | priority | (null) | +------------------+-----------------------------------------------------------------------------------------------------------------------------+ Time: 0.342 ms
- The lister task is scheduled but the lister does not consume the message...
while true; do swh-doco logs -f swh-scheduler-runner; sleep 5; done | grep -i maven + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml logs -f swh-scheduler-runner swh-scheduler-runner_1 | INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks list-maven-full swh-scheduler-runner_1 | INFO:swh.scheduler.celery_backend.runner:Grabbed 1 tasks list-maven-full
- But so far nothing happens in the lister container...
swh-lister_1 | -------------- listers@e1eb28c3ee72 v5.1.2 (sun-harmonics) swh-lister_1 | --- ***** ----- swh-lister_1 | -- ******* ---- Linux-5.10.0-10-amd64-x86_64-with-debian-11.2 2022-01-24 09:55:28 swh-lister_1 | - *** --- * --- swh-lister_1 | - ** ---------- [config] swh-lister_1 | - ** ---------- .> app: __main__:0x7f5ac155b4d0 swh-lister_1 | - ** ---------- .> transport: amqp://guest:**@amqp:5672// swh-lister_1 | - ** ---------- .> results: disabled:// swh-lister_1 | - *** --- * --- .> concurrency: 1 (prefork) swh-lister_1 | -- ******* ---- .> task events: ON swh-lister_1 | --- ***** ----- swh-lister_1 | -------------- [queues] swh-lister_1 | .> celery exchange=celery(direct) key=celery swh-lister_1 | .> swh.lister.bitbucket.tasks.FullBitBucketRelister exchange=swh.lister.bitbucket.tasks.FullBitBucketRelister(direct) key=swh.lister.bitbucket.tasks.FullBitBucketRelister swh-lister_1 | .> swh.lister.bitbucket.tasks.IncrementalBitBucketLister exchange=swh.lister.bitbucket.tasks.IncrementalBitBucketLister(direct) key=swh.lister.bitbucket.tasks.IncrementalBitBucketLister swh-lister_1 | .> swh.lister.bitbucket.tasks.RangeBitBucketLister exchange=swh.lister.bitbucket.tasks.RangeBitBucketLister(direct) key=swh.lister.bitbucket.tasks.RangeBitBucketLister swh-lister_1 | .> swh.lister.cgit.tasks.CGitListerTask exchange=swh.lister.cgit.tasks.CGitListerTask(direct) key=swh.lister.cgit.tasks.CGitListerTask swh-lister_1 | .> swh.lister.cran.tasks.CRANListerTask exchange=swh.lister.cran.tasks.CRANListerTask(direct) key=swh.lister.cran.tasks.CRANListerTask swh-lister_1 | .> swh.lister.debian.tasks.DebianListerTask exchange=swh.lister.debian.tasks.DebianListerTask(direct) key=swh.lister.debian.tasks.DebianListerTask swh-lister_1 | .> swh.lister.gitea.tasks.FullGiteaRelister exchange=swh.lister.gitea.tasks.FullGiteaRelister(direct) key=swh.lister.gitea.tasks.FullGiteaRelister swh-lister_1 | .> swh.lister.gitea.tasks.IncrementalGiteaLister exchange=swh.lister.gitea.tasks.IncrementalGiteaLister(direct) key=swh.lister.gitea.tasks.IncrementalGiteaLister swh-lister_1 | .> swh.lister.gitea.tasks.RangeGiteaLister exchange=swh.lister.gitea.tasks.RangeGiteaLister(direct) key=swh.lister.gitea.tasks.RangeGiteaLister swh-lister_1 | .> swh.lister.github.tasks.FullGitHubRelister exchange=swh.lister.github.tasks.FullGitHubRelister(direct) key=swh.lister.github.tasks.FullGitHubRelister swh-lister_1 | .> swh.lister.github.tasks.IncrementalGitHubLister exchange=swh.lister.github.tasks.IncrementalGitHubLister(direct) key=swh.lister.github.tasks.IncrementalGitHubLister swh-lister_1 | .> swh.lister.github.tasks.RangeGitHubLister exchange=swh.lister.github.tasks.RangeGitHubLister(direct) key=swh.lister.github.tasks.RangeGitHubLister swh-lister_1 | .> swh.lister.gitlab.tasks.FullGitLabRelister exchange=swh.lister.gitlab.tasks.FullGitLabRelister(direct) key=swh.lister.gitlab.tasks.FullGitLabRelister swh-lister_1 | .> swh.lister.gitlab.tasks.IncrementalGitLabLister exchange=swh.lister.gitlab.tasks.IncrementalGitLabLister(direct) key=swh.lister.gitlab.tasks.IncrementalGitLabLister swh-lister_1 | .> swh.lister.gitlab.tasks.RangeGitLabLister exchange=swh.lister.gitlab.tasks.RangeGitLabLister(direct) key=swh.lister.gitlab.tasks.RangeGitLabLister swh-lister_1 | .> swh.lister.gnu.tasks.GNUListerTask exchange=swh.lister.gnu.tasks.GNUListerTask(direct) key=swh.lister.gnu.tasks.GNUListerTask swh-lister_1 | .> swh.lister.launchpad.tasks.FullLaunchpadLister exchange=swh.lister.launchpad.tasks.FullLaunchpadLister(direct) key=swh.lister.launchpad.tasks.FullLaunchpadLister swh-lister_1 | .> swh.lister.launchpad.tasks.IncrementalLaunchpadLister exchange=swh.lister.launchpad.tasks.IncrementalLaunchpadLister(direct) key=swh.lister.launchpad.tasks.IncrementalLaunchpadLister swh-lister_1 | .> swh.lister.npm.tasks.NpmIncrementalListerTask exchange=swh.lister.npm.tasks.NpmIncrementalListerTask(direct) key=swh.lister.npm.tasks.NpmIncrementalListerTask swh-lister_1 | .> swh.lister.npm.tasks.NpmListerTask exchange=swh.lister.npm.tasks.NpmListerTask(direct) key=swh.lister.npm.tasks.NpmListerTask swh-lister_1 | .> swh.lister.opam.tasks.OpamListerTask exchange=swh.lister.opam.tasks.OpamListerTask(direct) key=swh.lister.opam.tasks.OpamListerTask swh-lister_1 | .> swh.lister.packagist.tasks.PackagistListerTask exchange=swh.lister.packagist.tasks.PackagistListerTask(direct) key=swh.lister.packagist.tasks.PackagistListerTask swh-lister_1 | .> swh.lister.phabricator.tasks.FullPhabricatorLister exchange=swh.lister.phabricator.tasks.FullPhabricatorLister(direct) key=swh.lister.phabricator.tasks.FullPhabricatorLister swh-lister_1 | .> swh.lister.phabricator.tasks.IncrementalPhabricatorLister exchange=swh.lister.phabricator.tasks.IncrementalPhabricatorLister(direct) key=swh.lister.phabricator.tasks.IncrementalPhabricatorLister swh-lister_1 | .> swh.lister.pypi.tasks.PyPIListerTask exchange=swh.lister.pypi.tasks.PyPIListerTask(direct) key=swh.lister.pypi.tasks.PyPIListerTask swh-lister_1 | swh-lister_1 | [tasks] swh-lister_1 | . swh.deposit.loader.tasks.ChecksDepositTsk swh-lister_1 | . swh.lister.bitbucket.tasks.FullBitBucketRelister swh-lister_1 | . swh.lister.bitbucket.tasks.IncrementalBitBucketLister swh-lister_1 | . swh.lister.bitbucket.tasks.ping swh-lister_1 | . swh.lister.cgit.tasks.CGitListerTask swh-lister_1 | . swh.lister.cgit.tasks.ping swh-lister_1 | . swh.lister.cran.tasks.CRANListerTask swh-lister_1 | . swh.lister.cran.tasks.ping swh-lister_1 | . swh.lister.debian.tasks.DebianListerTask swh-lister_1 | . swh.lister.debian.tasks.ping swh-lister_1 | . swh.lister.gitea.tasks.FullGiteaRelister swh-lister_1 | . swh.lister.gitea.tasks.ping swh-lister_1 | . swh.lister.github.tasks.FullGitHubRelister swh-lister_1 | . swh.lister.github.tasks.IncrementalGitHubLister swh-lister_1 | . swh.lister.github.tasks.RangeGitHubLister swh-lister_1 | . swh.lister.github.tasks.ping swh-lister_1 | . swh.lister.gitlab.tasks.FullGitLabRelister swh-lister_1 | . swh.lister.gitlab.tasks.IncrementalGitLabLister swh-lister_1 | . swh.lister.gitlab.tasks.ping swh-lister_1 | . swh.lister.gnu.tasks.GNUListerTask swh-lister_1 | . swh.lister.gnu.tasks.ping swh-lister_1 | . swh.lister.launchpad.tasks.FullLaunchpadLister swh-lister_1 | . swh.lister.launchpad.tasks.IncrementalLaunchpadLister swh-lister_1 | . swh.lister.launchpad.tasks.ping swh-lister_1 | . swh.lister.maven.tasks.FullMavenLister swh-lister_1 | . swh.lister.maven.tasks.IncrementalMavenLister swh-lister_1 | . swh.lister.maven.tasks.ping swh-lister_1 | . swh.lister.npm.tasks.NpmIncrementalListerTask swh-lister_1 | . swh.lister.npm.tasks.NpmListerTask swh-lister_1 | . swh.lister.npm.tasks.ping swh-lister_1 | . swh.lister.opam.tasks.OpamListerTask swh-lister_1 | . swh.lister.opam.tasks.ping swh-lister_1 | . swh.lister.packagist.tasks.PackagistListerTask swh-lister_1 | . swh.lister.packagist.tasks.ping swh-lister_1 | . swh.lister.phabricator.tasks.FullPhabricatorLister swh-lister_1 | . swh.lister.phabricator.tasks.ping swh-lister_1 | . swh.lister.pypi.tasks.PyPIListerTask swh-lister_1 | . swh.lister.pypi.tasks.ping swh-lister_1 | . swh.lister.sourceforge.tasks.FullSourceForgeLister swh-lister_1 | . swh.lister.sourceforge.tasks.IncrementalSourceForgeLister swh-lister_1 | . swh.lister.sourceforge.tasks.ping swh-lister_1 | . swh.lister.tuleap.tasks.FullTuleapLister swh-lister_1 | . swh.lister.tuleap.tasks.ping swh-lister_1 | . swh.loader.git.tasks.LoadDiskGitRepository swh-lister_1 | . swh.loader.git.tasks.UncompressAndLoadDiskGitRepository swh-lister_1 | . swh.loader.git.tasks.UpdateGitRepository swh-lister_1 | . swh.loader.mercurial.tasks.LoadArchiveMercurial swh-lister_1 | . swh.loader.mercurial.tasks.LoadMercurial swh-lister_1 | . swh.loader.package.archive.tasks.LoadArchive swh-lister_1 | . swh.loader.package.cran.tasks.LoadCRAN swh-lister_1 | . swh.loader.package.debian.tasks.LoadDebian swh-lister_1 | . swh.loader.package.deposit.tasks.LoadDeposit swh-lister_1 | . swh.loader.package.maven.tasks.LoadMaven swh-lister_1 | . swh.loader.package.nixguix.tasks.LoadNixguix swh-lister_1 | . swh.loader.package.npm.tasks.LoadNpm swh-lister_1 | . swh.loader.package.opam.tasks.LoadOpam swh-lister_1 | . swh.loader.package.pypi.tasks.LoadPyPI swh-lister_1 | . swh.loader.svn.tasks.DumpMountAndLoadSvnRepository swh-lister_1 | . swh.loader.svn.tasks.LoadSvnRepository swh-lister_1 | . swh.loader.svn.tasks.MountAndLoadSvnRepository swh-lister_1 | swh-lister_1 | [2022-01-24 09:55:28,235: INFO/MainProcess] Connected to amqp://guest:**@amqp:5672// swh-lister_1 | [2022-01-24 09:55:28,244: INFO/MainProcess] mingle: searching for neighbors swh-lister_1 | [2022-01-24 09:55:29,307: INFO/MainProcess] mingle: sync with 3 nodes swh-lister_1 | [2022-01-24 09:55:29,308: INFO/MainProcess] mingle: sync complete swh-lister_1 | [2022-01-24 09:55:29,451: INFO/MainProcess] listers@e1eb28c3ee72 ready. swh-lister_1 | [2022-01-24 09:55:29,795: INFO/MainProcess] sync with loader-opam@f19be95a78ef swh-lister_1 | [2022-01-24 09:55:49,564: INFO/MainProcess] sync with loader@728cc9d94e1c
So that does indeed needs unstucking ^
- Antoine R. Dumont changed the description
changed the description
- Author Owner
Connecting to the rabbitmq admin page [1], we can see the messages in the ready state (so not consumed indeed).
|------------------------------------------+----------+-------+-------+---------+-------+----------+---------------+-----| | Name | Features | State | Ready | Unacked | Total | incoming | deliver / get | ack | |------------------------------------------+----------+-------+-------+---------+-------+----------+---------------+-----| | swh.lister.maven.tasks.FullMavenLister | D | idle | 2 | 0 | 2 | 0.00/s | | | | swh.loader.package.maven.tasks.LoadMaven | D | idle | 0 | 0 | 0 | | | | |------------------------------------------+----------+-------+-------+---------+-------+----------+---------------+-----|
Note: i've tampered with the db to reschedule the listing (hence the 2 messages [2])
11:24:02 swh-scheduler@localhost:5433=# update task set status='next_run_not_scheduled', next_run=now() where type = 'list-maven-full'; update task set status='next_run_not_scheduled', next_run=now() where type = 'list-maven-full'; UPDATE 1 Time: 2.245 ms
- Phabricator Migration user mentioned in commit swh/devel/swh-environment@1f5c1393
mentioned in commit swh/devel/swh-environment@1f5c1393
- Author Owner
After D7023, the scheduling happens:
swh-lister_1 | [2022-01-24 10:53:48,054: INFO/ForkPoolWorker-1] Downloading text index from http://swh-lister-maven-nginx/export.fld. swh-lister_1 | [2022-01-24 10:53:48,068: INFO/ForkPoolWorker-1] Found 2 poms. swh-lister_1 | [2022-01-24 10:53:48,068: INFO/ForkPoolWorker-1] Fetching poms.. swh-lister_1 | [2022-01-24 10:53:48,068: INFO/ForkPoolWorker-1] Fetching URL https://repo1.maven.org/maven2/al/aldi/sprova4j/0.1.0/sprova4j-0.1.0.pom with params {} swh-lister_1 | [2022-01-24 10:53:48,206: INFO/ForkPoolWorker-1] Fetching URL https://repo1.maven.org/maven2/al/aldi/sprova4j/0.1.1/sprova4j-0.1.1.pom with params {} swh-lister_1 | [2022-01-24 10:53:48,310: INFO/ForkPoolWorker-1] Task swh.lister.maven.tasks.FullMavenLister[f88b25c8-7237-4887-948b-11371958b0cc] succeeded in 0.2693615449825302s: {'pages': 4, 'origins': 4}
- Antoine R. Dumont changed the description
changed the description
Thanks a lot for the advances made @ardumont If I understand correctly, we're missing some jar entries in the exported maven repositories. I'll figure that out and add some tonight.
- Author Owner
Thanks a lot for the advances made @ardumont
sure
If I understand correctly, we're missing some jar entries in the exported maven repositories. I'll figure that out and add some tonight.
Well, the output of the lister says "{'pages': 4, 'origins': 4}" but i see only 3 tasks in scheduler-db (table listed_origins). So, yes, there is some discrepancy there.
Good news though, we see some 'maven' origins resulting from that listing [1].
Still, there is something fishy about:
- a visit_type 'https' (record 3 below [1]).
- missing last_update entries in the task (I recalled we discussed and fixes those in the related diffs)
At last, another issue to understand is that there is no scheduling of the outputted tasks (visit_type 'maven'). Probably some discrepancy in the configuration again, hopefully.
- [1]
14:19:26 swh-scheduler@localhost:5433=# select visit_type, count(*) from listed_origins group by visit_type having visit_type in ('maven', 'https'); select visit_type, count(*) from listed_origins group by visit_type having visit_type in ('maven', 'https'); +------------+-------+ | visit_type | count | +------------+-------+ | https | 1 | | maven | 2 | +------------+-------+ (2 rows) Time: 1.268 ms 14:20:46 swh-scheduler@localhost:5433=# select * from listed_origins where visit_type in ('maven', 'https'); select * from listed_origins where visit_type in ('maven', 'https'); +-[ RECORD 1 ]-----------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | lister_id | 31fd49a6-396e-473e-8522-ed105b48b5d9 | | url | https://repo1.maven.org/maven2/al/aldi/sprova4j/0.1.0/sprova4j-0.1.0-sources.jar | | visit_type | maven | | extra_loader_arguments | {"artifacts": [{"aid": "sprova4j", "gid": "al.aldi", "time": 1626109619335, "version": "0.1.0", "base_url": "https://repo1.maven.org/maven2/"}]} | | enabled | t | | first_seen | 2022-01-24 10:53:48.060195+00 | | last_seen | 2022-01-24 13:17:18.01843+00 | | last_update | (null) | +-[ RECORD 2 ]-----------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | lister_id | 31fd49a6-396e-473e-8522-ed105b48b5d9 | | url | https://repo1.maven.org/maven2/al/aldi/sprova4j/0.1.1/sprova4j-0.1.1-sources.jar | | visit_type | maven | | extra_loader_arguments | {"artifacts": [{"aid": "sprova4j", "gid": "al.aldi", "time": 1626111425534, "version": "0.1.1", "base_url": "https://repo1.maven.org/maven2/"}]} | | enabled | t | | first_seen | 2022-01-24 10:53:48.065399+00 | | last_seen | 2022-01-24 13:17:18.024056+00 | | last_update | (null) | +-[ RECORD 3 ]-----------+--------------------------------------------------------------------------------------------------------------------------------------------------+ | lister_id | 31fd49a6-396e-473e-8522-ed105b48b5d9 | | url | //github.com/aldialimucaj/sprova4j.git | | visit_type | https | | extra_loader_arguments | {} | | enabled | t | | first_seen | 2022-01-24 10:53:48.204886+00 | | last_seen | 2022-01-24 13:17:18.085108+00 | | last_update | (null) | +------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ Time: 0.992 ms
- [2] (Optionally) My local setup (if you wanna try to reproduce using):
$ cat $SWH_ENVIRONMENT_HOME/docker/docker-compose.override.yml: version: '2' services: swh-scheduler-db: ports: - "5433:5432" $ grep -A4 swh-scheduler-dev ~/.pg_service.conf [swh-scheduler-dev] dbname=swh-scheduler host=localhost port=5433 user=postgres # all ^ this so that you can connect to the docker db without having to run docker # command $ grep "swh-scheduler" ~/.pgpass | grep 5433 *:5433:swh-scheduler:postgres:testpassword $ psql service=swh-scheduler-dev 14:24:57 swh-scheduler@localhost:5433=# \conninfo You are connected to database "swh-scheduler" as user "postgres" on host "localhost" (address "127.0.0.1") at port "5433". ...
- Author Owner
I've opened D7025 to improve some logging instructions and add what was missing.
Might be the erroneous url 'https' in the record 3 earlier ^ is linked to the url starting with "scm:https://github.com..." or something [1]?
- [1]
swh-lister_1 | [2022-01-24 13:31:06,210: INFO/ForkPoolWorker-1] * Yielding pom https://repo1.maven.org/maven2/al/aldi/sprova4j/0.1.0/sprova4j-0.1.0.pom: {'type': 'scm', 'doc': 1, 'url': 'scm:https://github.com/aldialimucaj/sprova4j.git', 'project': 'al.aldi.sprova4j'}
The fix works on my setup, thanks again. :-)
! In #3746 (closed), @ardumont wrote: Well, the output of the lister says "{'pages': 4, 'origins': 4}" but i see only 3 tasks in scheduler-db (table listed_origins). So, yes, there is some discrepancy there.
That sounds ok to me: scm entries are deduplicated. 2 pages corresponding to the same scm entry (i.e. with the same scm url) will produce only one output. In this sample we should indeed have 2 jars and 1 scm.
Still, there is something fishy about:
- a visit_type 'https' (record 3 below [1]). that is fishy. I'm investigating.
- missing last_update entries in the task (I recalled we discussed and fixes those in the related diffs) That's fishy too. Investigating.
On my side the lister also fails on a 404 exception, which should not happen. Don't know why it's become so fragile suddenly.
To be continued. Thanks for your time ardumont.
- Author Owner
That sounds ok to me: scm entries are deduplicated. 2 pages corresponding to the same scm entry (i.e. with the same scm url) will produce only one output. In this sample we should indeed have 2 jars and 1 scm.
right.
! In #3746 (closed), @borisbaldassari wrote: Still, there is something fishy about:
- a visit_type 'https' (record 3 below [1]). that is fishy. I'm investigating.
Ok, got it: unluckily this one string is viciously malformed (it's wrong, but fits the regexp anyway), and I had not considered this very case. Looking at the bright side, this shows a tricky bug that will be fixed.
Is there a way to list all available loaders from inside the lister? So we could dynamically check that the yielded scm string will be safely ingested by the loaders available on the host. It's the safest fix I can foresee.