Ingest Aur repository (Arch User Repository)
- swh/devel/swh-lister!285 (closed): Implement lister
- swh/devel/swh-loader-core!306 (closed), swh/devel/swh-loader-core!310 (closed): Implement loader
- #4466 Lister run in docker
- #4466 Loader run in docker
- swh/devel/swh-lister!285 (closed): Document lister
- swh/devel/swh-loader-core!306 (closed): Document loader
- swh/infra/sysadm-environment#5061 (closed): Deploy on staging
- Call for public review
- Deploy on production
Migrated from T4466 (view on Phabricator)
- Show closed items
- Task#4992Extend archive coverage [Roadmap - Collect]
- Task#4993Extend archive coverage [Roadmap - Collect]
- swh/devel/swh-loader-core !310
- swh/devel/swh-loader-core !306
- swh/devel/swh-lister !285
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Franck Bret added AUR lister AUR loader Archive coverage labels
added AUR lister AUR loader Archive coverage labels
- Author
AUR Lister runs in Docker report
Aur Lister runs fine in Docker, quite long (+/- 30 minutes) to list origins.
Found 78702 AUR packages in aur_index Successfully removed /tmp/aur_archive directory Task swh.lister.aur.tasks.AurListerTask[a7ed0b48-3d3b-4aad-b158-6d888ff9aab5] succeeded in 1619.0577569839952s: {'pages': 78702, 'origins': 78702} swh-scheduler=# select count(*) from listed_origins where visit_type = 'aur'; count ------- 78702
- Franck Bret changed the description
changed the description
- Author
Aur Loader runs in Docker report
Aur Loader runs in Docker but I don't get why It loads origins after the lister has completed (I.e I've not run origin scheduled next aur qty)
For now it looks good and is quite fast because the packages it download are very small. It grabs +/- 25000 origins in an hour without errors:
swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='successful';
count27057 (1 row)
swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='failed';
count0
(1 row)
- vlorentz added priority:Normal label
added priority:Normal label
- Author
I've made a complete run on docker
Lister:
2022-08-30 10:31:30,328: INFO/ForkPoolWorker-1] Task swh.lister.aur.tasks.AurListerTask[a24d7a3d-81ea-4ef9-90e7-e9cad8a3ffec] succeeded in 946.656092988007s: {'pages': 78803, 'origins': 78803} swh-scheduler=# select count(*) from listed_origins where visit_type='aur'; -[ RECORD 1 ] count | 78803
Loader (It takes between 2 and 3 hours to complete loading everything):
swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='successful'; -[ RECORD 1 ] count | 78799 swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='failed'; -[ RECORD 1 ] count | 4 swh-scheduler=# select count(*) from origin_visit_stats where visit_type='aur' and last_visit_status='not_found'; -[ RECORD 1 ] count | 0
- Franck Bret marked the checklist item Loader run in docker as completed
marked the checklist item Loader run in docker as completed
- Franck Bret changed the description
changed the description
- Benoit Chauvet changed milestone to %Extend archive coverage [Roadmap - Collect]
changed milestone to %Extend archive coverage [Roadmap - Collect]
- Benoit Chauvet removed Archive coverage label
removed Archive coverage label
- Benoit Chauvet added #4991 as child task
added #4991 as child task
- Benoit Chauvet added #4992 as child task
added #4992 as child task
- Benoit Chauvet added #4993 as child task
added #4993 as child task
- Benoit Chauvet added ll-status::in-progress label
added ll-status::in-progress label
I just noticed that origin URLs look like this: https://aur.archlinux.org/hg-evolve.git, which doesn't point to anything. They should probably be like this instead: https://aur.archlinux.org/packages/hg-evolve
- Antoine R. Dumont added ll:ready-for-staging label and removed ll-status::in-progress label
added ll:ready-for-staging label and removed ll-status::in-progress label
- Antoine R. Dumont mentioned in issue swh/infra/sysadm-environment#5061 (closed)
mentioned in issue swh/infra/sysadm-environment#5061 (closed)
- Antoine R. Dumont changed the description
changed the description
- Antoine R. Dumont removed child task #4991
removed child task #4991
- Antoine R. Dumont added ll-status::staging label and removed ll:ready-for-staging label
added ll-status::staging label and removed ll:ready-for-staging label
- Antoine R. Dumont marked the checklist item swh/infra/sysadm-environment#5061 (closed): Deploy on staging as completed
marked the checklist item swh/infra/sysadm-environment#5061 (closed): Deploy on staging as completed
- Antoine R. Dumont added ll:ready-for-public-review label and removed ll-status::staging label
added ll:ready-for-public-review label and removed ll-status::staging label