Arch User Repository (AUR) lister
Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
Migrated from D8033 (view on Phabricator)
Merge request reports
Activity
Build has FAILED
Patch application report for D8033 (id=28931)
Rebasing onto 1bf11aa2...
Current branch diff-target is up to date.
Changes applied before test
commit 0238f136b5df5e684b393301d01319ee29abf423 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 [WIP] Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/549/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/549/console
Updating !285 (closed): [WIP] Arch User Repository (AUR) lister
Fix issue with 'last_modified' date timezone adding timezone.utc offset.
Build is green
Patch application report for D8033 (id=28933)
Rebasing onto 1bf11aa2...
Current branch diff-target is up to date.
Changes applied before test
commit e91114aca59c17fb4b7e48028ac1687df580aa6d Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 [WIP] Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/550/ for more details.
@ardumont @vlorentz Hi, here is a first implementation of the Arch User Repository (AUR).
It's about 83841 packages today, https://aur.archlinux.org/packages. I get the packages list through https://aur.archlinux.org/packages-meta-ext-v1.json.gz wich is about 6,6Mb gzip file.
There is an rpc named Aurweb RPC interface, see https://wiki.archlinux.org/title/Aurweb_RPC_interface but its recommended to download a json.gz file which contains some metadata for each packages. See https://lists.archlinux.org/pipermail/aur-general/2021-November/036659.html
The main difference from Arch linux is that the end user builds its own packages with the help of makepkg + pacman. See https://wiki.archlinux.org/title/Arch_User_Repository for more details. It mainly stands on git repositories containing PKGBUILD file and .PKGINFO, see https://aur.archlinux.org/cgit/aur.git/log/?h=hg-evolve for an example.
There is no real direct way for the lister to discover where to download oldest versions of a package. There is a canonical url for each package in its page description but its the latest snapshot url, no way to know which version it is when downloading from this link.
For now I did not implement scrapping or whatever to get oldest versions of a package because I did not find a common way to do that. There is a git log on package page but messages are obviously inconsistent ( https://aur.archlinux.org/cgit/aur.git/log/?h=hg-evolve ).
Maybe we can play with git to found some kind of version diff history in .SRCINFO file as I've tried before for Arch PKGINFO file, but with this amount of repositories it will probably take a lot of time and resources. The .SRCINFO contains generally a link to the 'real' package corresponding version. See https://aur.archlinux.org/cgit/aur.git/tree/.SRCINFO?h=hg-evolve
What do you think? Is it ok to say that the origin is a git repository url in this case? If yes, will the loader really be a Package loader or VCS git loader?
! In !285 (closed), @franckbret wrote: There is no real direct way for the lister to discover where to download oldest versions of a package. There is a canonical url for each package in its page description but its the latest snapshot url, no way to know which version it is when downloading from this link.
If they are not directly available, then it doesn't make sense to have them in snapshots anyway. We'll just have successive visits of the loader as history.
For now I did not implement scrapping or whatever to get oldest versions of a package because I did not find a common way to do that. There is a git log on package page but messages are obviously inconsistent ( https://aur.archlinux.org/cgit/aur.git/log/?h=hg-evolve ).
Maybe we can play with git to found some kind of version diff history in .SRCINFO file as I've tried before for Arch PKGINFO file, but with this amount of repositories it will probably take a lot of time and resources. The .SRCINFO contains generally a link to the 'real' package corresponding version. See https://aur.archlinux.org/cgit/aur.git/tree/.SRCINFO?h=hg-evolve
What do you think? Is it ok to say that the origin is a git repository url in this case? If yes, will the loader really be a Package loader or VCS git loader?
clearly git loader, as there can be branches and whatnot; history also matters here.
However, the main content of these repositories is the PKGBUILD, which (among other things) fetches the code from somewhere else (tarball, git commit, ...), and the PKGBUILD alone is not very useful without that code. Therefore, it looks like we should implement something like swh-loader-git#3923, to fetch the actual code.
Updating !285 (closed): [WIP] Arch User Repository (AUR) lister
Fix an issue with requests usage when downloading packages archives ensuring it does not decode the binary directly. (tests and CI were fine, but got the bug while testing runner on docker)
Build is green
Patch application report for D8033 (id=28971)
Rebasing onto 1bf11aa2...
Current branch diff-target is up to date.
Changes applied before test
commit 4729b0aae165aab640677adefc2f6ceb90072bcd Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/551/ for more details.
! In !285 (closed), @franckbret wrote: but got the bug while testing runner on docker
what was the bug, exactly? I don't see how the new code changes the behavior.
Also, please remove
Updating !285: [WIP] Arch User Repository (AUR) lister
from change descriptions; it's redundant and makes the "History" tab of https://forge.softwareheritage.org/!285#toc less readable! In !285 (closed), @vlorentz wrote:
! In !285 (closed), @franckbret wrote: but got the bug while testing runner on docker
what was the bug, exactly? I don't see how the new code changes the behavior.
Also, please remove
Updating !285: [WIP] Arch User Repository (AUR) lister
from change descriptions; it's redundant and makes the "History" tab of https://forge.softwareheritage.org/!285#toc less readable@vlorentz The change force requests.get to not automatically decompress the response. This way we are sure we always get the raw archive as is. See https://requests.readthedocs.io/en/latest/user/quickstart/#raw-response-content
@franckbret ping on earlier comments
Build is green
Patch application report for D8033 (id=29383)
Rebasing onto 1bf11aa2...
Current branch diff-target is up to date.
Changes applied before test
commit d4851a18c7bf97dcfffdb8362dcf4c6a720bde02 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/554/ for more details.
Build is green
Patch application report for D8033 (id=29732)
Rebasing onto cee6bcb5...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit a33fbf745785ebe179bce4fbbbbb52a17fabce65 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/586/ for more details.
Build is green
Patch application report for D8033 (id=29755)
Rebasing onto cee6bcb5...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 280be6e9530238a385843ee09ee5040ca0d60473 Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/587/ for more details.
Build is green
Patch application report for D8033 (id=29760)
Rebasing onto cee6bcb5...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit 3e0c54bf111c316ead8a6804060f3432d8f3209f Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
See https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/588/ for more details.
Build has FAILED
Patch application report for D8033 (id=29766)
Rebasing onto cee6bcb5...
First, rewinding head to replay your work on top of it... Applying: Arch User Repository (AUR) lister
Changes applied before test
commit aeb60cc266f0fdf11e8650c9f7710ec00cf2443f Author: Franck Bret <franck.bret@octobus.net> Date: Fri Jun 24 12:19:15 2022 +0200 Arch User Repository (AUR) lister Add 'aur' module to swh-lister with data fixtures and tests. For now, origin url are package vcs (Git) url.
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/589/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tests-on-diff/589/console