GNU Lister
It is the implementation of GNU Lister, this lister download the tree.json.gz file from https://ftp.gnu.org/tree.json.gz, reades its json content and returns the origin of repos py parsing over the json data.
Related T1722
Migrated from D1482 (view on Phabricator)
Merge request reports
Activity
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/119/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/119/console
Related T1351
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/120/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/120/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/121/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/121/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/122/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/122/console
I sense some communication gap, I need to state more clearly what I am thinking of doing so that you can help me more effectively. Please correct me if am I wrong somewhere or there is a better method.
This gnu Lister will create origins like
[ { 'name': 8sync, 'url' : https://ftp.gnu.org/gnu/8sync/, time_last_upated: 184692748, origin: gnu, }, { 'name': emacs, 'url' : https://ftp.gnu.org/gnu/emacs/, time_last_upated: 184692948, origin: gnu, } ]
(This is only a sample not real )
And these will be passed to loaders
Now it is the duty of loader to look into all the files which are present in the listed folder.ie loader will look into all the files and folders recursively that are present in https://ftp.gnu.org/gnu/emacs/ to ensure all the subdirectories are covered and then create a snapshot of only those which have .tar as extension.
Now as I understood task_dic() function is used to schedule the task for loader, hence for dedicated gnu loader it would have to return in the constraint that is there in dedicated gnu loader( which is not build by now)
The way I am doing it sounds good?
! In !369 (closed), @nahimilega wrote: I sense some communication gap, I need to state more clearly what I am thinking of doing so that you can help me more effectively.
I've opened a task for the gnu loader so we can discuss there.
Please correct me if am I wrong somewhere or there is a better method.
This gnu Lister will create origins like
[ { 'name': 8sync, 'url' : https://ftp.gnu.org/gnu/8sync/, time_last_upated: 184692748, origin: gnu, }, { 'name': emacs, 'url' : https://ftp.gnu.org/gnu/emacs/, time_last_upated: 184692948, origin: gnu, } ]
(This is only a sample not real )
yes, something along those lines.
What you described is more a scheduler task (which includes the origin-url) but i'm rather mostly ok ;)
And these will be passed to loaders
Not loaders, only 1 loader, the gnu one.
I expect we list only tarballs from the gnu mirror.
Now it is the duty of loader to look into all the files which are present in the listed folder.ie loader will look into all the files and folders recursively that are present in https://ftp.gnu.org/gnu/emacs/ to ensure all the subdirectories are covered and then create a snapshot of only those which have .tar as extension.
As an implementation detail, i'm not sure whether the loader actually do some listing itself or if it's using the output of the lister. The lister after all has access to the folder listing (coming from the tree.json.gz file).
Now as I understood task_dic() function is used to schedule the task for loader, hence for dedicated gnu loader it would have to return in the constraint that is there in dedicated gnu loader( which is not build by now)
right
The way I am doing it sounds good?
Yes, it does.
I think, in the end, we were thinking along the same lines.
Great.
And these will be passed to loaders
Not loaders, only 1 loader, the gnu one.
Sorry that extra 's' was again a typo
I expect we list only tarballs from the gnu mirror. As an implementation detail, i'm not sure whether the loader actually do some listing itself or if it's using the output of the lister. The lister after all has access to the folder listing (coming from the tree.json.gz file).
To list only the tarballs from the gnu mirror, we need to change the lister a bit. We need to recursively check for all the tarball in all the directories and sub-directories The new scheduler task will look more like this -
[ { 'name': 8sync, 'url' : https://ftp.gnu.org/gnu/8sync/8sync-0.3.0.tar.gz, time_last_upated: 184692748, origin: gnu, }, { 'name': emacs, 'url' : https://ftp.gnu.org/gnu/emacs/windows/emacs-24/emacs-24.1-bin-i386.zip, time_last_upated: 184692948, origin: gnu, }, { 'name': emacs, 'url' : https://ftp.gnu.org/gnu/emacs/windows/libXpm-3.5.7-w32-src.zip time_last_upated: 184692948, origin: gnu, } ]
Now the loader just have to create the snapshot of the tarballs present on these links which are listed by lister.
Shall I make appropriate changes to enable lister list according to the new scheduler task(which is mentioned above)?
And Shall I create a task stating request to shift loader tar to core ?
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/134/ for more details.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/143/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/143/console
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/144/ for more details.
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/145/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/145/console
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/146/ for more details.
To check the working of the code, I made a separate script using the same code like that in this lister with the functions which are particular to GNU lister like find_all_tarball and list_packages etc . I ran the script to ensure the working the algorithm I am using. It worked fine. Here is the complete output - https://forge.softwareheritage.org/swh/meta$405
Here is the output which is more readable by humans but does not contain all the data as I used it to check if all the tarball urls are listed properly- https://forge.softwareheritage.org/swh/meta$406
mentioned in merge request !370 (closed)
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/148/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/154/ for more details.
Thanks for the update.
Can you please:
-
try to reduce the samples though. There are too much data (painful to review and to maintain). I know the initial sample on other listers are also big but that's on us. Let's not continue towards that.
-
rework your commits (squashing some). I see one with the message
IDK
(which means to me "i don't know" and well that's not descriptive enough)
Cheers,
-
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DLS/job/tox/158/ See console output for more information: https://jenkins.softwareheritage.org/job/DLS/job/tox/158/console
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/159/ for more details.
Oh yeah, also, come to think of it, i changed how the credentials (for lister's rate limit policy) is read from configuration recently (!65 (closed))
- you'll need to rebase all your lister diffs to the latest master
- and add a
self.instance
variable on those listers. For the lister whose instance does not change (e.g. check the github one), you can add directly in the class, for those where that changes (e.g. check the gitlab one), that must be initialized in the constructor.
And thanks for the git commits rework, nice work!
Cheers,
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/160/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/161/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/162/ for more details.
Build is green See https://jenkins.softwareheritage.org/job/DLS/job/tox/164/ for more details.