We want to ingest all Git repositories of the Eclipse foundation (and they want us to do that too!).
They are using cgit, and the full repo listing is here: https://git.eclipse.org/c/
It also comes with a "idle" column, which is a good substitute for an actual push feed.
The only annoying thing about that cgit listing is that it's paginated and html based, we might want to fix that and work with them to deploy the change (and push it upstream).
swhworker@worker0:~$ SWH_CONFIG_FILENAME=lister.yml swh lister run --lister cgit url=https://git.eclipse.org/c/ instance=eclipseWARNING:swh.lister.cgit.lister:Unexpected HTTP status code 500 on https://git.eclipse.org/c/osbp/org.eclipse.osbp.runtime.functionlibrary.validation.git/ # <- on their sideTraceback (most recent call last):...requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) # <- most probably the lister is too agressive
($928 for the full stacktrace)
Listed only 900 origins:
swh-scheduler=> select count(*) from listed_origins lo inner join listers l on lo.lister_id=l.id and l.name='cgit' and l.instance_name='eclipse'; count------- 900(1 row)
Thanks @ardumont for experimenting with this. The 500 seems normal: we need to tell Eclipse about us first, I'll put you in touch. So maybe it's still a no-brainer, and we just need to document the "contant the owner to get whitelisted" human step :-)
yes, it can happen and the lister is able to deal with it.
Listed only 900 origins:
By the way, after implementing swh/devel/swh-lister#2999 (closed), the test revealed that we did 2/3 of the listing
in the listing (900 origins out of 1340)..
So I think after swh/devel/swh-lister!394 (closed) is deployed, we should be able to list it in one request. We'll
now ending up with only 1 http request.
! In #376 (closed), @rdicosmo wrote:
Thanks @ardumont , that's great! If you think this does not need any more support on the Eclipse side, may you let them know?
softwareheritage-scheduler=> \conninfoYou are connected to database "softwareheritage-scheduler" as user "guest" on host "belvedere.internal.softwareheritage.org" (address "192.168.100.210") at port "5432".SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)softwareheritage-scheduler=> select count(*) from listed_origins where lister_id='7a775770-2b2f-4139-aacb-ad715c022b9d'; count------- 1340(1 row)
Note that does not mean this is or will be ingested anytime soon though.
We are still missing at least the one cog to actually schedule those listed origins.
Note that does not mean this is or will be ingested anytime soon though.
We are still missing at least the one cog to actually schedule those listed origins.
! In #376 (closed), @ardumont wrote:
Note that does not mean this is or will be ingested anytime soon though.
We are still missing at least the one cog to actually schedule those listed origins.