Hey @vlorentz, the sentry link isn't working (or maybe isn't publically accessible).
I tried using sudo docker-compose exec swh-loader swh loader run nixguix "https://guix.gnu.org/sources.json" on my self hosted swh instance and got the following error:
ERROR:swh.loader.package.loader:Failed to initialize origin_visit for https://guix.gnu.org/sources.jsonTraceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/package/loader.py", line 389, in load self.storage.origin_add([origin]) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 181, in meth_ return self.post(meth._endpoint_path, post_data) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 278, in post return self._decode_response(response) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 354, in _decode_response self.raise_for_status(response) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/storage/api/client.py", line 29, in raise_for_status super().raise_for_status(response) File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/core/api/__init__.py", line 344, in raise_for_status raise exception from Noneswh.core.api.RemoteException: <RemoteException 500 KafkaDeliveryError: ['flush() exceeded timeout (120s)', [['origin', {'url': 'https://guix.gnu.org/sources.json'}, 'No delivery before flush() timeout', 'SWH_FLUSH_TIMEOUT']]]>{'status': 'failed'}
The error is coming from Kafka. I initiated the containers using docker-compose up -f docker-compose.search.yml docker-compose.yml. Did I do anything wrong here?
The 2nd url [2] targetted from val is ok for my part (not the first one [1] though, it goes 404),
here is the content from the [2]nd url inlined just in case:
InvalidSchema: No connection adapters were found for 'ftp://ftp.ourproject.org/pub/ytalk/ytalk-3.3.0.tar.gz'EXCEPTION(most recent call first)InvalidSchema: No connection adapters were found for 'ftp://ftp.ourproject.org/pub/ytalk/ytalk-3.3.0.tar.gz' File "swh/loader/package/loader.py", line 576, in load res = self._load_revision(p_info, origin) File "swh/loader/package/loader.py", line 713, in _load_revision dl_artifacts = self.download_package(p_info, tmpdir) File "swh/loader/package/loader.py", line 364, in download_package return [download(p_info.url, dest=tmpdir, filename=p_info.filename)] File "swh/loader/package/utils.py", line 79, in download response = requests.get(url, **params, timeout=timeout, stream=True) File "requests/api.py", line 75, in get return request('get', url, params=params, **kwargs) File "requests/api.py", line 60, in request return session.request(method=method, url=url, **kwargs) File "requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "requests/sessions.py", line 640, in send adapter = self.get_adapter(url=request.url) File "requests/sessions.py", line 731, in get_adapter raise InvalidSchema("No connection adapters were found for '%s'" % url)
You might be able to reproduce it directly into the unit test though,
not necessarily by running a full-fledged nixguix loader into docker.
Copy paste and adapt an existing one with an ftp url like.
I'm not entirely sold on using that repository.
Given what's said in the description... and the absence of tests (still according to the readme/description).
This library is not intended to be an example of Transport Adapters best practices. This library was cowboyed together in about 4 hours of total work, has no tests, and relies on a few ugly hacks. Instead, it is intended as both a starting point for future development and a useful example for how to implement transport adapters.
also i just realize i should have mentioned this earlier @KShivendu.
To reproduce the issue, no need for any loader or whatever else,
just ipython in your venv:
$ workon swh(swh) $ ipythonPython 3.7.3 (default, Apr 3 2019, 05:39:12)Type 'copyright', 'credits' or 'license' for more informationIPython 7.20.0 -- An enhanced Interactive Python. Type '?' for help.In [1]: import requestsIn [2]: url = 'ftp://ftp.ourproject.org/pub/ytalk/ytalk-3.3.0.tar.gz'In [3]: requests.get(url)---------------------------------------------------------------------------InvalidSchema Traceback (most recent call last)<ipython-input-3-b80aa89477da> in <module>----> 1 requests.get(url)~/.virtualenvs/swh/lib/python3.7/site-packages/requests/api.py in get(url, params, **kwargs) 73 """ 74---> 75 return request('get', url, params=params, **kwargs) 76 77~/.virtualenvs/swh/lib/python3.7/site-packages/requests/api.py in request(method, url, **kwargs) 59 # cases, and look like a memory leak in others. 60 with sessions.Session() as session:---> 61 return session.request(method=method, url=url, **kwargs) 62 63~/.virtualenvs/swh/lib/python3.7/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 540 } 541 send_kwargs.update(settings)--> 542 resp = self.send(prep, **send_kwargs) 543 544 return resp~/.virtualenvs/swh/lib/python3.7/site-packages/requests/sessions.py in send(self, request, **kwargs) 647 648 # Get the appropriate adapter to use--> 649 adapter = self.get_adapter(url=request.url) 650 651 # Start time (approximately) of the request~/.virtualenvs/swh/lib/python3.7/site-packages/requests/sessions.py in get_adapter(self, url) 740 741 # Nothing matches :-/--> 742 raise InvalidSchema("No connection adapters were found for {!r}".format(url)) 743 744 def close(self):InvalidSchema: No connection adapters were found for 'ftp://ftp.ourproject.org/pub/ytalk/ytalk-3.3.0.tar.gz'