save-code-now: Ingestion of a specific svn origin does not finish
The repository [1] is not loaded completely through the save code now pipeline.
A local (svn) checkout of [1] is already at 35G (unfinished). There is a limit in the save-code-now of memory used 25Gi.
Up until recently, the memory handling logic was a tad inconsistent. It got fixed since the dates mentioned in the email [2].
Another request was sent to the save code now. But it won't go through given what i've said about memory.
Another point is that since the email [2], there has been a git clone [3] which has been processed successfully. That should help ingesting the svn orign since the tree representation between git and svn share some structures (inside the archive).
[3] https://gitlab.com/sosy-lab/software/cpachecker
[1] https://svn.sosy-lab.org/software/cpachecker
[2] original request
on 2024-10-18 I submitted a "save request"
for https://svn.sosy-lab.org/software/cpachecker.
As I can see on
https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://svn.sosy-lab.org/software/cpachecker
it was picked up an the code was imported until some point on 2016-03-14. But the repository has many more revisions since then.
I tried again with a new save request on 2024-10-31, but this did not change anything. Both save requests are still in status "running".
Is there anything that we can do to make the save request become unstuck and proceed? Or can we somehow import the repository in a different way?
For example, I could easily provide the a dumpfile of the repo.
We want to remove the repository from where it is hosted right now, and it would certainly be valuable if the Software Heritage Archive would keep a complete copy of it.
Thank you in advance for your assistance, and also for providing this great service in the first place!
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Author Owner
Another data point, it requires password at the end of the main checkout operation. To deal with externals definition.
While the current svn loader implementation is able to handle the externals, afaik, it will only work with anonymous connection.
... A cpachecker/trunk/test/util/VariablesGenerator.java A cpachecker/trunk/test/witness/violation-witness.yml U cpachecker Authentication realm: <https://svn.sosy-lab.org:443> SoSy-Lab Subversion Repositories Password for 'tony': ^Csvn: warning: W205011: Error handling externals definition for 'cpachecker/tags/cpachecker-1.3.10-svcomp15/test/programs/benchmarks': svn: warning: W170013: Unable to connect to a repository at URL 'https://svn.sosy-lab.org/software/sv-benchmarks/branches/selected/c' Authentication realm: <https://svn.sosy-lab.org:443> SoSy-Lab Subversion Repositories ...
Collapse replies - Maintainer
Fortunately the subversion client used by the loader already uses a set of default credentials, so the fetching of externals requiring credentials will likely fail but the loading will not block.
- Maintainer
Related to blocking svn export, it made me remind that I have a pending merge request to handle SSH export of externals that might block. swh/devel/swh-loader-svn!257 (merged)
- Antoine R. Dumont changed title from Specific svn origin seems to not go through the save-code-now pipelne to Specific svn origin seems to not go through the save-code-now pipeline
changed title from Specific svn origin seems to not go through the save-code-now pipelne to Specific svn origin seems to not go through the save-code-now pipeline
- OwnerResolved by Guillaume Samson
There are a lots of running scn requests in swh-web db:
swh-web=> select count(id) from save_origin_request where loading_task_status = 'running'; count ------- 610 (1 row)
2 replies Last reply by Guillaume Samson
- Antoine R. Dumont changed title from Specific svn origin seems to not go through the save-code-now pipeline to save-code-now: Ingestion of a specific svn origin does not finish
changed title from Specific svn origin seems to not go through the save-code-now pipeline to save-code-now: Ingestion of a specific svn origin does not finish
- Owner
I'm wondering if we should not deploy a larger scn loader instance with 1 replica sticked on a node with a lot of memory (perhaps saam or metal05 or perhaps it's time to add mam in the cluster).
It will eventually catch the big repositories that crashloop on the standard loaders.
WDYT?
Collapse replies - Author Owner
I'm wondering if we should not deploy a larger scn loader instance with 1 replica sticked on a node with a lot of memory (perhaps saam or metal05 or perhaps it's time to add mam in the cluster).
We thought about it with @guillaume indeed. It would match what we did at some point the large-git workload we had.
It will eventually catch the big repositories that crashloop on the standard loaders.
Heh, indeed, that i did not think about. Note that it might take a long time to succeed though.
Edited by Antoine R. Dumont
- Maintainer
Another point is that since the email [2], there has been a git clone [3] which has been processed successfully. That should help ingesting the svn origin since the tree representation between git and svn share some structures (inside the archive).
No really as the subversion loader in its current state reconstructs the file system of a repository at each revision, updates a
swh.model.from_disk.Directory
instance in memory (with file contents) then yields new objects to archive between each first seen revisions. So having a git mirror directory ingested into the archive will not resolve the memory consumption issue.Nevertheless, we could avoid yielding contents and directories objects if the root directory of a revision is already archived (when a git mirror has been ingested for instance), this should speed up the overall loading process.
1 Collapse replies - Author Owner
Yes, ok, indeed. What i said will only help storage side which is not the current issue...
- Author Owner
Ingestion happened, it picked up from where it left off the last time 'Processing revisions [20237-48258]'.
It failed on disk space issue without crashing (pod continued on the other repositories to ingest).
2024-11-26T09:16:00.043644918Z loaders {"asctime": "2024-11-26 09:16:00,043", "threadName": "MainThread", "pathname": "/opt/swh/.local/lib/python3.10/site-packages/celery/worker/strategy.py", "lineno": 161, "func Name": "task_message_handler", "task_name": null, "task_id": null, "name": "celery.worker.strategy", "levelname": "INFO", "message": "Task swh.loader.svn.tasks.DumpMountAndLoadSvnRepository[5b32b7b8-7aeb-41fe-a8c 6-a3cf8c4e07af] received", "data": {"id": "5b32b7b8-7aeb-41fe-a8c6-a3cf8c4e07af", "name": "swh.loader.svn.tasks.DumpMountAndLoadSvnRepository", "args": "()", "kwargs": "{'url': 'https://svn.sosy-lab.org/software/ cpachecker'}", "eta": null}} 2024-11-26T09:16:00.354756429Z loaders {"asctime": "2024-11-26 09:16:00,354", "threadName": "MainThread", "pathname": "/opt/swh/.local/lib/python3.10/site-packages/swh/loader/core/loader.py", "lineno": 413, "func Name": "load", "task_name": null, "task_id": null, "name": "swh.loader.svn.loader.SvnLoaderFromRemoteDump", "levelname": "INFO", "message": "Load origin 'https://svn.sosy-lab.org/software/cpachecker' with type 's vn'"} 2024-11-26T10:10:50.473801375Z loaders {"asctime": "2024-11-26 10:10:50,473", "threadName": "MainThread", "pathname": "/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/loader.py", "lineno": 286, "funcN ame": "start_from", "task_name": null, "task_id": null, "name": "swh.loader.svn.loader.SvnLoaderFromRemoteDump", "levelname": "INFO", "message": "Processing revisions [20237-48258] for {'swh-origin': 'https://svn .sosy-lab.org/software/cpachecker', 'remote_url': 'file:///tmp/swh.loader.svn.ddw0a6cn-22/swh.loader.svn.389qskxi-22/tmpt5c1vqjc', 'local_url': b'/tmp/swh.loader.svn.d4v93s0n-22/tmpt5c1vqjc', 'uuid': b'4712c6d2-4 0bb-43ae-aa4b-fec3f1bdfe4c'}"} 2024-11-26T10:10:50.473801375Z loaders {"asctime": "2024-11-26 10:10:50,473", "threadName": "MainThread", "pathname": "/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/loader.py", "lineno": 286, "funcName": "start_from", "task_name": null, "task_id": null, "name": "swh.lo ader.svn.loader.SvnLoaderFromRemoteDump", "levelname": "INFO", "message": "Processing revisions [20237-48258] for {'swh-origin': 'https://svn.sosy-lab.org/software/cpachecker', 'remote_url': 'file:///tmp/swh.loader.svn.ddw0a6cn-22/swh.loader.svn.389qskxi-22/tmpt5c1vqjc', 'local_url' : b'/tmp/swh.loader.svn.d4v93s0n-22/tmpt5c1vqjc', 'uuid': b'4712c6d2-40bb-43ae-aa4b-fec3f1bdfe4c'}"} 2024-11-26T13:16:17.126231578Z loaders {"asctime": "2024-11-26 13:16:17,124", "threadName": "MainThread", "pathname": "/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/loader.py", "lineno": 513, "funcName": "fetch_data", "task_name": null, "task_id": null, "name": "swh.lo ader.svn.loader.SvnLoaderFromRemoteDump", "levelname": "ERROR", "message": "[Errno 28] Can't write to file '/tmp/swh.loader.svn.d4v93s0n-22/tmpt5c1vqjc/branches/ki-before-imc/doc/tutorials/fault-localization/svn-Vyl8JT': No space left on device", "exc_info": "Traceback (most recent call last):\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/loader.py\", line 507, in fetch_data\n data = next(self.swh_revision_gen)\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/loader.py\", line 418, in process_svn_revisions\n f or rev, commit, new_objects, root_directory in gen_revs:\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/svn_repo.py\", line 568, in swh_hash_data_per_revision\n objects = self.swhreplay.compute_objects(rev, low_water_mark)\n File \"/opt/swh/.local/lib/pyth on3.10/site-packages/swh/loader/svn/replay.py\", line 850, in compute_objects\n self.replay(rev, low_water_mark)\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/replay.py\", line 832, in replay\n self.conn.replay(rev, low_water_mark, self.editor)\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/replay.py\", line 243, in add_directory\n self.svnrepo.export(\n File \"/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py\", line 336, in wrapped_f\n return copy(f, *args, **kw)\n File \"/opt/swh/. local/lib/python3.10/site-packages/tenacity/__init__.py\", line 475, in __call__\n do = self.iter(retry_state=retry_state)\n File \"/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py\", line 376, in iter\n result = action(retry_state)\n File \"/opt/swh/.local/ lib/python3.10/site-packages/tenacity/__init__.py\", line 398, in <lambda>\n self._add_action_func(lambda rs: rs.outcome.result())\n File \"/usr/local/lib/python3.10/concurrent/futures/_base.py\", line 451, in result\n return self.__get_result()\n File \"/usr/local/lib/pytho n3.10/concurrent/futures/_base.py\", line 403, in __get_result\n raise self._exception\n File \"/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py\", line 478, in __call__\n result = fn(*args, **kwargs)\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh /loader/svn/svn_repo.py\", line 315, in export\n return self.client.export(\nOSError: [Errno 28] Can't write to file '/tmp/swh.loader.svn.d4v93s0n-22/tmpt5c1vqjc/branches/ki-before-imc/doc/tutorials/fault-localization/svn-Vyl8JT': No space left on device"} 2024-11-26T13:16:22.434841175Z loaders {"asctime": "2024-11-26 13:16:22,434", "threadName": "MainThread", "pathname": "/opt/swh/.local/lib/python3.10/site-packages/swh/loader/core/loader.py", "lineno": 512, "funcName": "load", "task_name": null, "task_id": null, "name": "swh.loader. svn.loader.SvnLoaderFromRemoteDump", "levelname": "ERROR", "message": "Loading failure, updating to `partial` status", "exc_info": "Traceback (most recent call last):\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/core/loader.py\", line 502, in load\n self.post _load()\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/loader.py\", line 602, in post_load\n self._check_revision_divergence(\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/loader.py\", line 314, in _check_revision_divergence\n che cked_dir = self.swh_revision_hash_tree_at_svn_revision(rev)\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn/loader.py\", line 157, in swh_revision_hash_tree_at_svn_revision\n local_dirname, local_url = self.svnrepo.export_temporary(revision)\n File \"/opt/s wh/.local/lib/python3.10/site-packages/swh/loader/svn/svn_repo.py\", line 491, in export_temporary\n self.export(\n File \"/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py\", line 336, in wrapped_f\n return copy(f, *args, **kw)\n File \"/opt/swh/.local/lib/p ython3.10/site-packages/tenacity/__init__.py\", line 475, in __call__\n do = self.iter(retry_state=retry_state)\n File \"/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py\", line 376, in iter\n result = action(retry_state)\n File \"/opt/swh/.local/lib/python3 .10/site-packages/tenacity/__init__.py\", line 398, in <lambda>\n self._add_action_func(lambda rs: rs.outcome.result())\n File \"/usr/local/lib/python3.10/concurrent/futures/_base.py\", line 451, in result\n return self.__get_result()\n File \"/usr/local/lib/python3.10/concu rrent/futures/_base.py\", line 403, in __get_result\n raise self._exception\n File \"/opt/swh/.local/lib/python3.10/site-packages/tenacity/__init__.py\", line 478, in __call__\n result = fn(*args, **kwargs)\n File \"/opt/swh/.local/lib/python3.10/site-packages/swh/loader/svn /svn_repo.py\", line 315, in export\n return self.client.export(\nOSError: [Errno 28] Can't close file '/tmp/swh.loader.svn.d4v93s0n-22/check-revision-42096.u1x3lljd/tmpt5c1vqjc/branches/1007-update-JavaSMT-from-v3.10.1-to-v3.13.0/.idea/svn-w5FRzS': No space left on device", "swh _task_args": [], "swh_task_kwargs": {"origin": "https://svn.sosy-lab.org/software/cpachecker", "lister_name": null, "lister_instance_name": null}} 2024-11-26T13:16:47.866521268Z loaders {"asctime": "2024-11-26 13:16:47,866", "threadName": "MainThread", "pathname": "/opt/swh/.local/lib/python3.10/site-packages/celery/app/trace.py", "lineno": 128, "funcName": "info", "task_name": null, "task_id": null, "name": "celery.app.trace" , "levelname": "INFO", "message": "Task swh.loader.svn.tasks.DumpMountAndLoadSvnRepository[5b32b7b8-7aeb-41fe-a8c6-a3cf8c4e07af] succeeded in 14447.818100643344s: {'status': 'failed'}", "data": {"id": "5b32b7b8-7aeb-41fe-a8c6-a3cf8c4e07af", "name": "swh.loader.svn.tasks.DumpMountAnd LoadSvnRepository", "return_value": "{'status': 'failed'}", "runtime": 14447.818100643344, "args": "()", "kwargs": "{'url': 'https://svn.sosy-lab.org/software/cpachecker'}"}}
Edited by Antoine R. Dumont - Author Owner
This got triggered back and it will be processing another range of svn commits:
"swh.loader.svn.loader.SvnLoaderFromRemoteDump", "levelname": "INFO", "message": "Processing revisions [42097-48258] for {'swh-origin': 'https://svn.sosy-lab.org/software/cpachecker', 'remote_url': 'file:///tmp/swh.loader.svn.5ga50br5-22/swh.loader.svn.touyflec-22/tmpj4w5ddml', 'local_url': b'/tmp/swh.loader.svn.qtc_j4yk-22/tmpj4w5ddml', 'uuid': b'4712c6d2-40bb-43ae-aa4b-fec3f1bdfe4c'}"}
- Vincent Sellier marked this issue as related to #4390
marked this issue as related to #4390
- Vincent Sellier mentioned in commit swh/infra/ci-cd/swh-charts@4913e117
mentioned in commit swh/infra/ci-cd/swh-charts@4913e117
- Vincent Sellier mentioned in merge request swh/infra/ci-cd/swh-charts!517 (merged)
mentioned in merge request swh/infra/ci-cd/swh-charts!517 (merged)
- Vincent Sellier mentioned in commit swh/infra/ci-cd/swh-charts@f913fff7
mentioned in commit swh/infra/ci-cd/swh-charts@f913fff7
- Vincent Sellier mentioned in commit swh/infra/ci-cd/swh-charts@ddf19ac4
mentioned in commit swh/infra/ci-cd/swh-charts@ddf19ac4
- Owner
large svn repository loading stack deployed.
The loading of the repository was triggered:
$ echo "https://svn.sosy-lab.org/software/cpachecker" | swh scheduler origin send-origins-from-file-to-celery --queue-name-prefix large_repository load-svn - {'name': 'swh.loader.svn.tasks.DumpMountAndLoadSvnRepository', 'task_id': 'a07cd777-635f-4c4c-9e3b-fff3a492dadf', 'args': (), 'kwargs': {'url': 'https://svn.sosy-lab.org/software/cpachecker'}, 'queue': 'large_repository:swh.loader.svn.tasks.DumpMountAndLoadSvnRepository'}
1 - Owner
The visit was successfully done
2024-12-04T15:21:18.655600695Z loaders {"asctime": "2024-12-04 15:21:18,654", "threadName": "MainThread", "pathname": "/opt/swh/.local/lib/python3.10/site-packages/celery/app/trace.py", "lineno": 128, "funcName": "info", "task_name": null, "task_id": null, "name": "celery.app.trace", "levelname": "INFO", "message": "Task swh.loader.svn.tasks.DumpMountAndLoadSvnRepository[a07cd777-635f-4c4c-9e3b-fff3a492dadf] succeeded in 21255.482125155628s: {'status': 'eventful'}", "data": {"id": "a07cd777-635f-4c4c-9e3b-fff3a492dadf", "name": "swh.loader. svn.tasks.DumpMountAndLoadSvnRepository", "return_value": "{'status': 'eventful'}", "runtime": 21255.482125155628, "args": "()", "kwargs": "{'url': 'https://svn.sosy-lab.org/software/cpachecker'}"}}
1 - Vincent Sellier closed
closed