Google code svn import - 'Too many open file' errors
On some repositories with supposedly lots of files, an os error about too many open file error is raised.
Steps to reproduce on a local storage with latest swh-loader-svn.
Use /srv/storage/space/mirrors/code.google.com/sources/v2/code.google.com/c/c-semantics/c-semantics-repo.svndump.gz:
dump = 'c-semantics-repo.svndump.gz'
origin_url = 'http://foo/bar/svn'
import logging
logging.basicConfig(level=logging.DEBUG)
from swh.loader.svn.tasks import MountAndLoadSvnRepositoryTsk
t = MountAndLoadSvnRepositoryTsk()
t.run(archive_path=dump, origin_url=origin_url, visit_date='2016-05-03T15:16:32+00:00')
output:
INFO:swh.loader.svn.SvnLoader:Archive to mount and load c-semantics-repo.svndump.gz
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating svn origin for http://foo/bar/svn
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done creating svn origin for http://foo/bar/svn
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating svn origin for http://foo/bar/svn
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done creating svn origin for http://foo/bar/svn
################## ... Strange but not lethal... (could possibly explain an integrity error i've seen some time ago on origin creation...)
################## oh, yeah, loader-svn needs the origin created prior to the actual base class (loader-core) initializing it
################## possibly, this can be improved but this is not the task at end
################## fixed!
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating origin_visit for origin 7 at time 2016-05-03T15:16:32+00:00
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done Creating origin_visit for origin 7 at time 2016-05-03T15:16:32+00:00
INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:[revision_start-revision_end]: [1-1320]
INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Processing {'local_url': b'/tmp/tmp.v_ezqr26.swh.loader/tmp.a6ouqrqy.swh.loader.svn', 'remote_url': 'file:///tmp/tmp.a6ouqrqy.swh.loader.svn', 'swh-origin': 7, 'uuid': b'9230c010-68a0-4913-9f46-c5e1b1c9998b'}.
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 1, swhrev: 21cd3d56a4f4fe553d6b0c494579dc2b9590da89, dir: 24e1d3e7645aff42951637d824c0857e91058120
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 2, swhrev: 482ce1333a76e1f8325ec7cdf9d9d590f3dfbed6, dir: 630bb014fb6ce836ba339a7cf66a7b9c346bd71b
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 3, swhrev: 0a910abbe53b7d27c961c2c56f65aa6ea4571fea, dir: b310a8dcdfc5e4a6875b0c17a36af372e1acb31c
...
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 515, swhrev: 3de5df05ced29ce2765ea8007e81e2ec6e469f58, dir: 8e5bbfc7b14f28f8ba2f5acaae081e3f368ce17e
ERROR:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Eventful partial visit. Detail: [Errno 24] Can't open file '/tmp/tmp.a6ouqrqy.swh.loader.svn/db/revs/0/420': Too many open files
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:occ: {'visit': 1, 'target_type': 'revision', 'origin': 7, 'target': b'\x84\xe8\\c^\xba0!\xa5RrzR\xc2\x90H\x9d\xb2\xaaw', 'branch': 'master'}
ERROR:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Loading failure, updating to `partial` status
Traceback (most recent call last):
File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 216, in process_swh_revisions
self.config['revision_packet_size']):
File "/home/tony/work/inria/repo/swh/swh-environment/swh-core/swh/core/utils.py", line 40, in grouper
for _data in itertools.zip_longest(*args, fillvalue=None):
File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 160, in process_svn_revisions
for rev, nextrev, commit, new_objects, root_directory in gen_revs:
File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/svn.py", line 266, in swh_hash_data_per_revision
objects = self.swhreplay.compute_hashes(rev)
File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 333, in compute_hashes
self.replay(rev)
File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/ra.py", line 318, in replay
self.conn.replay(rev, rev+1, self.editor)
OSError: [Errno 24] Can't open file '/tmp/tmp.a6ouqrqy.swh.loader.svn/db/revs/0/420': Too many open files
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 681, in load
self.store_data()
File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 308, in store_data
self.last_known_swh_revision)
File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 496, in process_repository
svnrepo, revision_start, revision_end, revision_parents)
File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 235, in process_swh_revisions
'id': _id,
##################### raises an eyebrow... where is the rest of the stack trace?
swh.loader.svn.loader.SvnLoaderEventful: [Errno 24] Can't open file '/tmp/tmp.a6ouqrqy.swh.loader.svn/db/revs/0/420': Too many open files
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Updating origin_visit for origin 7 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done updating origin_visit for origin 7 with status partial
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 1 occurrences
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done sending 1 occurrences
DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Clean up temp directory /tmp/tmp.a6ouqrqy.swh.loader.svn for project
The loading fails at revision 515.
Migrated from T811 (view on Phabricator)