When swh.loader.svn loads a repository which contains svn:externals properties, the loading interrupts itself, and a partial visit is registered in the archive.
However, the revisions that have been converted up to the one containing externals are not loaded, only the underlying contents and directories. The registered partial visit is empty.
Steps to reproduce on a local storage:
Use /srv/storage/space/mirrors/code.google.com/sources/v2/code.google.com/c/cpcsdk/cpcsdk-repo.svndump.gz
The loading b0rks at revision 457 because of svn:externals; when done, note that the revision 975f88a53cec719fe0f3041f618bfae7528ac2be is not loaded to the archive, but the directory 2dedb203719e482743d4bd6822ae65cde41cf607 is.
Ok, just so you know, i did not reproduce this behavior immediately (even though: swh-env updated, db rebuilt, configuration file 'almost' identical to prod).
I had another issue prior to the one described (about DentryPerms.directory).
The 'almost' identical configuration diverged only on the storage (which was local).
Here is the stacktrace:
$ python3Python 3.5.3 (default, Jan 19 2017, 14:11:04)[GCC 6.3.0 20170118] on linuxType "help", "copyright", "credits" or "license" for more information.>>> import logging>>> logging.basicConfig(level=logging.DEBUG)>>>>>> from swh.loader.svn.tasks import MountAndLoadSvnRepositoryTsk>>>>>> t = MountAndLoadSvnRepositoryTsk()>>> t.run(archive_path='cpcsdk-repo.svndump.gz', origin_url='http://cpcsdk.googlecode.com/svn/', visit_date='2016-05-03T15:16:32+00:00')INFO:swh.loader.svn.SvnLoader:Archive to mount and load cpcsdk-repo.svndump.gzDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating svn origin for http://cpcsdk.googlecode.com/svn/DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done creating svn origin for http://cpcsdk.googlecode.com/svn/DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating svn origin for http://cpcsdk.googlecode.com/svn/DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done creating svn origin for http://cpcsdk.googlecode.com/svn/DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Creating origin_visit for origin 1 at time 2016-05-03T15:16:32+00:00DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done Creating origin_visit for origin 1 at time 2016-05-03T15:16:32+00:00INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:[revision_start-revision_end]: [1-623]INFO:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Processing {'swh-origin': 1, 'uuid': b'dd9cee65-5dde-4192-945e-cf0bec8a92d6', 'remote_url': 'file:///tmp/tmp.wmko7y37.swh.loader.svn', 'local_url': b'/tmp/tmp.yp2jitma.swh.loader/tmp.wmko7y37.swh.loader.svn'}.DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 1, swhrev: 8c00e01da274ded154d25b5b77d7e4dc077195d3, dir: 75ed58f260bfa4102d0e09657803511f5f0ab372DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 2, swhrev: e5b5466c376da890e2642dc42d6c6dd8807d2943, dir: d5a2863eb43c43292db4f804c320bc16a1f17375DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 3, swhrev: 936dc143f984c810b6234bff4717c6795a2d3c74, dir: 49a3946d60a6bd324f301ad8632c843fe4e09951DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 4, swhrev: 687abf57995e18c3df48132d8e82e5a8efd5df8a, dir: 1893c4ef234bc876840793f04dad13a6edead5c7DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 5, swhrev: 7cb9e8974a090f477de7528e4bbd818ff7f659b2, dir: e35aed32f4ad5d6ed8e28876373c198a531edc32DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 6, swhrev: e452dcd84af5530a484304373f2d0b5deb65cd69, dir: 4ee26a044b21470b477a18ac6a84fbb3eda31ea4DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 7, swhrev: 5345d0018ba076e8fe4ef2f3ed5b5fb5ad7f7ce3, dir: d143acafc843b621c7ccb7e6e6e41d59cbacb66aDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 1195 contentsDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done sending 1195 contentsDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 8, swhrev: 584d9ff73981cfcab44f3f2e15c91fdf7984e92b, dir: 2862ce4c4ab79660c34a46bf3aa29b7bf639e96aDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 9, swhrev: c4d19fb3e11a454239f468996ef2a4e5bbe65b26, dir: a2555ad5cba6f8514140746a2f2048c5192134a8DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 10, swhrev: 4435a9c2715e45b5334b4a51f5a3c7c649818c28, dir: 70d694bc06cf3165ab5aeefe9f4e2e3e84165293DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 11, swhrev: 22a2cfe9259295f586e13202c6874bc486f25b70, dir: af4bf5e750480398c9031e94c0dd2429f402b36eDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 12, swhrev: 751ad5674d4cfa137b11e403facba6124974a19f, dir: 262185f44411173cb13987f81da51237fb20687fDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 13, swhrev: 22e63e5088103c83f79000bbb20954938aaa3ea0, dir: 0723781250956b12961c04a10433e6ab2baabc9fDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 14, swhrev: d54b1ceed255fe7075c8f6c60a2153dcaaff7365, dir: 2fae94a18650760281732e38437e7d1f2dc54244DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 15, swhrev: 5cb584f26d008661af665d0c1db7dabed6a085dc, dir: 4538ecb42a2315407da7a97bdc9f8308586e64b4DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 16, swhrev: 4c8e6de61330ef0b99d5007ceaf746d7db173a28, dir: ea09ae8d1c83eccd4ab54b4a68af5ac9b49fed64DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 17, swhrev: 4c0012e0cb85146d00ef49b6203248590ef1c4b1, dir: 42c3c256eb962bdb61c82820588c409d3f758701DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 18, swhrev: 9012cbdf7c6ed045e8587a8687e856f2380a9a1d, dir: 17586a9e6e6fc9f82723c43d4368135fa9d6c891DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 19, swhrev: d2f0cb2d5ec820c924625e9a754600c347887fc3, dir: 9c149031fa310991cf1a774e869e153ca8cd0f89DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 20, swhrev: 694e29b5c4b8acc01490fd8d906bb66eb9e3aca8, dir: ffd7e9c62d44cc69a36da387d9f3bd6bc9a616ffDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 21, swhrev: d69c680394270bb9e742ccc362c60516a08440ea, dir: 8890ca5f9cd9621b19ff0ffc25345f5385e839acDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:rev: 22, swhrev: f9a54f0605677a247c59a1806dcfb82f82a01721, dir: ec4ee5e23f27483b7c9bae3683158cc5470ada37DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 223 contentsDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done sending 223 contentsDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Sending 262 directoriesERROR:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Loading failure, updating to `partial` statusTraceback (most recent call last): File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 627, in load self.store_data() File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 303, in store_data self.last_known_swh_revision) File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 492, in process_repository svnrepo, revision_start, revision_end, revision_parents) File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 232, in process_swh_revisions raise e File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 216, in process_swh_revisions self.config['revision_packet_size']): File "/home/tony/work/inria/repo/swh/swh-environment/swh-core/swh/core/utils.py", line 40, in grouper for _data in itertools.zip_longest(*args, fillvalue=None): File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-svn/swh/loader/svn/loader.py", line 165, in process_svn_revisions new_objects.get('directory', {}).values()) File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 494, in maybe_load_directories self.bulk_send_directories(directories) File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 417, in bulk_send_directories self.send_directories(self.directories.pop()) File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call return attempt.get(self._wrap_exception) File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise raise value File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) File "/home/tony/work/inria/repo/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 276, in send_directories self.storage.directory_add(directory_list) File "/home/tony/work/inria/repo/swh/swh-environment/swh-storage/swh/storage/storage.py", line 427, in directory_add cur, File "/home/tony/work/inria/repo/swh/swh-environment/swh-storage/swh/storage/db.py", line 183, in copy_to tblname, ', '.join(columns)), f)psycopg2.DataError: invalid input syntax for integer: "DentryPerms.directory"CONTEXT: COPY tmp_directory_entry_dir, line 1, column perms: "DentryPerms.directory"DEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Updating origin_visit for origin 1 with status partialDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Done updating origin_visit for origin 1 with status partialDEBUG:swh.scheduler.task.MountAndLoadSvnRepositoryTsk:Clean up temp directory /tmp/tmp.wmko7y37.swh.loader.svn for projectDEBUG:amqp:Start from server, version: 0.9, properties: {'cluster_name': 'rabbit@corellia.lan', 'product': 'RabbitMQ','version': '3.6.6', 'platform': 'Erlang/OTP', 'capabilities': {'consumer_priorities': True, 'consumer_cancel_notify': True, 'publisher_confirms': True, 'exchange_exchange_bindings': True, 'direct_reply_to': True, 'connection.blocked': True, 'authentication_failure_close': True, 'basic.nack': True, 'per_consumer_qos': True}, 'information': 'Licensed underthe MPL. See http://www.rabbitmq.com/', 'copyright': 'Copyright (C) 2007-2016 Pivotal Software, Inc.'}, mechanisms: ['AMQPLAIN', 'PLAIN'], locales: ['en_US']DEBUG:amqp:Open OK!DEBUG:amqp:using channel_id: 1DEBUG:amqp:Channel open
Switching the configuration to use a remote storage, and now I reproduce.
Well (remembering now), the fact that only 400 revisions is stored is normal (as in implemented that way).
To avoid having too much in memory, swh-loader-svn walks the history log and groups revision by block size (configured by yaml with key revision_packet_size, which for this test, is 100-revision blocks).
When an exception is raised, it's supposed to take the last revision seen (by blocks so here 400 as in 4th iteration of 100 revisions) to make it the next starting point (for the next visit).
Somehow the first time though, an exception is raised within that edge case scenario.
What's not normal here is that the first time we pass on that origin, another exception is raised (thus breaking what i've just described...).
Looking further into it.
Note:
The fact that the contents/directories are already sent is not problematic.
We will make links (as in revision targeting those directories) later when visiting again that origin (idempotency making it real).