When the ingestion is done, retrieve errors immediately from softwareheritage-log between the start and end date of the ingestion and reference them (e.g paste referenced from that thread or directly as comment here).
The errors should be addressable as per git repository.
Nicolas Dandrimontchanged title from Reference errors after ingestion to Gitorious import: Reference errors after ingestion
changed title from Reference errors after ingestion to Gitorious import: Reference errors after ingestion
Antoine R. Dumontchanged title from Gitorious import: Reference errors after ingestion to Gitorious import: Examine ingestion logs for errors and list them if any
changed title from Gitorious import: Reference errors after ingestion to Gitorious import: Examine ingestion logs for errors and list them if any
These should be rescheduled and driven to successful completion.
And now it has been queued (same goes for swh/devel/swh-loader-git#675 (closed)).
Putting aside the errors we cannot do much about (not a git repository, etc...)
We have been doing "more investigation" for a very long time now. We really need closure closure here.
I don't know if investigation was the right word so far.
We rescheduled (as in queued) massively since there have been infrastructure bumps along the way (db problems -> solved since, queue emptied, etc....).
Also, there were some me bump as well (foot accident and all) which delayed analysis :/
{ "took": 65, "timed_out": false, "_shards": { "total": 30, "successful": 30, "skipped": 0, "failed": 0 }, "hits": { "total": 2, "max_score": null, "hits": [ { "_index": "logstash-2017.12.18", "_type": "journal", "_id": "AWBqQGrgl67zVNRhVsQ-", "_score": null, "_source": { "message": "[2017-12-18 15:31:33,690: ERROR/Worker-1] Loading failure, updating to `partial` statusTraceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 862, in load self.store_data() File "/usr/lib/python3/dist-packages/swh/loader/git/base.py", line 164, in store_data self.send_batch_revisions(self.get_revisions()) File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 647, in send_batch_revisions send_in_packets(revisions, self.send_revisions, packet_size) File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 46, in send_in_packets sender(formatted_objects) File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call return attempt.get(self._wrap_exception) File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise raise value File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 416, in send_revisions self.storage.revision_add(revision_list) File "/usr/lib/python3/dist-packages/swh/storage/api/client.py", line 80, in revision_add return self.post('revision/add', {'revisions': revisions}) File "/usr/lib/python3/dist-packages/swh/core/api.py", line 62, in post return self._decode_response(response) File "/usr/lib/python3/dist-packages/swh/core/api.py", line 91, in _decode_response raise pickle.loads(decode_response(response))psycopg2.extensions.TransactionRollbackError: deadlock detectedDETAIL: Process 44058 waits for ShareLock on transaction 1430906690; blocked by process 40853.Process 40853 waits for ShareLock on transaction 1430906691; blocked by process 44058.HINT: See server log for query details.CONTEXT: while inserting index tuple (26995597,21) in relation "revision_pkey"SQL statement "insert into revision (id, date, date_offset, date_neg_utc_offset, committer_date, committer_date_offset, committer_date_neg_utc_offset, type, directory, message, author, committer, metadata, synthetic) select t.id, t.date, t.date_offset, t.date_neg_utc_offset, t.committer_date, t.committer_date_offset, t.committer_date_neg_utc_offset, t.type, t.directory, t.message, a.id, c.id, t.metadata, t.synthetic from tmp_revision t left join person a on a.fullname = t.author_fullname left join person c on c.fullname = t.committer_fullname"PL/pgSQL function swh_revision_add() line 5 at SQL statement" }, "sort": [ 1513611093695 ] }, { "_index": "logstash-2017.12.18", "_type": "journal", "_id": "AWBqRZr-l67zVNRhVxzM", "_score": null, "_source": { "message": "[2017-12-18 15:37:13,969: ERROR/Worker-3] Loading failure, updating to `partial` statusTraceback (most recent call last): File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 862, in load self.store_data() File "/usr/lib/python3/dist-packages/swh/loader/git/base.py", line 168, in store_data self.send_batch_occurrences(self.get_occurrences()) File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 659, in send_batch_occurrences send_in_packets(occurrences, self.send_occurrences, packet_size) File "/usr/lib/python3/dist-packages/swh/loader/core/loader.py", line 34, in send_in_packets for obj in objects: File "/usr/lib/python3/dist-packages/swh/loader/git/loader.py", line 214, in get_occurrences for refs, target in self.repo.refs.as_dict().items() File "/usr/lib/python3/dist-packages/dulwich/refs.py", line 164, in as_dict keys = self.keys(base) File "/usr/lib/python3/dist-packages/dulwich/refs.py", line 143, in keys return self.allkeys() File "/usr/lib/python3/dist-packages/dulwich/refs.py", line 470, in allkeys sys.getfilesystemencoding())UnicodeEncodeError: 'utf-8' codec can't encode character '\udccd' in position 11: surrogates not allowed" }, "sort": [ 1513611433974 ] } ] }}
The origins in question:
softwareheritage=> select o.id, o.url, o.type, ov.status from origin o inner join origin_visit ov on ov.origin=o.id where type='git' and url like 'https://gitorious.org%' and ov.visit = (select max(visit) from origin_visit where origin=ov.origin) and ov.status='partial'; id | url | type | status----------+-------------------------------------------------------------+------+--------- 58141917 | https://gitorious.org/test-project2009/test-project2009.git | git | partial 58227648 | https://gitorious.org/qt/tolszaks-qtsensors.git | git | partial(2 rows)
The sql error was sheer bad luck, tested locally and no problem, so it was rescheduled, loaded successfully.
Now, only 1 origin to go, this is a bug in the underlying brick (dulwich) and #911 (closed) is opened for it.
softwareheritage=> select o.id, o.url, o.type, ov.status from origin o inner join origin_visit ov on ov.origin=o.id where type='git' and url like 'https://gitorious.org%' and ov.visit = (select max(visit) from origin_visit where origin=ov.origin) and ov.status='partial'; id | url | type | status----------+-------------------------------------------------------------+------+--------- 58141917 | https://gitorious.org/test-project2009/test-project2009.git | git | partial(1 row)
No more gitorious origin with status different than 'full'.
softwareheritage=> select o.id, o.url, o.type, ov.status from origin o inner join origin_visit ov on ov.origin=o.id where type='git' and url like 'https://gitorious.org%' and ov.visit = (select max(visit) from origin_visit where origin=ov.origin) and ov.status<>'full';id | url | type | status----+-----+------+--------(0 rows)