Handle missing objects in `revision_log()` and `revision_shortlog()`
Handle missing objects in revision_log() and revision_shortlog()
A revision in a log can be missing from the storage. While holes are unusual, they can happen. Dedicated tests were added for the issue.
TODO:
-
A decision needs to be made on what should be returned by revision_short_log()
in case of a missing revision.
Merge request reports
Activity
assigned to @lunar
Jenkins job DSTO/gitlab-builds #665 succeeded .
See Console Output and Coverage Report for more details.Could you add a test? (insert revisions 2, 3, 4 and not 1 from
storage_data
; then query revision_log from 3 or 4)Edited by vlorentzThanks for the prompt. The situation seems pretty uneven:
swh/storage/tests/blocking/test_proxy_blocking.py::TestStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py FAILED [ 9%] swh/storage/tests/blocking/test_proxy_no_blocking.py::TestStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py FAILED [ 18%] swh/storage/tests/masking/test_proxy_masking.py::TestStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py FAILED [ 27%] swh/storage/tests/masking/test_proxy_passthrough.py::TestStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py FAILED [ 36%] swh/storage/tests/test_api_client.py::TestStorageApi::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py FAILED [ 45%] swh/storage/tests/test_cassandra.py::TestCassandraStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py FAILED [ 54%] swh/storage/tests/test_in_memory.py::TestInMemoryStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py FAILED [ 63%] swh/storage/tests/test_postgresql.py::TestStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py PASSED [ 72%] swh/storage/tests/test_postgresql_flavor_mirror.py::TestStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py PASSED [ 81%] swh/storage/tests/test_postgresql_flavor_readreplica.py::TestStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py PASSED [ 90%] swh/storage/tests/test_tenacious.py::TestTenaciousStorage::test_revision_log_missing_revision <- swh/storage/tests/storage_tests.py FAILED [100%]
added 1 commit
- a99153b9 - postgresql: handle missing objects in revision_log()
Jenkins job DSTO/gitlab-builds #666 failed .
See Console Output and Coverage Report for more details.changed milestone to %Tooling for takedown notices [Roadmap - Collect]
Jenkins job DSTO/gitlab-builds #667 failed .
See Console Output and Coverage Report for more details.Returning
None
s in the middle of a list of revisions seems a bit weird to me (I know it's consistent with the behavior of other endpoints, but ugh).These two
revision_log
endpoints are paginated by their very nature, so clients need to already be able to resume a query from incomplete results. With that being given, I would suggest just truncating the results at the "earliest" revision in the history for which we have information in the revision table.Assuming a linear history with revisions
A
(A.parents == ()
),B
(B.parents == (A.id,)
),C
(C.parents == (B.id,)
), if revisionA
is missing from storage, then:revision_log([C.id])
should return results equivalent to[C, B]
revision_shortlog([C.id])
should return results similar to[(C.id, (B.id,)), (B.id, (A.id,))]
Noticing this, a client would be able to try to resume iterating with
A.id
, and notice that the revision is missing somehow which could then return an error.Alternatively, we could deprecate these endpoints, and create others that would have a composite return value which would be able to report this error directly (making it explicit that there's a distinction between "limit reached" and "objects missing"). But that's lots of churn for what should be, effectively, a rare edge case.
But that's lots of churn for what should be, effectively, a rare edge case.
we could use this as an opportunity to make
revision_log()
's successor return model objects instead of dicts (see also !1117)not the worst idea!
Edited by Nicolas Dandrimont
mentioned in merge request swh-alter!41 (merged)