Postgresql storage replayer failed on incorrect revision dates
Sentry Issue: SWH-STORAGE-2XAW
ValueError: year 30564 is out of range
(25 additional frame(s) were not displayed)
...
File "swh", line 8, in <module>
sys.exit(main())
Designs
- Show closed items
- swh/devel/swh-storage #4718
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Owner
We should check if the revision is created in cassandra, if not we could activate the error offloading in redis.
If it is, we should check how we can deal with this entry
- Author Maintainer
Sentry Issue: SWH-WEBAPP-64C
- Owner
So after investigation, one of the revision is:
INFO:swh.storage.replay:Revision(message=b"Merge branch 'main' of github.com:NguyenHuuDangKhoa/my_profile\n", author=Person(fullname=b'NguyenHuuDangKhoa <nhdkhoa1994@gmail.com>', name=b'NguyenHuuDangKhoa', email=b'nhdkhoa1994@gmail.com'), committer=Person(fullname=b'NguyenHuuDangKhoa <nhdkhoa1994@gmail.com>', name=b'NguyenHuuDangKhoa', email=b'nhdkhoa1994@gmail.com'), date=TimestampWithTimezone(timestamp=Timestamp(seconds=962434443931, microseconds=0), offset_bytes=b'-10049133'), committer_date=TimestampWithTimezone(timestamp=Timestamp(seconds=962434443931, microseconds=0), offset_bytes=b'-10049133'), type=RevisionType.GIT, directory=hash_to_bytes('3911c5a3b0cab7f861632d8e774df0668049a931'), synthetic=False, metadata=None, parents=(hash_to_bytes('16e3d41fe2217e3a4508874996abcee38a2a86e0'), hash_to_bytes('ea222633282cd55f4b38c0d2f56ff76ce6fef256')), id=hash_to_bytes('092b6a553f0f5a30725f940ca4af1fd2965b003d'), extra_headers=(), raw_manifest=None)
The status is:
- the origin and its content is present cassandra: https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/NguyenHuuDangKhoa/my_profile
- The display of the history of the repository failed: https://sentry.softwareheritage.org/share/issue/7a28f1842bee4d53afc7bb1a3bb3ce20/
- The revision is not present in postgresql so displaying the history fails, but the origin is correctly displayed https://webapp-postgresql.internal.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/NguyenHuuDangKhoa/my_profile
For the sysadmin, how the revision was extracted:
- Create a debug pod similar to the revision replayer:
kubectl --context archive-production-rke2 debug -n swh storage-replayer-revision-c969f986b-6lvz4 --copy-to=storage-revision-debug --container=storage-replayer -it -- bash
- Add a logger in the replayer and override the file
cat > ~/patch.txt <<EOF index 3e1ded35..5066b768 100644 --- a/swh/storage/replay.py +++ b/swh/storage/replay.py @@ -190,7 +190,10 @@ class ModelObjectDeserializer: def process_replay_objects( all_objects: Dict[str, List[BaseModel]], *, storage: StorageInterface ) -> None: + logger.info(f"-------------------------") for object_type, objects in all_objects.items(): + for object in objects: + logger.info(f"{object}") logger.debug("Inserting %s %s objects", len(objects), object_type) with statsd.timed(GRAPH_DURATION_METRIC, tags={"object_type": object_type}): _insert_objects(object_type, objects, storage) EOF patch -p1 -d /opt/swh/.local/lib/python3.10/site-packages < ~/patch.txt
- Reduce the batch size to 10 to have a limited block of revision (it's not easy to find the faulty revision in a block of 1000)
- Start the replayer and wait for a crash
/usr/local/bin/python /opt/swh/.local/bin/swh storage replay
Edited by Vincent Sellier - Owner
another revision:
INFO:swh.storage.replay:Revision(message=b'add zpmenv.sh example\n', author=Person(fullname=b'mike fulton <mikefultonpersonal@gmail.com>', name=b'mike fulton', email=b'mikefultonpersonal@gmail.com'), committer=Person(fullname=b'mike fulton <mikefultonpersonal@gmail.com>', name=b'mike fulton', email=b'mikefultonpersonal@gmail.com'), date=TimestampWithTimezone(timestamp=Timestamp(seconds=902367323816, microseconds=0), offset_bytes=b'-11783100'), committer_date=TimestampWithTimezone(timestamp=Timestamp(seconds=902367323816, microseconds=0), offset_bytes=b'-11783100'), type=RevisionType.GIT, directory=hash_to_bytes('b60c10d29ad1fd23f24d260acf025a9e9f0dd057'), synthetic=False, metadata=None, parents=(hash_to_bytes('0e08cf9621162aeb13a9c69f4f3ece3973064b8a'),), id=hash_to_bytes('13cb9f4b1bbb5dd406258ae955d5a258e19bc983'), extra_headers=(), raw_manifest=None)
from this repository: https://github.com/Trisk3lion/samples (fork of https://github.com/MikeFultonDev/samples)
Edited by Vincent Sellier Collapse replies - Owner
how it was tracked:
softwareheritage=# select * from object_references_2024w41 where target_type='revision' and target='\x13cb9f4b1bbb5dd406258ae955d5a258e19bc983' limit 10; insertion_date | source_type | source | target_type | target ----------------+-------------+--------------------------------------------+-------------+-------------------------------------------- 2024-10-07 | revision | \x0bed82b358c2693b8773d331da064b9694e0a156 | revision | \x13cb9f4b1bbb5dd406258ae955d5a258e19bc983 2024-10-08 | revision | \x43fb3e01cb9c6cc7ab4057d387b81ccedb90ebe6 | revision | \x13cb9f4b1bbb5dd406258ae955d5a258e19bc983 (2 rows) softwareheritage=# select * from object_references_2024w41 where target_type='revision' and target='\x0bed82b358c2693b8773d331da064b9694e0a156' limit 10; insertion_date | source_type | source | target_type | target ----------------+-------------+--------------------------------------------+-------------+-------------------------------------------- 2024-10-07 | revision | \xfc4e812dd809088ec10168eec4ec42ebf14e4811 | revision | \x0bed82b358c2693b8773d331da064b9694e0a156 (1 row) softwareheritage=# select * from object_references_2024w41 where target_type='revision' and target='\x43fb3e01cb9c6cc7ab4057d387b81ccedb90ebe6' limit 10; insertion_date | source_type | source | target_type | target ----------------+-------------+--------------------------------------------+-------------+-------------------------------------------- 2024-10-07 | snapshot | \x6aa3b60b7835e0d96f1953446f0ce9cf0bd534bd | revision | \x43fb3e01cb9c6cc7ab4057d387b81ccedb90ebe6 (1 row) softwareheritage=# select * from object_references_2024w41 where target_type='snapshot' and target='\x6aa3b60b7835e0d96f1953446f0ce9cf0bd534bd' limit 10; insertion_date | source_type | source | target_type | target ----------------+-------------+--------------------------------------------+-------------+-------------------------------------------- 2024-10-07 | origin | \xebb917de2ac5a3556be6e0788758314439f390b3 | snapshot | \x6aa3b60b7835e0d96f1953446f0ce9cf0bd534bd
and in cassandra:
guest@cqlsh:swh> select * from origin where sha1=0xebb917de2ac5a3556be6e0788758314439f390b3; sha1 | next_visit_id | url --------------------------------------------+---------------+--------------------------------------- 0xebb917de2ac5a3556be6e0788758314439f390b3 | 2 | https://github.com/Trisk3lion/samples
- Owner
softwareheritage-scheduler=> select * from origin_visit_stats where url='https://github.com/Trisk3lion/samples'; -[ RECORD 1 ]-------------+------------------------------------------- url | https://github.com/Trisk3lion/samples visit_type | git last_snapshot | \x6aa3b60b7835e0d96f1953446f0ce9cf0bd534bd last_scheduled | 2024-10-07 17:00:29.045787+00 next_visit_queue_position | 198170734676 next_position_offset | 4 successive_visits | 1 last_successful | 2024-10-07 17:37:08.360528+00 last_visit | 2024-10-07 17:37:08.360528+00 last_visit_status | successful
- Vincent Sellier assigned to @vsellier
assigned to @vsellier
- Owner
It seems the postgresql revision replayer is unstucked. If no manual action was performed, it could have something to dig into, it's not normal the message was ignored
- Owner
A new error is detected by the replayers:
INFO:swh.storage.replay:Revision(message=b'This is the 9977 commit.', author=Person(fullname=b'Not the secret holder. <no_secrets@example.com>', name=b'Not the secret holder.', email=b'no_secrets@example.com'), committer=Person(fullname=b'Not the secret holder. <no_secrets@example.com>', name=b'Not the secret holder.', email=b'no_secrets@example.com'), date=TimestampWithTimezone(timestamp=Timestamp(seconds=1200000000105, microseconds=0), offset_bytes=b'+0000'), committer_date=TimestampWithTimezone(timestamp=Timestamp(seconds=1200000000105, microseconds=0), offset_bytes=b'+0000'), type=RevisionType.GIT, directory=hash_to_bytes('2a5111c64577cdb18f9cf3ef8e27c18749bbc04d'), synthetic=False, metadata=None, parents=(hash_to_bytes('cd6beb1bcc97a689065e3813f44678d5b358caf5'),), id=hash_to_bytes('ec71e7b7c87b9e1907c68c69973a5b9251699152'), extra_headers=(), raw_manifest=None)
File "/opt/swh/.local/lib/python3.10/site-packages/swh/storage/postgresql/converters.py", line 148, in date_to_db timestamp = datetime.datetime.fromtimestamp(ts.seconds, datetime.timezone.utc) ValueError: year 39996 is out of range
As the directory replayer is quite late, it's not yet possible to identify the culprit repository
Edited by Vincent Sellier - Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@53e31f7c
mentioned in commit swh/infra/ci-cd/swh-charts@53e31f7c
- Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@5d89b30e
mentioned in commit swh/infra/ci-cd/swh-charts@5d89b30e
- Vincent Sellier marked this issue as related to swh/devel/swh-storage#4718 (closed)
marked this issue as related to swh/devel/swh-storage#4718 (closed)
- Owner
with the fix that will land in swh-model (swh/devel/swh-model!366 (merged)), we should be able to activate the model validation and send the errors into the redis.
- Guillaume Samson mentioned in commit swh/infra/ci-cd/swh-charts@005b3012
mentioned in commit swh/infra/ci-cd/swh-charts@005b3012
- Owner
The storage replayers errors are sent to
redis-postgresql-replayer.redis
: this should unstuck the replayers.Edited by Guillaume Samson