Skip to content

Handle malformed author and committer dates

All errrors reported by the git loader of type psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block [1] correspond to the processing of malformed dates.

This is usually due to a revision whose author or commit date is located far in the future, see for instance:

This results in an invalid computed timezone offset whose value overflows the smallint postgres type, resulting in the following exception being thrown in swh-storage:

Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/antoine/swh/swh-environment/swh-storage/swh/storage/db.py", line 201, in writer
    tblname, ', '.join(columns)), f)
psycopg2.DataError: ERREUR:  la valeur « 24193125 » est en dehors des limites pour le type smallint
CONTEXT:  COPY tmp_revision, ligne 19448, colonne date_offset : « 24193125 »

We should handle these corner cases. The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not. This could be handled directly in swh-storage [2] in case other loaders encounter a similar issue.


Migrated from T1339 (view on Phabricator)

Edited by Phabricator Migration user
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information