Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Register
  • Sign in
  • M Meta
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 459
    • Issues 459
    • List
    • Boards
    • Service Desk
    • Milestones
  • Snippets
    • Snippets
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Platform
  • Meta
  • Issues
  • #1339
Closed
Open
Issue created Nov 14, 2018 by Antoine Lambert@anlambertMaintainer

Handle malformed author and committer dates

All errrors reported by the git loader of type psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block [1] correspond to the processing of malformed dates.

This is usually due to a revision whose author or commit date is located far in the future, see for instance:

  • https://github.com/cristiansteib/unAventon/commit/1d05195322ec802238fb7c4608b8db614b4d75c7

  • https://github.com/archlinuxarm/PKGBUILDs/commit/509419e3280b66026c55787ccc8ee97e53ca690f

  • https://github.com/ska-sa/PySPEAD/commit/0ee7c7d41b57b471af00b4e2869cede57706b9fd

  • https://github.com/samthiriot/openmole/commit/b558e2f43067c6471ac656fcadff4559958abbcf

This results in an invalid computed timezone offset whose value overflows the smallint postgres type, resulting in the following exception being thrown in swh-storage:

Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/antoine/swh/swh-environment/swh-storage/swh/storage/db.py", line 201, in writer
    tblname, ', '.join(columns)), f)
psycopg2.DataError: ERREUR:  la valeur « 24193125 » est en dehors des limites pour le type smallint
CONTEXT:  COPY tmp_revision, ligne 19448, colonne date_offset : « 24193125 »

We should handle these corner cases. The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not. This could be handled directly in swh-storage [2] in case other loaders encounter a similar issue.

  • [1] http://kibana0.internal.softwareheritage.org:5601/app/kibana#/dashboard/22195930-d36e-11e8-913b-077937c6a5ef?_g=(refreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3Anow-60d%2Cmode%3Aquick%2Cto%3Anow))

  • [2] https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/converters.py$125


Migrated from T1339 (view on Phabricator)

Edited Jan 08, 2023 by Phabricator Migration user
Assignee
Assign to
Time tracking