Skip to content

Prevent timestamps in node properties from being shifted

vlorentz requested to merge no-timezone-shift into master

according to the timezone WriteNodeProperties is being run in.

Due to our ORC exports using the timestamp instead of the timestamp with timezone, reader and writer need to agree out of bound on the timezone used in files they exchange.

However, we don't do this:

  • swh-dataset uses pyorc, which uses the C++ ORC library, which assumes users (us) always write in GMT
  • swh-graph uses the Java ORC library, which assumes the system timezone (or $TZ if set)

So when reading with a non-UTC timezone, the Java ORC library interprets timestamps in the dataset as being in the local timezone, and converts them to UNIX timestamps (number of seconds since epoch); then we use these converted timestamps and write them to .property.author_timestamp.bin and .property.committer_timestamp.bin.

This commit regenerates the example graph to have the correct timestamps. It also applies the 39ed0d17 change that removes useless padding at the end of all property files.

Resolves #4788 (closed)

Merge request reports