Skip to content

directory: Align process to checkout a remote git ref with guix one

It has been observed that the process used by SWH to checkout a remote git reference can lead to different recursive nar hash values compared to those computed by guix. This seems related to CR/LF normalization.

So prefer to align the process to checkout a remote git reference with the one used by guix. It seems also faster than the previous approach.

Also refine the not found repository detection process as previously some non related git errors could be missed.

Related to #4751 (comment 150664).

Before:

$ time swh loader run git-checkout https://github.com/apache/groovy ref=GROOVY_3_0_5 checksum_layout=nar checksums='{"sha256": "d5a653ce9c1f4662cdcaa2fe2d0eb7359936f807d67807fb50c3b22a3b36a500"}'
WARNING:swh.loader.cli:No storage configuration detected, using an in-memory storage instead.
INFO:swh.loader.git.directory.GitCheckoutLoader:Load origin 'https://github.com/apache/groovy' with type 'git-checkout'
Cloning into '/tmp/tmpa6661kuh/groovy'...
remote: Enumerating objects: 37577, done.
remote: Counting objects: 100% (621/621), done.
remote: Compressing objects: 100% (579/579), done.
remote: Total 37577 (delta 66), reused 563 (delta 42), pack-reused 36956
Receiving objects: 100% (37577/37577), 8.43 MiB | 3.15 MiB/s, done.
Resolving deltas: 100% (1613/1613), done.
remote: Enumerating objects: 1186, done.
remote: Counting objects: 100% (420/420), done.
remote: Compressing objects: 100% (330/330), done.
remote: Total 1186 (delta 1), reused 243 (delta 0), pack-reused 766
Receiving objects: 100% (1186/1186), 210.31 KiB | 3.75 MiB/s, done.
Resolving deltas: 100% (2/2), done.
remote: Enumerating objects: 4965, done.
remote: Counting objects: 100% (4440/4440), done.
remote: Compressing objects: 100% (3520/3520), done.
remote: Total 4965 (delta 2531), reused 1132 (delta 920), pack-reused 525
Receiving objects: 100% (4965/4965), 6.85 MiB | 2.81 MiB/s, done.
Resolving deltas: 100% (2748/2748), done.
Updating files: 100% (4988/4988), done.
remote: Enumerating objects: 1116, done.
remote: Counting objects: 100% (230/230), done.
remote: Compressing objects: 100% (230/230), done.
remote: Total 1116 (delta 0), reused 0 (delta 0), pack-reused 886
Receiving objects: 100% (1116/1116), 199.34 KiB | 3.44 MiB/s, done.
remote: Enumerating objects: 2394, done.
remote: Counting objects: 100% (2275/2275), done.
remote: Compressing objects: 100% (2052/2052), done.
remote: Total 2394 (delta 1003), reused 223 (delta 223), pack-reused 119
Receiving objects: 100% (2394/2394), 4.14 MiB | 1.68 MiB/s, done.
Resolving deltas: 100% (1024/1024), done.
Updating files: 100% (3762/3762), done.
HEAD is now at 0be35bb1 Release 3.0.5: update versions
ERROR:swh.loader.git.directory.GitCheckoutLoader:Loading failure, updating to `failed` status
Traceback (most recent call last):
  File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 433, in load
    more_data_to_fetch = self.fetch_data()
                         ^^^^^^^^^^^^^^^^^
  File "/home/anlambert/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 818, in fetch_data
    raise errors[0]
ValueError: Checksum mismatched on <https://github.com/apache/groovy>: {'sha256': 'cbfc7055d6baa0011c0aa5d8e2fa79ea96148851f09ab203b5d064c7cec0900c'} != {'sha256': 'd5a653ce9c1f4662cdcaa2fe2d0eb7359936f807d67807fb50c3b22a3b36a500'}
{'status': 'failed'} for origin 'https://github.com/apache/groovy'

real    0m15,583s
user    0m5,440s
sys     0m2,389s

After:

$ time swh loader run git-checkout https://github.com/apache/groovy ref=GROOVY_3_0_5 checksum_layout=nar checksums='{"sha256": "d5a653ce9c1f4662cdcaa2fe2d0eb7359936f807d67807fb50c3b22a3b36a500"}'
WARNING:swh.loader.cli:No storage configuration detected, using an in-memory storage instead.
INFO:swh.loader.git.directory.GitCheckoutLoader:Load origin 'https://github.com/apache/groovy' with type 'git-checkout'
{'status': 'eventful'} for origin 'https://github.com/apache/groovy'

real    0m6,735s
user    0m3,672s
sys     0m1,405s
Edited by Antoine Lambert

Merge request reports