After the deployment to production of the new nixguix lister and related loaders, numerous loadings with the git-checkout visit_type have failed.
The reported errors are related to NAR checksum mismatch between the one computed by the git-checkout loader and the one provided by guix or nix in their sources.json manifest.
However there is something odd as if we try to load these repositories locally or in the docker environment, the loadings succeed without hash mismatches.
For instance for the loading of https://github.com/tdf/libcmis at tag v0.6.0, the following error is reported in production:
Checksum mismatched on <https://github.com/tdf/libcmis>: {'sha256': '512b6149a8e249ffedea1977f3239ce55247b0e66e97f7566f37b4df9c4a26f1'} != {'sha256': '136038b8951acaa30cc5589fcde01e60a2cbf85895daf4a134d7577b964b5d9e'}
While executing the loading locally is fine:
$ swh -l DEBUG loader run git-checkout https://github.com/tdf/libcmis ref=v0.6.0 checksum_layout=nar checksums='{"sha256": "136038b8951acaa30cc5589fcde01e60a2cbf85895daf4a134d7577b964b5d9e"}'DEBUG:swh.loader.cli:ctx: <click.core.Context object at 0x7fd3f105d2d0>DEBUG:swh.loader.cli:config_file: NoneDEBUG:swh.loader.cli:config: {}WARNING:swh.loader.cli:No storage configuration detected, using an in-memory storage instead.DEBUG:swh.loader.cli:kw: {'ref': 'v0.6.0', 'checksum_layout': 'nar', 'checksums': {'sha256': '136038b8951acaa30cc5589fcde01e60a2cbf85895daf4a134d7577b964b5d9e'}}DEBUG:swh.loader.cli:registry: {'task_modules': [], 'loader': <class 'swh.loader.git.directory.GitCheckoutLoader'>}DEBUG:swh.loader.cli:loader class: <class 'swh.loader.git.directory.GitCheckoutLoader'>DEBUG:swh.loader.git.directory.GitCheckoutLoader:Loader checksums computation: narINFO:swh.loader.git.directory.GitCheckoutLoader:Load origin 'https://github.com/tdf/libcmis' with type 'git-checkout'DEBUG:swh.loader.git.directory.GitCheckoutLoader:lister_not provided, skipping extrinsic origin metadataDEBUG:swh.loader.git.directory.GitCheckoutLoader:Artifact <git-checkout> with path /tmp/tmpl4k7e1k8/libcmisDEBUG:swh.loader.git.directory.GitCheckoutLoader:Artifact <git-checkout> to check nar hashes: /tmp/tmpl4k7e1k8/libcmisDEBUG:swh.loader.git.directory.GitCheckoutLoader:Number of skipped contents: 0DEBUG:swh.loader.git.directory.GitCheckoutLoader:Number of contents: 333DEBUG:swh.loader.git.directory.GitCheckoutLoader:Number of directories: 21DEBUG:swh.loader.git.directory.GitCheckoutLoader:cleanup{'status': 'eventful'} for origin 'https://github.com/tdf/libcmis'
I attached below the list of failing origin URLs extracted from sentry with that script for both staging and production environments:
How about loading several origins one after the other?
I scheduled the origins to load in the docker environment using that script create-git-checkout-loading-tasks and so far the only hash mismatch errors I got are real ones (they are still a couple left, for instance if upstream force pushed a tag), other origins are loading fine.
$ swh -l DEBUG loader run directory https://gitlab.com/api/v4/projects/iucode-tool%2Fiucode-tool/repository/archive.tar.gz?sha=v2.3.1 fallback_urls='[]' checksum_layout='nar' checksums='{"sha256": "6a30e9cb08327ab6ef81eae7d1bf13e232545a2b0ccf0ae1c0a3b583b88eb411"}'WARNING:swh.loader.cli:No storage configuration detected, using an in-memory storage instead.DEBUG:swh.loader.cli:kw: {'fallback_urls': [], 'checksum_layout': 'nar', 'checksums': {'sha256': '6a30e9cb08327ab6ef81eae7d1bf13e232545a2b0ccf0ae1c0a3b583b88eb411'}}DEBUG:swh.loader.cli:registry: {'task_modules': ['swh.loader.core.tasks'], 'loader': <class 'swh.loader.core.loader.TarballDirectoryLoader'>}DEBUG:swh.loader.cli:loader class: <class 'swh.loader.core.loader.TarballDirectoryLoader'>DEBUG:swh.loader.core.loader.TarballDirectoryLoader:Loader checksums computation: narINFO:swh.loader.core.loader.TarballDirectoryLoader:Load origin 'https://gitlab.com/api/v4/projects/iucode-tool%2Fiucode-tool/repository/archive.tar.gz?sha=v2.3.1' with type 'tarball-directory'DEBUG:swh.loader.core.loader.TarballDirectoryLoader:lister_not provided, skipping extrinsic origin metadataDEBUG:swh.core.statsd:Error submitting statsd packet. Dropping the packet and closing the socket.DEBUG:swh.loader.core.loader.TarballDirectoryLoader:prepare; origin_url=https://gitlab.com/api/v4/projects/iucode-tool%2Fiucode-tool/repository/archive.tar.gz?sha=v2.3.1 fallback=https://gitlab.com/api/v4/projects/iucode-tool%2Fiucode-tool/repository/archive.tar.gz?sha=v2.3.1 scheme=https path=/api/v4/projects/iucode-tool%2Fiucode-tool/repository/archive.tar.gzDEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): gitlab.com:443DEBUG:urllib3.connectionpool:https://gitlab.com:443 "GET /api/v4/projects/iucode-tool%2Fiucode-tool/repository/archive.tar.gz?sha=v2.3.1 HTTP/1.1" 200 NoneDEBUG:swh.loader.package.utils:filename: iucode-tool-v2.3.1-9ab8dcb833945ffa71d8bfc034b27ef7d858b3dc.tar.gzDEBUG:swh.loader.package.utils:filepath: /tmp/tmpmmobvxl2/iucode-tool-v2.3.1-9ab8dcb833945ffa71d8bfc034b27ef7d858b3dc.tar.gzDEBUG:swh.loader.package.utils:extrinsic_metadata: {'length': 79521, 'filename': 'iucode-tool-v2.3.1-9ab8dcb833945ffa71d8bfc034b27ef7d858b3dc.tar.gz', 'checksums': {'sha1': '5af5f6dec536434503e604e0c8beca388af1698b', 'sha256': '76e79fe4eaee75cfb97d651ec009b557079ba330abfedae06c034f8901dcbe44'}, 'url': 'https://gitlab.com/api/v4/projects/iucode-tool%2Fiucode-tool/repository/archive.tar.gz?sha=v2.3.1'}DEBUG:swh.loader.core.loader.TarballDirectoryLoader:uncompressed path to directory: /tmp/tmpmmobvxl2/srcDEBUG:swh.loader.core.loader.TarballDirectoryLoader:Artifact <tarball-directory> with path /tmp/tmpmmobvxl2/src/iucode-tool-v2.3.1-9ab8dcb833945ffa71d8bfc034b27ef7d858b3dcDEBUG:swh.loader.core.loader.TarballDirectoryLoader:Artifact <tarball-directory> to check nar hashes: /tmp/tmpmmobvxl2/src/iucode-tool-v2.3.1-9ab8dcb833945ffa71d8bfc034b27ef7d858b3dcDEBUG:swh.loader.core.loader.TarballDirectoryLoader:Number of skipped contents: 0DEBUG:swh.loader.core.loader.TarballDirectoryLoader:Number of contents: 22DEBUG:swh.loader.core.loader.TarballDirectoryLoader:Number of directories: 2DEBUG:swh.loader.core.loader.TarballDirectoryLoader:cleanup{'status': 'eventful'} for origin 'https://gitlab.com/api/v4/projects/iucode-tool%2Fiucode-tool/repository/archive.tar.gz?sha=v2.3.1'
This issue is hard to debug without inspecting the content of the directory where the NAR hash computation failed.
I think it could make sense, in case of such failure, to dump the output of the repository listing along the SWHIDs for contents and directories in a file and attach it to the sentry event.
It would make sense to add an explicit flag to the loader, to dump the temporary directory, which we could inject in the listed origin entry (or just schedule manually). Sentry isn't really set up to support (potentially large) attachments for all failing visit events.
@vsellier, you were right about the possible cause of that issue in our f2F discussion today !
I gave another look to our NAR hash computation code and indeed file permissions are used during that processing. In particular execution permission is checked and influences the hash output value.
So I took the following failure example for the git-checkout visit of repository https://gitlab.com/mdds/mdds at tag 2.1.1 in the hope to reproduce the same invalid NAR hash reported in the sentry event (first one below).
Checksum mismatched on <https://gitlab.com/mdds/mdds>: {'sha256':'b30561047d7a3d212699f8fd2bc7a675d7780cef540022a27185b6220f370d16'} != {'sha256': '6b8d762e980389833c4cc4e8694ad3947b5b9584b51de873ac33b0bc8000c620'}
First I checked that we could compute the correct NAR hash without hacking on our NAR hashing code:
$ git checkout https://gitlab.com/mdds/mddsCloning into 'mdds'...warning: redirecting to https://gitlab.com/mdds/mdds.git/remote: Enumerating objects: 20109, done.remote: Counting objects: 100% (581/581), done.remote: Compressing objects: 100% (194/194), done.remote: Total 20109 (delta 385), reused 581 (delta 385), pack-reused 19528Receiving objects: 100% (20109/20109), 6.70 MiB | 20.79 MiB/s, done.Resolving deltas: 100% (13601/13601), done.$ cd mdds$ git checkout 2.1.1HEAD is now at d0a5a816 Fix make distcheck$ swh nar -x.6b8d762e980389833c4cc4e8694ad3947b5b9584b51de873ac33b0bc8000c620
All good ! Then I commented out the code checking file execution permission in our NAR hashing code linked above and computed the hash again.
So the cause of the issue is that executable files cannot be detected as such in our production environment likely due to the use of noexec option when mounting /tmp.
So, I mentioned it on IRC, and indeed os.access(..., os.X_OK) honors the noexec flag. It also wouldn't match a file that has an executable bit that the effective gid can't honor, which might happen if we ever started fully separating the decompression process from the loading process.
So, we should instead use os.lstat(...).st_mode & 0o111 != 0, which only relies on whether the filesystem bits are set to executable, and doesn't check the permissions of the mountpoint or the owner of the file.
The deployment got done and the sentry issue marked as solved yesterday.
So far, the issue did not get raised back (as regression) by the loaders yet.
I've rescheduled in staging the failed git-checkout tasks [2] (which are the dominant occurrences here) [1] as another check.
I've bumped the number of pods to ingest those to 3 so it's getting a bit faster...
If all is good, i'll do the same on production after that.
[1]
swh@swh-toolbox-d96bc8d9b-sxdzg:~$ swh scheduler -C $SWH_CONFIG_FILENAME origin send-to-celery git-checkout --lister-name nixguix --lister-instance-name guix.gnu.org --failed-cooldown '1 second'10000 slots available in celery queue4223 visits to send to celery
[2]
Forge guix.gnu.org (nixguix) has 33368 scheduled ingests in the scheduler.failed : 4640None : 5402not_found : 56successful : 23270total : 33368success rate : 69.73747302805083%
The issue got marked as regression but I gather that's because some hash mismatch can still happen (if the issue is upstream of guix's as it already happened before).
Anyway, the numbers are going up fast for git-checkout origins [1]