[winery] Concurrency issue on shard db
The issue happens inside a same objstorage between different workers or between different objstorage.
Several workers are writing in the same shared database but are computing shard information in a local variable
Depending of the requests received, the shards continue to write on a shard assumed to be under packing. Several packing can be launched for the same shards.
It's tested in the docker env based on swh-environment!281 (closed) with the packing failing. I could be a side effect, but it doesn't looks very resilient because at some point a packing can fail.
Some logs added on the RWShard.add method:
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 41598184 (full:False) <---- Workers are unsynced
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 71513677 (full:False)
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 41599806 (full:False)
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 105292487 (full:True)
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 71516443 (full:False)
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 41601662 (full:False)
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 71519135 (full:False)
...
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 41617351 (full:False)
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 71542077 (full:False)
docker-swh-objstorage-3 | Locking shard if1cfd58d6384b0594739d0f0e3531b6 <---- A packing is trigger by a worker
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 41619206 (full:False) <---- Another worker continue to write on the same shard
docker-swh-objstorage-3 | Process Process-2: <---- The packing failed
docker-swh-objstorage-3 | Traceback (most recent call last):
docker-swh-objstorage-3 | File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
docker-swh-objstorage-3 | self.run()
docker-swh-objstorage-3 | File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
docker-swh-objstorage-3 | self._target(*self._args, **self._kwargs)
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/objstorage.py", line 107, in pack
docker-swh-objstorage-3 | return Packer(shard, **kwargs).run()
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/objstorage.py", line 115, in __init__
docker-swh-objstorage-3 | self.init()
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/objstorage.py", line 119, in init
docker-swh-objstorage-3 | self.ro = ROShard(self.shard, **self.args)
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/roshard.py", line 63, in __init__
docker-swh-objstorage-3 | self.pool = Pool(shard_max_size=kwargs["shard_max_size"])
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/roshard.py", line 22, in __init__
docker-swh-objstorage-3 | self.rbd = sh.sudo.bake("rbd", f"--pool={self.name}")
docker-swh-objstorage-3 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/sh.py", line 3548, in __getattr__
docker-swh-objstorage-3 | return self.__env[name]
docker-swh-objstorage-3 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/sh.py", line 3330, in __getitem__
docker-swh-objstorage-3 | raise CommandNotFound(k)
docker-swh-objstorage-3 | sh.CommandNotFound: sudo
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 71544956 (full:False) <------------ Other workers continue to write on the same shard
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 41620718 (full:False)
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 71547827 (full:False)
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 41622094 (full:False)
...
docker-swh-objstorage-3 | Local size of if41ab8e69c449cca572005c46b01ee5: 104948380 (full:True) <----- Until they try to compact too
docker-swh-objstorage-3 | Local size of if1cfd58d6384b0594739d0f0e3531b6: 82357069 (full:False)
docker-swh-objstorage-3 | Local size of if1cfd58d6384b0594739d0f0e3531b6: 82536132 (full:False)
docker-swh-objstorage-3 | Local size of if1cfd58d6384b0594739d0f0e3531b6: 82710496 (full:False)
docker-swh-objstorage-3 | Locking shard if1cfd58d6384b0594739d0f0e3531b6
docker-swh-objstorage-3 | Process Process-2:
docker-swh-objstorage-3 | Local size of if1cfd58d6384b0594739d0f0e3531b6: 82879993 (full:False)
docker-swh-objstorage-3 | Traceback (most recent call last):
docker-swh-objstorage-3 | File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
docker-swh-objstorage-3 | self.run()
docker-swh-objstorage-3 | File "/usr/local/lib/python3.7/multiprocessing/process.py", line 99, in run
docker-swh-objstorage-3 | self._target(*self._args, **self._kwargs)
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/objstorage.py", line 107, in pack
docker-swh-objstorage-3 | return Packer(shard, **kwargs).run()
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/objstorage.py", line 115, in __init__
docker-swh-objstorage-3 | self.init()
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/objstorage.py", line 119, in init
docker-swh-objstorage-3 | self.ro = ROShard(self.shard, **self.args)
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/roshard.py", line 63, in __init__
docker-swh-objstorage-3 | self.pool = Pool(shard_max_size=kwargs["shard_max_size"])
docker-swh-objstorage-3 | File "/src/swh-objstorage"/swh/objstorage/backends/winery/roshard.py", line 22, in __init__
docker-swh-objstorage-3 | self.rbd = sh.sudo.bake("rbd", f"--pool={self.name}")
docker-swh-objstorage-3 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/sh.py", line 3548, in __getattr__
docker-swh-objstorage-3 | return self.__env[name]
docker-swh-objstorage-3 | File "/srv/softwareheritage/venv/lib/python3.7/site-packages/sh.py", line 3330, in __getitem__
docker-swh-objstorage-3 | raise CommandNotFound(k)
docker-swh-objstorage-3 | sh.CommandNotFound: sudo