Skip to content

Refactor the content replayer as a class

David Douard requested to merge douardda/swh-objstorage-replayer:fix-#1 into master

this allows to make it actually benefit from multithreading, by not sharing objstorage instances between threads.

Note: the doctest formely in process_replay_objects_content's docstring has been moved to a proper test in test_replay.py; the required mocking to make it work is not suitable for docstring anymore.

The idea is to replace the worker_fn (process_replay_objects_content) by a class (ContentReplayer), and use a pool of worker threads responsible for replicating objects (from a queue). objstorages are given by config (rather than instances) to the ContentReplayer class so that each worker thread can instantiate its own pair of objstorage (source, destination), preventing from sharing these objstorage instances between theads.

On a bench test, using a docker-deployed seaweedfs as local objstorage and using a source objstorage config consisting in multiplexing S3 (http backend), then azure (azure backend), then Rocq's RO objstorage (remote objstorage), a single replayer that used to be limited to about 6obj/s whatever the number of concurrent workers are used. With this version of the replayer, a single process can now achieve 80/100 obj/s (using 32 to 48 concurrent threads).

Fix #1 (closed)

Edited by David Douard

Merge request reports

Loading