Make opam shared root initialization more robust
The way we initialize the opam shared root on workers, through multiple timer units, is pretty brittle. A lot of the services actually fail to run because of a race condition or because of "lack of a switch", whatever that means.
Jan 03 00:00:00 worker01 systemd[1]: Started Software Heritage Manage OPAM shared state (coq.inria.fr).
Jan 03 00:00:01 worker01 opam-manage-shared-state.sh[1475377]: [WARNING] No switch is currently set, perhaps you meant '--set-default'?
Jan 03 00:00:02 worker01 opam-manage-shared-state.sh[1475377]: [coq.inria.fr] Initialised
Jan 03 00:00:02 worker01 opam-manage-shared-state.sh[1475377]: [ERROR] No switch is currently set. Please use 'opam switch' to set or install a switch
Jan 03 00:00:02 worker01 systemd[1]: opam-manage-shared-state-coq.inria.fr.service: Main process exited, code=exited, status=50/n/a
Jan 03 00:00:02 worker01 systemd[1]: opam-manage-shared-state-coq.inria.fr.service: Failed with result 'exit-code'.
Instead of having separate timer units, we should probably have a single one which would run a script that would create the default root first, then run snippets updating the created root for each separate instance afterwards.
-
use
set -e
in the main script if it's not already set -
make the update script update all instances at once, instead of a single one separately: write a main script which handles the creation of the opam root and the update of the main instance, and run snippets generated for each other instance, using
run-parts
- make sure the worker only starts after a successful run of the main service
- make sure the timer unit runs at different times, rather than all at midnight
-
consider moving the shared root to
/var/tmp
to avoid it being blown away by reboots (probably not needed if we make sure the service dependencies are correct)
Migrated from T3826 (view on Phabricator)