Skip to content

ra: Send modified objects only to storage after replaying a revision

Previously all contents and directories of the reconstructed filesystem were sent to the storage after having replayed a svn revision. The filtering of the new contents and directories to write to the storage is then delegated to the storage filtering proxy.

Proceeding like this has a huge performance impact on the loading of large subversion repositories as large sets of objects to archive are filtered again and again after each revision replay.

That commit performs the objects filtering at the loader level instead of delegating that task to the storage filtering proxy. It is done by maintaining a set of added or modified paths for a given revision when replaying it. As we use the svn_ra API, that set of paths can be easily computed with confidence.

This change provides a really significant speedup to the overall loading time of a subversion repository.

For my tests, I used the large tortoise SVN repository. Before that change, the loading took around 24h in the docker environment. After that change, the loading took around 4h so a 6x speedup !

Related to #3839 (closed)

Depends on !100 (closed)

Test Plan

I added snapshot integrity checks in tests where it was not performed to ensure all objects referenced by a snapshot can be found in the archive after a loading, no issues detected.


Migrated from D6950 (view on Phabricator)

Merge request reports