Properly handle loading of repository sub-tree
Subversion allows to perform checkout/export operation on a specific sub-tree of a repository, see below:
anlambert@carnavalet:/tmp$ svn info https://svn.code.sf.net/p/xvidcap/code/trunk/debian
Path: debian
URL: https://svn.code.sf.net/p/xvidcap/code/trunk/debian
Relative URL: ^/trunk/debian
Repository Root: https://svn.code.sf.net/p/xvidcap/code
Repository UUID: 521773ef-0118-0410-98fd-b0fa47ad2f46
Revision: 319
Node Kind: directory
Last Changed Author: charly4711
Last Changed Rev: 319
Last Changed Date: 2009-07-14 09:45:41 +0200 (mar., 14 juil. 2009)
anlambert@carnavalet:/tmp$ svn checkout https://svn.code.sf.net/p/xvidcap/code/trunk/debian xvidcap-debian
A xvidcap-debian/rules
A xvidcap-debian/changelog
A xvidcap-debian/control
A xvidcap-debian/postinst
A xvidcap-debian/postrm
A xvidcap-debian/copyright
A xvidcap-debian/Makefile.am
A xvidcap-debian/xvidcap.menu
A xvidcap-debian/xvidcap.files
A xvidcap-debian/compat
A xvidcap-debian/bts
Checked out revision 319.
Currently, the subversion loader does not handle correctly that case due to the use of svnrdump
.
Indeed, svnrdump
filters the repository paths outside of the sub-tree but still dumps all
commits of the root repository. This means that the produced dump might contain empty commits
if those modify paths outside of the sub-tree.
Below is an extract of the dump file generated by svnrdump dump https://svn.code.sf.net/p/xvidcap/code/trunk/debian
,
we can see there is commits without any modifications on the dumped sub-tree of the repository.
Revision-number: 15
Prop-content-length: 130
Content-length: 130
K 10
svn:author
V 10
charly4711
K 8
svn:date
V 27
2006-08-26T14:12:17.108970Z
K 7
svn:log
V 24
deleting ffmpeg-svn5528
PROPS-END
Revision-number: 16
Prop-content-length: 147
Content-length: 147
K 10
svn:author
V 10
charly4711
K 8
svn:date
V 27
2006-08-26T14:21:45.547817Z
K 7
svn:log
V 41
updated to new ffmpeg, loads of bugfixes
PROPS-END
Consequently, the subversion loader will generate a lot of empty revisions targeting the same directory when loading data coming from such a dump. This is what we can observe on that repository whose loading has been executed on staging. If you look at the revisions history, you will find a lot of empty ones that should not have been archived.
So the loader implementation should be improved to properly handle the loading of a sub-tree by filtering out the commits that do not modify paths in it.
Migrated from T3896 (view on Phabricator)