$ svnadmin create asf-mirror$ 7z x -so svn-asf-public-r0:1164363.7z | svnadmin load ./asf-mirror...------- Committed revision 923 >>><<< Started new transaction, based on original revision 924 * editing path : incubator/directory/ldap/trunk/sandbox0/.cvsignore ... done.------- Committed revision 924 >>><<< Started new transaction, based on original revision 925 * editing path : incubator/directory/ldap/trunk/sandbox0/newbackend/src/java/ldapd/server/jndi/InterceptorPipeline.java ... done.svnadmin: E125005: Invalid property value found in dumpstream; consider repairing the source or using --bypass-prop-validation while loading.svnadmin: E125005: Cannot accept non-LF line endings in 'svn:log' property
Repairing the source (2nd suggestion) seems a no go as this would touch the initial log message (at least for that revision).
So that would be enough to mess up the revision hash history.
The 1st suggestion is currently tested and so far so good (more than 700k revision has been done so far).
During our latest exchange with our asf contact (Greg Stein), i ask about history modification and here is his answer:
The ASF never rewrites svn revision history. Given the size of our repository, it would be prohibitive, even if we philosophically thought it was proper (and we don't! static!)
That's good.
That said, we do allow log messages to be edited. Records of such changes are only on mailing lists. We have no structured history for this.
That's not a good news.
From our view point, they do modify their history. The log message is used for the revision hash computation.
So we will have altered history hiccup along loading incrementally the asf mirror.
And that also means, we will have that possiblity for any other live repositories.
I don't see anything new here. Subversion offers no integrity guarantees, it applies to the ASF repos like it applies to any other SVN repo out there. We need to decide a policy about when (if at all), re-do full ingestions of Subversion repos (which will allow to re-inject modified objects at the cost of forking the resulting history on Software Heritage) or just say shrug and never re-ingest in a non-incremental way any Subversion repo we have previously ingested.
I had it in mind but it hit me way more when i read it.
Also, explicit is better than implicit
Subversion offers no integrity guarantees,
Yes. That's somewhat bad.
it applies to the ASF repos like it applies to any other SVN repo out there.
Sure.
We need to decide a policy about when (if at all), re-do full ingestions of Subversion repos (which will allow to re-inject modified objects at the cost of forking the resulting history on Software Heritage)
Please, let's decide then...
I recently added a start_from_scratch flag (to permit rescheduling missing objects in the googlecode dumps). So, this can be leveraged.
or just say shrug and never re-ingest in a non-incremental way any Subversion repo we have previously ingested.
That sounds rough and off regarding the global swh goal i came to understand.
I prefer option 1.
We could also mix 1. and 2. depending on the repository's size (in terms of svn revisions).
Also, there may be a third option, as a trade-off, (for very large repositories), don't use the swh revision hash as a way forward but the svn revision number (within the swh revision, we have the svn revision).
i meant missing from uffizi. Checking the source, it's missing from the index page listing.
Need to notify our asf contact but i'll make sure there are no other holes first.
Need to notify our asf contact but i'll make sure there are no other holes first.
No need. All that needs to is there.
The jokes is on me about the dumps.
There are overlap (and holes) in the dumps provided...
1693677 ['1700376', '1727878']1700377 ['1706178'] <- reudndant with 2nd dump (1693677:1727878)1706179 ['1711710'] <- redundant with 2nd dump1711711 ['1717366'] <- redundant with 2nd dump1717367 ['1722480'] <- redundant with 2nd dump<holes from 1722480 to 1727878 which must be present in the 2nd dump from 1693677 to 1727878 >1727879 ['1732985']