fix regular expression used for matching RCS keywords
The function RcsKeywords.expand_keyword() uses a regular expression to detect keywords in lines of a file. This expression treated the closing $-sign of the keyword as optional. This fails when a line contains an invalid keyword declaration which lacks the trailing $-sign, followed by a valid keyword declaration.
For example, consider this line from a file in the ccvs CVS repository:
#ident "@(#)cvs/contrib/pcl-cvs:$Name:
Here, the CVS server expands the
Our regular expression would split this line into two match groups:
1: #ident "@(#)cvs/contrib/pcl-cvs:$Name: $ 2: Id$
Neither match produces a valid keyword and hence expand_keyword() would not expand keywords on this line.
Tweak our test data to cover the above example, and fix the regular expression such that the trailing $-sign is required. With this change in place we match the
Migrated from D6684 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D6684 (id=24276)
Could not rebase; Attempt merge onto 1f6580c4...
Updating 1f6580c..00b218b Fast-forward swh/loader/cvs/cvs2gitdump/cvs2gitdump.py | 2 +- swh/loader/cvs/loader.py | 23 ++++++++++++++++++++++- swh/loader/cvs/tests/data/greek-repository4.tgz | Bin 12393 -> 12525 bytes swh/loader/cvs/tests/test_loader.py | 20 ++++++++++---------- 4 files changed, 33 insertions(+), 12 deletions(-)
Changes applied before test
commit 00b218b3ef4ac51ec6547ae03dbc339c72db257e Author: Stefan Sperling <stsp@stsp.name> Date: Wed Nov 24 12:13:11 2021 +0100 fix regular expression used for matching RCS keywords The function RcsKeywords.expand_keyword() uses a regular expression to detect keywords in lines of a file. This expression treated the closing $-sign of the keyword as optional. This fails when a line contains an invalid keyword declaration which lacks the trailing $-sign, followed by a valid keyword declaration. For example, consider this line from a file in the ccvs CVS repository: #ident "@(#)cvs/contrib/pcl-cvs:$Name: $Id$" Here, the CVS server expands the $Id$ keyword, and it does not expand the $Name$ keyword (which lacks a trailing $-sign). Our regular expression would split this line into two match groups: 1: #ident "@(#)cvs/contrib/pcl-cvs:$Name: $ 2: Id$ Neither match produces a valid keyword and hence expand_keyword() would not expand keywords on this line. Tweak our test data to cover the above example, and fix the regular expression such that the trailing $-sign is required. With this change in place we match the $Id$ part of the above example and expand keywords on this line just like the CVS server does. commit 5539ccb67b2a3398019eff571f633a227a498fcd Author: Stefan Sperling <stsp@stsp.name> Date: Tue Nov 23 15:16:11 2021 +0100 attempt to avoid content differences due to paths in keywords Some RCS keywords, such has "Header", contain absolute file paths derived from the on-disk filesystem path of the CVS repository. When we fetch files over the pserver protocol such keywords are expanded by the CVS server. But when using the rsync protocol we will first copy the CVS repository to local disk and the path to this local copy will correspond to some temporary directory. Try to avoid file content differences between pserver and rsync access methods by deriving a likely server-side path from path information found in the rsync:// origin URL. This will work as expected as long as the CVS server-side setup exposes the same path to the CVS repository over both access methods, which is the case for GNU savannah for example. In general, we should recommend treating pserver and rsync as distinct origins and not rely on them to be interchangable and always produce the same conversion result. But we can still try our best to avoid needless differences in content hashes.
See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/66/ for more details.
mentioned in issue #3691