Skip to content
Snippets Groups Projects

fix regular expression used for matching RCS keywords

The function RcsKeywords.expand_keyword() uses a regular expression to detect keywords in lines of a file. This expression treated the closing $-sign of the keyword as optional. This fails when a line contains an invalid keyword declaration which lacks the trailing $-sign, followed by a valid keyword declaration.

For example, consider this line from a file in the ccvs CVS repository:

#ident "@(#)cvs/contrib/pcl-cvs:$Name:

IdId
"

Here, the CVS server expands the

IdId
keyword, and it does not expand the
NameName
keyword (which lacks a trailing $-sign).

Our regular expression would split this line into two match groups:

1: #ident "@(#)cvs/contrib/pcl-cvs:$Name: $ 2: Id$

Neither match produces a valid keyword and hence expand_keyword() would not expand keywords on this line.

Tweak our test data to cover the above example, and fix the regular expression such that the trailing $-sign is required. With this change in place we match the

IdId
part of the above example and expand keywords on this line just like the CVS server does.


Migrated from D6684 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D6684 (id=24276)

    Could not rebase; Attempt merge onto 1f6580c4...

    Updating 1f6580c..00b218b
    Fast-forward
     swh/loader/cvs/cvs2gitdump/cvs2gitdump.py       |   2 +-
     swh/loader/cvs/loader.py                        |  23 ++++++++++++++++++++++-
     swh/loader/cvs/tests/data/greek-repository4.tgz | Bin 12393 -> 12525 bytes
     swh/loader/cvs/tests/test_loader.py             |  20 ++++++++++----------
     4 files changed, 33 insertions(+), 12 deletions(-)
    Changes applied before test
    commit 00b218b3ef4ac51ec6547ae03dbc339c72db257e
    Author: Stefan Sperling <stsp@stsp.name>
    Date:   Wed Nov 24 12:13:11 2021 +0100
    
        fix regular expression used for matching RCS keywords
        
        The function RcsKeywords.expand_keyword() uses a regular expression
        to detect keywords in lines of a file. This expression treated the
        closing $-sign of the keyword as optional. This fails when a line
        contains an invalid keyword declaration which lacks the trailing
        $-sign, followed by a valid keyword declaration.
        
        For example, consider this line from a file in the ccvs CVS repository:
        
          #ident        "@(#)cvs/contrib/pcl-cvs:$Name:  $Id$"
        
        Here, the CVS server expands the $Id$ keyword, and it does not expand
        the $Name$ keyword (which lacks a trailing $-sign).
        
        Our regular expression would split this line into two match groups:
        
          1: #ident     "@(#)cvs/contrib/pcl-cvs:$Name:  $
          2: Id$
        
        Neither match produces a valid keyword and hence expand_keyword()
        would not expand keywords on this line.
        
        Tweak our test data to cover the above example, and fix the regular
        expression such that the trailing $-sign is required. With this change
        in place we match the $Id$ part of the above example and expand keywords
        on this line just like the CVS server does.
    
    commit 5539ccb67b2a3398019eff571f633a227a498fcd
    Author: Stefan Sperling <stsp@stsp.name>
    Date:   Tue Nov 23 15:16:11 2021 +0100
    
        attempt to avoid content differences due to paths in keywords
        
        Some RCS keywords, such has "Header", contain absolute file paths
        derived from the on-disk filesystem path of the CVS repository.
        
        When we fetch files over the pserver protocol such keywords are
        expanded by the CVS server. But when using the rsync protocol we
        will first copy the CVS repository to local disk and the path to
        this local copy will correspond to some temporary directory.
        
        Try to avoid file content differences between pserver and rsync
        access methods by deriving a likely server-side path from path
        information found in the rsync:// origin URL.
        This will work as expected as long as the CVS server-side setup
        exposes the same path to the CVS repository over both access
        methods, which is the case for GNU savannah for example.
        
        In general, we should recommend treating pserver and rsync as distinct
        origins and not rely on them to be interchangable and always produce
        the same conversion result. But we can still try our best to avoid
        needless differences in content hashes.

    See https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/66/ for more details.

  • This patch is wrong. Keywords stored in already-expanded form in the repository are no longer being expanded. We need a different fix for this issue.

  • Merge request was abandoned

  • mentioned in issue #3691

Please register or sign in to reply
Loading