Skip to content
Snippets Groups Projects
Forked from Platform / Development / swh-model
244 commits behind the upstream repository.
  • Stefano Zacchiroli's avatar
    57468505
    SWHID parsing: simplify and deduplicate validation logic · 57468505
    Stefano Zacchiroli authored
    Before this change there was a lot of overlap between parse_swhid() and the
    attrs-based validators in the SWHID class. Also, the validation implementation
    in parse_swhid() was done by hand.
    
    With this change the coarse-grained validation done by parse_swhid() is now
    delegated to a regex. The semantic validation of SWHIDs is left to attrs
    validators. The regex is also exposed as a module attribute, to be used by
    client code that want to syntactically validate SWHIDs without necessarily
    instantiate SWHID classes (we have several other modules doing that already,
    and they are using slightly different hand-made regexs, which isn't great).
    
    As part of this change we also clean up the use of ValidationError exceptions,
    systematically passing the problematic parts of SWHID as arguments, and uniform
    error messages.
    
    This change also brings some speed up in SWHID parsing. On a benchmark parsing
    ~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one
    ~2:50 minutes, or a ~9% speedup.
    
    Closes T2788
    57468505
    History
    SWHID parsing: simplify and deduplicate validation logic
    Stefano Zacchiroli authored
    Before this change there was a lot of overlap between parse_swhid() and the
    attrs-based validators in the SWHID class. Also, the validation implementation
    in parse_swhid() was done by hand.
    
    With this change the coarse-grained validation done by parse_swhid() is now
    delegated to a regex. The semantic validation of SWHIDs is left to attrs
    validators. The regex is also exposed as a module attribute, to be used by
    client code that want to syntactically validate SWHIDs without necessarily
    instantiate SWHID classes (we have several other modules doing that already,
    and they are using slightly different hand-made regexs, which isn't great).
    
    As part of this change we also clean up the use of ValidationError exceptions,
    systematically passing the problematic parts of SWHID as arguments, and uniform
    error messages.
    
    This change also brings some speed up in SWHID parsing. On a benchmark parsing
    ~30 M valid SWHIDs, the previous implementation took ~3:06 minutes, the new one
    ~2:50 minutes, or a ~9% speedup.
    
    Closes T2788
identifiers.py 27.70 KiB