Skip to content

License detection: adapt Scancode for scaling to the SWH archive

Main steps

  • Create framework for scanning
    • Run test scans at scale to calibrate scan durations, and estimate compute and storage resources requirements of the whole SWH archive
    • Create ScanCode output and input plugins for minimalist results output tailored for reduced space usage and input to combine these results in a coherent scan for a whole package
  • Run massive scan
    • Complete initial test scan of batch with ~100 to ~500K files, apply fixes for issues
    • Apply code updates and adjustment from initial scan batch
    • Run reassembly of per-package scans at scale (using swh-fuse), apply consistency checks, update code accordingly
  • Technical support to SWH team
    • Support SWH for bulk ScanCode scans in multiples batches

Actors

AboutCode

Specifications

Specifications and documentation related to ScanCode for CodeCommons are centralized in this directory:

https://gitlab.softwareheritage.org/teams/codecommons/cc-public-resources/-/tree/main/specifications/scancode?ref_type=heads

Edited by Benoit Chauvet