Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • S sysadm-environment
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 167
    • Issues 167
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Platform
  • Infrastructure
  • sysadm-environment
  • Issues
  • #4400
Closed
Open
Issue created Jul 19, 2022 by Antoine R. Dumont@ardumontOwner4 of 5 checklist items completed4/5 checklist items

Fill in the gap with scanoss tool

After discussing with the upstream of scanoss tool, Roberto compulsed a list of (github) repositories (large [1] and normal [2]) we are currently missing. Let's try and ingest those using what we did for the chromium repository [3].

fwiw, we have a huge number of those reported by sentry [6].

Plan:

  • Clean up large worker17 and 18 setup and keep them out of the standard consumption loop [4]

  • Schedule large repositories on dedicated queue oneshot:swh.loader.git.tasks.UpdateGitRepository

  • Schedule normal repositories on dedicated queue oneshot2:swh.loader.git.tasks.UpdateGitRepository

  • Configure parallelism to not be too much as well (large repo queue: 1, normal repo queue: 5)

  • Babysit processes (grafana dashboard [4])

  • [1] big: ingest.big.list

  • [2] normal: ingest.normal.list

  • [3] #4283 (closed)

  • [4] Recent tryouts on chromium and liferay-portal repositories currently failed possibly due to the standard consumption happening in parallel. If large repositories is consumed at the same time, the machine might become unable to finish both repositories...

  • [5] https://grafana.softwareheritage.org/goto/6HwEWEgVk?orgId=1

  • [6] https://sentry.softwareheritage.org/share/issue/bbcb3aef5b974dac9a3194f7bf8ede87/


Migrated from T4400 (view on Phabricator)

Edited Oct 18, 2022 by Antoine R. Dumont
Assignee
Assign to
Time tracking