Skip to content
Snippets Groups Projects
  1. May 21, 2021
  2. May 20, 2021
  3. May 19, 2021
    • Raphaël Gomès's avatar
      Replace old loader with the new one · 92441e90
      Raphaël Gomès authored
      This is the minimal amount of code needed to switch from the old one to
      the new one. If the new loader proves to be good enough, we may remove
      the old one entirely.
      92441e90
  4. May 07, 2021
    • Raphaël Gomès's avatar
      Make the Mercurial loader incremental · 4630de8a
      Raphaël Gomès authored
      Before this change, if a previous snapshot of a given origin existed
      and new changes were detected, we would start from scratch.
      
      This change leverages the recent new db mapping for external ids (like
      Mercurial's node ids) to internal SWH ids to compute what has changed
      from the latest snapshot, now that it is possible to find an SWH id from
      a Mercurial node id.
      
      For revisions, the logic is simple: look at the heads we've saved and
      ask Mercurial for all the revisions that are not ancestors of these
      heads (themselves excluded). This is not as "clever" as the full
      Mercurial discovery algorithm, but is much simpler and good enough for
      the kinds of scales we're operating at on a single repository.
      
      For tags, the previous logic assumed that all possible target revisions
      were done in the same run. Here, we look at the difference between the
      tags Mercurial reports and the one form the previous snapshot; any new
      tag will either have its corresponding release in cache (because it was
      processed in the same run) or fetched from the database using the
      aforementioned mapping.
      4630de8a
    • Raphaël Gomès's avatar
      Move `os.environ` manipulation to pre_cleanup · 773d872a
      Raphaël Gomès authored
      Simply initializing a loader would empty the environment, which can
      cause seemingly unrelated things to break. Moving the environment
      handling to the `pre_cleanup` phase ensures that `cleanup` will also
      be called and the environment will not be left in a broken state.
      
      We also add the `HGRCSKIPREPO` variable that I forgot to add in the
      test environment. This is still needed because the tests invoke
      `hg` directly. We could potentially have a wrapper util that uses a
      context-manager to do the environment manipulation closer to the issue,
      but we'd have to make sure that no other bare `hg` invocations can
      happen, even in random subprocesses.
      773d872a
  5. Apr 30, 2021
    • Raphaël Gomès's avatar
      Handle more cases of corruption · 88847148
      Raphaël Gomès authored
      Some corrupted repos have missing files or broken logical links in the
      underlying Mercurial datastructure, which means that say sometimes fail
      for a given revision. This does not mean we should throw away the rest
      of the repository. (Tested on repos of various levels and flavors of
      corruption in the Boatbucket archive)
      88847148
    • Raphaël Gomès's avatar
      Ignore the repository's config · f73d960b
      Raphaël Gomès authored
      `HGRCPATH` only tells Mercurial to ignore the user's config files, but
      some repositories have a `.hg/hgrc` file (only in the case that you copy
      the files instead of cloning, if present) that is usually used for server-side
      configuration. We want to ignore this, since it might affect loading
      and ask for hooks that are not there or are otherwise annoying/dangerous,
      for example.
      f73d960b
    • Raphaël Gomès's avatar
      Also use minimal env in the new Mercurial loader · aa80a360
      Raphaël Gomès authored
      The old loader (bundle2 loader) already received this treatment which
      ensures Mercurial doesn't pick up on any user customization, but I
      apparently forgot to apply the same changes to the new one.
      aa80a360
    • Raphaël Gomès's avatar
      Use billiard instead of stdlib multiprocessing · 23260277
      Raphaël Gomès authored
      This circumvents a few celery-related issues, and is consistent with
      what the rest of the codebase does.
      
      stdlib multiprocessing is not able to spawn children from daemonic
      processes, and even says so plainly if you try:
      
      `AssertionError: daemonic processes are not allowed to have children`
      
      This is incompatible with the SWH infrastructure which needs to do this
      exactly. Fortunately, we're already using billiard and celery. I'm
      assuming that there could be other blocking or annoying differences
      between stdlib and billiard, but we will save ourselves the trouble of
      finding out.
      23260277
  6. Apr 26, 2021
    • Antoine Lambert's avatar
      tox: Add sphinx environments to check sane doc build · 504ee123
      Antoine Lambert authored
      Enable to check package documentation can be built without producing
      sphinx warnings.
      
      The sphinx environment is designed to be used in continuous integration
      in order to prevent breaking documentation build when committing changes.
      
      The sphinx-dev environment is designed to be used inside a full swh
      development environment.
      
      Related to T3258
      v0.5.0
      504ee123
  7. Apr 13, 2021
  8. Apr 06, 2021
  9. Mar 30, 2021
  10. Mar 29, 2021
  11. Mar 05, 2021
  12. Feb 26, 2021
  13. Feb 25, 2021
    • Antoine R. Dumont's avatar
      mercurial.cli: Deprecate cli in favor of the generic `swh loader run` · d8572187
      Antoine R. Dumont authored
      This module is:
      - not referenced in the setup.py (so no simple cli call)
      - not tested
      - using an in-memory storage
      - no longer the canonical way of triggering a mercurial ingestion
      
      Note that this also fixes the visit date cli input. When user provided, this date should
      be parsed as a datetime prior to being passed to the loader constructor.
      d8572187
  14. Feb 23, 2021
  15. Feb 17, 2021
  16. Feb 15, 2021
  17. Feb 09, 2021
  18. Feb 03, 2021
  19. Feb 01, 2021
  20. Dec 01, 2020
Loading