Skip to content
Snippets Groups Projects
  1. Nov 14, 2023
    • Nicolas Dandrimont's avatar
    • Raphaël Gomès's avatar
      Replace the `dir_filter` with a `path_filter` in `Directory` · 1286c8a4
      Raphaël Gomès authored
      `dir_filter` only filters directories. `swh-scanner` needs to
      accurately filter out ignored files before making expensive requests
      to the web API. We introduce a more general `path_filter` that allows
      us to differentiate between files and folders.
      
      `dir_filter` is now deprecated and will be removed once the remaining
      users in other packages are migrated over to the new API.
      
      `accept_all_directories` is also deprecated, because it only implies
      accepting *directories* even though its behavior also accepts
      non-directory entries when used with `path_filter`.
      1286c8a4
  2. Sep 25, 2023
  3. Aug 29, 2023
  4. Aug 21, 2023
  5. Jul 12, 2023
  6. Jun 14, 2023
  7. Mar 16, 2023
    • Jérémy Bobbio (Lunar)'s avatar
      Add several helper methods returning SWHIDs · 48a46285
      Jérémy Bobbio (Lunar) authored
      This adds several helper methods returning SWHIDs to model objects,
      namely:
      
      - SkippedContent.swhid()
      - DirectoryEntry.swhid()
      - SnapshotBranch.swhid()
      - Release.target_swhid()
      - Revision.directory_swhid() and Release.parent_swhids()
      - OriginVisitStatus.origin_swhid() and
        OriginVisitStatus.snapshot_swhid()
      v6.7.0
      48a46285
  8. Feb 17, 2023
  9. Feb 16, 2023
  10. Feb 13, 2023
    • Antoine Lambert's avatar
      collections: Improve ImmutableDict look up by key performance · d6d17dad
      Antoine Lambert authored
      Previously when looking up data by key in an ImmutableDict, the inner
      tuple storing keys and values was iterated until finding the requested
      key.
      
      This is not really efficient when the ImmutableDict contains a lot of
      entries, typically for an origin snapshot containing a lot of branches.
      
      So use an inner dictionary to speedup look up by key operations and
      improve loader performances.
      v6.6.2
      d6d17dad
  11. Feb 02, 2023
  12. Dec 19, 2022
  13. Dec 15, 2022
    • Antoine Lambert's avatar
      docs/persistent-identifiers: Fix some broken links for browsing SWHIDs · f883e224
      Antoine Lambert authored
      There were two issues that was preventing to browse some SWHIDs given as
      examples in that documentation:
      
      - Some sphinx links were broken in rDMODe1c3fe80731226618616117dfd67a95f3d365645
      
      - A SWHID with ';' in its path qualifier was correctly percent escaped but
        when used as URL argument an extra percent escaping is required as HTTP
        server will unescape URL arguments and thus break SWHID percent escaping.
      
      Closes T4721
      v6.6.1
      f883e224
  14. Dec 05, 2022
  15. Oct 18, 2022
  16. Oct 17, 2022
    • Antoine Lambert's avatar
      model: Fix hypothesis integration with attr < 21.3.0 · dd3bab81
      Antoine Lambert authored
      When using attr < 21.3.0, adding field transformer breaks attrs
      integration with hypothesis, because attributes transformed with
      such function are not casted to generated AttrsClass, but remains
      just an list of attributes. This causes error in hypothesis by
      raising an AttributeError.
      
      As we use attr 21.2.0 in production and when building debian buster
      package, add a workaround for that issue as explained here:
      https://github.com/python-attrs/attrs/issues/821.
      v6.6.0
      dd3bab81
    • Antoine Lambert's avatar
      merkle: Make MerkleNode.collect return a set of nodes instead of a dict · 13e7adc3
      Antoine Lambert authored
      Previously the MerkleNode.collect method was returning a dict whose keys
      are node types and values dict of {<node_hash>: <node_data>}.
      
      In order to give more flexibility to client code for the processing of
      collected nodes, prefer to simply return a set of MerkleNode.
      
      As a consequence, MerkleNode objects need to be hashable by Python so
      the __hash__ method has also been implemented.
      
      Closes T4633
      13e7adc3
  17. Sep 30, 2022
  18. Sep 29, 2022
    • Pierre-Yves David's avatar
      from_disks: fix some of the pattern checking logic · 6a38c4ad
      Pierre-Yves David authored
      The pattern were validated from $PWD and later applied on path relative
      to `root_path`. So we shuffle a bit of code to test them againt
      root_path. We make the absolute pattern relative in the same go.
      
      This code is coming from swh-scanner and should probably get an
      overhaul, how ever for now we start with making it no broken.
      6a38c4ad
  19. Sep 23, 2022
    • Pierre-Yves David's avatar
      model: inline the call to `_check_swhid` · 2d65a24a
      Pierre-Yves David authored
      This reduce the number of function call and should be faster.
      
      The mashup of blind optimisation in the previous changeset yield some
      interesting results in total.
      
      It would be insightful to measure them individually, but that would
      take more time than we currently have.
      
      When testing all the validator changes on our previous "benchmark" we
      see quite interesting improvement.
      
          swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
      
      = Median time of 3 run =
      base:   17 minutes 48 seconds
      before: 11 minutes 50 seconds
      after:  11 minutes 11 seconds
      
      On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
      base:   43%
      before: 15%
      after:  11%
      v6.5.0
      2d65a24a
    • Pierre-Yves David's avatar
      model: optimization pass on custom validator · 3608271a
      Pierre-Yves David authored
      (This commit is actually doing two things /o\)
      
      - we inline the type-checking in the custom validator to reduce the
        number of function call.
      
      - we optimize some of the custom validator by skipping the creation of
        intermediate tuples.
      3608271a
    • Pierre-Yves David's avatar
      model: delete unused validator code · 3796e5ba
      Pierre-Yves David authored
      Since all `generic_type_validator` are optimized away, the code will no
      longer be called. So we remove that code to avoid any drifting.
      
      A nice "exception" is provided in case this start getting called again
      in the future.
      3796e5ba
    • Pierre-Yves David's avatar
      model: remove the try/except · b7267a89
      Pierre-Yves David authored
      Since try/except context are known to be expensive in Python, it seems
      useful to remove them.
      b7267a89
    • Pierre-Yves David's avatar
      model: also optimize combined validator · cf529cd1
      Pierre-Yves David authored
      This ensure we don't have any remaining `generic_type_validator` call
      that have not been optimized away.
      cf529cd1
    • Pierre-Yves David's avatar
      model: drop the `type_validator()` indirection · 6ababdeb
      Pierre-Yves David authored
      This indirection seems useless and is probably the remains of some long
      forgotten rituals.
      6ababdeb
    • Pierre-Yves David's avatar
      model: implement specialized attribute-validator functions · edb57fb1
      Pierre-Yves David authored
      This should reduces function calls and speeds things up.
      
      It might be useful to introduce even more specialized validator in the
      future. It would also be useful to skip the intermediate try/except.
      
      Some of this will be done in later changesets.
      edb57fb1
    • Pierre-Yves David's avatar
      model: prepare the filtering of type_validator into something faster · 1dfea324
      Pierre-Yves David authored
      This is currently doing nothing, but prepare for actually changing the
      generic validator into faster specialized variants.
      1dfea324
    • Pierre-Yves David's avatar
      from_disk: skip intermediate dictionnary creation when building model · a2e8f18c
      Pierre-Yves David authored
      Before this change we would do the following :
      
      1) translate from_disk's object into `dict`,
      2) sort these dict,
      3) feed the list to `Directory.from_dict`,
      4) create DirectoryEntry from these dict.
      
      Skipping the directory creating and directly creating the
      DirectoryEntries provide us with a small but stable and noticeable
      performance win.
      
      We tested this change on simple information of the Mercurial loader,
      with a noop-loader stockage:
      
          swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
      
      = Median time of 3 run =
      before: 11 minute  56 seconds
      aftere: 11 minute  50 seconds
      
      On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
      before: 17%
      after:  15%
      a2e8f18c
    • Pierre-Yves David's avatar
      model: avoid another extra creation of Model object · ad3ecac9
      Pierre-Yves David authored
      Do not create model object while sorting entry before creating model
      object.
      
      This is another case of "let us create object X to prepare the creation
      of object X", slowing things down.
      
      In practice, we will likely skip this code-path after the next
      changeset, however this seems useful to get this performance footgun
      out the way.
      
      We tested this change on simple information of the Mercurial loader,
      with a noop-loader stockage:
      
          swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
      
      = Median time of 3 run =
      before  12 minutes 59 seconds
      after:  11 minute  56 seconds
      
      On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
      before: 24%
      after:  17%
      ad3ecac9
    • Pierre-Yves David's avatar
      from_disk: only build a model object once · 814a6c84
      Pierre-Yves David authored
      Before this change, a Directory object was built to compute the `id` of
      we fed to the Directory object we built for `to_model`.
      
      We tested this change on simple information of the Mercurial loader,
      with a noop-loader stockage:
      
          swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
      
      = Median time of 3 run =
      before: 17 minutes 48 seconds
      after:  12 minutes 59 seconds
      
      On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
      before: 43%
      after:  24%
      814a6c84
  20. Aug 30, 2022
  21. Aug 08, 2022
Loading