Skip to content
Snippets Groups Projects
  1. Oct 17, 2022
  2. Sep 30, 2022
  3. Sep 29, 2022
    • Pierre-Yves David's avatar
      from_disks: fix some of the pattern checking logic · 6a38c4ad
      Pierre-Yves David authored
      The pattern were validated from $PWD and later applied on path relative
      to `root_path`. So we shuffle a bit of code to test them againt
      root_path. We make the absolute pattern relative in the same go.
      
      This code is coming from swh-scanner and should probably get an
      overhaul, how ever for now we start with making it no broken.
      6a38c4ad
  4. Sep 26, 2022
  5. Sep 23, 2022
    • Pierre-Yves David's avatar
      model: inline the call to `_check_swhid` · 2d65a24a
      Pierre-Yves David authored
      This reduce the number of function call and should be faster.
      
      The mashup of blind optimisation in the previous changeset yield some
      interesting results in total.
      
      It would be insightful to measure them individually, but that would
      take more time than we currently have.
      
      When testing all the validator changes on our previous "benchmark" we
      see quite interesting improvement.
      
          swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
      
      = Median time of 3 run =
      base:   17 minutes 48 seconds
      before: 11 minutes 50 seconds
      after:  11 minutes 11 seconds
      
      On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
      base:   43%
      before: 15%
      after:  11%
      v6.5.0
      2d65a24a
    • Pierre-Yves David's avatar
      model: optimization pass on custom validator · 3608271a
      Pierre-Yves David authored
      (This commit is actually doing two things /o\)
      
      - we inline the type-checking in the custom validator to reduce the
        number of function call.
      
      - we optimize some of the custom validator by skipping the creation of
        intermediate tuples.
      3608271a
    • Pierre-Yves David's avatar
      model: delete unused validator code · 3796e5ba
      Pierre-Yves David authored
      Since all `generic_type_validator` are optimized away, the code will no
      longer be called. So we remove that code to avoid any drifting.
      
      A nice "exception" is provided in case this start getting called again
      in the future.
      3796e5ba
    • Pierre-Yves David's avatar
      model: remove the try/except · b7267a89
      Pierre-Yves David authored
      Since try/except context are known to be expensive in Python, it seems
      useful to remove them.
      b7267a89
    • Pierre-Yves David's avatar
      model: also optimize combined validator · cf529cd1
      Pierre-Yves David authored
      This ensure we don't have any remaining `generic_type_validator` call
      that have not been optimized away.
      cf529cd1
    • Pierre-Yves David's avatar
      model: drop the `type_validator()` indirection · 6ababdeb
      Pierre-Yves David authored
      This indirection seems useless and is probably the remains of some long
      forgotten rituals.
      6ababdeb
    • Pierre-Yves David's avatar
      model: implement specialized attribute-validator functions · edb57fb1
      Pierre-Yves David authored
      This should reduces function calls and speeds things up.
      
      It might be useful to introduce even more specialized validator in the
      future. It would also be useful to skip the intermediate try/except.
      
      Some of this will be done in later changesets.
      edb57fb1
    • Pierre-Yves David's avatar
      model: prepare the filtering of type_validator into something faster · 1dfea324
      Pierre-Yves David authored
      This is currently doing nothing, but prepare for actually changing the
      generic validator into faster specialized variants.
      1dfea324
    • Pierre-Yves David's avatar
      from_disk: skip intermediate dictionnary creation when building model · a2e8f18c
      Pierre-Yves David authored
      Before this change we would do the following :
      
      1) translate from_disk's object into `dict`,
      2) sort these dict,
      3) feed the list to `Directory.from_dict`,
      4) create DirectoryEntry from these dict.
      
      Skipping the directory creating and directly creating the
      DirectoryEntries provide us with a small but stable and noticeable
      performance win.
      
      We tested this change on simple information of the Mercurial loader,
      with a noop-loader stockage:
      
          swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
      
      = Median time of 3 run =
      before: 11 minute  56 seconds
      aftere: 11 minute  50 seconds
      
      On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
      before: 17%
      after:  15%
      a2e8f18c
    • Pierre-Yves David's avatar
      model: avoid another extra creation of Model object · ad3ecac9
      Pierre-Yves David authored
      Do not create model object while sorting entry before creating model
      object.
      
      This is another case of "let us create object X to prepare the creation
      of object X", slowing things down.
      
      In practice, we will likely skip this code-path after the next
      changeset, however this seems useful to get this performance footgun
      out the way.
      
      We tested this change on simple information of the Mercurial loader,
      with a noop-loader stockage:
      
          swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
      
      = Median time of 3 run =
      before  12 minutes 59 seconds
      after:  11 minute  56 seconds
      
      On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
      before: 24%
      after:  17%
      ad3ecac9
    • Pierre-Yves David's avatar
      from_disk: only build a model object once · 814a6c84
      Pierre-Yves David authored
      Before this change, a Directory object was built to compute the `id` of
      we fed to the Directory object we built for `to_model`.
      
      We tested this change on simple information of the Mercurial loader,
      with a noop-loader stockage:
      
          swh loader run mercurial https://foss.heptapod.net/mercurial/mercurial-devel directory=/data/repos/mercurial-devel
      
      = Median time of 3 run =
      before: 17 minutes 48 seconds
      after:  12 minutes 59 seconds
      
      On a profile of the same run, the `to_model` call of the from_disk's `Directory` class took the following percentage:
      before: 43%
      after:  24%
      814a6c84
  6. Aug 31, 2022
  7. Aug 30, 2022
  8. Aug 12, 2022
Loading