Skip to content
Snippets Groups Projects
  1. Sep 09, 2024
  2. Aug 27, 2024
  3. Jun 11, 2024
    • vlorentz's avatar
      from_disk: Do not recurse in ignored directories · 34f61010
      vlorentz authored
      Using os.walk() does not make much sense when we want to control what
      directories to recurse into.
      
      Additionally, this uses os.scandir directly, which allows us to directly
      sort symlinks and files apart from directories (while os.walk groups
      symlinks with directories) without two extra system calls.
      v6.14.0
      34f61010
    • David Douard's avatar
      Add a mk_tree() helper function for tests · 8689b0c0
      David Douard authored
      This creates a tree structure from an easy to read textual
      representation of said structure.
      8689b0c0
    • David Douard's avatar
      Rework from_disk.Directory.from_disk() implementation · f0f21b4d
      David Douard authored
      It used to traverse all the directory and filter elements of said tree
      only afterwards; this version should be a bit smarter and not go too far
      deep in directories that should be ignored.
      
      We cannot just use the subtree eviction mechanism of
      'os.walk(topdown=True)' because the filtering callback takes some
      context of the subdirectory content (typically to be able to evict empty
      directories).
      
      This version of the code is a bit more complex but should do the trick.
      f0f21b4d
  4. Jun 10, 2024
  5. May 30, 2024
  6. May 29, 2024
  7. May 28, 2024
  8. May 23, 2024
  9. May 17, 2024
  10. May 16, 2024
  11. May 15, 2024
    • vlorentz's avatar
      QualifiedSWHID: Fix (de)serialization of 'origin' qualifier · 9cf7ad9d
      vlorentz authored and Antoine Lambert's avatar Antoine Lambert committed
      Having the escaped URL in `swhid.origin` is inconsistent with self.path
      (which is always escaped) and never what we want, because it is only
      useful while serializing, which is already handled by `__str__`.
      
      This led to swh-indexer#4738
      where swh-deposit parsed a qualified SWHID, then used `.origin` to get
      an origin URL.
      
      Additionally, as serialization always escapes the `origin` qualifier,
      this means that deserializing then re-serializing a qualified SWHID
      would double-escape it.
      
      Finally, fixing this made the test uncover that `%` was not escaped
      while serializing, while `;` was, leading to incorrect (and ambiguous)
      escaped URLs.
      9cf7ad9d
    • Pierre-Yves David's avatar
      DiskBackedContent: add a small temporary compatibility layer · f1f62388
      Pierre-Yves David authored
      There are two other package using DiskBackedContent "swh-loader-svn" and
      "swh-loader-cvs". Both use it to check "DiskBackedContent.object_type"
      at the same time as "model.Content.object_type".
      
      so we do this small hack to avoid breaking these other module until
      they migrate.
      f1f62388
    • Pierre-Yves David's avatar
      from_disk: introduce a ModelObjectType enum · 8b29444a
      Pierre-Yves David authored
      This sets the pieces in place to finally cleanup the confusion from the
      various object_type attributes. They now have different type, so we
      should be able to start detecting error at some point.
      
      As for FromDiskType, we keep compatibility with string value for now.
      This avoid breaking existing code.
      8b29444a
    • Pierre-Yves David's avatar
      DiskBackedContent: remove the class in favor of a simpler composition approach · d65a844a
      Pierre-Yves David authored
      Instead of having multiple class and `object_type` value, we just adds
      a few lines in the main `model.Content` class to retrieved data on
      demand. The `with_data` logic already existed there anyway.
      
      This will avoid having from_disk extending the model from the outside.
      d65a844a
  12. May 14, 2024
  13. Apr 24, 2024
    • vlorentz's avatar
      Add size limit to origin URLs · 906e5093
      vlorentz authored
      Currently the only limit is "enforced" by PostgreSQL.
      
      This makes sure that origins created after we switch to Cassandra as the
      primary storage remain compatible with a PostgreSQL-based storage.
      906e5093
  14. Mar 29, 2024
  15. Mar 26, 2024
Loading