Skip to content
Snippets Groups Projects
  1. Nov 30, 2022
  2. Nov 29, 2022
  3. Nov 28, 2022
    • vlorentz's avatar
    • vlorentz's avatar
      storage: Insert from temporary tables in consistent order · f7833b7e
      vlorentz authored
      This avoids having a transaction inserting row A then B, while another
      inserts row B then A; which (probably) leads to deadlocks like this:
      
      ```
      DeadlockDetected: deadlock detected
      DETAIL:  Process 1842336 waits for ShareLock on transaction 1051957280; blocked by process 64261.
      Process 64261 waits for ShareLock on transaction 1051957281; blocked by process 1842336.
      HINT:  See server log for query details.
      CONTEXT:  while inserting index tuple (1972253,5) in relation "origin_extrinsic_metadata"
      SQL statement "insert into origin_extrinsic_metadata (id, metadata, indexer_configuration_id, from_remd_id, metadata_tsvector, mappings)
      ```
      
      https://sentry.softwareheritage.org/share/issue/52b06caae89f4235a758887fd6817656/
      
      This was already mitigating by sorting before inserting in temporary
      tables, then expecting postgresql to read from temporary tables in the
      same order rows where inserted. This is often true, but not guaranteed.
      
      No test for this, because I do not see a way to replicate this more than
      existing deadlock tests do.
      f7833b7e
  4. Nov 21, 2022
  5. Nov 03, 2022
  6. Nov 02, 2022
  7. Oct 26, 2022
  8. Oct 25, 2022
  9. Oct 24, 2022
  10. Oct 18, 2022
  11. Oct 07, 2022
  12. Sep 28, 2022
  13. Sep 27, 2022
  14. Sep 12, 2022
  15. Sep 08, 2022
    • vlorentz's avatar
      npm: Do not generate URIs with spaces in them · 6d7efad9
      vlorentz authored
      It makes rdflib complain, and is invalid anyway
      v2.6.0
      6d7efad9
    • vlorentz's avatar
      Convert SWHID to str before passing to sentry_sdk.set_tag · f4e08f95
      vlorentz authored
      Sentry uses repr() by default, which does not look good in a UI
      f4e08f95
    • vlorentz's avatar
      Fix crash when indexing the same directory twice with non-deterministic order · b6385cec
      vlorentz authored
      persist_index_computations deduplicated row entries based on the entire
      content of the row; but postgresql enforces the 'id' should be unique.
      
      This was not an issue in older version of swh-indexer, because all
      operations were deterministic, given a specific directory as input.
      
      The recent switch to rdflib introduced non-determinism, so different
      outputs may be returned for the same directory id; causing the
      deduplication to not be good enough to avoid duplicate ids.
      
      With this commit, deduplication is now done on 'id', as expected.
      
      As a side-effect, persist_index_computations is now more efficient
      because:
      
      1. it runs in linear time instead of quadratic in the number of
         metadata items
      2. it only compares dir ids, instead of the content of indexed metadata
         (which is arbitrarily large JSON-like data)
      b6385cec
    • vlorentz's avatar
      github: Add support for 'topics' · dd027419
      vlorentz authored
      dd027419
  16. Sep 05, 2022
    • vlorentz's avatar
      Fix crash when RawExtrinsicMetadata target new origins · befdbd7e
      vlorentz authored
      RawExtrinsicMetadata contain a swh:1:ori: identifier of the origin,
      which the indexer needs to resolve, by querying its storage replica.
      
      Because RawExtrinsicMetadata are created by loaders, they are often
      created shortly after the origin is created by the corresponding lister,
      so the origin may not be known to the storage replica used by the
      indexer, causing this function to crash.
      
      Waiting 10s seems to be good enough when run on my computer with
      production data and moma's replica; so I set it to 60s just to be safe.
      befdbd7e
    • vlorentz's avatar
      Fix crash when RawExtrinsicMetadata objects have the same target · 68940cfc
      vlorentz authored
      ... and they are processed in the same batch.
      
      The last one received takes precedence, as it is likely to be more
      up-to-date
      68940cfc
  17. Sep 02, 2022
  18. Sep 01, 2022
  19. Aug 31, 2022
  20. Aug 30, 2022
Loading