Skip to content
Snippets Groups Projects
  1. Nov 21, 2022
    • vlorentz's avatar
      luigi: Remove copies of stamp files to/from S3 · b39436e3
      vlorentz authored
      They are only useful while exporting the dataset -- after the export is
      finished, meta.json is good enough and stamp files only save a couple
      of minutes when only some objects types are needed (ie. never in practice)
      b39436e3
  2. Nov 15, 2022
  3. Nov 10, 2022
  4. Nov 04, 2022
  5. Nov 03, 2022
  6. Oct 18, 2022
  7. Sep 08, 2022
  8. Aug 29, 2022
  9. Jul 06, 2022
  10. Jun 21, 2022
  11. May 23, 2022
  12. May 09, 2022
  13. Apr 29, 2022
  14. Apr 28, 2022
  15. Apr 26, 2022
  16. Apr 21, 2022
  17. Apr 14, 2022
    • Antoine Pietri's avatar
      relational exports: add ID field to origin table · 9f342d99
      Antoine Pietri authored
      The origin table now contains the origin URL and a sha1 of the URL as
      the "ID" field. Allows us to join this table more easily with the SWHIDs
      retrieved from the compressed graph, as well as generate the edge
      dataset without having to compute sha1s manually.
      9f342d99
  18. Apr 12, 2022
  19. Apr 08, 2022
  20. Apr 07, 2022
  21. Apr 06, 2022
  22. Apr 01, 2022
  23. Mar 31, 2022
  24. Mar 30, 2022
    • David Douard's avatar
      Improve the progress reporting look and feel · 58b9f92e
      David Douard authored
      - use shortened values in the progress bar (eg. '11.3M/566M' instead of
        something like '11310201/566123456')
      - reduce the description strings and align them.
      
      The result looks like:
      
      Exporting release:
        - Offset: 100%|███████████████████████████████| 64/64 [00:03<00:00, 18.61it/s]
        - Export: 100%|█████████████████| 130/130 [00:01<00:00, 10.3it/s, workers=4/4]
      58b9f92e
    • David Douard's avatar
      Delay the unsubscribe to the end of handle_messages · be3c5da2
      David Douard authored
      to prevent some possible race condition leading to kafka errors like:
      
        rd_kafka_assignment_partition_stopped: Assertion `rktp->rktp_started' failed
      
      This could occur when, at the time of unsubscribing from a partition,
      another partition is also depleted. Since the unsubscription consist in
      resubscribing to all partitions except unsubscribes ones, we could try
      to subscribe to such a depleted parition, leading to the error message
      listed above.
      be3c5da2
Loading