Skip to content
Snippets Groups Projects
  1. Oct 18, 2022
  2. Sep 08, 2022
  3. Aug 29, 2022
  4. Jul 06, 2022
  5. Jun 21, 2022
  6. May 23, 2022
  7. May 09, 2022
  8. Apr 29, 2022
  9. Apr 28, 2022
  10. Apr 26, 2022
  11. Apr 21, 2022
  12. Apr 14, 2022
    • Antoine Pietri's avatar
      relational exports: add ID field to origin table · 9f342d99
      Antoine Pietri authored
      The origin table now contains the origin URL and a sha1 of the URL as
      the "ID" field. Allows us to join this table more easily with the SWHIDs
      retrieved from the compressed graph, as well as generate the edge
      dataset without having to compute sha1s manually.
      9f342d99
  13. Apr 12, 2022
  14. Apr 08, 2022
  15. Apr 07, 2022
  16. Apr 06, 2022
  17. Apr 01, 2022
  18. Mar 31, 2022
  19. Mar 30, 2022
    • David Douard's avatar
      Improve the progress reporting look and feel · 58b9f92e
      David Douard authored
      - use shortened values in the progress bar (eg. '11.3M/566M' instead of
        something like '11310201/566123456')
      - reduce the description strings and align them.
      
      The result looks like:
      
      Exporting release:
        - Offset: 100%|███████████████████████████████| 64/64 [00:03<00:00, 18.61it/s]
        - Export: 100%|█████████████████| 130/130 [00:01<00:00, 10.3it/s, workers=4/4]
      58b9f92e
    • David Douard's avatar
      Delay the unsubscribe to the end of handle_messages · be3c5da2
      David Douard authored
      to prevent some possible race condition leading to kafka errors like:
      
        rd_kafka_assignment_partition_stopped: Assertion `rktp->rktp_started' failed
      
      This could occur when, at the time of unsubscribing from a partition,
      another partition is also depleted. Since the unsubscription consist in
      resubscribing to all partitions except unsubscribes ones, we could try
      to subscribe to such a depleted parition, leading to the error message
      listed above.
      be3c5da2
    • David Douard's avatar
      Move exporter config entries in dedicated sections · e01daba4
      David Douard authored
      eg. orc exporter specific exporter config entries are now under the
      'orc' section, like:
      
        journal:
          brokers: [...]
      
        orc:
          remove_pull_requests: true
          max_rows:
            revision: 100000
            directory: 10000
      e01daba4
    • David Douard's avatar
      Add support for limited row numbers in ORC files · 3df08fd7
      David Douard authored
      Make it possible to specify a maximum number of rows a table can store
      in a single ORC file. The limit can only be set on main tables for now
      (i.e. cannot be specified for tables like revision_history or
      directory_entry).
      
      This can be set by configuration only (no extra cli options).
      3df08fd7
  20. Mar 29, 2022
Loading