- Nov 22, 2021
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '0.20.0' with Debian dir 648a27f772aa18cbaeb88421ad44cfbf18517068
-
vlorentz authored
grab_next_visits grabs from `listed_origins`, whose primary key is `(lister_id, url, visit_type)` and uses it to upsert in origin_visit_stats, whose primary key is `(url, visit_type)`. This causes the error `ON CONFLICT DO UPDATE command cannot affect row a second time` when the same (origin, type) pair is grabbed twice. This commit deduplicates the (origin, type) pairs before upserting.
- Oct 29, 2021
-
-
Nicolas Dandrimont authored
The ratios weren't checked for normalization; using relative weights explicitly ensures that the settings won't be misinterpreted.
-
Nicolas Dandrimont authored
-
- Oct 28, 2021
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '0.19.0' with Debian dir 25276af0f39eed8f7341a27f86fd12f2168c0212
-
For each known visit type, we run a loop which: - monitors the size of the relevant celery queue - schedules more visits of the relevant type once the number of available slots goes over a given threshold (currently set to 5% of the max queue size). The scheduling of visits combines multiple scheduling policies, for now using static ratios set in the `POLICY_RATIOS` dict. We emit a warning if the ratio of origins fetched for each policy is skewed with respect to the original request (allowing, for now, manual adjustement of the ratios). The CLI endpoint spawns one thread for each visit type, which all handle connections to RabbitMQ and the scheduler backend separately. For now, we handle exceptions in the visit scheduling threads by (stupidly) respawning the relevant thread directly. We should probably improve this to give up after a specific number of tries. Co-authored-by: Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org>
- Oct 27, 2021
-
-
Nicolas Dandrimont authored
When the database is in a non-UTC timezone with DST, and a `timestamptz - interval` calculation crosses a DST change, the result of the calculation can be one hour off from the expected value: PostgreSQL will vary the timestamp by the amount of days in the interval, and will keep the same (local) time, which will be offset by an hour because of the DST change. Doing the datetime +- timedelta calculations in Python instead of PostgreSQL avoids this caveat altogether.
-
- Oct 22, 2021
-
-
Antoine R. Dumont authored
Otherwise, in some edge case, like run in docker, the install fails on conflict. Related to P1205#8092
-
- Oct 20, 2021
-
-
Antoine R. Dumont authored
Related to T3667
-
Antoine R. Dumont authored
It's been deprecated for enough time. Related to T3667
-
Antoine R. Dumont authored
-
- Oct 18, 2021
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '0.18.2' with Debian dir 064dff3f140dad4fe33f0893b48748460076b59a
-
Antoine R. Dumont authored
This actually fixes the debian build failure. Related to T3666
- Oct 15, 2021
-
-
Jenkins for Software Heritage authored
Update to upstream version '0.18.1' with Debian dir 33b5f1325f7a752172f38a90c9be255f3bac2402
-
Antoine R. Dumont authored
This scenario happens with the loader oneshot for example. This loader deals with more than 1 type of origins to ingest in the same queue. So the computation of that function returned negative value [1]. Which is ultimately not possible to execute in sql [1]. This commits fixes that behavior. This also explicits that the function must return positive values in its docstring. [1] ``` ... psycopg2.errors.InvalidRowCountInLimitClause: LIMIT must not be negative ```
- Sep 02, 2021
-
-
Jenkins for Software Heritage authored
-
Jenkins for Software Heritage authored
Update to upstream version '0.18.0' with Debian dir e98b6b91a4c915d0ad9f6ee1286273ed99ee3b5b
-
Antoine R. Dumont authored
- Aug 27, 2021
-
-
Antoine R. Dumont authored
In the non optimal case, we may want to trigger specific case (not-yet enabled origins, origin from specific lister...). Related to T3350
-
- Aug 26, 2021
-
-
Nicolas Dandrimont authored
For origins that have never been visited, and for which we don't have a queue position yet, we want to visit them in the order they've been added.
-
Nicolas Dandrimont authored
The subcommand bypasses the legacy task-based mechanism to directly send new origin visits to celery
-
Nicolas Dandrimont authored
Running common operations on all git origins is pretty intense. Using table sampling gives us the opportunity to at least schedule some jobs in (decently small) time.
-
Antoine R. Dumont authored
-