- Mar 14, 2023
-
-
vlorentz authored
Luigi workflows contain plenty of ad-hoc bash scripts like 'zstdcat | java | pv | zstdmt > output.tmp'. This is a source of problems, because: 1. a command failing in the script won't fail the script (bash eats errors), so the Python code believes the temporary output was successfully written, commits it, and proceeds: swh/devel/swh-graph#4773 2. most arguments aren't properly escaped 3. a future change to CountPaths will need to build a pipeline dynamically (`zstdcat` in one case, `zstdcat | cat - <(cat other_file.txt | sed)` in another, `zstdcat | cat <(cat other_file.txt | sed) -` in another one), which was be pretty unreadable with string substitutions
-
vlorentz authored
-
vlorentz authored
-
- Mar 09, 2023
-
-
vlorentz authored
-
- Mar 03, 2023
- Mar 02, 2023
-
-
vlorentz authored
-
- Feb 28, 2023
-
- Feb 23, 2023
-
-
Jérémy Bobbio (Lunar) authored
GitLab will display the content of the README file when browsing the repository. But in case the file is a symlink, it will display the path pointed by the symlink. There is a 6 year old issue about this: https://gitlab.com/gitlab-org/gitlab/-/issues/15093 We can workaround the issue by having the content at the root of the repository and a symlink to this file in the `docs/` directory. Tested in swh/devel/swh-py-template!27
-
- Feb 22, 2023
- Feb 20, 2023
-
- Feb 17, 2023
-
-
Antoine Lambert authored
Related to swh/meta#4960
-
- Feb 16, 2023
-
-
Related to swh/meta#4959
-
Antoine Lambert authored
-
- Feb 15, 2023
-
- Feb 14, 2023
-
-
vlorentz authored
Each thread handled one 96th of the node id range. But nodes are not homogeneously randomized across that range, so some threads had a lot more work to do than others, causing them to end weeks after, while most CPU cores idled. By splitting the range this way, threads should have more homogeneous workloads.
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
-
vlorentz authored
This is much more performant on the dir layer: it only takes 27 hours on swh1.enst.fr instead of an ETA of one or two years, most of it spent on this particular line: ``` LazyLongIterator successorAncestors = graph.successors(successorNodeId); ``` even when replacing all the code that used `successorAncestors`, it was still the major cause of the huge expected runtime.
-
protobuf earlier 4.12.* versions crashed when `swhgraph_pb2.py` was discovered by pytest. This has been fixed in the 4.12.11 release. See: https://github.com/protocolbuffers/protobuf/issues/10151 Preventing protobuf 4.12.* to be used makes `pip` install grpcio-tools version 1.49.0 which fails to build on Debian bookworm. In order to allow more fixed versions of grpcio-tools to be used, we bump the dependency on protobuf to version 4.12.11 or later.
-
vlorentz authored
This will be used as a metric for 'popularity' of directories, which will be used to weigh results of PopularContents (which counts the most popular names used to refer to each content)
-