raw_extrinsic_metadata partitions has endless growth
Some partitions of the cassandra table are really big and bigger than the recommended sizes.
The real impact will depend on how the data will be requested but the problem is the partition will keep getting bigger as new lines are added each day for some partitions (more then 700 lines per days for some opam related directories).
cqlsh:swh> describe table raw_extrinsic_metadata;
CREATE TABLE swh.raw_extrinsic_metadata (
target text,
authority_type text,
authority_url text,
discovery_date timestamp,
id blob,
directory text,
fetcher_name ascii,
fetcher_version ascii,
format ascii,
metadata blob,
origin text,
path blob,
release text,
revision text,
snapshot text,
type text,
visit bigint,
PRIMARY KEY (target, authority_type, authority_url, discovery_date, id)
) WITH CLUSTERING ORDER BY (authority_type ASC, authority_url ASC, discovery_date ASC, id ASC)
...
All the lines are partitioned by the target so the partition will grow forever until we reach a critical limit.
cqlsh:swh> select target,authority_type, authority_url, id, discovery_date, origin, release from raw_extrinsic_metadata where target='swh:1:dir:ab964a5bd58ffce2bb24667bc08dd90b7ffaea84' limit 5;
target | authority_type | authority_url | id | discovery_date | origin | release
----------------------------------------------------+----------------+------------------------+--------------------------------------------+---------------------------------+-------------------------------------------------------------------------------------+----------------------------------------------------
swh:1:dir:ab964a5bd58ffce2bb24667bc08dd90b7ffaea84 | forge | https://opam.ocaml.org | 0xd735764bc4ca2223f17c43917a59132153c574af | 2023-07-04 16:07:55.000000+0000 | opam+https://opam.ocaml.org/packages/tezos-baking-alpha/ | swh:1:rel:9e5ef4bf91f60fa5e12d217744c68b2980ab865a
swh:1:dir:ab964a5bd58ffce2bb24667bc08dd90b7ffaea84 | forge | https://opam.ocaml.org | 0xd8465a13851725e44433b460189cf1b1fe47f73a | 2023-07-04 16:09:57.000000+0000 | opam+https://opam.ocaml.org/packages/octez-client/ | swh:1:rel:47ac985a22ccaa4a7a60d7c42f5416f4ef481950
swh:1:dir:ab964a5bd58ffce2bb24667bc08dd90b7ffaea84 | forge | https://opam.ocaml.org | 0xa933536f087cd3ba60438856d4b80efb718bb0f4 | 2023-07-04 16:11:01.000000+0000 | opam+https://opam.ocaml.org/packages/tezos-protocol-plugin-009-PsFLoren-registerer/ | swh:1:rel:7bd2914dd7b756bb5e2075c74fe9e1fa0aa73e6a
swh:1:dir:ab964a5bd58ffce2bb24667bc08dd90b7ffaea84 | forge | https://opam.ocaml.org | 0xb12847615d53a47e431a551e3cba862645592409 | 2023-07-04 16:12:33.000000+0000 | opam+https://opam.ocaml.org/packages/tezos-client-demo-counter/ | swh:1:rel:aca66a206d65f7fe2e43051a960f7c666609fdb2
swh:1:dir:ab964a5bd58ffce2bb24667bc08dd90b7ffaea84 | forge | https://opam.ocaml.org | 0xd7b51ab90a33045dfed94f82afa9012d8747d866 | 2023-07-04 16:12:59.000000+0000 | opam+https://opam.ocaml.org/packages/tezos-protocol-plugin-007-PsDELPH1/ | swh:1:rel:19aa24e3d4c9a056f2ec8f493f73c761f6756d06
(copied from swh/infra/sysadm-environment#5287 (comment 168586))
% ./dsbulk count --driver.advanced.auth-provider.username=admin --driver.advanced.auth-provider.password=$PASS --driver.basic.contact-points $(hostname -f) -k swh -t raw_extrinsic_metadata -stats partitions -partitions 10
...
id lines percent
'swh:1:dir:a00426caacd26b8fd20704887fb4d7cefcf7edc1' 153954 0.04
'swh:1:dir:a51b9a6cad80abba3f7ce4ec051ace7c2b0eeb90' 152906 0.04
'swh:1:dir:c24ff7f51209509c28ad026f06c4254f57663458' 144716 0.04
'swh:1:dir:0c00174e41033b337f67575ce08c7492eedb5f9a' 141241 0.04
'swh:1:dir:45b87e1ca127914b936884dc73c9f788df5b7abd' 138551 0.04
'swh:1:dir:ab964a5bd58ffce2bb24667bc08dd90b7ffaea84' 138534 0.04
'swh:1:dir:2a9145a732b61adc6ff97a182ed2be914bc700df' 133214 0.03
'swh:1:dir:0145f0b12499573f9a2600878397e47dcdf6d3cc' 133207 0.03
'swh:1:dir:6745965831928f63790580d9587fa70ca0499fad' 131010 0.03
'swh:1:dir:7d7acd7ac3143e9d8e2cf3083947ec0263439ac3' 129105 0.03
cqlsh:swh> select count(*) from raw_extrinsic_metadata where target='swh:1:dir:ab964a5bd58ffce2bb24667bc08dd90b7ffaea84';
count
--------
138770
(1 rows)