cassandra: Split author/committer/date/committer_date into individual columns (!1156) · Merge requests · Platform / Development / swh-storage

Open vlorentz requested to merge no-person-udt into master 4 months ago

Mar 18, 2025
- Fix test_object_delete · 1c0b6203
  vlorentz authored 1 month ago
  
  1c0b6203
- Fix test_storage_replay_anonymized · 5fb4dfa4
  vlorentz authored 1 month ago
  
  5fb4dfa4
Mar 17, 2025
- Add 2025-03-17_flatten_person_udt_replay pseudo-migration · f0e74f58
  vlorentz authored 1 month ago
  
  f0e74f58
- Add migration · c731beeb
  vlorentz authored 1 month ago
  
  c731beeb
- Merge branch 'master' into no-person-udt · 0502834e
  vlorentz authored 1 month ago
  
  0502834e
- Merge branch 'master' into no-person-udt · 24cb1c7b
  vlorentz authored 1 month ago
  
  24cb1c7b
Dec 23, 2024

Change type of minimal_revision from hg to git · 51d8725c

vlorentz authored 3 months ago

test_extid_add_hg expects all hg revisions to have a 'node' extra header,
which minimal_revision does not have.

51d8725c

Fix rebase · 75d167a2
vlorentz authored 3 months ago

75d167a2
Flatten dates too + add tests for nulls · 49e06138
vlorentz authored 4 months ago and vlorentz committed 3 months ago

49e06138
Fix docstring · f5aaa1f5
vlorentz authored 4 months ago and vlorentz committed 3 months ago

f5aaa1f5

cassandra: Split author/committer into individual columns · f61d649c

vlorentz authored 4 months ago and

vlorentz committed 3 months ago

Cassandra does not support filtering on individual fields of UDTs, as it considers
structures as a single whole value.

However, the infra team needs to filter on author.email and committer.email, hence the need
for separate columns.

This commit reads and writes the new split columns, but keeps reading the UDT as
a fallback. This will be removed after we are done migrating all rows.

Migration plan:

1.
   ```
   ALTER TABLE revision
   ADD (
       author_fullname                 blob,
       author_name                     blob,
       author_email                    blob,
       committer_fullname              blob,
       committer_name                  blob,
       committer_email                 blob
   );
   ALTER TABLE release
   ADD (
       author_fullname                 blob,
       author_name                     blob,
       author_email                    blob
   );
   ```

2. update Python code and restart

3. run a replayer on `revision` and `release` objects without a filtering proxy,
   in order to write the new columns

f61d649c

cassandra: Split author/committer/date/committer_date into individual columns

Merge request reports

Activity