Fix crash when indexing the same directory twice with non-deterministic order
persist_index_computations deduplicated row entries based on the entire content of the row; but postgresql enforces the 'id' should be unique.
This was not an issue in older version of swh-indexer, because all operations were deterministic, given a specific directory as input.
The recent switch to rdflib introduced non-determinism, so different outputs may be returned for the same directory id; causing the deduplication to not be good enough to avoid duplicate ids.
With this commit, deduplication is now done on 'id', as expected.
As a side-effect, persist_index_computations is now more efficient because:
- it runs in linear time instead of quadratic in the number of metadata items
- it only compares dir ids, instead of the content of indexed metadata (which is arbitrarily large JSON-like data)
Migrated from D8417 (view on Phabricator)