Type swh-storage endpoints with swh.model objects
TL; DR
Our data is typed in the database but it is not when consumed from swh-storage's internal api. Resulting in too much conversion code in the swh-storage's internal api consumers (api mainly). We could benefit from having types to help.
Details
The possible data structure returned by swh-storage are dictionary, list, bytes, int, date, etc... Some of those data structure are not serializable in x (x in {json, yaml, ...}).
The thread connection the api's client and swh-storage's internal api use custom types for the data which is not natively serializable (bytes, datetime, etc...) to permit transit of that data.
However, as soon as the api's swh-storage's client has consumed the data, we are back to where we started. That is with possibly non serializable data structure (bytes, date, etc...)
It is then up to the api to convert those data structure to something serializable (json, yaml, etc...) before returning the results to api consumers.
Implementation wise, today, the api converts the values based on key names. This is not a sustainable model. Indeed, if a new key with non-serializable data arises (and it will), we need to update the base code to deal with that case. Furthermore, this is dealt with at endpoint's type level ({content, directory, revision, release, occurrence, etc...}). So, if that key is redundant between endpoints, all the more things to adapt.
If we were to have types in the output from swh-storage, up to after the consumption from swh-storage's internal api, we could generically transform that data according to that type.
We then, would only need to update the base code if a new type arises (which must be rarer than new key with non-serializable values).
Note: I think this is swh-model's goal but it's not pushed there yet.
Migrated from T645 (view on Phabricator)