Skip to content

Store content -> revision cache in azure table storage

We've been hitting PostgreSQL limitations for storage of the content -> revision cache. Azure table storage looks like a relevant candidate to store that cache.

Table storage provides a schemaless storage API which uses a compound primary key containing a PartitionKey and a RowKey, clustering on PartitionKeys and ordering queries on RowKeys. Each entry can have up to 255 properties and weigh up to 1MB.

A good candidate for PartitionKey would be the content identifier (well distributed except for corner cases). We need to figure out a RowKey that's intrinsic to the line provided (properties : Revision identifier, path), and gives us a relevant ordering for files with multiple entries.

Limitations:

PartitionKey and RowKey are strings, and a bunch of control characters aren't allowed. Better use some kind of ASCII I suppose. Both can be up to 1KB in size.

Resources:


Migrated from T598 (view on Phabricator)

Edited by Phabricator Migration user