diff --git a/docs/graph/dataset.rst b/docs/graph/dataset.rst index 2071258b0557eccccb7da0144136f61286fe86ea..838802a11a09c51a8d11ec84cecc6d51a9ec81ee 100644 --- a/docs/graph/dataset.rst +++ b/docs/graph/dataset.rst @@ -284,8 +284,16 @@ A full export of the graph dated from March 2021. 2020-12-15 ~~~~~~~~~~ -A full export of the graph dated from December 2020. Only available in -compressed representation. +A full export of the graph dated from December 2020. + +This export has a CSV representation of nodes and edges instead of columnar: + +* edges as :file:`graph.edges.{cnt,ori,rel,rev,snp}.csv.zst` and + :file:`graph.edges.dir.{00..21}.csv.zst` +* nodes as :file:`graph.nodes.csv.zst` +* deduplicated labels as :file:`graph.labels.csv.zst` +* statistics as :file:`graph.edges.count.txt`, :file:`graph.edges.stats.txt`, + :file:`graph.labels.count.txt`, :file:`graph.nodes.count.txt`, and :file:`graph.nodes.stats.txt` - **Compressed graph**: @@ -293,6 +301,9 @@ compressed representation. <https://annex.softwareheritage.org/public/dataset/graph/2020-12-15/compressed/>`_ - **S3**: ``s3://softwareheritage/graph/2020-12-15/compressed`` +- **Edges**: + - **S3**: ``s3://softwareheritage/graph/2020-12-15/edges`` + .. _graph-dataset-2020-05-20: