Skip to content

crates: Fixes and improvements

I tested the crates lister current state in the docker environment and noticed several issues and improvements to bring.

Below is the commit log of the changes.

commit 6dd62b35f8b690559f639c8a375d93855aaced94 (HEAD -> crates-lister-fixes, anlambert/crates-lister-fixes)
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Fri Aug 23 11:46:24 2024 +0200

    crates: Remove crates metadata as loader argument
    
    Those extrinsic metadata can be directly fetched by the loader
    through the crates Web API, plus it contains more metadata fields.

commit 0af8a332a5bc0d8fc3cfc49df8009eb524153cbd
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Thu Aug 22 16:38:44 2024 +0200

    crates: Speedup listing by processing crates in batch
    
    Instead of having a single crate and its versions info per page,
    prefer to have up to 1000 crates per page to significantly speedup
    the listing process.

commit 7ca067e82e4a19e5381ccc45579c019ad6963ec2
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Aug 21 16:12:37 2024 +0200

    crates: Record lister state only if all crates were processed
    
    Previously, the lister state was recorded regardless if errors occurred
    when listing crates as the finalize method is called regardless of raised
    exception during listing.
    
    As a consequence some crates could be missed as the incremental listing
    restarts from the dump date of the last processed crate database.
    
    So ensure all crates have been processed by the lister before recording
    its state.

commit e1f9ec540c66e4e05627f54f7354c785c33806fd
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Aug 21 15:17:28 2024 +0200

    crates: Use looseversion.LooseVersion2 to parse crate versions
    
    packaging.version.parse is dedicated to parse Python package version
    numbers but crate versions do not necessarily respect Python version
    number conventions and thus some crate versions cannot be parsed.
    
    Prefer to use looseversion.LooseVersion2 instead which in a drop-in
    replacement for deprecated distutils.version.LooseVersion and enables
    to parse all kind of version numbers.

commit 6c16aeea7ed4dadc7ac309fe4b3ce33b46e9d36c
Author: Antoine Lambert <anlambert@softwareheritage.org>
Date:   Wed Aug 21 13:46:03 2024 +0200

    crates: Bump csv field size limit
    
    A size limit of 1000000 was not enough to properly process
    all CSV crates data so bump to a higher value.

Merge request reports