Skip to content
Snippets Groups Projects

npm: Add workaround for mangled package descriptions

1 unresolved thread

Null bytes in JSON produced by indexers cause the indexer-storage to crash (#4277 (closed)), and this case seems to be the only current source of such crashes; so this should fix the issue for now.

A future commit will sanitize all JSON documents before storage.


Migrated from D7992 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
169 # is a common mistake, which indicates a README.md was saved as UTF-16,
170 # and some NPM tool opened it as UTF-8 and used the first line as
171 # description.
172
173 description_bytes = description.encode()
174
175 # Strip the the two unicode replacement characters
176 assert description_bytes.startswith(b"\xef\xbf\xbd\xef\xbf\xbd")
177 description_bytes = description_bytes[6:]
178
179 # If the following attempts fail to recover the description, discard it
180 # entirely because the current indexer storage backend (postgresql) cannot
181 # store zero bytes in JSON columns.
182 description = None
183
184 if not description_bytes.startswith(b"\x00"):
  • mentioned in issue #4277 (closed)

  • lgtm, one remark inline.

  • Merge request was accepted

  • Antoine R. Dumont approved this merge request

    approved this merge request

  • Author Maintainer

    add missing case

  • Build is green

    Patch application report for D7992 (id=28808)

    Rebasing onto 71046713...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 62db0cb2086be7a4c6c48f6488fb425485e56093
    Author: Valentin Lorentz <vlorentz@softwareheritage.org>
    Date:   Wed Jun 15 18:29:09 2022 +0200
    
        npm: Add workaround for mangled package descriptions
        
        Null bytes in JSON produced by indexers cause the indexer-storage to crash,
        and this case seems to be the only current source of such crashes;
        so this should fix the issue for now.
        
        A future commit will sanitize all JSON documents before storage.

    See https://jenkins.softwareheritage.org/job/DCIDX/job/tests-on-diff/259/ for more details.

  • Author Maintainer

    Merge request was merged

  • closed

  • Please register or sign in to reply
    Loading