CVS loader: UnicodeEncodeError when calling 'rcsparse.rcsfile' when files are not valid UTF-8
https://sentry.softwareheritage.org/organizations/swh/issues/9008/?referrer=phabricator_plugin
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce4' in position 54: surrogates not allowed
File "swh/loader/core/loader.py", line 335, in load
self.prepare()
File "swh/loader/cvs/loader.py", line 415, in prepare
rcsfile = rcsparse.rcsfile(filepath) # noqa: F841
Loading failure, updating to `failed` status
List of origins causing this:
- rsync://a.cvs.sourceforge.net/cvsroot/dynamicdraw/Doc/
- rsync://a.cvs.sourceforge.net/cvsroot/maklerdesign/Kunde/
- rsync://a.cvs.sourceforge.net/cvsroot/maklerdesign/Zwischenpraesentation_FS2/
- rsync://a.cvs.sourceforge.net/cvsroot/smtpsvr/bbs/
- rsync://a.cvs.sourceforge.net/cvsroot/aspintranet/aspintranet/
- rsync://a.cvs.sourceforge.net/cvsroot/stricqv2/stricqv2/
- rsync://a.cvs.sourceforge.net/cvsroot/epgsukapa/epgret/
- rsync://a.cvs.sourceforge.net/cvsroot/keepalivexp/kAllStatus/
- rsync://a.cvs.sourceforge.net/cvsroot/soliton/soliton/
- rsync://a.cvs.sourceforge.net/cvsroot/yast/Dokumente/
Migrated from T3980 (view on Phabricator)