Skip to content
Snippets Groups Projects
Commit 9cf7ad9d authored by vlorentz's avatar vlorentz Committed by Antoine Lambert
Browse files

QualifiedSWHID: Fix (de)serialization of 'origin' qualifier

Having the escaped URL in `swhid.origin` is inconsistent with self.path
(which is always escaped) and never what we want, because it is only
useful while serializing, which is already handled by `__str__`.

This led to swh/devel/swh-indexer#4738
where swh-deposit parsed a qualified SWHID, then used `.origin` to get
an origin URL.

Additionally, as serialization always escapes the `origin` qualifier,
this means that deserializing then re-serializing a qualified SWHID
would double-escape it.

Finally, fixing this made the test uncover that `%` was not escaped
while serializing, while `;` was, leading to incorrect (and ambiguous)
escaped URLs.
parent f1f62388
No related branches found
No related tags found
No related merge requests found
......@@ -322,14 +322,14 @@ class QualifiedSWHID(_BaseSWHID[ObjectType]):
)
def qualifiers(self) -> Dict[str, str]:
"""Returns URL-escaped qualifiers of this SWHID, for use in serialization"""
origin = self.origin
if origin:
unescaped_origin = origin
origin = origin.replace("%", "%25")
origin = origin.replace(";", "%3B")
assert urllib.parse.unquote_to_bytes(
origin
) == urllib.parse.unquote_to_bytes(
unescaped_origin
assert (
urllib.parse.unquote(origin) == unescaped_origin
), "Escaping ';' in the origin qualifier corrupted the origin URL."
d: Dict[str, Optional[str]] = {
......@@ -377,6 +377,9 @@ class QualifiedSWHID(_BaseSWHID[ObjectType]):
"Invalid qualifier(s): %(qualifiers)s",
params={"qualifiers": ", ".join(invalid_qualifiers)},
)
if "origin" in qualifiers:
qualifiers["origin"] = urllib.parse.unquote(qualifiers["origin"])
try:
return QualifiedSWHID(**parts, **qualifiers)
except ValueError as e:
......
......@@ -490,7 +490,7 @@ QUALIFIED_SWHIDS = [
QualifiedSWHID(
object_type=ObjectType.CONTENT,
object_id=_x(HASH),
origin="https://example.org/foo%3Bbar%25baz",
origin="https://example.org/foo;bar%baz",
),
),
(
......@@ -607,15 +607,15 @@ def test_QualifiedSWHID_parse_serialize_qualifiers(string, parsed):
assert str(parsed) == string
def test_QualifiedSWHID_serialize_origin():
def test_QualifiedSWHID_deserialize_origin_extra_escapes():
"""Checks that semicolon in origins are escaped."""
string = f"swh:1:cnt:{HASH};origin=https://example.org/foo%3Bbar%25baz"
swhid = QualifiedSWHID(
object_type=ObjectType.CONTENT,
object_id=_x(HASH),
origin="https://example.org/foo;bar%25baz",
origin="https://example.org/foo;bar%baz",
)
assert str(swhid) == string
assert QualifiedSWHID.from_string(string) == swhid
def test_QualifiedSWHID_attributes():
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment