Skip to content
Snippets Groups Projects

origin_search: Filters and sorting for date_{created,modified,published}

intrinsic_metadata often contains date_{created,modified,published} which can be used as sorting options as well as filters.


Migrated from D5964 (view on Phabricator)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
177 183 # don't bother indexing tokens in these URIs, as the
178 184 # are used as namespaces
179 185 "type": "keyword",
180 }
186 },
187 "http://schema": {
188 "properties": {
189 "org/dateCreated": {
  • Build has FAILED

    Patch application report for D5964 (id=21499)

    Rebasing onto f378a989...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 31b0f67cc99e49d0c398960ed93ac4d9c5134931
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Mon Jul 5 16:17:23 2021 +0000
    
        origin_search: Filters and sorting for date_{created,modified,published}

    Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/190/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/190/console

  • Author Contributor

    ! In !57 (closed), @vlorentz wrote: Can you either add tests, or deduplicate this code so we don't need to test every field?

    By "deduplicate this code so we don't need to test every field" you mean common code and tests for all the date fields (last_visit, last_release, ... datePublished, dateModified, ...) ?

  • Author Contributor
    • elasticsearch.py: Use "linient: true"
    • origin_search: Validate intrinsic_metadata date field format before storing
    • test_search: Fix failing tests
  • Build is green

    Patch application report for D5964 (id=21505)

    Rebasing onto f378a989...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 976c7229d2221cbdc82517ab7a24d121ad0ced62
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 6 09:44:27 2021 +0000
    
        origin_search: Validate intrinsic_metadata date field format before storing
    
    commit 31b0f67cc99e49d0c398960ed93ac4d9c5134931
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Mon Jul 5 16:17:23 2021 +0000
    
        origin_search: Filters and sorting for date_{created,modified,published}

    See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/191/ for more details.

  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 66 66 # * {"author": [{"@value": "Jane Doe"}]}
    67 67 # and JSON-LD expansion will convert them all to the last one.
    68 68 if "intrinsic_metadata" in res:
    69 res["intrinsic_metadata"] = codemeta.expand(res["intrinsic_metadata"])
    69 intrinsic_metadata = res["intrinsic_metadata"]
    70 for date_field in ["dateCreated", "dateModified", "datePublished"]:
    71 if date_field in intrinsic_metadata:
    72 date = intrinsic_metadata[date_field]
    73
    74 # If date{Created,Modified,Published} value isn't parsable
    75 # It gets rejected and isn't stored (unlike other fields)
    76 if not is_date_parsable(date):
    77 intrinsic_metadata.pop(date_field)
    78
    79 res["intrinsic_metadata"] = codemeta.expand(intrinsic_metadata)
  • Author Contributor
    • origin_update: Document rejection of metadata date fields if not parsable
  • Build is green

    Patch application report for D5964 (id=21531)

    Rebasing onto f378a989...

    Current branch diff-target is up to date.
    Changes applied before test
    commit dc52c981fb953add9be87f464a0921ea6601bc02
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Wed Jul 7 11:27:35 2021 +0000
    
        origin_update: Document rejection of metadata date fields if not parsable
    
    commit 976c7229d2221cbdc82517ab7a24d121ad0ced62
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 6 09:44:27 2021 +0000
    
        origin_search: Validate intrinsic_metadata date field format before storing
    
    commit 31b0f67cc99e49d0c398960ed93ac4d9c5134931
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Mon Jul 5 16:17:23 2021 +0000
    
        origin_search: Filters and sorting for date_{created,modified,published}

    See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/192/ for more details.

  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 37 if sep:
    38 return sep.join(METADATA_FIELDS[field])
    39
    40 return METADATA_FIELDS[field]
    41
    42
    43 def is_date_parsable(date_str):
    44 """
    45 Return True if date_str is in the format
    46 %Y-%m-%d or the standard ISO format.
    47 Otherwise return False.
    48 """
    49 try:
    50 datetime.strptime(date_str, "%Y-%m-%d")
    51 return True
    52 except Exception:
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 1 from datetime import datetime
  • vlorentz
    vlorentz @vlorentz started a thread on the diff
  • 117
    112 118 if field == "score":
    113 119 if reversed:
    114 120 return -origin.get(field, 0)
    115 121 else:
    116 122 return origin.get(field, 0)
    117 123
    118 datetime_max = datetime.max.replace(tzinfo=timezone.utc)
    124 if field in ["date_created", "date_modified", "date_published"]:
    125 date = datetime.strptime(
    126 _nested_get(origin, get_expansion(field), DATE_MIN)[0], "%Y-%m-%d"
    127 )
    128 if reversed:
    129 return DATE_OBJ_MAX - date
    130 else:
    131 return date
  • Author Contributor
    • Add test for sort_by : ["date_created"]
    • Deduplicate calculation of some variables in _get_sorting_key
    • Use iso8601 library to validate date format in instrinsic_metadata fields
  • Build is green

    Patch application report for D5964 (id=21612)

    Rebasing onto f378a989...

    Current branch diff-target is up to date.
    Changes applied before test
    commit fe7640f71024084554ab4d36209f6da5d1c76267
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 14:59:53 2021 +0530
    
        origin_search: Filters and sorting for date_{created,modified,published}
        
        intrinsic_metadata often contains date_{created,modified,published} which
        can be used as sorting options as well as filters.

    See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/193/ for more details.

  • thanks!

  • Merge request was accepted

  • Author Contributor

    Merge request was merged

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading