Skip to content
Snippets Groups Projects

query_language: Setup tree-sitter and grammar.js

This revision introduces the grammar for the search query language and completes the setup required for a smoother development of the grammar.

The parsers generated from the proposed grammar serve two different purposes:

  • Translation of search queries into elasticsearch DSL (or any other search backends that we may use in the future)
  • Autocompletion of the queries in the SWH Archive User Interface

tree-sitter is an excellent candidate for the task because it has bindings for python (swh.search) as well as wasm (swh.web)


Migrated from D5990 (view on Phabricator)

Merge request reports

Approved by

Closed by Phabricator Migration userPhabricator Migration user 3 years ago (Jul 26, 2021 3:23pm UTC)

Merge details

  • The changes were not merged into generated-differential-D5990-target.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Build is green

    Patch application report for D5990 (id=21616)

    Rebasing onto fe7640f7...

    Current branch diff-target is up to date.
    Changes applied before test
    commit b4fd1eeda546341e2f4a7e10834a668bfb5df6eb
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 16:21:51 2021 +0530
    
        parser: Setup TreeSitter with first draft for the grammar
        
        Summary:
        This diff is the first step towards implementing the search query language which
        can be directly translated to Elasticsearch (or any other search backend) queries
        and is also useful for the introducing autocomplete in the swh archive.
        
        Also, we need a parser that can be used in swh.search backend as well as the
        swh.web interface so we've decided to go with TreeSitter which satisfies these
        conditions, is easier to write (written in JS) and is compaitible
        with many langauges.
        
        Test Plan:
        
        Reviewers:
        
        Subscribers:

    See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/194/ for more details.

  • Author Contributor
    • Add newline at the end of files
  • Build is green

    Patch application report for D5990 (id=21618)

    Rebasing onto fe7640f7...

    Current branch diff-target is up to date.
    Changes applied before test
    commit a406f05fd695f9d7d096dd9aeb8b4d25793bdb07
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 18:15:46 2021 +0530
    
        Add newline at the end of files
    
    commit 874b437f8c84ea6090a466e02ce866373425fcb0
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 16:21:51 2021 +0530
    
        parser: Setup TreeSitter with first draft for the grammar
        
        Summary:
        This diff is the first step towards implementing the search query language which
        can be directly translated to Elasticsearch (or any other search backend) queries
        and is also useful for the introducing autocomplete in the swh archive.
        
        Also, we need a parser that can be used in swh.search backend as well as the
        swh.web interface so we've decided to go with TreeSitter which satisfies these
        conditions, is easier to write (written in JS) and is compaitible
        with many langauges.
        
        Reviewers: #reviewers
        
        Differential Revision: https://forge.softwareheritage.org/!58

    See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/195/ for more details.

  • @anlambert We are going to need to precompile some assets in swh-search using JS dependencies. We define the grammar in JS, which can then be compile into a big JSON file, that will be read from Python; see swh/search/parser/package.json. Do you have some advice to make this work nicely?

  • @anlambert We are going to need to precompile some assets in swh-search using JS dependencies. We define the grammar in JS, which can then be compile into a big JSON file, that will be read from Python; see swh/search/parser/package.json. Do you have some advice to make this work nicely?

    Based on the tree-sitter documentation, we should follow that guide to properly define our search language parser. I would also move the package.json file at the root of that repository and create a search_language directory to hold all tree-sitter developments.

    Also, some makefile targets could be added to execute the parser generation.

    How the generated parser will be consumed by Python ? py-tree-sitter could be an interesting option but there might be a simplest solution.

    Otherwise, for the generated assets, I think we could proceed as in swh-web and put them in a static folder at the root of the repository, do not store them in git and bundle them as data_files in setup.py. This way, web assets (js and wasm files) could be easily consumed by webpack when building swh-web assets by adding the swh-search data folder as webpack source folder.

  • Author Contributor
    • Move parser to search_language dir
    • Introduce Makefile.local and add TreeSitter related commands
    • Set data_files of setup.py to 'generated/search_ql.so'
  • Build has FAILED

    Patch application report for D5990 (id=21632)

    Rebasing onto fe7640f7...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 9bda5b1d270b1c60f1c468fc89991b562bd11f09
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 16:21:51 2021 +0530
    
        parser: Setup TreeSitter with first draft for the grammar
        
        This is the first step towards implementing the search query language which
        can be directly translated to Elasticsearch (or any other search backend) queries
        and is also useful for the introducing autocomplete in the swh archive.
        
        Also, we need a parser that can be used in swh.search backend as well as the
        swh.web interface so we've decided to go with TreeSitter which satisfies these
        conditions, is easier to write (written in JS) and is compaitible
        with many langauges.

    Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/196/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/196/console

  • Author Contributor
    • Fix failing build ( because of data_files )
  • Build has FAILED

    Patch application report for D5990 (id=21638)

    Rebasing onto fe7640f7...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 14c610679f85c27eea1926d778d2e0df3c12cb17
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 16:21:51 2021 +0530
    
        parser: Setup TreeSitter with first draft for the grammar
        
        This is the first step towards implementing the search query language which
        can be directly translated to Elasticsearch (or any other search backend) queries
        and is also useful for the introducing autocomplete in the swh archive.
        
        Also, we need a parser that can be used in swh.search backend as well as the
        swh.web interface so we've decided to go with TreeSitter which satisfies these
        conditions, is easier to write (written in JS) and is compaitible
        with many langauges.

    Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/197/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/197/console

  • Author Contributor
    • Generate swh_ql.so at builds
  • Build has FAILED

    Patch application report for D5990 (id=21640)

    Rebasing onto fe7640f7...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 9822798cb4389685e96551263ee6a33df54ca229
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Thu Jul 15 13:13:40 2021 +0530
    
        setup.py: Generate swh_ql.so at builds
    
    commit 14c610679f85c27eea1926d778d2e0df3c12cb17
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 16:21:51 2021 +0530
    
        parser: Setup TreeSitter with first draft for the grammar
        
        This is the first step towards implementing the search query language which
        can be directly translated to Elasticsearch (or any other search backend) queries
        and is also useful for the introducing autocomplete in the swh archive.
        
        Also, we need a parser that can be used in swh.search backend as well as the
        swh.web interface so we've decided to go with TreeSitter which satisfies these
        conditions, is easier to write (written in JS) and is compaitible
        with many langauges.

    Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/198/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/198/console

  • Author Contributor
    • Fix installation/build errors
  • Build has FAILED

    Patch application report for D5990 (id=21641)

    Rebasing onto fe7640f7...

    Current branch diff-target is up to date.
    Changes applied before test
    commit b9a3b8108f5495907ac405f8e7a3b6ed4a484125
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 16:21:51 2021 +0530
    
        parser: Setup TreeSitter with first draft for the grammar
        
        This is the first step towards implementing the search query language which
        can be directly translated to Elasticsearch (or any other search backend) queries
        and is also useful for the introducing autocomplete in the swh archive.
        
        Also, we need a parser that can be used in swh.search backend as well as the
        swh.web interface so we've decided to go with TreeSitter which satisfies these
        conditions, is easier to write (written in JS) and is compaitible
        with many langauges.

    Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/199/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/199/console

  • Author Contributor
    • Generate parser before building swh_ql.so
  • Build has FAILED

    Patch application report for D5990 (id=21642)

    Rebasing onto fe7640f7...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 186dacb1e51f5c42dd7bb6047bdccc07422fce03
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Thu Jul 15 16:03:42 2021 +0530
    
        Generate parser before building swh_ql.so
    
    commit b9a3b8108f5495907ac405f8e7a3b6ed4a484125
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 16:21:51 2021 +0530
    
        parser: Setup TreeSitter with first draft for the grammar
        
        This is the first step towards implementing the search query language which
        can be directly translated to Elasticsearch (or any other search backend) queries
        and is also useful for the introducing autocomplete in the swh archive.
        
        Also, we need a parser that can be used in swh.search backend as well as the
        swh.web interface so we've decided to go with TreeSitter which satisfies these
        conditions, is easier to write (written in JS) and is compaitible
        with many langauges.

    Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/200/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/200/console

  • Author Contributor
    • Install tree-sitter-cli (NodeJS) during builds
  • Build is green

    Patch application report for D5990 (id=21643)

    Rebasing onto fe7640f7...

    Current branch diff-target is up to date.
    Changes applied before test
    commit 128ec9f103bb5691aa31a56aefbadf2d816a24bd
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Thu Jul 15 16:07:23 2021 +0530
    
        Install tree-sitter-cli (NodeJS) during builds
    
    commit 186dacb1e51f5c42dd7bb6047bdccc07422fce03
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Thu Jul 15 16:03:42 2021 +0530
    
        Generate parser before building swh_ql.so
    
    commit b9a3b8108f5495907ac405f8e7a3b6ed4a484125
    Author: KShivendu <shivendu@iitbhilai.ac.in>
    Date:   Tue Jul 13 16:21:51 2021 +0530
    
        parser: Setup TreeSitter with first draft for the grammar
        
        This is the first step towards implementing the search query language which
        can be directly translated to Elasticsearch (or any other search backend) queries
        and is also useful for the introducing autocomplete in the swh archive.
        
        Also, we need a parser that can be used in swh.search backend as well as the
        swh.web interface so we've decided to go with TreeSitter which satisfies these
        conditions, is easier to write (written in JS) and is compaitible
        with many langauges.

    See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/201/ for more details.

  • Thanks!

    Can you update the README (and/or add a new documentation page) to explain this machinery? How to use it, what it does at compile/install/run time, etc.

  • Merge request was returned for changes

  • Author Contributor
    • Polish the code
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading