query_language: Setup tree-sitter and grammar.js
This revision introduces the grammar for the search query language and completes the setup required for a smoother development of the grammar.
The parsers generated from the proposed grammar serve two different purposes:
- Translation of search queries into elasticsearch DSL (or any other search backends that we may use in the future)
- Autocompletion of the queries in the SWH Archive User Interface
tree-sitter is an excellent candidate for the task because it has bindings for python (swh.search) as well as wasm (swh.web)
Migrated from D5990 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D5990 (id=21616)
Rebasing onto fe7640f7...
Current branch diff-target is up to date.
Changes applied before test
commit b4fd1eeda546341e2f4a7e10834a668bfb5df6eb Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Tue Jul 13 16:21:51 2021 +0530 parser: Setup TreeSitter with first draft for the grammar Summary: This diff is the first step towards implementing the search query language which can be directly translated to Elasticsearch (or any other search backend) queries and is also useful for the introducing autocomplete in the swh archive. Also, we need a parser that can be used in swh.search backend as well as the swh.web interface so we've decided to go with TreeSitter which satisfies these conditions, is easier to write (written in JS) and is compaitible with many langauges. Test Plan: Reviewers: Subscribers:
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/194/ for more details.
Build is green
Patch application report for D5990 (id=21618)
Rebasing onto fe7640f7...
Current branch diff-target is up to date.
Changes applied before test
commit a406f05fd695f9d7d096dd9aeb8b4d25793bdb07 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Tue Jul 13 18:15:46 2021 +0530 Add newline at the end of files commit 874b437f8c84ea6090a466e02ce866373425fcb0 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Tue Jul 13 16:21:51 2021 +0530 parser: Setup TreeSitter with first draft for the grammar Summary: This diff is the first step towards implementing the search query language which can be directly translated to Elasticsearch (or any other search backend) queries and is also useful for the introducing autocomplete in the swh archive. Also, we need a parser that can be used in swh.search backend as well as the swh.web interface so we've decided to go with TreeSitter which satisfies these conditions, is easier to write (written in JS) and is compaitible with many langauges. Reviewers: #reviewers Differential Revision: https://forge.softwareheritage.org/!58
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/195/ for more details.
@anlambert We are going to need to precompile some assets in swh-search using JS dependencies. We define the grammar in JS, which can then be compile into a big JSON file, that will be read from Python; see
swh/search/parser/package.json
. Do you have some advice to make this work nicely?@anlambert We are going to need to precompile some assets in swh-search using JS dependencies. We define the grammar in JS, which can then be compile into a big JSON file, that will be read from Python; see swh/search/parser/package.json. Do you have some advice to make this work nicely?
Based on the tree-sitter documentation, we should follow that guide to properly define our search language parser. I would also move the package.json file at the root of that repository and create a
search_language
directory to hold all tree-sitter developments.Also, some makefile targets could be added to execute the parser generation.
How the generated parser will be consumed by Python ? py-tree-sitter could be an interesting option but there might be a simplest solution.
Otherwise, for the generated assets, I think we could proceed as in
swh-web
and put them in a static folder at the root of the repository, do not store them in git and bundle them asdata_files
insetup.py
. This way, web assets (js and wasm files) could be easily consumed bywebpack
when buildingswh-web
assets by adding theswh-search
data folder as webpack source folder.Build has FAILED
Patch application report for D5990 (id=21632)
Rebasing onto fe7640f7...
Current branch diff-target is up to date.
Changes applied before test
commit 9bda5b1d270b1c60f1c468fc89991b562bd11f09 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Tue Jul 13 16:21:51 2021 +0530 parser: Setup TreeSitter with first draft for the grammar This is the first step towards implementing the search query language which can be directly translated to Elasticsearch (or any other search backend) queries and is also useful for the introducing autocomplete in the swh archive. Also, we need a parser that can be used in swh.search backend as well as the swh.web interface so we've decided to go with TreeSitter which satisfies these conditions, is easier to write (written in JS) and is compaitible with many langauges.
Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/196/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/196/console
Build has FAILED
Patch application report for D5990 (id=21638)
Rebasing onto fe7640f7...
Current branch diff-target is up to date.
Changes applied before test
commit 14c610679f85c27eea1926d778d2e0df3c12cb17 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Tue Jul 13 16:21:51 2021 +0530 parser: Setup TreeSitter with first draft for the grammar This is the first step towards implementing the search query language which can be directly translated to Elasticsearch (or any other search backend) queries and is also useful for the introducing autocomplete in the swh archive. Also, we need a parser that can be used in swh.search backend as well as the swh.web interface so we've decided to go with TreeSitter which satisfies these conditions, is easier to write (written in JS) and is compaitible with many langauges.
Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/197/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/197/console
Build has FAILED
Patch application report for D5990 (id=21640)
Rebasing onto fe7640f7...
Current branch diff-target is up to date.
Changes applied before test
commit 9822798cb4389685e96551263ee6a33df54ca229 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jul 15 13:13:40 2021 +0530 setup.py: Generate swh_ql.so at builds commit 14c610679f85c27eea1926d778d2e0df3c12cb17 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Tue Jul 13 16:21:51 2021 +0530 parser: Setup TreeSitter with first draft for the grammar This is the first step towards implementing the search query language which can be directly translated to Elasticsearch (or any other search backend) queries and is also useful for the introducing autocomplete in the swh archive. Also, we need a parser that can be used in swh.search backend as well as the swh.web interface so we've decided to go with TreeSitter which satisfies these conditions, is easier to write (written in JS) and is compaitible with many langauges.
Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/198/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/198/console
Build has FAILED
Patch application report for D5990 (id=21641)
Rebasing onto fe7640f7...
Current branch diff-target is up to date.
Changes applied before test
commit b9a3b8108f5495907ac405f8e7a3b6ed4a484125 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Tue Jul 13 16:21:51 2021 +0530 parser: Setup TreeSitter with first draft for the grammar This is the first step towards implementing the search query language which can be directly translated to Elasticsearch (or any other search backend) queries and is also useful for the introducing autocomplete in the swh archive. Also, we need a parser that can be used in swh.search backend as well as the swh.web interface so we've decided to go with TreeSitter which satisfies these conditions, is easier to write (written in JS) and is compaitible with many langauges.
Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/199/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/199/console
Build has FAILED
Patch application report for D5990 (id=21642)
Rebasing onto fe7640f7...
Current branch diff-target is up to date.
Changes applied before test
commit 186dacb1e51f5c42dd7bb6047bdccc07422fce03 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jul 15 16:03:42 2021 +0530 Generate parser before building swh_ql.so commit b9a3b8108f5495907ac405f8e7a3b6ed4a484125 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Tue Jul 13 16:21:51 2021 +0530 parser: Setup TreeSitter with first draft for the grammar This is the first step towards implementing the search query language which can be directly translated to Elasticsearch (or any other search backend) queries and is also useful for the introducing autocomplete in the swh archive. Also, we need a parser that can be used in swh.search backend as well as the swh.web interface so we've decided to go with TreeSitter which satisfies these conditions, is easier to write (written in JS) and is compaitible with many langauges.
Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/200/ See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/200/console
Build is green
Patch application report for D5990 (id=21643)
Rebasing onto fe7640f7...
Current branch diff-target is up to date.
Changes applied before test
commit 128ec9f103bb5691aa31a56aefbadf2d816a24bd Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jul 15 16:07:23 2021 +0530 Install tree-sitter-cli (NodeJS) during builds commit 186dacb1e51f5c42dd7bb6047bdccc07422fce03 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jul 15 16:03:42 2021 +0530 Generate parser before building swh_ql.so commit b9a3b8108f5495907ac405f8e7a3b6ed4a484125 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Tue Jul 13 16:21:51 2021 +0530 parser: Setup TreeSitter with first draft for the grammar This is the first step towards implementing the search query language which can be directly translated to Elasticsearch (or any other search backend) queries and is also useful for the introducing autocomplete in the swh archive. Also, we need a parser that can be used in swh.search backend as well as the swh.web interface so we've decided to go with TreeSitter which satisfies these conditions, is easier to write (written in JS) and is compaitible with many langauges.
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/201/ for more details.