Hey @vlorentz@zack, I've been using sourcegraph.com for almost a year now and I feel that they have worked a lot on polishing their search query language. I think we can learn from them and adapt our language. Here are a few suggestions:
Instead of making it mandatory to use the origin and metadata keyword. We can just allow users to mention keywords without mentioning the field and search those terms in origin (higher score) and metadata fields.
This will allow users to write smaller and effective queries:
django last_visit > 2022 instead of origin:django and last_visit > 2022
progval instead of metadata:progval
Make it faster to write array filters like language and license
language: python|go instead of language in [python, go]
It should be possible to negate any filter with -
-origin:XYZ should exclude origins containing the term XYZ (exact opposite of origin:XYZ)
Provide aliases for writing queries faster
o:xyz should be equivalent to origin:xyz
m:abc should be equivalent to metadata:abc
lang:python or l:python should be equivalent to language:python
Assume and between filters if anything isn't provided.
origin:X metadata:Y instead of origin: X and metadata: Y
They are based on the following assumptions:
Search queries should be small and hence fast to type.
Search query languages should intelligently pick up the most common intention of the user while still allowing overriding the default behavior.
! In #3560, @KShivendu wrote:
Hey @vlorentz@zack, I've been using sourcegraph.com for almost a year now and I feel that they have worked a lot on polishing their search query language. I think we can learn from them and adapt our language. Here are a few suggestions:
Thanks for investigating this and making a list of actionable suggestions!
Here is a case-by-case commentary below:
Instead of making it mandatory to use the origin and metadata keyword. We can just allow users to mention keywords without mentioning the field and search those terms in origin (higher score) and metadata fields.
This will allow users to write smaller and effective queries:
django last_visit > 2022 instead of origin:django and last_visit > 2022
progval instead of metadata:progval
This one gives me pause, but only because we need to make sure it's not semantically ambiguous. Let's see if I'm getting it right:
if there are no qualifiers ("o:", "m:"), we search by default in both origin and metadata, and rank the results
if there are qualifiers we only search in the associated data
Correct?
If so, I'm fine with this, but we need to check how much worse performances get.
Also, I'm not so sure the ranking criteria should be "origin hits win", maybe there's something smarter to be used there...
Make it faster to write array filters like language and license
language: python|go instead of language in [python, go]
LGTM
It should be possible to negate any filter with -
-origin:XYZ should exclude origins containing the term XYZ (exact opposite of origin:XYZ)
LGTM
Provide aliases for writing queries faster
o:xyz should be equivalent to origin:xyz
m:abc should be equivalent to metadata:abc
lang:python or l:python should be equivalent to language:python
OK, but again as long as they're not ambiguous.
Assume and between filters if anything isn't provided.
origin:X metadata:Y instead of origin: X and metadata: Y
We can continue to support the query language, and the QL can be generated using the UI. This will help us to support saved searches and bookmarks.
We could have more search contexts (more than origin and metadata) in the future as we index more data.
It will be too hard to support different contexts with varying inputs just using a QL. It can done in a UI with some moving elements.
What do you think?