Polish the swh-search QL

mentioned in issue #4613 (closed)

assigned to @vlorentz

added Archive search priority:Normal labels

changed the description

marked this issue as related to #3909

marked this issue as related to #3926

marked this issue as related to #3927 (closed)

marked this issue as related to #3941 (closed)

marked this issue as related to #3952

unassigned @vlorentz

Hey @vlorentz @zack, I've been using sourcegraph.com for almost a year now and I feel that they have worked a lot on polishing their search query language. I think we can learn from them and adapt our language. Here are a few suggestions:

Instead of making it mandatory to use the origin and metadata keyword. We can just allow users to mention keywords without mentioning the field and search those terms in origin (higher score) and metadata fields. This will allow users to write smaller and effective queries:
- django last_visit > 2022 instead of origin:django and last_visit > 2022
- progval instead of metadata:progval
Make it faster to write array filters like language and license
- language: python|go instead of language in [python, go]
It should be possible to negate any filter with -
- -origin:XYZ should exclude origins containing the term XYZ (exact opposite of origin:XYZ)
Provide aliases for writing queries faster
- o:xyz should be equivalent to origin:xyz
- m:abc should be equivalent to metadata:abc
- lang:python or l:python should be equivalent to language:python
Assume and between filters if anything isn't provided.
- origin:X metadata:Y instead of origin: X and metadata: Y

They are based on the following assumptions:

Search queries should be small and hence fast to type.
Search query languages should intelligently pick up the most common intention of the user while still allowing overriding the default behavior.

! In #3560, @KShivendu wrote: Hey @vlorentz @zack, I've been using sourcegraph.com for almost a year now and I feel that they have worked a lot on polishing their search query language. I think we can learn from them and adapt our language. Here are a few suggestions:

Thanks for investigating this and making a list of actionable suggestions! Here is a case-by-case commentary below:

Instead of making it mandatory to use the origin and metadata keyword. We can just allow users to mention keywords without mentioning the field and search those terms in origin (higher score) and metadata fields. This will allow users to write smaller and effective queries:

django last_visit > 2022 instead of origin:django and last_visit > 2022

progval instead of metadata:progval

This one gives me pause, but only because we need to make sure it's not semantically ambiguous. Let's see if I'm getting it right:

if there are no qualifiers ("o:", "m:"), we search by default in both origin and metadata, and rank the results
if there are qualifiers we only search in the associated data

Correct?

If so, I'm fine with this, but we need to check how much worse performances get.

Also, I'm not so sure the ranking criteria should be "origin hits win", maybe there's something smarter to be used there...

Make it faster to write array filters like language and license

language: python|go instead of language in [python, go]

LGTM

It should be possible to negate any filter with -

-origin:XYZ should exclude origins containing the term XYZ (exact opposite of origin:XYZ)

LGTM

Provide aliases for writing queries faster

o:xyz should be equivalent to origin:xyz

m:abc should be equivalent to metadata:abc

lang:python or l:python should be equivalent to language:python

OK, but again as long as they're not ambiguous.

Assume and between filters if anything isn't provided.

origin:X metadata:Y instead of origin: X and metadata: Y

Hell yes!

marked this issue as related to #4296

What about having a UI like in Github or Phabricator to create an advanced query? eg: https://github.com/search/advanced

We can continue to support the query language, and the QL can be generated using the UI. This will help us to support saved searches and bookmarks. We could have more search contexts (more than origin and metadata) in the future as we index more data. It will be too hard to support different contexts with varying inputs just using a QL. It can done in a UI with some moving elements. What do you think?

Polish the swh-search QL

Designs

Child items ...

Activity