- Aug 13, 2021
- Aug 09, 2021
-
- Aug 06, 2021
-
-
Kumar Shivendu authored
Integrate the query language translator in the Elasticsearch implementation
-
- Jul 30, 2021
-
-
vlorentz authored
Putting it in a subdirectory that isn't a subpackage should make it undiscoverable.
-
Kumar Shivendu authored
Translate swh search query language queries into Elasticsearch DSL
-
- Jul 28, 2021
-
-
vlorentz authored
Like this: ``` ql_rel_paths = [ "swh_ql.so", # installed "../../query_language/swh_ql.so", # development ] for ql_rel_path in ql_rel_paths: ql_path = resource_filename("swh.search", ql_rel_path) if os.path.exists(ql_path): break else: assert False, 'not found' search_ql = Language(ql_path, "swh_search_ql") ``` `data_files` is not designed to be accessed from the same Python package, but to write files in standard locations (typically `.desktop` files in /usr/share) that other packages read with their own discovery mechanisms.
-
vlorentz authored
-
- Jul 26, 2021
-
-
Kumar Shivendu authored
The grammar should not allow using sort_by and limit more than once throughout the query. Unlike other filters, these two must not be concatenated by 'and' or 'or'
-
Kumar Shivendu authored
This revision defines the grammar for the search query language and prepares swh.search for a smoother development of the grammar. The parsers generated from the proposed grammar serve two different purposes: - Translation of search queries into elasticsearch DSL in swh.search (or any other search backend that we may use in the future) - Autocompletion of the queries in the swh.web (Archive UI) tree-sitter has been selected for the task because it has bindings for python (swh.search) as well as wasm (swh.web).
-
- Jul 23, 2021
-
-
Nicolas Dandrimont authored
Sometimes, in a very loaded situation, the producer can return and let the consumer start before the topic is actually created. Adding a `producer.flush()` avoids that race condition.
-
- Jul 22, 2021
-
-
Kumar Shivendu authored
Documentation for the proposed search query language
-
- Jul 21, 2021
-
-
Nicolas Dandrimont authored
The origin_visit_status topic now contains the `type` key, which is all the information that we used in the origin_visit topic; We can stop processing that topic altogether.
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
Nicolas Dandrimont authored
-
- Jul 13, 2021
-
-
Kumar Shivendu authored
intrinsic_metadata often contains date_{created,modified,published} which can be used as sorting options as well as filters.
-
- Jul 05, 2021
-
-
Kumar Shivendu authored
instrinsic_metadata contains keywords and description which are very useful for finding desirable origins based on a list of keywords provided by the user.
-
- Jul 02, 2021
-
-
Kumar Shivendu authored
Instrinsic_metadata often contains programmingLanguage and license fields which are very useful as search filters. These values can be used until license and language mined by swh.indexer aren't ready to use.
-
- Jun 28, 2021
-
-
Kumar Shivendu authored
Sorting options are important features of an advanced search interface. This diff introduces sort_by parameter in origin_search which shall facilitate the same.
-
- Jun 23, 2021
-
-
Kumar Shivendu authored
Number of fields in ES is expected to grow with time so the update script should be easy to read.
-
Kumar Shivendu authored
Last revision/release dates are excellent candidates for sorting options and filters. These changes store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage so these changes also configure swh.storage client in swh.search.
-
- Jun 17, 2021
-
-
Kumar Shivendu authored
Summary: last_eventful_visit_date is a good candidate for sorting options and filters. These changes will store this value in ES when OriginVisitStatus objects are received by the swh.search journal client. Reviewers: #reviewers, vsellier Reviewed By: #reviewers, vsellier Subscribers: vlorentz, vsellier Differential Revision: https://forge.softwareheritage.org/D5878
-
- Jun 16, 2021
-
-
Vincent Sellier authored
Ensure all the fields metadata.* are created as a string in the mapping Related to T3373
-
- Jun 15, 2021
-
-
Kumar Shivendu authored
swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka). These two values are good candidates for filters and the sorting feature so this commit provides the code to store these values when they are recieved by the swh.search journal client
-
- Jun 14, 2021
-
- Jun 11, 2021
-
-
Antoine Lambert authored
Previous value was inducing pytest log pollution when tests fail as the whole JSON query is printed in that case. This was resulting in a really big and hard to exploit log file.
-
Antoine Lambert authored
Debugging update painless script is hard as errors are returned in a not really readable JSON format. To gain debuggig time, wrap search.origin_update calls when running elasticsearch tests in order to catch painless script errors and pretty print them. Tests will also immediatly fail when such errors are detected.
-
- Jun 09, 2021
-
-
Antoine Lambert authored
-
- Apr 29, 2021
- Apr 26, 2021
-
-
Antoine Lambert authored
Enable to check package documentation can be built without producing sphinx warnings. The sphinx environment is designed to be used in continuous integration in order to prevent breaking documentation build when committing changes. The sphinx-dev environment is designed to be used inside a full swh development environment. Related to T3258
-
- Apr 13, 2021
-
-
vlorentz authored
-
- Apr 08, 2021
-
-
Nicolas Dandrimont authored
This adds a boolean field, "blocklisted", to the origin documents. When this field is true, the documents are not returned in search results. The blocklist field is sticky for updates, meaning that another run of the swh.search journal client (or further visits of the origin) will keep the origin hidden from search results.
- Mar 19, 2021
-
-
Stefano Zacchiroli authored
to match the style of other headings
-
- Mar 04, 2021
-
-
Vincent Sellier authored
Related to T3076
- Mar 03, 2021
-
-
Vincent Sellier authored
- Use a flask hook to be sure the index is initialized before performing any action - Fix the server tests to avoid a side effect with the global application configuration not reset between 2 tests Related to T3076
-