- Oct 18, 2017
-
-
Antoine R. Dumont authored
-
- Oct 17, 2017
-
- Oct 16, 2017
-
-
moranegg authored
Differential Revision: https://forge.softwareheritage.org/D255
-
- Oct 12, 2017
-
-
Nicolas Dandrimont authored
-
- Oct 10, 2017
-
-
Antoine R. Dumont authored
Close T801
-
- Sep 27, 2017
-
-
Antoine R. Dumont authored
-
- Sep 05, 2017
-
-
Stefano Zacchiroli authored
-
- Aug 30, 2017
-
-
Stefano Zacchiroli authored
change cherry picked from python module template commit 71b117ba0cf9f1251b1cac26d0994df03a4c787d
-
- Jul 28, 2017
-
-
moranegg authored
Reviewers: ardumont Differential Revision: https://forge.softwareheritage.org/D236
-
moranegg authored
added CodeRepository to translated_metadata for revision_metadata also run main with click with multiple args
-
- Jul 26, 2017
- Jul 25, 2017
-
-
moranegg authored
Summary: - renaming methods filter_contents to filter and index_content to index in all sub-classes and orchestrator - renaming dependencies to ContentIndexer instead of BaseIndexer - renaming in tests Added RevisionMetadataIndexer with a detection tool for metadata - RevisionMetadataIndexer takes a list of revisions and detects in the root directory all the file names supported by the swh-metadata-detector version 0.0.1 that can contain metadata - checks if files where translated before in the content_metadata table - if not: sends the files to indexation - aggregates results Note: should keep results in revision_metadata but this part is not ready in the storage - also, changed init of ContentMetadataIndexer with tool in args Updated documentation with new revision indexer Test Plan: WIP (will be updated today) Reviewers: ardumont Differential Revision: https://forge.softwareheritage.org/D233
-
moranegg authored
- RevisionMetadataIndexer takes a list of revisions and detects in the root directory all the file names supported by the swh-metadata-detector version 0.0.1 that can contain metadata - checks if files where translated before in the content_metadata table - if not: sends the files to indexation - aggregates results Note: should keep results in revision_metadata but this part is not ready in the storage - also, changed init of ContentMetadataIndexer with tool in args
-
- Jul 24, 2017
-
-
moranegg authored
- renaming methods filter_contents to filter and index_content to index in all sub-classes and orchestrator - renaming dependencies to ContentIndexer instead of BaseIndexer - renaming in tests
-
- Jul 11, 2017
-
-
moranegg authored
Summary: providing the possibility to extract the metadata_dictionary into a stand alone repo as a complete tool. Also this is the naming used in the storage. Reviewers: ardumont Differential Revision: https://forge.softwareheritage.org/D221
-
- Jun 28, 2017
-
-
Morane Otilia Gruenpeter authored
Summary: for indexing content in content_metadata we want to use metadata tools to extract metadata from manifest files and keep in same format(syntax) and with same terms(semantic) temp solution: in Metadata_Dictionary class dispatch the content for parsing and translation using hard coded mapping (should be extracted from storage) to translate package.json files to codemeta terms testing: in test_metadata 3 running tests for the compute_metadata function with and without content and for the metadata indexer (the storage part of it isn't implemented) with local npm mapping Differential Revision: https://forge.softwareheritage.org/D215 Refactor metadata dictionary with a class for each mapping change to compute_metadata without a MetaDict class added a tools mapping to help with dispatch added convert function to decode content note: delete compare function that returns a wrong result
-
- Jun 13, 2017
-
-
Morane Otilia Gruenpeter authored
Summary: moved MockObjStorage to new file test_utils.py added file test_language for testing indexer using python and c content for testing pygment tool Reviewers: ardumont Differential Revision: https://forge.softwareheritage.org/D210
-
Antoine R. Dumont authored
This has the nice feature to ease initialization in tests.
-
- Jun 12, 2017
-
-
Nicolas Dandrimont authored
-
- Jun 07, 2017
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Before this commit, the following situation could arise: - no encoding detected (we cannot do much after that yet) - no result after multiple tryouts to compute the lang This resulted in exceptions at runtime. So, for now, we check those situations and we reference the sha1 with lang None. The purpose is to permit an ulterior scheduling of those later on (e.g. when we improve the encoding policy reuse for example).
-
- Jun 06, 2017
-
-
Antoine R. Dumont authored
When chunking on the wrong byte, we can raise 'unexpected end of data'. This permits to capture this error and test the same content on the next ending byte. Related T722
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Related T721
-
Antoine R. Dumont authored
-
- Jun 02, 2017
-
-
Antoine R. Dumont authored
Related T722
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Related T722
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Related T722
-
- Jun 01, 2017
-
-
Antoine R. Dumont authored
It's up to the indexer to query the storage to determine the indexer configuration's identifier and pass along the information when sending data for filter or update operations. Related T722
-
- May 29, 2017
-
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
Related T722
-
- May 22, 2017
-
-
Antoine R. Dumont authored
-
- May 17, 2017
-
-
Antoine R. Dumont authored
Related T713
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
-
Antoine R. Dumont authored
No rescheduling by default
-
Antoine R. Dumont authored
Prior to this, each indexer received the same batch of contents as the orchestrator. As some of our indexer need quite some time to finish when the batch size is huge enough, the only reasonable step was to reduce such batch size. But, in effect, this impacted as well other indexers which did not need such restriction (this for example creates more little db transations). The introduction of such option avoids that impact. This also introduces the option to check the presence on a per indexer basis. Related T713
-