Setup async interface for discovery module
This will allow us to use this interface in async code like swh-scanner
.
Unfortunately, this means calling asyncio.run
for sync code, but the
performance impact should be negligible.
The swh_storage.*missing*
APIs are inconsistent for each type, which
requires a lot of boilerplate code. This should be addressed in a
follow-up.
There is also one hack (marked as such inline) needed to have an API that isn't also plagued by implementation details of needing to keep ID -> object matchers around. This should be a removed as a direct consequence of aligning the storage APIs.
Migrated from D8538 (view on Phabricator)
Merge request reports
Activity
Build is green
Patch application report for D8538 (id=30790)
Rebasing onto 798f749e...
Current branch diff-target is up to date.
Changes applied before test
commit 19db5769cbe618b77c322189b66994222e2c8668 Author: Raphaël Gomès <rgomes@octobus.net> Date: Mon Sep 26 18:21:54 2022 +0200 Setup async interface for discovery module This will allow us to use this interface in async code like ``swh-scanner``. Unfortunately, this means calling ``asyncio.run`` for sync code, but the performance impact should be negligible. The ``swh_storage.*missing*`` APIs are inconsistent for each type, which requires a lot of boilerplate code. This should be addressed in a follow-up. There is also one hack (marked as such inline) needed to have an API that isn't also plagued by implementation details of needing to keep ID -> object matchers around. This should be a removed as a direct consequence of aligning the storage APIs.
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/896/ for more details.
Build is green
Patch application report for D8538 (id=30795)
Rebasing onto 798f749e...
Current branch diff-target is up to date.
Changes applied before test
commit fe6610629b1bff545b30ea86743d8e90a0487a47 Author: Raphaël Gomès <rgomes@octobus.net> Date: Mon Sep 26 18:21:54 2022 +0200 Setup async interface for discovery module This will allow us to use this interface in async code like ``swh-scanner``. Unfortunately, this means calling ``asyncio.run`` for sync code, but the performance impact should be negligible. The ``swh_storage.*missing*`` APIs are inconsistent for each type, which requires a lot of boilerplate code. This should be addressed in a follow-up. There is also one hack (marked as such inline) needed to have an API that isn't also plagued by implementation details of needing to keep ID -> object matchers around. This should be a removed as a direct consequence of aligning the storage APIs.
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/897/ for more details.
39 def __init__( 40 self, 41 contents: List[model.Content], 42 skipped_contents: List[model.SkippedContent], 43 directories: List[model.Directory], 44 ) -> None: 45 self.contents = contents 46 self.skipped_contents = skipped_contents 47 self.directories = directories 48 49 @abc.abstractmethod 50 async def content_missing(self, contents: List[Sha1Git]) -> Iterable[Sha1Git]: 51 """List content missing from the archive by sha1""" 52 53 @abc.abstractmethod 54 async def skipped_content_missing( Build is green
Patch application report for D8538 (id=30851)
Rebasing onto 798f749e...
Current branch diff-target is up to date.
Changes applied before test
commit 404c1b9a30928f42a8f32bbfbcfe64986fdc8126 Author: Raphaël Gomès <rgomes@octobus.net> Date: Mon Sep 26 18:21:54 2022 +0200 Setup async interface for discovery module This will allow us to use this interface in async code like ``swh-scanner``. Unfortunately, this means calling ``asyncio.run`` for sync code, but the performance impact should be negligible. The ``swh_storage.*missing*`` APIs are inconsistent for each type, which requires a lot of boilerplate code. This should be addressed in a follow-up. There is also one hack (marked as such inline) needed to have an API that isn't also plagued by implementation details of needing to keep ID -> object matchers around. This should be a removed as a direct consequence of aligning the storage APIs. See inline comment.
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/899/ for more details.
I still don't understand why you need the
_all_contents
hack instead of doing this:async def skipped_content_missing( self, skipped_contents: List[Sha1Git] ) -> Iterable[Sha1Git]: """List skipped content missing from the archive by sha1""" contents = [{"sha1_git": s} s in skipped_contents] return (d["sha1_git"] for d in self.storage.skipped_content_missing(contents))
! In !321 (closed), @vlorentz wrote: I still don't understand why you need the
_all_contents
hack instead of doing this:async def skipped_content_missing( self, skipped_contents: List[Sha1Git] ) -> Iterable[Sha1Git]: """List skipped content missing from the archive by sha1""" contents = [{"sha1_git": s} s in skipped_contents] return (d["sha1_git"] for d in self.storage.skipped_content_missing(contents))
Ah. I was sure
skipped_content_missing
tookSkippedContent
instead of dicts. Sorry about that, will update this patch soon.Build is green
Patch application report for D8538 (id=30912)
Rebasing onto 798f749e...
Current branch diff-target is up to date.
Changes applied before test
commit 1facea3cd215155f77ae4083a33837b7c6f642b0 Author: Raphaël Gomès <rgomes@octobus.net> Date: Mon Sep 26 18:21:54 2022 +0200 Setup async interface for discovery module This will allow us to use this interface in async code like ``swh-scanner``. Unfortunately, this means calling ``asyncio.run`` for sync code, but the performance impact should be negligible. The ``swh_storage.*missing*`` APIs are inconsistent for each type, which requires a lot of boilerplate code. This should be addressed in a follow-up.
See https://jenkins.softwareheritage.org/job/DLDBASE/job/tests-on-diff/905/ for more details.
Still a couple of remarks/questions.
- Isn't this missing an asyncio dependency requirements in /requirements.txt
nope, it's fine [1]
- [1]
10:35 <+ardumont> is asyncio part of python3.7? 10:36 <+ardumont> (for loader.core and the new discovery async code ^) 10:36 <Alphare> yes 10:36 <Alphare> https://docs.python.org/3.7/library/asyncio.html
- Why is this defined here and not in the storage since all the current dicsovery methods involve the storage?
see [2]
mentioned in merge request swh-scanner!64 (closed)