swh-journal: Add tests to the journal publisher using kafka instance
Testing the publisher with a real kafka instance through the pytest-kafka dependency.
This also:
- documents with a script how and what to install in regards to the kafka needed
- extracts the configuration checks done on the publisher via the cli and push those checks in the publisher directly.
- those checks are now tested (they helped in the debugging process of those tests)
- update the readme
Related T1276 Depends on D1280 Depends on D1282
Test Plan
tox
Migrated from D1239 (view on Phabricator)
Merge request reports
Activity
Build is green See https://jenkins.softwareheritage.org/job/DJNL/job/tox/28/ for more details.
Thanks for looking into pytest-kafka!
Before you go too much further with this approach, I'll just say that it makes me very uncomfortable...
We don't install external daemon dependencies (e.g. PostgreSQL, RabbitMQ, statsd, ...) in setup.py in any other module, and I don't see a reason for this module to be any different.
If the CI needs Kafka to be installed to be able to run, then we should make sure the image that the tests run on have Kafka available, the same way we currently install PostgreSQL so pytest-postgresql can work.
For local development, i feel we should just do the same thing and document how to install Kafka on the developer's system, rather than have setup.py unconditionally download stuff from the Internet (although the way it's done here looks fine at a glance).
! In !242 (closed), @olasd wrote: Thanks for looking into pytest-kafka!
Before you go too much further with this approach, I'll just say that it makes me very uncomfortable...
yes, that sounds off to my ears as well but i did not see any other way so far.
We don't install external daemon dependencies (e.g. PostgreSQL, RabbitMQ, statsd, ...) in setup.py in any other module, and I don't see a reason for this module to be any different.
indeed.
If the CI needs Kafka to be installed to be able to run, then we should make sure the image that the tests run on have Kafka available, the same way we currently install PostgreSQL so pytest-postgresql can work.
thanks for reminding me of this, i'll go check the details ;)
For local development, i feel we should just do the same thing and document how to install Kafka on the developer's system, rather than have setup.py unconditionally download stuff from the Internet (although the way it's done here looks fine at a glance).
As a trade-off, wouldn't it be ok that we keep the code for local development?
That is, dev can do, to ease setup:
./setup.py develop # <- the name should probably change, i just kept what upstream did ;)
And simply not use that part for other automatic stuff?
Thanks for the feedback
mentioned in commit swh-environment@c5a48242
Will remove the 'develop' step later
In the mean time:
- tox.ini: Remove unused dependency
- publisher: separate pytest test from mocked ones
- test_publisher: Reorganize code
- publisher: Initiate the journal cli to start running the publisher
- test_publisher2: Start trying to test with kafka
- test_publisher: Use direct instantiation of the in-memory storage
- test_publisher2: Progress towards actual publisher test with kafka
- test_publisher2: Make the test pass
- test_publisher2: Explain why we need the transformation
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tox/35/ See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tox/35/console
Rebase on latest master (and removes the cli adaptation part, !9 (closed))
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tox/38/ See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tox/38/console
Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tox/41/ See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tox/41/console
I think i'm done for the main part of the tests.
Now i need to concentrate on removing the part about the target
./setup.py develop
. Well not necessarily removing that part (because that could prove useful for others) but at least not make the ci job rely on it (through tox in the current diff's state).Build has FAILED
Link to build: https://jenkins.softwareheritage.org/job/DJNL/job/tox/42/ See console output for more information: https://jenkins.softwareheritage.org/job/DJNL/job/tox/42/console
Build is green See https://jenkins.softwareheritage.org/job/DJNL/job/tox/43/ for more details.
mentioned in commit swh/infra/ci-cd/swh-jenkins-dockerfiles@cc483aa7
So, if we keep the setup.py snippet to install kafka, I think it should be a completely separate subcommand of setup.py, rather than overriding the default behavior of the develop subcommand, which has a somewhat standard meaning of "install the current module for a development setup".
But I really think those 50 lines of python should just be turned into 5-10 lines of shell script that would be more readable and do the exact same thing.
(disclaimer: I still haven't looked too closely at the actual tests)
So, if we keep the setup.py snippet to install kafka, I think it should be a completely separate subcommand of setup.py, rather than overriding the default behavior of the develop subcommand, which has a somewhat standard meaning of "install the current module for a development setup".
well, it was for that indeed, for the tests to work so for development.
(I don't know yet how to make a subcommand of setup.py, i just started from the pytest-kafka's intial setup.py which does that)
But I really think those 50 lines of python should just be turned into 5-10 lines of shell script that would be more readable and do the exact same thing.
fine by me, i had it in swh/infra/ci-cd/swh-jenkins-dockerfiles!42 (closed) (prior to removing it) so i'll just take it back for here.
(disclaimer: I still haven't looked too closely at the actual tests)
ack, that's fine ;)
For information, it's just:
- write to the publisher (as our swh.storage.listener does)
- publisher reacts, reifies objects and publish them
- a consumer of the publisher asserts the objects are the expected reified ones