Investigate why logging is inconsistent
Logging investigation
Storage service crashed on connection reset with objstorage service. https://forge.softwareheritage.org/$834 Thought it was an objstorage crash with no objstorage logs, but after investigation it seems not the case (all but HTTP 500 errors are not logged).
Observation: objstorage server only logs internal server errors, whereas storage also logs access. How is it so?
TL;DR: The aiohttp-based services use an aiohttp Gunicorn worker which implements a different feature set than the builtin Gunicorn workers used by the other servers. The same configuration yield different logging behavior out-of-the box depending on the workers used. In order to have consistent logging accross WSGI services, a fix is proposed as a common configuration file that would lead to uniforn logging.
Library server implementations
Aiohttp
https://docs.aiohttp.org/en/stable/logging.html https://docs.aiohttp.org/en/stable/web_quickstart.html#aiohttp-web-exceptions https://docs.aiohttp.org/en/stable/web_advanced.html#middlewares "If the exception is not an instance of HTTPException it is converted to 500 HTTPInternalServerError after processing the middlewares chain."
Access and error logs active by default, given that logging is configured (not default). All logs defaults >=DEBUG and to stderr if no handlers. Access logs are actually out as INFO, whereas errors as ERROR.
Flask
https://flask.palletsprojects.com/en/1.1.x/logging/
Main logger app.logger
. Access logger werkzeug
.
Flask default adds a StreamHandler
to app.logger
, if logging not configured.
Werkzeug default adds a StreamHandler
to logging.root
, if logging not configured.
Python stream handlers log to stderr by default.
Gunicorn
https://docs.aiohttp.org/en/stable/deployment.html#aiohttp-deployment-gunicorn https://docs.gunicorn.org/en/stable/settings.html#accesslog
Defaults: accesslog: None, errorlog: - Gunicorn main logging options: --error-logfile --access-logfile --loglevel --log-config --capture-output --log-syslog Example logging config file https://github.com/benoitc/gunicorn/blob/master/examples/logging.conf
SWH server implementations
Find everything related to logs:
git grep -iP "[\s_.]log(?:ger|ging)?[\s_.]"
core.api.asynchronous
Base: aiohttp
Logger: no logging config or getting
Both error and access logs enabled.
Error handler:
bubbles up aiohttp.web.HTTPException errors (not only 4XX) but log and send 5XX errors.
HTTP errors not used beyond the defaults of the framework (403, 404, 405, 413, 417)
grep -n "raise HTTP" ~/.virtualenvs/swh/lib/python3.7/site-packages/aiohttp*/**
-> remove special aiohttp.web.HTTPException handling?
core.api.__init__
Base: Flask
logger = logging.getLogger(__name__)
Error handler:
handles all errors by logging and sending
objstorage
Base: core.api.asynchronous
No special logging or error handler.
Absolutely no relevant logging code.
storage
Base: core.api.__init__
2 error_handlers: StorageArgumentException = 400, other exceptions = 500
Logging handler is added, seems useless because the same as Flask default.
swh/storage/api/server.py: handler = logging.StreamHandler(); app.logger.addHandler(handler) // useless?
bin/swh-storage-add-dir: logging.basicConfig(level=logging.INFO)
swh/storage/cli.py: logging.getLogger("werkzeug").setLevel(ctx.obj["log_level"])
swh/storage/cli.py: logging.basicConfig()
Loggers mostly used for warnings (2 occurences) or replay/backfill (irrelevant). Where does the displayed API access logs DEBUG come from?
vault
Base: core.api.asynchronous
No special logging or error handler.
swh/vault/__init__.py:logger = logging.getLogger(__name__)
swh/vault/__init__.py: logger.debug("Instantiating
swh/vault/cli.py: setup_log_handler(loglevel=ctx.obj.get("log_level", logging.INFO),
swh/vault/cookers/base.py: logging.exception("Bundle cooking failed.")
Testing objstorage logging
Setup
Have a valid FS tree for objstorage:
sudo mkdir -p /srv/softwareheritage/objects && sudo chown -R me:me /srv
Devel server:
swh objstorage -C conf/objstorage.yml rpc-serve
Production server:
--worker-class aiohttp.worker.GunicornWebWorker --log-level DEBUG
'swh.objstorage.api.server:make_app_from_configfile()'```
Added prints in `core.api.asynchronous:middleware_handler` `except` block.
Base request:
`curl -s -X POST http://127.1:5003/check_config -H "Content-Type: application/json" -d '{"check_write": false}'`
Base logs:
ERROR:root:... Traceback (most recent call last): ... TypeError: check_config() got an unexpected keyword argument 'check_writ'
INFO:aiohttp.access:127.0.0.1 [29/Oct/2020:16:22:21 +0000] "POST /check_config HTTP/1.1" 500 897 "-" "curl/7.64.0"
Objstorage devel server log *all* access like above access log, log *some* errors like above error log.
Server: always above access log, sometimes above logging.root error log
### Tests
- POST, /check_config, json, {"check_write": false}
no error log, 200, server no traceback, client "."
- POST, /check_confi, json, {"check_writ": false}
no error log, 404, server no traceback, client "404: Not Found"
- GET, /check_config
no error log, 405, server no traceback, client "405: Method Not Allowed"
- POST, /check_config, json, {"check_writ": false}
error log, 500, server+client traceback "TypeError: .*unexpected keyword argument"
- POST, /check_config, jso, {"check_write": false}
error log, 500, server+client traceback "ValueError: Wrong content type"
Error 500 are logged and sent to client. They come not as aiohttp exceptions but as any other Python exception.
Error 4XX are not logged but sent to client. They are the only `aiohttp.web.HTTPException`s raised.
Other statuses are not logged.
## Testing storage logging
Added marker "TEST" in `gunicorn.worker.pre_request` log output to verify this is the log source.
-> Which it was.
`SWH_CONFIG_FILENAME=conf/storage.yml gunicorn --bind 0.0.0.0:5002 --log-level DEBUG
--error-logfile None 'swh.storage.api.server:make_app_from_configfile()'`
-> No more logging, pre_request` logs go to stderr along with errors, as it uses the same logger.
## Conclusion
Gunicorn (default) workers call `pre_request` at each `handle_request`. This does log through errorlog.
aiohttp GunicornWebWorker inherits Gunicorn BaseWorker but does not implement some features like Gunicorn server hooks like `pre_request`
Our configs disable access logs but Gunicorn workers do "accesslog" through errorlog...
To have consistant logging accross SWH API servers, we could disable the `pre_request` hook through the Gunicorn config file, and enable accesslog.
Should we remove aiohttp error special case in core.api.async error handler?
Proposed config file would be similar to:
pre_request = lambda *args: None accesslog = "-"
Do we want to keep special handling of `aiohttp.web.HTTPException`s in `core.api.asynchronous`?
## Annex: Docker configurations findings
Searched through Docker configurations for logging.
- docker-compose files: no logging
- SWH service configs (`conf/*.yml`): no logging
- `env/common_python.env`: no logging
- `services/swh-{service}/entrypoint.sh`:
- All same gunicorn config: "accesslog: None, errorlog: -, loglevel: DEBUG" which is default except for loglevel
- swh.core/web.gunicorn_config defines only sentry config,
including Flask (core) Django (web) integration.
- `--worker-class` used to specify async (aiohttp) instead of sync, in objstorage
- Inconsistency in options passing: --worker-class, --reload, --log-level
- All use Gunicorn except Vault which uses aiohttp devel server!
- Deposit is Django != architecture schema, uses swh core gunicorn_config which does not handle Django integration.
----
*Migrated from T2743 ([view on Phabricator](https://forge.softwareheritage.org/T2743))*