Allow logging configuration from configuration yaml file
This will allow proper logging configuration for the services which are currently running in the dynamic infrastructure. Their logs are current written in the wrong elasticsearch indices.
The existing behavior is kept as is so we can focus on changing the current logging configuration deployment incrementally.
And then, some json logging configuration file could look like:
---
version: 1
handlers:
console:
class: logging.StreamHandler
formatter: json
stream: ext://sys.stdout
formatters:
json:
class: pythonjsonlogger.jsonlogger.JsonFormatter
# python-json-logger parses the format argument to get the variables it actually expands into the json
format: "%(asctime)s:%(threadName)s:%(pathname)s:%(lineno)s:%(funcName)s:%(task_name)s:%(task_id)s:%(name)s:%(levelname)s:%(message)s"
loggers:
celery:
level: INFO
amqp:
level: WARNING
urllib3:
level: WARNING
azure.core.pipeline.policies.http_logging_policy:
level: WARNING
swh: {}
celery.task: {}
root:
level: <<loglevel>>
handlers:
- console
While a systemd logging config matching the current setup for celery would look like:
---
version: 1
handlers:
console:
class: logging.StreamHandler
formatter: task
stream: ext://sys.stdout
systemd:
class: swh.core.logger.JournalHandler
formatter: task
formatters:
task:
(): celery.app.log.TaskFormatter
fmt: "[%(asctime)s: %(levelname)s/%(processName)s] %(task_name)s[%(task_id)s]: %(message)s"
use_color: false
loggers:
celery:
level: INFO
amqp:
level: WARNING
urllib3:
level: WARNING
azure.core.pipeline.policies.http_logging_policy:
level: WARNING
swh: {}
celery.task: {}
root:
level: DEBUG
handlers:
- console
- systemd
Note: the task_name and task_id qualifiers are only available within celery, when using the TaskFormatter
Ref. swh/infra/sysadm-environment#4524 (closed) Depends on swh-core!335 (merged)
Merge request reports
Activity
-
I believe that json logging and the default stdout logging should be exclusive (or else how are we going to distinguish between lines that are expected to be json, and lines that are not?)
-
I think the complexity of this logging setup function should be pushed to a dictConfig, like we do in swh.core: https://gitlab.softwareheritage.org/swh/devel/swh-core/-/blob/8c455e6324a287bb58c4e9cbbd757368e2029aca/swh/core/cli/init.py#L137-156
This way, we don't need to fiddle with extra dependencies either: it'll be up to the deployment manifest to select which loggers to use, how to install them (using a config similar to https://github.com/madzak/python-json-logger#using-a-config-file). We then won't have to patch the code when a deployment wants different logging to pinpoint an error, etc.
Edited by Nicolas Dandrimont-
I believe that json logging and the default stdout logging should be exclusive (or else how are we going to distinguish between lines that are expected to be json, and lines that are not?)
Totally. I did not realize that yet ^
I think the complexity of this logging setup function should be pushed to a dictConfig, like we do in swh.core: https://gitlab.softwareheritage.org/swh/devel/swh-core/-/blob/8c455e6324a287bb58c4e9cbbd757368e2029aca/swh/core/cli/init.py#L137-156
ok, sounds like the better path indeed.
If we go the dictConfig way, I should still probably adapt the current
swh.scheduler.celery_backend.config.setup_log_handler
to read the said dictConfig file like the swh cli does.That's part of your point, right?
This way, we don't need to fiddle with extra dependencies either: it'll be up to the deployment manifest to select which loggers to use, how to install them (using a config similar to https://github.com/madzak/python-json-logger#using-a-config-file). We then won't have to patch the code when a deployment wants different logging to pinpoint an error, etc.
ok
sounds like a plan ;)
I'll drop this MR then.
Thanks a lot for the insight!
Edited by Antoine R. DumontIf we go the dictConfig way, I should still probably adapt the current
swh.scheduler.celery_backend.config.setup_log_handler
to read the said dictConfig file like the swh cli does.Yeah, absolutely.
I see a few steps to go that way:
- extract the swh.core cli logging setup to a generic function (in a new module swh.core.logging - different from swh.core.logger which depends on python-systemd)
- override the logic in the swh.scheduler celery logging setup hook if the SWH_LOG_CONFIG envvar is set
- migrating all celery worker workloads (elastic, static in prod, docker dev) to an explicit config file
- once 3 is done, drop the current logic in the celery logging setup hook
Does that make sense?
A json logging config would look like:
--- version: 1 handlers: console: class: logging.StreamHandler formatter: json stream: ext://sys.stdout formatters: json: class: pythonjsonlogger.jsonlogger.JsonFormatter # python-json-logger parses the format argument to get the variables it actually expands into the json format: "%(asctime)s:%(threadName)s:%(pathname)s:%(lineno)s:%(funcName)s:%(task_name)s:%(task_id)s:%(name)s:%(levelname)s:%(message)s" loggers: celery: level: INFO amqp: level: WARNING urllib3: level: WARNING azure.core.pipeline.policies.http_logging_policy: level: WARNING swh: {} celery.task: {} root: level: <<loglevel>> handlers: - console
A systemd logging config for celery would look like
--- version: 1 handlers: console: class: logging.StreamHandler formatter: task stream: ext://sys.stdout systemd: class: swh.core.logger.JournalHandler formatter: task formatters: task: (): celery.app.log.TaskFormatter fmt: "[%(asctime)s: %(levelname)s/%(processName)s] %(task_name)s[%(task_id)s]: %(message)s" use_color: false loggers: celery: level: INFO amqp: level: WARNING urllib3: level: WARNING azure.core.pipeline.policies.http_logging_policy: level: WARNING swh: {} celery.task: {} root: level: DEBUG handlers: - console - systemd
(the task_name and task_id qualifiers are only available within celery, when using the TaskFormatter)
Edited by Nicolas Dandrimontadded 1 commit
- b5479b52 - Allow logging configuration from configuration yaml file
added 1 commit
- ed52ac55 - Allow logging configuration from configuration yaml file
added 1 commit
- 8f0849a7 - Allow logging configuration from configuration yaml file