Improve access_logs parsing
Actually the apache logs are sent to elasticsearch and there some parsing made by logstash [1]. These data can be used to extract create interesting dashboards on grafana and help to diagnose some issue (response time per time of request / response code / ... )
To do so, the logs and the parsing must be improved to :
-
Add the request duration on the log
-
Convert the response code to an integer
-
convert the "bytes" to an integer
-
Add some tags like the application and the environment to help create usable filters
{
"_index": "apache_logs-2020.11.17",
"_type": "_doc",
"_id": "UAH81XUBO-oK4hKuO702",
"_score": 1,
"_source": {
"@version": "1",
"agent": {
"id": "926d1a92-fb11-4e60-b29e-59550ea0ade8",
"type": "filebeat",
"hostname": "moma",
"name": "moma",
"version": "7.8.0",
"ephemeral_id": "e5ad2d14-108b-471b-8a88-f4dddee584ad"
},
"ident": "-",
"httpversion": "1.1",
"referrer": "\"-\"",
"host": {
"name": "moma"
},
"log": {
"file": {
"path": "/var/log/apache2/archive.softwareheritage.org_non-ssl_access.log"
},
"offset": 13209487
},
"fields": {
"apache_log_type": "access_log"
},
"ecs": {
"version": "1.5.0"
},
"auth": "-",
"verb": "GET",
"input": {
"type": "log"
},
"tags": [
"beats_input_codec_plain_applied"
],
"@timestamp": "2020-11-17T11:34:37.000Z",
"timestamp": "17/Nov/2020:11:34:37 +0000",
"request": "/browse/revision/dde8d9775f9fe122dfeb03c3fd736118e1062887/",
"message": "::1 - - [17/Nov/2020:11:34:37 +0000] \"GET /browse/revision/dde8d9775f9fe122dfeb03c3fd736118e1062887/ HTTP/1.1\" 200 8378 \"-\" \"Mozilla/5.0 (compatible; SemrushBot/6~bl; +http://www.semrush.com/bot.html)\"",
"response": "200",
"bytes": "8378",
"clientip": "::1"
},
"fields": {
"@timestamp": [
"2020-11-17T11:34:37.000Z"
]
}
}
Migrated from T2787 (view on Phabricator)