diff --git a/docs/source/AdministratorGuide/ServerInstallations/index.rst b/docs/source/AdministratorGuide/ServerInstallations/index.rst index 743b7fdbaf8..3a4ece8265a 100644 --- a/docs/source/AdministratorGuide/ServerInstallations/index.rst +++ b/docs/source/AdministratorGuide/ServerInstallations/index.rst @@ -14,6 +14,7 @@ This sections constains the documentation for installing new DIRAC Servers or se InstallingWebAppDIRAC HTTPSServices centralizedLogging + tornadoComponentsLogs rabbitmq scalingAndLimitations environment_variable_configuration diff --git a/docs/source/AdministratorGuide/ServerInstallations/tornadoComponentsLogs.rst b/docs/source/AdministratorGuide/ServerInstallations/tornadoComponentsLogs.rst new file mode 100644 index 00000000000..f56c40d5d0d --- /dev/null +++ b/docs/source/AdministratorGuide/ServerInstallations/tornadoComponentsLogs.rst @@ -0,0 +1,254 @@ +.. _tornado_components_logs: + +=============================== +Split Tornado logs by component +=============================== + +Dirac offers the ability to write logs for each component. One can find logs in /startup//log/current + +In case of Tornado, logs come from many components, and can be hard to sort. + +Using Fluent-bit will allow to collect logs from files, rearrange content, then send them elsewhere like an ELK instance or simply other files. +Thus, in case of ELK, it's now possible to monitor and display informations through Kibana and Grafana tools, using filters to sort logs, or simply read other splitted log files, one by component. + +The idea behind that is to deal with logs independantly from Dirac. It is also possible to grab servers metrics such as cpu, memory and disk usage, giving the opportunity to make correlations between logs and server usage. + +DIRAC Configuration +------------------- + +First of all, you should configure a JSON Log Backend in your ``Resources`` and ``Operations`` like:: + + Resources + { + LogBackends + { + StdoutJson + { + Plugin = StdoutJson + } + } + } + + Operations + { + Defaults + { + Logging + { + DefaultBackends = StdoutJson + } + } + } + + +Fluent-bit Installation +----------------------- + +On each Dirac server, install Fluent-bit (https://docs.fluentbit.io):: + + curl https://raw.githubusercontent.com/fluent/fluent-bit/master/install.sh | sh + +Fluent-bit Configuration +------------------------ + +Edit and add in /etc/fluent-bit/fluent-bit.conf:: + + @INCLUDE dirac-json.conf + +Create following files in /etc/fluent-bit + +dirac-json.conf (Add all needed components and choose the output you want):: + + [SERVICE] + flush 1 + log_level info + parsers_file dirac-parsers.conf + + [INPUT] + name cpu + tag metric + Interval_Sec 10 + + [INPUT] + name mem + tag metric + Interval_Sec 10 + + [INPUT] + name disk + tag metric + Interval_Sec 10 + + [INPUT] + name tail + parser dirac_parser_json + path /startup//log/current + Tag log..log + Mem_Buf_Limit 50MB + + [INPUT] + name tail + parser dirac_parser_json + path /startup//log/current + Tag log..log + Mem_Buf_Limit 50MB + + [FILTER] + Name modify + Match log.* + Rename log message + Add levelname DEV + + [FILTER] + Name modify + Match * + Add hostname ${HOSTNAME} + + [FILTER] + Name Lua + Match log.* + script dirac.lua + call add_raw + + [FILTER] + Name rewrite_tag + Match log.tornado + Rule $tornadoComponent .$ $TAG.$tornadoComponentclean.log false + Emitter_Name re_emitted + + #[OUTPUT] + # name stdout + # match * + + [OUTPUT] + Name file + Match log.* + Path /vo/dirac/logs + Mkdir true + Format template + Template {raw} + + [OUTPUT] + name es + host + port + logstash_format true + logstash_prefix + tls on + tls.verify off + tls.ca_file + tls.crt_file + tls.key_file + match log.* + + [OUTPUT] + name es + host + port + logstash_format true + logstash_prefix + tls on + tls.verify off + tls.ca_file + tls.crt_file + tls.key_file + match metric + +``dirac-json.conf`` is the main file, it defines different steps such as:: + [SERVICE] where we describe our json parser (from dirac Json log backend) + [INPUT] where we describe dirac components log file and the way it will be parsed (json) + [FILTER] where we apply modifications to parsed data, for example adding a levelname "DEV" whenever logs are not well formatted, typically "print" in code, or adding fields like hostname to know from which host logs are coming, but also more complex treatments like in dirac.lua script (described later) + [OUTPUT] where we describe formatted logs destination, here, we have stdout, files on disks and elasticsearch. + +dirac-parsers.conf:: + + [PARSER] + Name dirac_parser_json + Format json + Time_Key asctime + Time_Format %Y-%m-%d %H:%M:%S,%L + Time_Keep On + +``dirac-parsers.conf`` describes the source format that will be parsed, and the time that will be used (here asctime field) as reference + +dirac.lua:: + + function add_raw(tag, timestamp, record) + new_record = record + + if record["asctime"] ~= nil then + raw = record["asctime"] .. " [" .. record["levelname"] .. "] [" .. record["componentname"] .. "] " + if record["tornadoComponent"] ~= nil then + patterns = {"/"} + str = record["tornadoComponent"] + for i,v in ipairs(patterns) do + str = string.gsub(str, v, "_") + end + new_record["tornadoComponentclean"] = str + raw = raw .. "[" .. record["tornadoComponent"] .. "] " + else + raw = raw .. "[]" + end + raw = raw .. "[" .. record["customname"] .. "] " .. record["message"] .. " " .. record["varmessage"] .. " [" .. record["hostname"] .. "]" + new_record["raw"] = raw + else + new_record["raw"] = os.date("%Y-%m-%d %H:%M:%S %Z") .. " [" .. record["levelname"] .. "] " .. record["message"] .. " [" .. record["hostname"] .. "]" + end + + return 2, timestamp, new_record + end + +``dirac.lua`` is the most important transformation we perform on primarily logs, it builds new record depending on logs containing or not special field tornadocomponent, then cleans and formats it before sending to the outputs. + +Testing +------- + +Before throwing logs to ElasticSearch, config can be tested in Standard output by uncommenting:: + + [OUTPUT] + name stdout + match * + +...and commenting ElasticSearch outputs. + +Then by using command:: + + /opt/fluent-bit/bin/fluent-bit -c //etc/fluent-bit/fluent-bit.conf + +NOTE: When all is OK, uncomment ElasticSearch outputs and comment stdout output + +Service +------- + +``sudo systemctl start/stop fluent-bit.service`` + +Dashboards +---------- + +In case of logs sent to an ELK instance, dashboards are available here: ??? + +On disk +------- + +In case of logs sent to local files, Logrotate is mandatory. + +Having a week log retention, Logrotate config file should look like +/etc/logrotate.d/diraclogs:: + + /vo/dirac/logs/* { + rotate 7 + daily + missingok + notifempty + compress + delaycompress + create 0644 diracsgm dirac + sharedscripts + postrotate + /bin/kill -HUP `cat /var/run/syslogd.pid 2>/dev/null` 2>/dev/null || true + endscript + } + +along with crontab line like + +``0 0 * * * logrotate /etc/logrotate.d/diraclogs``