You can use optional run parameters:
- -s - to specify logs source directory (by default
logs-hf
is set) - -d - to specify logs destination directory (by default
./
is set) - -w - to specify
workflow.json
file path (by default./
is set) - -o - to omit creading new dedicated directory in the destination path (defaults to false)
e.g.
python3 parser.py -s logs-hf -d parsed-logs -w workflow.json
Logs are written to directory with name pattern <dest_dir>/<workflow_name>__<workflow_size>__<version>__<date_time>
, where:
dest_dir
- destination directory from run parameters,workflow_name
- extracted fromworkflow.json
file,undefined
ifname
key does not exist,workflow_size
- extracted fromworkflow.json
file, number of processes ifsize
key does not exist,version
- extracted fromworkflow.json
file,1.0.0
ifversion
key does not exist,date_time
- timestamp in%Y-%m-%d-%H-%M-%S
format
eg. montage__0.25__1.0.0__2020-04-20-12-01-24
,
Parser generates following files in JSON lines format:
job_descriptions.jsonl
sys_info.jsonl
metrics.jsonl
Identifiers:
hyperflowId
- eg. HbD2SFH5workflowId
- eg. HbD2SFH5-16jobId
- eg. HbD2SFH5-16-44
{
"workflowName":"montage",
"size":"0.25",
"version":"1.0.0",
"hyperflowId":"6ZYgjDbbG",
"jobId":"6ZYgjDbbG-1-29",
"env":{
"podIp":"10.40.0.56",
"nodeName":"gke-cluster-x-default-pool-917ea268-7zqx",
"podName":"job4kxmy-mdifffit-29-1-qxfg7",
"podServiceAccount":"default",
"podNamespace":"default"
},
"nodeName":"gke-cluster-x-default-pool-917ea268-7zqx",
"executable":"mBgModel",
"args":[
"-i",
"100000",
"pimages_20180402_165339_22325.tbl",
"fits.tbl",
"corrections.tbl"
],
"inputs":[
{
"name":"fits.tbl",
"size":3745
},
{
"name":"pimages_20180402_165339_22325.tbl",
"size":1936
}
],
"outputs":[
{
"name":"corrections.tbl",
"size":573
}
],
"name":"mBgModel",
"command":"mBgModel -i 100000 pimages_20180402_165339_22325.tbl fits.tbl corrections.tbl",
"execTimeMs":1030
}
{
"cpu":{
"manufacturer":"Intel®",
"brand":"Xeon®",
"vendor":"",
"family":"",
"model":"",
"stepping":"",
"revision":"",
"voltage":"",
"speed":"2.00",
"speedmin":"",
"speedmax":"",
"governor":"",
"cores":2,
"physicalCores":2,
"processors":1,
"socket":"",
"cache":{
"l1d": 32768,
"l1i": 32768,
"l2": 1048576,
"l3": 40370176
}
},
"mem":{
"total":2095239168,
"free":130646016,
"used":1964593152,
"active":849100800,
"available":1246138368,
"buffers":105852928,
"cached":1078996992,
"slab":149585920,
"buffcache":1334435840,
"swaptotal":0,
"swapused":0,
"swapfree":0
},
"jobId":"6ZYgjDbbG-1-29"
}
Two types of metrics (values for key parameter
):
- events -
event
- measurements -
cpu
,memory
,ctime
,io
,network
Possible values for events:
handlerStart
jobStart
jobEnd
handlerEnd
{
"time":"2020-03-30T17:22:45.160",
"workflowId":"6ZYgjDbbG-1",
"jobId":"6ZYgjDbbG-1-1",
"name":"mProjectPP",
"parameter":"event",
"value":"jobStart"
}
cpu
{
"time":"2020-03-30T17:23:31.083",
"pid":"8",
"workflowId":"6ZYgjDbbG-1",
"jobId":"6ZYgjDbbG-1-29",
"name":"mBgModel",
"parameter":"cpu",
"value":0
}
memory
{
...,
"parameter":"memory",
"value":11304960
}
ctime
{
...,
"parameter":"ctime",
"value":30
}
io
{
...,
"parameter":"io",
"value":{
"read":1225,
"write":1,
"readSyscalls":5,
"writeSyscalls":1,
"readReal":0,
"writeReal":0,
"writeCancelled":0
}
}
network
{
...,
"parameter":"network",
"value":{
"name":"eth0",
"rxBytes":5777,
"rxPackets":15,
"rxErrors":0,
"rxDrop":0,
"rxFifo":0,
"rxFrame":0,
"rxCompressed":0,
"rxMulticast":0,
"txBytes":1336,
"txPackets":15,
"txErrors":0,
"txDrop":0,
"txFifo":0,
"txColls":0,
"txCarrier":0,
"txCompressed":0
}
}