Skip to content

The implementation of SwissLog in ISSRE'20 and TDSC'22

License

Notifications You must be signed in to change notification settings

IntelligentDDS/SwissLog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SwissLog

This repository is the basic implementation of our publication in ISSRE'20 conference paper SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults and its extend version on TDSC SwissLog: Robust Anomaly Detection and Localization for Interleaved Unstructured Logs. SwissLog contains two parts: log parsing and anomaly detection.

Description

SwissLog adopts a novel log parsing method and extracts multiple templates by tokenizing, dictionarizing, and clustering history log data. Unlike other log parsing methods, our dictionary-based method requires no parameter tuning process. These templates are kept as natural sentences instead of event ids. We link those log statements with the same identifiers or simply use a sliding window to construct log sequences named sessions. And then the log sequence is transformed into semantic information and temporal information. SwissLog uses BERT encoder to encode semantic information into semantic embedding and projects temporal information onto time embedding. The concatenation of semantic embedding and time embedding as input is fed into Attn-based Bi-LSTM to learn the features of normal, abnormal and performance-anomalous log sequence.

Project Structure

The file structure is as belows:

.
├── LICENSE
├── README.md
├── anomaly_detection
│   ├── encoder
│   │   ├── BertEncoder.py
│   │   └── word2vecEncoder.py
│   ├── perf_model.py
│   ├── perf_predict.py
│   └── perf_train.py
├── log_parser
│   ├── EngCorpus.pkl
│   ├── logs
│   ├── offline_logparser
│   │   ├── evaluator
│   │   ├── layers
│   │   └── run.py
│   └── online_logparser
│       ├── evaluator
│       ├── layers
│       ├── online_run.py
│       └── utils
└── requirements.txt

Datasets

This demo adopts logpai benchmark. Logpai adopts 16 real-world log datasets ranging from distributed systems, supercomputers, operating systems, mobile systems, server applications, to standalone software including HDFS, Hadoop, Spark, Zookeeper, BGL, HPC, Thunderbird, Windows, Linux, Android, HealthApp, Apache, Proxifier, OpenSSH, OpenStack, and Mac. The above log datasets are provided by LogHub. Each dataset contains 2,000 log samples with its ground truth tagged by a rule-based log parser.

Requirements

This project can be reproducible under python v3.7. Please follow the command to install other key packages.

pip install -r requirements.txt

Quick Start

Log Parser

Step 1: Construct a dictionary

We first construct a dictionary and utilize an English corpus including 5.2 million sentences, which is accessible on the repository (or you can directly download this in this link). After splitting this corpus with the space delimiter, we collect 588,054 distinct words. Noting that not every occurred word is valid (e.g., location name), we set an occurrence threshold to filter common valid words. The dictionary finally remains only 18,653 common words. In the evaluation, we will use these 18,653 common words as the dictionary D to recognize valid words. The dictionary is stored as the file EngCorpus.pkl

It is also fine if you would like to use your own dictionary. Please carefully follow the dictionary format. For now, the program only receives the .pkl file storing the dict structure where the key is the word and the value is the occurrence.

Step 2: Just run the file

Offline version

Please execute the run.py file in the offline_logparser directory.

cd log_parser/offline_logparser
python3 run.py --dictionary=$PATH_OF_DICTIONARY
Online version

Please execute the online_run.py file in the online_logparser directory.

cd log_parser/online_logparser
python3 online_run.py --dictionary=$PATH_OF_DICTIONARY

Results

In this demo, we present benchmark results on 16 datasets. Overall, we observe that SwissLog shows almost the best PA in all datasets except the Mac logs. Even more, SwissLog can parse HDFS, BGL, Windows, Apache, OpenSSH datasets with 1.000 accuracy. The average of SwissLog is up to 0.962, which is much more than other log parsers by 10%.

dataset F1_measure Accuracy
HDFS 1.000000 1.0000
Hadoop 0.999901 0.9920
Spark 0.999978 0.9965
Zookeeper 0.999763 0.9845
BGL 0.999831 0.9695
HPC 0.992245 0.9095
Thunderbird 0.999980 0.9920
Windows 1.000000 1.0000
Linux 0.989943 0.8690
Andriod 0.995815 0.9535
HealthApp 0.993429 0.9010
Apache 1.000000 1.0000
Proxifier 0.999980 0.9900
OpenSSH 1.000000 1.0000
OpenStack 1.000000 1.0000
Mac 0.976316 0.8400
Average 0.9967 0.9623

ChangeLogs

  • 2023.05
    • Fix bugs: use original delimiters instead of space to join template tokens
    • Add parameter list extraction
  • 2022.06
    • Update the online version log parser
    • Update anomaly detection code

Acknowledges:

SwissLog is implemented based on LogPai team, we appreciate their contributions to the community.

We also thank for all the contributors to this project:

Name github
Xiaoyun Li @humanlee1011
Pengfei Chen* @chen0031
Linxiao Jing @jl0x61
Zilong He @QAZASDEDC
Guangba Yu @yuxiaoba

Reference

Please cite our ISSRE'20 paper if you find this work is helpful.

@inproceedings{li2020swisslog,
  title={SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults},
  author={Li, Xiaoyun and Chen, Pengfei and Jing, Linxiao and He, Zilong and Yu, Guangba},
  booktitle={2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)},
  pages={92--103},
  year={2020},
  organization={IEEE}
}

@article{li2022swisslog,
  title={SwissLog: Robust Anomaly Detection and Localization for Interleaved Unstructured Logs},
  author={Li, Xiaoyun and Chen, Pengfei and Jing, Linxiao and He, Zilong and Yu, Guangba},
  journal={IEEE Transactions on Dependable and Secure Computing},
  year={2022},
  publisher={IEEE}
}

About

The implementation of SwissLog in ISSRE'20 and TDSC'22

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages