Use python implement the paper Spell: Streaming Parsing of System Event Logs from Min Du, Feifei Li @University of Utah.
This implement is refactored and enhancement version of logpai's logparser.
pip install spellpypython example.pyAfter executing the line above, the result folder will be created and you will see two files: structured.csv and templates.csv.
*_main_structured.csv
| ... | Level | Component | Content | EventId | EventTemplate | ParameterList |
|---|---|---|---|---|---|---|
| ... | INFO | dfs.DataNode$DataXceiver | Receiving block blk_-1608999687919862906 src: /10.250.19.102:54106 dest: /10.250.19.102:50010 | f57d69cf | Receiving block blk_-1608999687919862906 src <*> <*> dest <*> 50010 | ['/10.250.19.102:54106', '/10.250.19.102'] |
| ... | INFO | dfs.DataNode$PacketResponder | PacketResponder 1 for block blk_-1608999687919862906 terminating | 7b619377 | PacketResponder <*> for block blk_-1608999687919862906 terminating | ['1'] |
| ... | INFO | dfs.DataNode$DataXceiver | Receiving block blk_-1608999687919862906 src: /10.250.10.6:40524 dest: /10.250.10.6:50010 | f57d69cf | Receiving block blk_-1608999687919862906 src <*> <*> dest <*> 50010 | ['/10.250.10.6:40524', '/10.250.10.6'] |
*_main_templates.csv
| EventId | EventTemplate | Occurrences |
|---|---|---|
| 6af214fd | Receiving block <*> src <*> <*> dest <*> 50010 | 5 |
| 26ae4ce0 | BLOCK* NameSystem.allocateBlock <*> | 2 |
| dc2c74b7 | PacketResponder <*> for block <*> terminating | 4 |
As you see, there have three test log files. Use for loop to simulate (nearly) streaming situation.
In the result folder, there are _main_*.csv files and *.log_*.csv files. The _main_*.csv files will keep appending the new coming log when it has been parse.
We can use graphviz to visualize the tree-structured of the parser.
python plot_tree.py
sh test_coverage.sh| Name | Stmts | Miss | Cover |
|---|---|---|---|
| spell/init.py | 3 | 0 | 100% |
| spellpy/spell.py | 319 | 174 | 45% |
| test/test_spellpu.py | 71 | 1 | 98% |
| TOTAL | 393 | 175 | 55% |
-
This tree structure is generate by mac terminal tool
tree& copy paste it toREADME.md.tree -I "__pycache__|tmp.*" >> tmp.txt
.
├── LICENSE
├── MANIFEST.in
├── README.md
├── data
│ ├── empty_log.log
│ ├── tiny_hdfs_1.log
│ ├── tiny_hdfs_2.log
│ └── tiny_hdfs_3.log
├── example.py
├── plot
│ ├── tree.gv
│ └── tree.gv.png
├── plot_tree.py
├── requirements.txt
├── setup.cfg
├── setup.py
├── spellpy
│ ├── __init__.py
│ └── spell.py
└── tests
├── __init__.py
├── test_data.log
└── test_spellpy.py