A Comprehensive Benchmark Suite for LLMs in Log Analysis
LogEval is a comprehensive benchmark suite designed to evaluate Large Language Models’ capabilities in log parsing, anomaly detection, fault diagnosis, and log summarization. It uses 4,000 publicly available log data entries and 15 different prompts for each task to rigorously evaluate multiple mainstream Large Language Models, demonstraing the Large Language Models’ performance in self-consistency and few-shot learning, and discuss findings related to model quantification, Chinese-English question-answering evaluation, and prompt engineering. Its evaluation results reveal the strengths and limitations of Large Language Models in log analysis tasks, providing researchers with valuable references when selecting models for such tasks. You can view the paper for more details.

@misc{cui2024logevalcomprehensivebenchmarksuite, title={LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis}, author={Tianyu Cui and Shiyu Ma and Ziang Chen and Tong Xiao and Shimin Tao and Yilun Liu and Shenglin Zhang and Duoming Lin and Changchang Liu and Yuzhe Cai and Weibin Meng and Yongqian Sun and Dan Pei}, year={2024}, eprint={2407.01896}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2407.01896}, }
If you have any questions about LogEval or are interested in working with LogEval, click submit Page, fill out your email, or send an email directly to cuitianyu@mail.nankai.edu.cn