About me
I am an incoming PhD student at Northeastern University, supervised by Prof. Weiyan Shi. I am currently a third-year Master’s student in the Wangxuan Institute of Computer Technology at Peking University supervised by Prof. Xiaojun Wan. Previously, I obtained my Bachelor’s degree in the School of Electronics Engineering and Computer Science at Peking University. I was a visiting student at Yale NLP Lab, supervised by Arman Cohan.
I am interested in the evaluation of NLP and LLMs, and I believe that evaluation has an interdisciplinary nature, including but not limited to human factors, machine learning, and statistics. My work has focused on the evaluation of summarization, generation, and LLMs, exploring various aspects of automatic evaluation, human evaluation, and meta-evaluation.
I believe that evaluation is crucial in current research. Without a more reliable evaluation mechanism, it is difficult to accurately determine whether an innovation is a genuine advancement or merely an illusion, especially in the context of a large amount of incremental research.
Selected Publications
( * indicates equal contribution)
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao*, Xinyu Hu*, Xunjian Yin, Jie Ruan, Xiao Pu, Xiaojun Wan
Computational Linguistics [pdf]Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Mingqi Gao*, Yixin Liu*, Xinyu Hu, Xiaojun Wan, Jonathan Bragg, Arman Cohan
Findings of NAACL 2025 (To appear) [pdf]Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao*, Xinyu Hu*, Li Lin, Xiaojun Wan
NAACL 2025 (To appear) [pdf]Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability
Xinyu Hu, Li Lin, Mingqi Gao, Xunjian Yin, Xiaojun Wan
EMNLP 2024 [pdf] [code]Are LLM-based Evaluators Confusing NLG Quality Criteria?
Xinyu Hu*, Mingqi Gao*, Sen Hu, Yang Zhang, Yicheng Chen, Teng Xu, Xiaojun Wan
ACL 2024 [pdf] [code]Is Summary Useful or Not? An Extrinsic Human Evaluation of Text Summaries on Downstream Tasks
Xiao Pu, Mingqi Gao, Xiaojun Wan
LREC-COLING 2024 [pdf]Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
Jie Ruan, Xiao Pu, Mingqi Gao, Xiaojun Wan, Yuesheng Zhu
AAAI 2024 [pdf]Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP
Anya Belz, Craig Thomson, Ehud Reiter, and 36 more authors
Fourth Workshop on Insights from Negative Results in NLP, 2023 [pdf]Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework
Mingqi Gao, Xiaojun Wan, Jia Su, Zhefeng Wang, Baoxing Huai
ACL 2023 [pdf] [code]Evaluating Factuality in Cross-lingual Summarization
Mingqi Gao*, Wenqing Wang*, Xiaojun Wan, Yuemei Xu
Findings of ACL 2023 [pdf] [code]DialSummEval: Revisiting Summarization Evaluation for Dialogues
Mingqi Gao, Xiaojun Wan
NAACL 2022 [pdf] [code]
Academic Services
Served as a reviewer for:
- Conferences: AAAI 2023, EMNLP 2023, ACL Rolling Review 2023-2024, ICLR 2025.
- Workshops: HumEval @ RANLP 2023, LLMAgents @ ICLR 2024.