I am an incoming Ph.D. student (starting Fall 2025) in the TsinghuaNLP Group at Tsinghua University, Beijing, under the supervision of Prof. Zhiyuan Liu. My research focuses on efficient AI and machine learning systems, particularly in the area of LLM inference systems. Currently, I am working on developing efficient algorithms and system frameworks for long-context processing to enhance LLM inference speed. My research spans model compression, speculative decoding, and long-context inference acceleration, which I believe are critical to improving the efficiency of LLM systems. In the future, I aim to explore long-CoT inference acceleration as a promising direction.

I am a strong advocate of the idea that the scaling law, especially test-time scaling is a pathway to AGI. I believe efficiency is the key to scaling. I have published papers at COLM and EMNLP, with a citation count of .

  • News: We are releasing our new long-context inference acceleration method APB. 10x speedup without any performance degradation!

  • News: One paper (Ouroboros) accepted by EMNLP 2024 main.

  • News: One paper (CA-LoRA) accepted by COLM. The first COLM was super great!

Publications and Preprints

Huang, Y.*, Li, M.*, Han, X., Xiao, C., Zhao, W., Sun, A., Zhou, J., Zhou, H., Liu, Z., & Sun, M. (2025). APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs. arXiv preprint arXiv:2502.12085 (In submission).

Huang, Y., Yuan, B., Han, X., Xiao, C., & Liu, Z. (2024). Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads. arXiv preprint arXiv:2410.01805 (In submission).

Zhao, W.*, Huang, Y.*, Han, X., Xu, W., Xiao, C., Zhang, X., Fang, Y., Zhang, K., Liu, Z., & Sun, M. (2024). Ouroboros: Speculative Decoding with Large Model Enhanced Drafting. Main Conference of Empirical Methods in Natural Language Processing (EMNLP 2024 main).

Zhao, W.*, Huang, Y.*, Han, X., Liu, Z., Zhang, Z., Li, K., Chen, C., Yang, T., & Sun, M. (2024). CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices. Conference on Language Modeling (COLM 2024).

Hu, S., Tu, Y., Han, X., Cui, G., He, C., Zhao, W., … & Sun, M. (2024). MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Conference on Language Modeling (COLM 2024).

Qin, Y., Hu, S., Lin, Y., Chen, W., Ding, N., Cui, G., … & Sun, M. (2023). Tool Learning with Foundation Models. ACM Computing Surveys.

Xiao, J., Huang, Y., Hu, C., Song, S., Huang, X., & Wang, J. (2022). Time series data encoding for efficient storage: a comparative analysis in Apache IoTDB. Proceedings of the VLDB Endowment, 15(10), 2148-2160.

(Note: * indicates equal contribution.)

Research Experiences

  • 2022.07-now: Working in THUNLP, dept. of CST. Topiced efficient LLMs.
  • 2024.07-2024.09: Research Internship at HKUST, topiced LLM long-context inference, advised by Prof. Binhang Yuan.
  • 2021.10-2022.07, SRT (Student Research Training): Worked at School of Software, topiced compression algorithms in big data database, advised by Prof. Shaoxu Song.

Honors and Awards

  • Academic Excellence Award of Dept. of CST, 2023.09-2024.07
  • Academic Excellence in Research Award of Dept. of CST, 2022.09-2023.07
  • Comprehensive Scholarship (Scholarship from Prof. Zesheng Tang) of Dept. of CST, 2021.09-2022.07
  • The third prize, the 40th Tsinghua Challenge Cup
  • High School Graduation with Honor, Beijing No.9 Middle School, 2021.07

Educations

  • 2021.09-now, Tsinghua University, Beijing, China. Undergraduate Student.
  • 2024.07-2024.09, The Hong Kong University of Science and Technology, Sai Kung, Hong Kong S.A.R., China. Research Internship.
  • 2023.09-2023.12, University of Washington, Seattle, U.S.A. Exchange Student at School of Arts and Sciences.
  • 2018.09-2021.07, Beijing No.9 Middle School, Beijing, China. High school Student.

Service and Voluntary Work

  • Reviewer: ICLR 2025, ACL ARR 2023 December Cycle.

  • Maintainer: Ouroboros github repository

  • 2022 autumn - 2023 spring: Supporting education for Qinghai University, involved in The foundation of Programming (higher level) teaching. Lecture 1: Search (In Chinese). Lecture 2: Graphs and Trees (In Chinese).

Collaborators

I work closely with the MLSys guys (Xu Han (Research Assist. Prof.), Weilin Zhao (PhD. Candidate) and Ao Sun (M.Sc. Student)) in my lab, TsinghuaNLP Lab. If you want to work with me as a collaborator, please feel free to reach out. I’m also welcoming to all kinds of discussion, e.g. MLSys, AI, academic choice, daily life, etc. Here, I’d like to list the people I have collaborated with.

  • Mingye Li @Central South Univ. (2024.Summer - Now)

More

  • Recently I find taking notes with LaTeX is fun on maths or math-related cs courses, so I created this repository: CourseNotes. If you are looking for some learning materials of THU CST courses, please reach to the repository. If you are also taking notes with LaTeX, just contact me!

  • I was an exchange student at University of Washington in 2023 Autumn. The experience was amazing of being an oversea exchange student. If you want to exchange at UW or Tsinghua and want to talk to someone, I am always pleasure to chat. (TL;DR: If you want to exchange at UW, you must be nominated by your home institution; for Tsinghua Univerisity, exchange students cannot be Chinese citizens. For other things you are not sure, just ask me!)

  • I speak Chinese and English and I am recently learning German (yes, I want to write Deutsch at first then I realized I’m writing English). You can contact me freely within Chinese or English. German… probably several years after then I could verstehe was du geschrieben :)