Blogs [2025/06] It’s Time to Support Sparse Attention in Inference Engines (In Chinese) | 是时候在推理引擎中支持稀疏注意力了(中文) [2025/02] APB: 10x Lossless Speedup for Long-Context Inference [2024/09] Locret: Enabling Long-Context Inference on Personal Devices [2024/09] Ouroboros: Speculative Decoding that is Relatively Fast and Abusolutely Accurate [2024/09] CA-LoRA: Personal Devices-Friendly Downstream Task Adapting Non-Academic Stuff [2024/10] A Guidebook of Exchange Studies for DST@THU Students (In Chinese) | 贵系交换指南 | Unfinished [2024/04] Acceleration of LLM’s Generation (In Chinese) | 大模型推理加速