Yuxiang Huang (Shawn Huang)

Yuxiang Huang (Shawn Huang)

1st year Ph.D. Student @TsinghuaNLP

Blogs

[2026/01] NOSA: Native and Offloadable Sparse Attention (In Chinese) | NOSA: 原生可卸载的稀疏注意力（中文）

[2025/06] It’s Time to Support Sparse Attention in Inference Engines (In Chinese) | 是时候在推理引擎中支持稀疏注意力了（中文）

[2025/02] APB: 10x Lossless Speedup for Long-Context Inference

[2024/09] Locret: Enabling Long-Context Inference on Personal Devices

[2024/09] Ouroboros: Speculative Decoding that is Relatively Fast and Abusolutely Accurate

[2024/09] CA-LoRA: Personal Devices-Friendly Downstream Task Adapting

Non-Academic Stuff

[2024/10] A Guidebook of Exchange Studies for DST@THU Students (In Chinese) | 贵系交换指南 | Unfinished

[2024/04] Acceleration of LLM’s Generation (In Chinese) | 大模型推理加速