📝 Publications
📩 denotes corresponding author, 📌 denotes co-first author.

Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning
Maomao Li📌, Lijian Lin📌, Yunfei Liu, Ye Zhu, Yu Li
- We propose a novel dual-frame-guided framework for portrait video editing, which propagates fine-grained local modification from the start and end video frames.
- We propose a recursive inference strategy named Quadrant-grid Propagation (QGP), which can stably generate arbitrary-long videos.

GUAVA: Generalizable Upper Body 3D Gaussian Avatar
Dongbin Zhang, Yunfei Liu📩, Lijian Lin, Ye Zhu, Yang Li, Minghan Qin, Yu Li, Haoqian Wang📩
- ⚡️ Reconstructs 3D upper-body Gaussian avatars from single image in 0.1s
- ⏱️ Supports real-time expressive animation and novel view synthesis at 50FPS !

HRAvatar: High-Quality and Relightable Gaussian Head Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Kangjie Chen, Minghan Qin, Yu Li, Haoqian Wang
- We propose HRAvatar, a 3D Gaussian Splatting-based method that reconstructs high-fidelity, relightable 3D head avatars from monocular videos by jointly optimizing tracking, deformation, and appearance modeling.
- By leveraging learnable blendshapes, physically-based shading, and end-to-end optimization, HRAvatar significantly improves head quality and realism under novel lighting conditions.

TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
Yunfei Liu, Lei Zhu, Lijian Lin, Ye Zhu, Ailing Zhang, Yu Li
- A novel approach that achieves more accurate facial expression reconstruction by predicting a hybrid representation of faces from a single image.
- A multi-scale facial appearance tokenizer and a token-guided neural renderer to generate high-fidelity facial images. The extracted token is interpretable and highly disentangled, enabling various downstream applications.

MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions
Yunfei Liu, Lijian Lin, Fei Yu, Changyin Zhou, Yu Li
- We propose a unified system for multi-person, diverse, and high-fidelity talking portrait video generation.
- Extensive evaluations demonstrate that the proposed system produces more natural and realistic video portraits compared to previous methods.

Accelerating the training of video super-resolution models
Lijian Lin, Xintao Wang, Zhongang Qi, Ying Shan
- Our method is capable of largely speeding up training (up to speedup in wall-clock training time) without performance drop for various VSR models.

Dual semantic fusion network for video object detection
Lijian Lin📌, Haosheng Chen📌, Honglun Zhang, Jun Liang, Yu Li, Ying Shan, Hanzi Wang
- We present a dual semantic fusion network, which performs a multi-granularity semantic fusion at both frame level and instance level in a unified framework and then generates enhanced features for video object detection.
- We introduce a geometric similarity measure into the proposed dual semantic fusion network along with the widely used appearance similarity measure to alleviate the information distortion caused by noise during the fusion process.
ICCV 2025
CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation, Xiangyang Luo, Ye Zhu, Yunfei Liu, Lijian Lin, Cong Wan, Zijian Cai, Shao-Lun Huang, Yu LiAAAI 2025
AnyTalk: Multi-modal Driven Multi-domain Talking Head Generation, Yu Wang, Yunfei Liu, Fa-Ting Hong, Meng Cao, Lijian Lin, Yu LiICLR 2024
GPAvatar: Generalizable and Precise Head Avatar from Image(s), Xuangeng Chu, Yu Li, Ailing Zeng, Tianyu Yang, Lijian Lin, Yunfei Liu, Tatsuya HaradaICCV 2023
Accurate 3D Face Reconstruction with Facial Component Tokens, Tianke Zhang, Xuangeng Chu, Yunfei Liu, Lijian Lin, Zhendong Yang, et al..AAAI 2023
Tagging before alignment: Integrating multi-modal tags for video-text retrieval, Yizhen Chen, Jie Wang, Lijian Lin, Zhongang Qi, Jin Ma, Ying Shan