Publications

You can also find my articles on my Google Scholar profile.

First-Author


A Unified Approach to Interpreting Knowledge Distillation for Large Language Models via Interactions

Published in ICML 2026, 2026

We interpret knowledge distillation from a game-theoretic interaction perspective, revealing that the essence of distillation is the sparsification of interactions, and propose the CIP loss to explicitly enforce this mechanism.

Recommended citation: Qingzhuo Wang*, Ruiyang Qin*, Zhenxin Qin, Wen Shen, Zhihua Wei. (2026). "A Unified Approach to Interpreting Knowledge Distillation for Large Language Models via Interactions." ICML 2026.
Download Paper

Multilingual Safety Alignment via Self-Distillation

Published in arXiv 2026, 2026

We propose an on-policy self-distillation method for multilingual safety alignment, transferring the model’s own safety capabilities from high-resource to low-resource languages without reliance on human-annotated safety data.

Recommended citation: Ruiyang Qin*, Qingzhuo Wang*, Dongrui Liu, Qiang Li, Zhihua Wei, Wen Shen. (2026). "Multilingual Safety Alignment via Self-Distillation." arXiv 2026.
Download Paper

TME-PSR: Time-aware, Multi-interest, and Explanation Personalization for Sequential Recommendation

Published in arXiv 2026, 2026

We propose TME-PSR, a framework that integrates time-awareness, multi-interest modeling, and personalized explanations for sequential recommendation.

Recommended citation: Qingzhuo Wang, Leilei Wen, Juntao Chen, Kunyu Peng, Ruiyang Qin, Zhihua Wei, Wen Shen. (2026). "TME-PSR: Time-aware, Multi-interest, and Explanation Personalization for Sequential Recommendation." arXiv 2026.
Download Paper

Co-Author


Evaluating and Explaining Prompt Sensitivity of LLMs Using Interactions

Published in ICML 2026, 2026

We introduce game-theoretic interactions to fine-grainedly analyze prompt sensitivity of LLMs, proposing the IPS metric and uncovering that factors like SFT and scale reduce sensitivity by stabilizing low-order interactions.

Recommended citation: Ruiyang Qin, Qingzhuo Wang, Tian Wang, Zhihua Wei, Wen Shen. (2026). "Evaluating and Explaining Prompt Sensitivity of LLMs Using Interactions." ICML 2026.
Download Paper

Mitigating Action-Relation Hallucinations in LVLMs via Relation-aware Visual Enhancement

Published in ACL 2026, 2026

We define the ARS score to locate action-relation-sensitive attention heads, and propose RVE, a training-free method that enhances attention to action-relevant image regions to mitigate action-relation hallucinations in LVLMs.

Recommended citation: Zhenxin Qin, Qiang Li, Qingzhuo Wang, Ruiyang Qin, Zhihua Wei, Wen Shen. (2026). "Mitigating Action-Relation Hallucinations in LVLMs via Relation-aware Visual Enhancement." ACL 2026.

Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift

Published in arXiv 2026, 2026

We show that VLM jailbreaks are not perception failures but distinct internal states driven by image-induced representation shifts, and propose JRS-Rem to remove these shifts at inference time.

Recommended citation: Zhihua Wei, Qiang Li, Jian Ruan, Zhenxin Qin, Leilei Wen, Ruiyang Qin, Qingzhuo Wang, Dongrui Liu, Wen Shen. (2026). "Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift." arXiv 2026.
Download Paper