Qingzhuo Wang

About Me

I am a first-year M.S. student in Computer Science and Technology at Tongji University, supervised by Associate Professor Wen Shen and Professor Zhihua Wei. I received my B.Eng. in Computer Science and Technology from Tongji University in 2025.

My research interests lie in LLM post-training, knowledge distillation, and agentic reinforcement learning.

Publications

First-Author

A Unified Approach to Interpreting Knowledge Distillation for Large Language Models via Interactions
Qingzhuo Wang*, Ruiyang Qin*, Zhenxin Qin, Wen Shen, Zhihua Wei
ICML 2026 [URL]
TL;DR: We interpret KD from a game-theoretic interaction perspective, revealing that the essence of distillation is the sparsification of interactions — student models selectively inherit salient simple interactions from teachers while compressing complex ones. We further propose the CIP loss to explicitly enforce this sparsification.
Multilingual Safety Alignment via Self-Distillation
Ruiyang Qin*, Qingzhuo Wang*, Dongrui Liu, Qiang Li, Zhihua Wei, Wen Shen
arXiv 2026 [URL]
TL;DR: We propose an on-policy self-distillation method that transfers the model’s own safety capabilities from high-resource languages to low-resource ones, eliminating the dependency on high-quality human-annotated safety data while improving both in-distribution and out-of-distribution multilingual safety.
TME-PSR: Time-aware, Multi-interest, and Explanation Personalization for Sequential Recommendation
Qingzhuo Wang, Leilei Wen, Juntao Chen, Kunyu Peng, Ruiyang Qin, Zhihua Wei, Wen Shen
arXiv 2026 [URL]
TL;DR: A unified framework that simultaneously introduces time-aware personalization, multi-interest modeling, and explanation personalization into sequential recommendation, addressing the lack of comprehensive personalization in existing methods.

Co-Author

Evaluating and Explaining Prompt Sensitivity of LLMs Using Interactions
Ruiyang Qin, Qingzhuo Wang, Tian Wang, Zhihua Wei, Wen Shen
ICML 2026 [URL]
TL;DR: We introduce game-theoretic interactions as a fine-grained tool to analyze prompt sensitivity, proposing the IPS metric and revealing that even when outputs stay the same, most internal interactions are unstable — and that factors like SFT and scale reduce sensitivity by stabilizing low-order interactions.
Mitigating Action-Relation Hallucinations in LVLMs via Relation-aware Visual Enhancement
Zhenxin Qin, Qiang Li, Qingzhuo Wang, Ruiyang Qin, Zhihua Wei, Wen Shen
ACL 2026
TL;DR: We define the Action-Relation Sensitivity (ARS) score to locate attention heads sensitive to action-relation changes, and propose Relation-aware Visual Enhancement (RVE), a training-free method that enhances attention to action-relevant image regions to mitigate action-relation hallucinations.
Understanding and Defending VLM Jailbreaks via Jailbreak-Related Representation Shift
Zhihua Wei, Qiang Li, Jian Ruan, Zhenxin Qin, Leilei Wen, Ruiyang Qin, Qingzhuo Wang, Dongrui Liu, Wen Shen
arXiv 2026 [URL]
TL;DR: We show that VLMs recognize harmful intent but enter a distinct jailbreak state rather than refusing, driven by a jailbreak-related representation shift induced by visual inputs. We propose JRS-Rem, a training-free defense that removes this shift at inference time.

Internships

WeQuant — Quant Researcher Mar. 2025 – Oct. 2025

Independently led the full lifecycle of a low-frequency A-share Alpha strategy, covering data ingestion/cleaning, label construction, factor engineering, model training & prediction, and order-generation backtesting.
Built multi-source factors (high-frequency aggregation, low-frequency price-volume, Wind, fundamentals, and risk-style), performed missing/lagged data repair, future-data auditing, correlation & volatility screening, and improved model stability via sample weighting, clip-scale, and early stopping. Backtest rankIC 0.16, Exceeding expectations 0.019%.
Built a reusable live-trading framework; the strategy ran continuously from Jun. to Oct. 2025. Live performance: cumulative return 26.5%, cumulative excess 6.1%, max drawdown 6.6%, Sharpe 3.7, IR 1.07.

Bilibili — Algorithm Engineer Jul. 2024 – Dec. 2024

Designed and launched a redirect recall strategy targeting users who had wishlisted or saved items, significantly improving post-engagement conversion: Orders +33.13%, GMV +113.14%, GPM +112.94%.
Trained product image and title embeddings via CNCLIP and integrated Faiss for online similarity filtering to reduce repetitive product exposure in feeds; maintained business metrics while improving diversity: avg. 4th-level category exposure 4.92 → 4.96, IP exposure 5.80 → 5.85.
Applied isotonic regression with Bayesian smoothing for ranking model calibration, improving COPC 0.91 → 1.01 and driving Orders +5%, GMV +13%, GPM +13%.