Portrait of Chenglin Li

Chenglin Li

李成林

Ph.D. Student

Zhejiang University & Shanghai Innovation Institute

I work on large language models, long-video understanding, multimodal evaluation, and agentic llm systems.

About

I am a Ph.D. student jointly affiliated with Zhejiang University and Shanghai Innovation Institute. My research focuses on multimodal large language models, long-video understanding, and agentic reasoning.

I am advised by Jiaqi Wang and Yin Zhang.

Education

Zhejiang University & Shanghai Innovation Institute

Ph.D. in Artificial Intelligence, College of Computer Science and Technology

Zhejiang University

M.S. in Artificial Intelligence, School of Software Technology

Research areas: large language models, video understanding, and agents

Northeastern University

B.S. in Computer Science and Technology

GPA: 4.05/5.00, Rank: 13/221, CET-4: 561, CET-6: 479

Publications [ Google Scholar]

  1. CVPR 2026 Findings First Author
  2. ACL 2026 Main First Author
  3. EMNLP 2024 Findings First Author
  4. EMNLP 2024 Findings First Author
  5. Under Review First Author
  6. Under Review First Author
  7. EMNLP 2024 Main Third Author
  8. NeurIPS 2025 Main Third Author

Research Experience

Agentic VideoLLMs for Long Video Understanding

Developed VideoThinker, a VideoLLM framework that turns long-video understanding into an agentic retrieval-and-zoom reasoning problem.

  • Introduced a unified retrieval-and-zoom mechanism for temporal localization and fine-grained evidence inspection.
  • Built synthetic tool-use supervision for VideoLLMs and improved MLVU and LVBench by 6.8% - 10.6%.

Adaptive Program Reasoning for Long Video Understanding

Studied how VideoLLMs can combine direct multimodal reasoning with program-based tool use for complex video queries.

  • Designed an adaptive routing strategy that selects fast VideoLLM reasoning or slow executable workflows according to query difficulty.
  • Built a code-based workflow planning pipeline that unifies model inference with external tool execution.

Benchmarking Advanced Multimodal Video Cognition

Built a controllable benchmark for evaluating symbolic, abstract, and high-level cognitive reasoning in video understanding.

  • Constructed an automated pipeline for scalable benchmark generation and task synthesis.
  • Covered object tracking, action perception, spatio-temporal reasoning, and cross-modal understanding with controllable difficulty.

Instruction Evolution with Monte Carlo Tree Search

Studied instruction data synthesis with tree search to improve data quality for low-resource alignment.

  • Applied Monte Carlo Tree Search to explore and evaluate prompt rewriting actions.
  • Generated higher-quality synthetic instructions and obtained consistent gains on OpenLLM, Alpaca-eval, and Wizard-eval.

Distilling Reasoning Ability into Small Language Models

Proposed a mixed distillation framework for transferring reasoning supervision from strong LLMs to smaller, deployable language models.

  • Used PoT and CoT signals as complementary supervision for numerical reasoning tasks.
  • Combined filtered reasoning traces with mixed-task distillation and improved Llama-7B beyond some GPT-3.5-turbo baselines.

Industry Experience

JD Explore Research

Worked on the full research pipeline for multimodal foundation models, including data, training, and evaluation.

ByteDance CapCut

Worked on video understanding systems for user intent modeling in video creation scenarios.

  • Built an end-to-end multimodal prompting pipeline over video, image, and text inputs.
  • Reduced hallucinations with decoding strategies and improved efficiency through visual token pruning.

Alibaba Quark

Worked on experience-aware content modeling for search scenarios requiring subjective or experiential evidence.

  • Led the project from investigation and sample mining to model training, achieving 75% accuracy with a BERT-based model.

Alibaba AiCheng

Worked on post-training and evaluation for enterprise dialogue models, with a focus on data quality and efficient deployment.

  • Improved data quality with reward modeling and diversity control, enabling Qwen-14B to match Qwen-70B on internal evaluation.
  • Built subjective and objective evaluation sets for enterprise dialogue tasks and supported efficient deployment.