Ph.D. Candidate · Computer Vision · Multimodal AI

Hao Wang (王豪)

I'm currently a Ph.D. candidate at HCP Lab, SYSU, and Pengcheng Laboratory, supervised by Prof. Xiaodan Liang and Associate Prof. Xiangyuan Lan. Before that, I received my Master's degree from CASIA, supervised by Prof. Jing Liu, and my Bachelor's degree from BJTU.

Research interests

Open-ended computer vision Multi-modal large language models Multi-modal agentic models

Open source promotes the development of technology.
I'm currently looking for collaborations, feel free to contact me via E-mail or WeChat.

profile photo

News

2026.04 Excited to share that our latest project X2SAM is launched!
2025.11 Happy to announce that our paper X-SAM is accepted by AAAI 2026.

Publications

First-author Publications

X2SAM: Any Segmentation in Images and VideosNew
Hao Wang, Limeng Qiao, Chi Zhang, Guanglu Wan, Lin Ma, Xiangyuan Lan, Xiaodan Liang,
arXiv Preprint, 2026
Project Paper Code

A unified segmentation MLLM that extends any-segmentation from images to videos, supporting conversational instructions and visual prompts through Mask Memory for temporally consistent pixel-level perception.

X-SAM: From Segment Anything to Any SegmentationAAAI 2026
Hao Wang, Limeng Qiao, Zequn Jie, Zhijian Huang, Chengjian Feng, Qingfang Zheng, Lin Ma, Xiangyuan Lan, Xiaodan Liang,
AAAI, 2026
Project Paper Code

A novel unified multimodal large language model (MLLM) framework, which extends the segmentation from segment anything to any segmentation, enhancing pixel-level perceptual understanding.

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang,
arXiv Preprint, 2024
Project Paper Code

A novel unified open-vocabulary detection method, which is pre-trained on diverse large-scale datasets with language-aware selective fusion in a unified framework.

TMANet: Temporal Memory Attention for Video Semantic SegmentationICIP 2021
Hao Wang, Weining Wang, Jing Liu,
ICIP, 2021
Paper Code

A novel self-attention and temporal memory mechanism to capture long-range temporal relations between frames, avoiding the computational cost of optical flow prediction.

Co-author Publications

WL-MSR: Watch and Listen for Multimodal Subtitle RecognitionICASSP 2023
Jiawei Liu, Hao Wang, Weining Wang, Xingjian He, Jing Liu,
ICASSP, 2023
Paper

A framework that fuses OCR and ASR information using a Transformer model with mask/crop strategies and multi-level identity embeddings to generate comprehensive video subtitles.

Experience

2025.01 - Present
Research intern in Meituan M17-MM, co-worked with Limeng Qiao, Lin Ma and Guanglu Wan.
2022.07 - 2025.01
Research intern in Meituan Vision Intelligence Department, co-worked with Zequn Jie and Lin Ma.
2021.05 - 2021.08
Application research intern in Tencent AI Platform Department.
2019.09 - 2020.07
Application project intern in Huawei Photo Processing Department.

Education

2022.09 - Present
Ph.D. student in the School of Intelligent Systems Engineering, SYSU, and Pengcheng Laboratory, co-supervised by Prof. Xiaodan Liang and Associate Prof. Xiangyuan Lan.
2019.09 - 2022.06
Master student in the School of Artificial Intelligence, UCAS, and the Institute of Automation, CAS, supervised by Prof. Jing Liu.
2015.09 - 2019.06
Bachelor student in the School of Electronic and Information Engineering, BJTU.

Services

Conference Reviewer AAAI 2026 ICCV 2023 ECCV 2024
Journal Reviewer Proceedings of the IEEE

Awards

2021.09 1st place in the 1st VSPW Challenge Workshop, ICCV 2021.

- views   |   - visitors