Computer Vision · Multimodal AI · Agentic AI

Hao Wang (王豪)

I'm currently a Ph.D. candidate at HCP Lab, SYSU, and Pengcheng Laboratory, supervised by Prof. Xiaodan Liang and Associate Prof. Xiangyuan Lan. Before that, I received my Master's degree from CASIA, UCAS, supervised by Prof. Jing Liu, and my Bachelor's degree from BJTU.

Research interests

Open-ended computer vision Multi-modal large language models Multi-modal agentic models

Open source promotes the development of technology.
I will graduate in December 2026 and am actively seeking research positions in the industry. I am also open to collaborations on innovative projects. If you have suitable opportunities or are interested in collaborating, please feel free to contact me via email or WeChat.

profile photo

News

2026.04 LatestExcited to share that our latest project X2SAM is officially released!
2025.11 Happy to announce that our paper X-SAM has been accepted by AAAI 2026!
2025.08 Excited to share that our project X-SAM is officially released!
2024.07 Excited to share that our project OV-DINO is officially released!

Publications

First-author Publications

2026
X2SAM animated preview
X2SAM preview
X2SAM: Any Segmentation in Images and VideosNew
Hao Wang, Limeng Qiao, Chi Zhang, Guanglu Wan, Lin Ma, Xiangyuan Lan, Xiaodan Liang,
arXiv Preprint, 2026
Project Paper Code

A novel unified segmentation MLLM that extends any-segmentation from images to videos, supporting conversational instructions and visual prompts through Mask Memory for temporally consistent pixel-level perception.

2025
X-SAM hover preview
X-SAM preview
X-SAM: From Segment Anything to Any SegmentationAAAI 2026
Hao Wang, Limeng Qiao, Zequn Jie, Zhijian Huang, Chengjian Feng, Qingfang Zheng, Lin Ma, Xiangyuan Lan, Xiaodan Liang,
AAAI, 2026
Project Paper Code

A novel unified multimodal large language model (MLLM) framework, which extends the segmentation from segment anything to any segmentation, enhancing pixel-level perceptual understanding.

2024
OV-DINO hover preview
OV-DINO preview
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang,
arXiv Preprint, 2024
Project Paper Code

A novel unified open-vocabulary detection method, which is pre-trained on diverse large-scale datasets with language-aware selective fusion in a unified framework.

2021
TMANet preview
TMANet preview
TMANet: Temporal Memory Attention for Video Semantic SegmentationICIP 2021
Hao Wang, Weining Wang, Jing Liu,
ICIP, 2021
Paper Code

A novel self-attention and temporal memory mechanism to capture long-range temporal relations between frames, avoiding the computational cost of optical flow prediction.

Co-author Publications

2023
WL-MSR preview
WL-MSR preview
WL-MSR: Watch and Listen for Multimodal Subtitle RecognitionICASSP 2023
Jiawei Liu, Hao Wang, Weining Wang, Xingjian He, Jing Liu,
ICASSP, 2023
Paper

A framework that fuses OCR and ASR information using a Transformer model with mask/crop strategies and multi-level identity embeddings to generate comprehensive video subtitles.

Experience

2025.01 - Present
Research intern in Meituan M17-MM, co-worked with Limeng Qiao, Lin Ma and Guanglu Wan.
2022.07 - 2025.01
Research intern in Meituan Vision Intelligence Department, co-worked with Zequn Jie and Lin Ma.
2021.05 - 2021.08
Application research intern in Tencent AI Platform Department.
2019.09 - 2020.07
Application project intern in Huawei Photo Processing Department.

Education

2022.09 - Present
Ph.D. student in the School of Intelligent Systems Engineering, SYSU, and Pengcheng Laboratory, co-supervised by Prof. Xiaodan Liang and Associate Prof. Xiangyuan Lan.
2019.09 - 2022.06
Master student in the School of Artificial Intelligence, UCAS, and the Institute of Automation, CAS, supervised by Prof. Jing Liu.
2015.09 - 2019.06
Bachelor student in the School of Electronic and Information Engineering, BJTU.

Services

Conference Reviewer AAAI 2026 ICCV 2023 ECCV 2024
Journal Reviewer Proceedings of the IEEE

Awards

2021.09 1st place in the 1st VSPW Challenge Workshop, ICCV 2021.

Visitors