Junke Wang @ FDU

Junke Wang 「王君可」

I'm a 4th year Ph.D. student in school of computer science at Fudan University, supervised by Prof. Zuxuan Wu and Prof. Yu-Gang Jiang. I was very fortunate to be mentored by Dongdong Chen and Yi Jiang.

My research interest lies in computer vision and deep learning, with the emphasis on multimodal understanding and generation. I developed Omni-series models, including OmniTokenizer (one codebook for image-video joint tokenization), OmniVid (a generative framework for general video understanding), OmniTracker (a unified tracking model), and OmniVL (an image-video-language foundation model).

I'm now working on training world models, which can simulate real-world environments and interact with embodied agents.

Email: wangjk21 [at] m.fudan.edu.cn

Google Scholar / Github

(* denotes equal contribution)

Publication

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation. [Code]
Junke Wang, Yi Jiang, Zehuan Yuan, Binyue Peng, Zuxuan Wu, Yu-Gang Jiang.
NeurIPS, 2024.

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection.
Junke Wang, Zhenxin Li, Chao Zhang, Jingjing Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang.
Proceedings of IEEE, 2025.

OmniTracker: Unifying Object Tracking by Tracking-with-Detection.
Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Xiyang Dai, Lu Yuan, Yu-Gang Jiang.
TPAMI, 2025.

OmniVid: A Generative Framework for Universal Video Understanding. [Code]
Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang.
CVPR, 2024.

Look Before You Match: Instance Understanding Matters in Video Object Segmentation.
Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao,
Yujia Xie, Lu Yuan, Yu-Gang Jiang.
CVPR, 2023.

OmniVL: One Foundation Model for Image-Language and Video-Language Tasks.
Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao,
Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan.
NeurIPS, 2022.

Efficient Video Transformers with Spatial-Temporal Token Selection. [Code]
Junke Wang*, Xitong Yang*, Hengduo Li, Zuxuan Wu, Yu-Gang Jiang.
ECCV, 2022.

M2TR: Multi-modal Multi-scale Transformer for Deepfake Detection. [Code]
Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang
ICMR, 2022.

ObjectFormer for Image Manipulation Detection and Localization. [Code]
Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Yu-Gang Jiang, Ser-Nam Li.
CVPR, 2022.

FT-TDR: Frequency-guided Transformer and Top-Down Refinement Network for Blind Face Inpainting.
Junke Wang, Shaoxiang Chen, Zuxuan Wu, Yu-Gang Jiang.
TMM, 2022.

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition.
Yuqian Fu, Li Zhang, Junke Wang, Yanwei Fu, Yu-Gang Jiang.
ACM MM, 2020.

Preprints

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL. [Code]
Junke Wang, Zhi Tian, Xun Wang, Xinyu Zhang, Weilin Huang, Zuxuan Wu, Yu-Gang Jiang
Arxiv, 2025.

Perception Encoder: The best visual embeddings are not at the output of the network. [Code]
PE Team from FAIR, Meta.
Arxiv, 2025.

Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning.
Zuyao You*, Junke Wang*, Lingyu Kong, Bo He, Zuxuan Wu
Arxiv, 2025.

Projects

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning. [Dataset] [Project page]
Junke Wang*, Lingchen Meng*, Zejia Weng, Bo He, Zuxuan Wu, Yu-Gang Jiang.

We introduce a fine-grained visual instruction dataset, LVIS-INSTRUCT4V, which contains 220K visually aligned and context-aware instructions produced by prompting the powerful GPT-4V with images from LVIS.

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System. [Code]
Junke Wang, Dongdong Chen, Chong Luo, Xiyang Dai, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang.

We present our vision for multimodal and versatile video understanding and propose a prototype system, ChatVideo.

Academic Services

Conference Reviewer for CVPR, ICCV, ICML, NeurIPS, ICLR, ECCV, etal.

Journal Reviewer for TPAMI, TIP, IJCV, etal.

Awards

Academic Star in Fudan University (10 PhD students). 2025.

Fundamental Research Program for PhD students, sponsored by NSFC. 2024.

Young Elite Scientists Sponsorship Program for PhD students, sponsored by CAAI. 2024.

Intel Fellowship. 2023.

National Scholarship (Top 1%). 2022.

Outstanding graduates in Shanghai (undergrads). 2021.

First-class Scholarship (Top 5%). 2019, 2021.

Uniqlo Scholarship (33 undergrads from China). 2019.

Updated at May 2025.

Template