Junke Wang @ FDU

Junke Wang 「王君可」

I'm a final-year Ph.D. student at Fudan University, supervised by Prof. Zuxuan Wu and Prof. Yu-Gang Jiang. I have interned at frontier AI labs including ByteDance Seed, Meta FAIR, etc.

My research interest lies in multimodal AI and embodied AI. Recently, I work on visual tokenizers, generative models, and world action models. Feel free to reach out if you are interested in working with me.

Email: jkwang0724 [at] gmail [dot] com

Google Scholar [Full publications] / Github

Technical Reports

* denotes equal contribution.

RepWAM: World Action Modeling with Representation Visual-Action Tokenizers. [Code] [Page]
Junke Wang, Qihang Zhang, Shuai Yang, Yiming Luo, Yujun Shen, Zuxuan Wu, Yu-Gang Jiang, Yinghao Xu.

ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations. [Page]
Junke Wang*, Xiao Wang*, Jiacheng Pan*, Xuefeng Hu*, Feng Li, et al.

SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL. [Code]
Junke Wang, Zhi Tian, Xun Wang, Xinyu Zhang, Weilin Huang, Zuxuan Wu, Yu-Gang Jiang.

Perception Encoder: The best visual embeddings are not at the output of the network. [Code]
Core contributor, FAIR perception.

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning. [Code]
Junke Wang*, Lingchen Meng, Zejia Weng, Bo He, Zuxuan Wu, Yu-Gang Jiang.

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System. [Code]
Junke Wang, Dongdong Chen, Yiweng Xie, Chong Luo, Xiyang Dai, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang.

Publications

† denotes project lead.

DeRA: Decoupled Representation Alignment for Video Tokenization.
Pengbo Guo, Junke Wang†, Zhen Xing, Chengxu Liu, Daoguo Dong, Xueming Qian, Zuxuan Wu.
European Conference on Computer Vision (ECCV), 2026.

FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding. [Code]
Yiweng Xie, Bo He, Junke Wang†, Xiangyu Zheng, Ziyi Ye, Zuxuan Wu.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2026.

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding. [Code]
Jiapeng Shi, Junke Wang†, Zuyao You, Bo He, Zuxuan Wu.
International Conferenece on Machine Learning (ICML), 2026.

OmniGen-AR: AutoRegressive Any-to-Image Generation. [Code]
Junke Wang, Xun Wang, Qiushan Guo, Peize Sun, Weilin Huang, Zuxuan Wu, Yu-Gang Jiang.
Advances in Neural Information Processing Systems (NeurIPS), 2025.

OmniTracker: Unifying Object Tracking by Tracking-with-Detection.
Junke Wang*, Zuxuan Wu*, Dongdong Chen, Chong Luo, Xiyang Dai, Lu Yuan, Yu-Gang Jiang.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.

Fighting Malicious Media Data: A Survey on Tampering Detection and Deepfake Detection.
Junke Wang, Zhenxin Li, Chao Zhang, Jingjing Chen, Zuxuan Wu, Larry S. Davis, Yu-Gang Jiang.
Proceedings of the IEEE, 2025.

OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation. [Code]
Junke Wang, Yi Jiang, Zehuan Yuan, Binyue Peng, Zuxuan Wu, Yu-Gang Jiang.
Advances in Neural Information Processing Systems (NeurIPS), 2024.

OmniVid: A Generative Framework for Universal Video Understanding. [Code]
Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

Look Before You Match: Instance Understanding Matters in Video Object Segmentation.
Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao,
Yujia Xie, Lu Yuan, Yu-Gang Jiang.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

OmniVL: One Foundation Model for Image-Language and Video-Language Tasks.
Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao,
Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan.
Advances in Neural Information Processing Systems (NeurIPS), 2022.

Efficient Video Transformers with Spatial-Temporal Token Selection. [Code]
Junke Wang*, Xitong Yang*, Hengduo Li, Zuxuan Wu, Yu-Gang Jiang.
European Conference on Computer Vision (ECCV), 2022.

ObjectFormer for Image Manipulation Detection and Localization. [Code]
Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Yu-Gang Jiang, Ser-Nam Li.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

FT-TDR: Frequency-Guided Transformer and Top-Down Refinement Network for Blind Face Inpainting.
Junke Wang, Shaoxiang Chen, Zuxuan Wu, Yu-Gang Jiang.
IEEE Transactions on Multimedia (TMM), 2022.

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection. [Code]
Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang.
International Conference on Multimedia Retrieval (ICMR), 2022.

Services

Conference Reviewer for CVPR, ICCV, ICML, NeurIPS, ICLR, ECCV, CoRL, etal.

Journal Reviewer for TPAMI, TIP, etal.

Awards

Bytedance Fellowship (20 people in China and Singapore). 2025.

CCF-CV Academic Rising Star Award (3 people in China). 2025.

National Scholarship (Top 1%). 2022, 2025.