website page counter Xiuye's Home

XIUYE GU

Hi, I'm a researcher at Google Research. Before that, I was a Google AI Resident, where I was lucky to be advised by Yin Cui and Tsung-Yi Lin and worked on open-vocabulary visual recognition. Previously, I obtained my M.S. in Computer Science at Stanford University, and my B.E. in Computer Science at Zhejiang University. I spent half a year (9/2018-3/2019) and summer 2016 working happily with Prof. Yong Jae Lee at UC Davis. In summer 2018, I interned at TuSimple and worked on 3D point cloud scene flow estimation with Dr. Panqu Wang and Yijie Wang. At Zhejiang University, I was advised by Prof. Deng Cai working on content based image retrieval and computer vision.

Currently, I'm interested into video generation and open-vocabulary visual recognition.


Publications

Dan Kondratyuk*, Lijun Yu*, Xiuye Gu*, José Lezama*, Jonathan Huang, Rachel Hornung, Hartwig Adam, Hassan Akbari, Yair Alon, Vighnesh Birodkar, Yong Cheng, Ming-Chang Chiu, Josh Dillon, Irfan Essa, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, David Ross, Grant Schindler, Mikhail Sirotenko, Kihyuk Sohn, Krishna Somandepalli, Huisheng Wang, Jimmy Yan, Ming-Hsuan Yang, Xuan Yang, Bryan Seybold, Lu Jiang.
Videopoet: A large language model for zero-shot video generation.
arXiv:2312.14125, 2023.


[paper]   [website with demos]   [blog]

Shuyang Sun, Runjia Li, Philip Torr, Xiuye Gu*, Siyang Li*.
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor.
arXiv:2312.07661, 2023.


[paper]   [website]

Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama.
Photorealistic video generation with diffusion models (W.A.L.T).
arXiv:2312.06662, 2023.


[paper]   [website]   [video demo]   [more samples]

Lijun Yu, José Lezama, Nitesh B Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A Ross, Lu Jiang.
Language Model Beats Diffusion--Tokenizer is Key to Visual Generation.
ICLR, 2024.


[paper]   [website]

Xiuye Gu, Yin Cui, Jonathan Huang, Abdullah Rashwan, Xuan Yang, Xingyi Zhou, Golnaz Ghiasi, Weicheng Kuo, Huizhong Chen, Liang-Chieh Chen, David A Ross.
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model.
NeurIPS, 2023.


[paper]   [Objects365 instance segmentation dataset]   [poster]

Xuan Yang, Liangzhe Yuan, Kimberly Wilber, Astuti Sharma, Xiuye Gu, Siyuan Qiao, Stephanie Debats, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Liang-Chieh Chen.
PolyMaX: General Dense Prediction with Mask Transformer.
WACV, 2024.


[paper]   [supp]

James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Jeremiah Zhe Liu, Xiuye Gu, Yin Cui, Dustin Tran, Balaji Lakshminarayanan.
A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models.
ICML, 2023.


[paper]

Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova.
F-vlm: Open-vocabulary object detection upon frozen vision and language models.
ICLR, 2023.


[paper]   [code]   [website]   [blog]

Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin.
Scaling open-vocabulary image segmentation with image-level labels (OpenSeg).
ECCV, 2022.


[paper]   [code]   [colab demo]   [poster]

Xiuye Gu, Tsung-Yi Lin, Weicheng Kuo, Yin Cui.
Open-vocabulary object detection via vision and language knowledge distillation (ViLD).
ICLR, 2022.


[paper]   [code]   [colab demo]

Xiuye Gu, Weixin Luo, Michael S. Ryoo, Yong Jae Lee.
Password-conditioned Anonymization and Deanonymization with Face Identity Transformers.
ECCV, 2020.


[paper]   [code]   [demo video]
[1-min presentation]   [10-min presentation]

Xiuye Gu, Yijie Wang, Chongruo Wu, Panqu Wang, Yong Jae Lee.
HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-scale Point Clouds.
CVPR, 2019.


[paper]   [supp]   [code]   [poster]   [video]

Maheen Rashid, Xiuye Gu, Yong Jae Lee.
Interspecies Knowledge Transfer for Facial Keypoint Detection.
CVPR, 2017.


[paper]   [code]   [anotation tool]

Deng Cai, Xiuye Gu, Chaoqi Wang.
A Revisit on Deep Hashing for Large-scale Content Based Image Retrieval.
arXiv:1711.06016, 2017.


[paper]

Community Services


Projects

Explore Deep Graph Generation


Course project for CS224W: Machine Learning with Graphs.
Explore deep graph generation from two directions:
1) use CNN GANs to model the whole adjacency matrix directly after sorting the nodes;
2) build upon the very recent Graph Recurrent Attention Networks (GRANs), proposed a graph completeness judger network and improved its attention mechanism.

[poster]   [report]   [code]

Simulating and Rendering Explosion


Course project for CS348B: Image Synthesis Techniques.
Extended PBRT to support emissive volumes and openVDB input format.
Self-studied and implemented blackbody radiation, closed-form and delta tracking.
Simulated explosion and flying rubbles using Blender.

[report]   [video]   [code]

License Plate Detection and Character Segmentation


A robust iterative license plate character segmentation algorithm and a license detection system with robust skew and slant correction to improve character segmentation.

[character segmentation report]   [detection report]
[detection survey]   [segmentation survey]   [recognition survey]
[code]   (reports and surveys are in Chinese)

Deep Stereo Matching


Course project for CS231A: Computer Vision, From 3D Reconstruction to Recognition.

[code]   [report]   [supp]