Skip to content
View whwu95's full-sized avatar
♥️
I may be slow to respond.
♥️
I may be slow to respond.

Highlights

  • Pro

Block or report whwu95

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
whwu95/README.md

Hi, I'm Wenhao Wu 👋

Wenhao Wu 知乎 github LinkedIn Google Scholar X

Wenhao Wu (吴文灏🇨🇳) is a Ph.D. student in the School of Computer Science at The University of Sydney, supervised by Prof. Wanli Ouyang. I have a close collaboration with Department of Computer Vision Technology (VIS) at Baidu led by Dr. Jingdong Wang (IEEE Fellow). I received my M.S.E degree from Multimedia Laboratory (MMLab@SIAT), University of Chinese Academy of Sciences, supervised by Prof. Shifeng Chen and Prof. Yu Qiao. I was also fortunate to intern/RA at MMLab@CUHK, Baidu, iQIYI, SenseTime, Samsung Research and Chinese Academy of Sciences. I am honored to be awarded the 11th Baidu PhD Fellowship (2023).

My current research interest includes Cross-Modal Learning and Video Understanding. I have published 30+ papers at the top international CV/AI conferences or journals such as CVPR/ICCV/ECCV/NeurIPS/AAAI/IJCAI/ACMMM/TPAMI/IJCV.

Wenhao Wu's GitHub stats Top Langs

🔭 Research Interest

My research interests broadly lie in the areas of Computer Vision and Deep Learning, including:

  • Cross-Modal Learning (2022-Present): Video-Language Matching, Multimodal Large Language Model (MLLM)
  • Video Foundation Model (2017-Present): Video Recognition, Efficient Video Tuning
  • Video-related Applications (2017-2022): Video Sampler, Temporal Action Detection, Anomaly Detction in Video
  • Self-supervised Learning (2021-2022): Contrastive Video Learning, Masked Video Modeling
  • Low-level Vision (2021-2022): Image Colorization, Style Transfer, Image Rescaling

🔥 News

  • 2024.09: [2/2] Dense Connecter and AMP are accepted by NeurIPS 2024! Dense Connector was cited by Apple MM1.5.
  • 2024.05: The extension of Cap4Video has been accepted by TPAMI.
  • 2024.01: I am honored to receive the 11th🎖Baidu Scholarship🎖, a prestigious fellowship awarding 200,000 RMB (about $30,000) to a select 10 PhD students worldwide in Artificial Intelligence, selected from thousands of applicants.
  • 2023.11: We release GPT4Vis , which provides a Quantitative Evaluation of GPT-4 for Visual Understanding across images, videos and point clouds, spinning on 16 popular datasets.
  • 2023.11: We release Side4Video , a Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning, which significantly reduces the training memory cost for action recognition (↓75%) and text-video retrieval (↓30%).
  • 2023.08: The extension of Text4Vis has been accepted by IJCV.
  • 2023.07: Two First-author papers (Temporal Modeling: ATM , Cross-Modal Retrieval: UA ) are accepted by ICCV2023.
  • 2023.02: Two First-author papers for video understanding (BIKE , Cap4Video ) are accepted by CVPR 2023. Cap4Video involves GPT to enhance text-video learning, is selected as a 🎉Highlight paper🎉 (Top 2.5%).
  • 2022.11: Two papers (Video Recognition: Text4Vis , Style Transfer: AdaCM) are accepted by AAAI 2023.
  • 2022.07: Three papers (Video Sampling: NSNet, TSQNet, Cross-Modal Learning: CODER) are accepted by ECCV 2022.
  • 2022.06: Our MaMiCo, a new video self-supervised learning work, is accepted by ACMMM 2022 (🎉Oral Presentation🎉).

Pinned Loading

  1. MVFNet MVFNet Public

    【AAAI'2021】MVFNet: Multi-View Fusion Network for Efficient Video Recognition

    Python 143 12

  2. Text4Vis Text4Vis Public

    【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

    Python 205 15

  3. Cap4Video Cap4Video Public

    【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

    Python 249 20

  4. ATM ATM Public

    【ICCV'2023】What Can Simple Arithmetic Operations Do for Temporal Modeling?

    Python 76 5

  5. GPT4Vis GPT4Vis Public

    GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

    Python 209 25

  6. HJYao00/DenseConnector HJYao00/DenseConnector Public

    【NeurIPS 2024】Dense Connector for MLLMs

    Python 152 6