layout |
---|
homepage |
I am a Ph.D. student at Xidian University and advised by Prof. Bo Chen. My long-term research goal is to build an explainable multi-modality cognition system in which machines can reason, make logical decisions, and generate content like humans.
My current research lies in multi-modality understanding and generation. My research areas involve:
- Image-to-text understanding: VLM, visual captioning, VQA, prompt learning in vision-and-language models
- Knowledge-aware machine learning: retrieval-augmented generation, knowledge-enhancement
- Remote sensing foundation models: vision foundation models for multi-modality remote sensing tasks
- Cross-modality image synthesizing: SAR-to-optical generation
- [July. 2024] Our paper about image captioning evaluation is accepted to ACM MM2024!
- [Mar. 2024] Our paper about memory-augmented image captioning is accepted to CVPR2024!
- [July. 2023] Our paper about multi-Label image classification is accepted to ICCV2023!
- [Mar. 2023] Our paper about zero-shot image captioning is accepted to CVPR2023!
- [Feb. 2022] Our paper about image paragraphing is accepted to IJCV2022!
{% include_relative _includes/publications.md %}
{% include_relative _includes/services.md %}