My research strengthens the generalization and safety of the generative AI, spanning vision models, LLMs, and VLMs. As steps towards this goal, I work on:
- Generalizable multimodal representation learning: foundation models for table recognition (UniTable, Table Transformer, Self-supervised Pretraining), RGB-infrared fusion object tracking (DsiamMFT, SiamFT), structural health monitoring (system identification).
- Safe and robust machine learning models: LLM loss landscape (coming soon!), robust CNN design principles (#1 on RobustBench CIFAR-10), multi-task person tracking (SkeleVision), and defending LLM attacks (LLM Self Defense)
- Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models, preprint - [paper] [code coming soon]
- UniTable: Towards a Unified Framework for Table Structure Recognition via Self-Supervised Pretraining, preprint - [paper] [code]
- Self-Supervised Pre-Training for Table Structure Recognition Transformer, AAAI'24 Workshop Oral - [paper] [code]
- High-Performance Transformers for Table Structure Recognition Need Early Convolutions, NeurIPS'23 Workshop Oral - [paper] [code]
- Robust Principles: Architectural Design Principles for Adversarially Robust CNNs, BMVC'23 Best Poster Award - [paper] [code]
- SkeleVision: Towards Adversarial Resiliency of Person Tracking with Multi-Task Learning, ECCV'22 Workshop - [paper] [code]