CVPR2023 Top Open Papers

Best Papers

Type	Title	Homepage	Code
Best Paper	Planning-oriented Autonomous Driving	Link	Github
Best Paper	Visual Programming: Compositional visual reasoning without training	Link	Github
Best Paper Honorable Mention	DynIBaR: Neural Dynamic Image-Based Rendering	Link	Github
Best Student Paper	3D Registration with Maximal Cliques	Link	Github
Best Student Paper Honorable Mention	DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation	Link	Github

Top CVPR2023 Papers with Code

The following CVPR2023 paper information is extracted from the following web page and saved in the papers_info.json file.

https://openaccess.thecvf.com/CVPR2023?day=all
https://cvpr2023.thecvf.com/Conferences/2023/AcceptedPapers

If you find any errors in the paper information or missing Githubs, you are welcome to modify the corresponding content of the papers_info_refined.json file and submit a Pull Request.

Title	Paper	Code
YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors	Link	Github
From Images to Textual Prompts: Zero-Shot Visual Question Answering With Frozen Large Language Models	Link	Github
Co-Training 2L Submodels for Visual Recognition	Link	Github
Token Turing Machines	Link	Github
How Can Objects Help Action Recognition?	Link	Github
GINA-3D: Learning To Generate Implicit Neural Assets in the Wild	Link	Github
Images Speak in Images: A Generalist Painter for In-Context Visual Learning	Link	Github
Planning-Oriented Autonomous Driving	Link	Github
Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks	Link	Github
InternImage: Exploring Large-Scale Vision Foundation Models With Deformable Convolutions	Link	Github
DepGraph: Towards Any Structural Pruning	Link	Github
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale	Link	Github
Universal Instance Perception As Object Discovery and Retrieval	Link	Github
PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360°	Link	Github
EfficientViT: Memory Efficient Vision Transformer With Cascaded Group Attention	Link	Github
Unifying Vision, Text, and Layout for Universal Document Processing	Link	Github
ConvNeXt V2: Co-Designing and Scaling ConvNets With Masked Autoencoders	Link	Github
FlexiViT: One Model for All Patch Sizes	Link	Github
CLIPPO: Image-and-Language Understanding From Pixels Only	Link	Github
Neighborhood Attention Transformer	Link	Github
SeqTrack: Sequence to Sequence Learning for Visual Object Tracking	Link	Github
Deep Learning of Partial Graph Matching via Differentiable Top-K	Link	Github
Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation	Link	Github
Paint by Example: Exemplar-Based Image Editing With Diffusion Models	Link	Github
Cut and Learn for Unsupervised Object Detection and Instance Segmentation	Link	Github
Masked Image Modeling With Local Multi-Scale Reconstruction	Link	Github
PAniC-3D: Stylized Single-View 3D Reconstruction From Portraits of Anime Characters	Link	Github
Learning To Generate Image Embeddings With User-Level Differential Privacy	Link	Github
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures	Link	Github
InstMove: Instance Motion for Object-Centric Video Segmentation	Link	Github
Activating More Pixels in Image Super-Resolution Transformer	Link	Github
VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking	Link	Github
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking	Link	Github
OpenGait: Revisiting Gait Recognition Towards Better Practicality	Link	Github
Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks	Link	Github
All Are Worth Words: A ViT Backbone for Diffusion Models	Link	Github
Shape, Pose, and Appearance From a Single Image via Bootstrapped Radiance Field Inversion	Link	Github
MAGE: MAsked Generative Encoder To Unify Representation Learning and Image Synthesis	Link	Github
Mask-Free Video Instance Segmentation	Link	Github
Compressing Volumetric Radiance Fields to 1 MB	Link	Github
PIDNet: A Real-Time Semantic Segmentation Network Inspired by PID Controllers	Link	Github
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network	Link	Github
FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction	Link	Github
Detecting Everything in the Open World: Towards Universal Object Detection	Link	Github
Temporal Attention Unit: Towards Efficient Spatiotemporal Predictive Learning	Link	Github
Cross-Domain Image Captioning With Discriminative Finetuning	Link	Github
NeuralLift-360: Lifting an In-the-Wild 2D Photo to a 3D Object With 360° Views	Link	Github
Scaling Language-Image Pre-Training via Masking	Link	Github
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation	Link	Github
RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation	Link	Github
MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors	Link	Github
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing	Link	Github
BiFormer: Vision Transformer With Bi-Level Routing Attention	Link	Github
All in One: Exploring Unified Video-Language Pre-Training	Link	Github
Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation	Link	Github
Wavelet Diffusion Models Are Fast and Scalable Image Generators	Link	Github
Efficient and Explicit Modelling of Image Hierarchies for Image Restoration	Link	Github
3D Registration With Maximal Cliques	Link	Github
Prompting Large Language Models With Answer Heuristics for Knowledge-Based Visual Question Answering	Link	Github
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks	Link	Github
DSVT: Dynamic Sparse Voxel Transformer With Rotated Sets	Link	Github
BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points	Link	Github
EDICT: Exact Diffusion Inversion via Coupled Transformations	Link	Github
Disentangling Writer and Character Styles for Handwriting Generation	Link	Github
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation	Link	Github
Conditional Image-to-Video Generation With Latent Flow Diffusion Models	Link	Github
Inversion-Based Style Transfer With Diffusion Models	Link	Github
Recurrent Vision Transformers for Object Detection With Event Cameras	Link	Github
Dense Distinct Query for End-to-End Object Detection	Link	Github
Neural Video Compression With Diverse Contexts	Link	Github
Spherical Transformer for LiDAR-Based 3D Recognition	Link	Github
You Only Segment Once: Towards Real-Time Panoptic Segmentation	Link	Github
Referring Image Matting	Link	Github
VideoMAE V2: Scaling Video Masked Autoencoders With Dual Masking	Link	Github
Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation	Link	Github
NIKI: Neural Inverse Kinematics With Invertible Neural Networks for 3D Human Pose and Shape Estimation	Link	Github
High-Fidelity 3D GAN Inversion by Pseudo-Multi-View Optimization	Link	Github
GeoLayoutLM: Geometric Pre-Training for Visual Information Extraction	Link	Github
OTAvatar: One-Shot Talking Face Avatar With Controllable Tri-Plane Rendering	Link	Github
PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces	Link	Github
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation	Link	Github
Robust Model-Based Face Reconstruction Through Weakly-Supervised Outlier Segmentation	Link	Github
LargeKernel3D: Scaling Up Kernels in 3D Sparse CNNs	Link	Github
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation	Link	Github
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis	Link	Github
Learning a Sparse Transformer Network for Effective Image Deraining	Link	Github
Visual Prompt Multi-Modal Tracking	Link	Github
DeepSolo: Let Transformer Decoder With Explicit Points Solo for Text Spotting	Link	Github
HumanBench: Towards General Human-Centric Perception With Projector Assisted Pretraining	Link	Github
Learning Visual Representations via Language-Guided Sampling	Link	Github
GP-VTON: Towards General Purpose Virtual Try-On via Collaborative Local-Flow Global-Parsing Learning	Link	Github
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection	Link	Github
NeRF-RPN: A General Framework for Object Detection in NeRFs	Link	Github
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation	Link	Github
Position-Guided Text Prompt for Vision-Language Pre-Training	Link	Github
Query-Centric Trajectory Prediction	Link	Github
Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need	Link	Github
LoGoNet: Towards Accurate 3D Object Detection With Local-to-Global Cross-Modal Fusion	Link	Github
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training	Link	Github
BEVHeight: A Robust Framework for Vision-Based Roadside 3D Object Detection	Link	Github
SimpleNet: A Simple Network for Image Anomaly Detection and Localization	Link	Github
Think Twice Before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving	Link	Github
Slide-Transformer: Hierarchical Vision Transformer With Local Self-Attention	Link	Github
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion	Link	Github
Standing Between Past and Future: Spatio-Temporal Modeling for Multi-Camera 3D Multi-Object Tracking	Link	Github
Identity-Preserving Talking Face Generation With Landmark and Appearance Priors	Link	Github
LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation	Link	Github
Delving Into Shape-Aware Zero-Shot Semantic Segmentation	Link	Github
Aligning Bag of Regions for Open-Vocabulary Object Detection	Link	Github
ZegCLIP: Towards Adapting CLIP for Zero-Shot Semantic Segmentation	Link	Github
MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers	Link	Github
Data-Driven Feature Tracking for Event Cameras	Link	Github
FeatureBooster: Boosting Feature Descriptors With a Lightweight Neural Network	Link	Github
Omni Aggregation Networks for Lightweight Image Super-Resolution	Link	Github
Shifted Diffusion for Text-to-Image Generation	Link	Github
A Generalized Framework for Video Instance Segmentation	Link	Github
Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild	Link	Github
LANA: A Language-Capable Navigator for Instruction Following and Generation	Link	Github
Learning Generative Structure Prior for Blind Text Image Super-Resolution	Link	Github
Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement	Link	Github
TriDet: Temporal Action Detection With Relative Boundary Modeling	Link	Github
GD-MAE: Generative Decoder for MAE Pre-Training on LiDAR Point Clouds	Link	Github
Fix the Noise: Disentangling Source Feature for Controllable Domain Translation	Link	Github
Multimodal Prompting With Missing Modalities for Visual Recognition	Link	Github
Temporal Consistent 3D LiDAR Representation Learning for Semantic Perception in Autonomous Driving	Link	Github
Enhanced Training of Query-Based Object Detection via Selective Query Recollection	Link	Github
Data-Efficient Large Scale Place Recognition With Graded Similarity Supervision	Link	Github
Super-Resolution Neural Operator	Link	Github
Revisiting Rotation Averaging: Uncertainties and Robust Losses	Link	Github
PlaneDepth: Self-Supervised Depth Estimation via Orthogonal Planes	Link	Github
Human Guided Ground-Truth Generation for Realistic Image Super-Resolution	Link	Github
DynamicDet: A Unified Dynamic Architecture for Object Detection	Link	Github
FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation	Link	Github
HelixSurf: A Robust and Efficient Neural Implicit Surface Learning of Indoor Scenes With Iterative Intertwined Regularization	Link	Github
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information	Link	Github
UniHCP: A Unified Model for Human-Centric Perceptions	Link	Github
NeuFace: Realistic 3D Neural Face Rendering From Multi-View Images	Link	Github
Adaptive Assignment for Geometry Aware Local Feature Matching	Link	Github
Learning To Generate Text-Grounded Mask for Open-World Semantic Segmentation From Only Image-Text Pairs	Link	Github
CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation	Link	Github
Anchor3DLane: Learning To Regress 3D Anchors for Monocular 3D Lane Detection	Link	Github
Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision	Link	Github
CLIP2Protect: Protecting Facial Privacy Using Text-Guided Makeup via Adversarial Latent Search	Link	Github
DNF: Decouple and Feedback Network for Seeing in the Dark	Link	Github
Curricular Contrastive Regularization for Physics-Aware Single Image Dehazing	Link	Github
Scalable, Detailed and Mask-Free Universal Photometric Stereo	Link	Github
Learning To Dub Movies via Hierarchical Prosody Models	Link	Github
BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation	Link	Github
Generic-to-Specific Distillation of Masked Autoencoders	Link	Github
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding	Link	Github
Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning	Link	Github
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?	Link	Github
Unifying Short and Long-Term Tracking With Graph Hierarchies	Link	Github
Hierarchical Fine-Grained Image Forgery Detection and Localization	Link	Github
CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution	Link	Github
Vita-CLIP: Video and Text Adaptive CLIP via Multimodal Prompting	Link	Github
Masked Image Training for Generalizable Deep Image Denoising	Link	Github
CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP	Link	Github
Efficient Frequency Domain-Based Transformers for High-Quality Image Deblurring	Link	Github
Multimodal Industrial Anomaly Detection via Hybrid Fusion	Link	Github
LinK: Linear Kernel for LiDAR-Based 3D Perception	Link	Github
V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting	Link	Github
Meta Architecture for Point Cloud Analysis	Link	Github
CF-Font: Content Fusion for Few-Shot Font Generation	Link	Github
ViTs for SITS: Vision Transformers for Satellite Image Time Series	Link	Github
ISBNet: A 3D Point Cloud Instance Segmentation Network With Instance-Aware Sampling and Box-Aware Dynamic Convolution	Link	Github
A Light Weight Model for Active Speaker Detection	Link	Github
Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark	Link	Github
DeltaEdit: Exploring Text-Free Training for Text-Driven Image Manipulation	Link	Github
Understanding Imbalanced Semantic Segmentation Through Neural Collapse	Link	Github
MP-Former: Mask-Piloted Transformer for Image Segmentation	Link	Github
Hierarchical Dense Correlation Distillation for Few-Shot Segmentation	Link	Github
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection	Link	Github
IFSeg: Image-Free Semantic Segmentation via Vision-Language Model	Link	Github
AutoFocusFormer: Image Segmentation off the Grid	Link	Github
EqMotion: Equivariant Multi-Agent Motion Prediction With Invariant Interaction Reasoning	Link	Github
GrowSP: Unsupervised Semantic Segmentation of 3D Point Clouds	Link	Github
Solving 3D Inverse Problems Using Pre-Trained 2D Diffusion Models	Link	Github
Finetune Like You Pretrain: Improved Finetuning of Zero-Shot Vision Models	Link	Github
Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation	Link	Github
Two-View Geometry Scoring Without Correspondences	Link	Github
CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability	Link	Github
Learning Semantic Relationship Among Instances for Image-Text Matching	Link	Github
LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation	Link	Github
Robust Mean Teacher for Continual and Gradual Test-Time Adaptation	Link	Github
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning With Masked Autoencoders	Link	Github
Directional Connectivity-Based Segmentation of Medical Images	Link	Github
Zero-Shot Referring Image Segmentation With Global-Local Context Features	Link	Github
Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank	Link	Github
Dynamic Focus-Aware Positional Queries for Semantic Segmentation	Link	Github
Vision Transformer With Super Token Sampling	Link	Github
Sampling Is Matter: Point-Guided 3D Human Mesh Reconstruction	Link	Github
3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds	Link	Github
PROB: Probabilistic Objectness for Open World Object Detection	Link	Github
Benchmarking Robustness of 3D Object Detection to Common Corruptions	Link	Github
Adaptive Sparse Convolutional Networks With Global Context Enhancement for Faster Object Detection on Drone Images	Link	Github
MARLIN: Masked Autoencoder for Facial Video Representation LearnINg	Link	Github
ConZIC: Controllable Zero-Shot Image Captioning by Sampling-Based Polishing	Link	Github
Interactive and Explainable Region-Guided Radiology Report Generation	Link	Github
SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection	Link	Github
Real-Time 6K Image Rescaling With Rate-Distortion Optimization	Link	Github
Revisiting Temporal Modeling for CLIP-Based Image-to-Video Knowledge Transferring	Link	Github
Frequency-Modulated Point Cloud Rendering With Easy Editing	Link	Github
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning	Link	Github
BBDM: Image-to-Image Translation With Brownian Bridge Diffusion Models	Link	Github
LAVENDER: Unifying Video-Language Understanding As Masked Language Modeling	Link	Github
DynaFed: Tackling Client Data Heterogeneity With Global Dynamics	Link	Github
Frame Flexible Network	Link	Github
GeoMAE: Masked Geometric Target Prediction for Self-Supervised Point Cloud Pre-Training	Link	Github
Collaboration Helps Camera Overtake LiDAR in 3D Detection	Link	Github
CODA-Prompt: COntinual Decomposed Attention-Based Prompting for Rehearsal-Free Continual Learning	Link	Github
RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving	Link	Github
Generalized Relation Modeling for Transformer Tracking	Link	Github
WildLight: In-the-Wild Inverse Rendering With a Flashlight	Link	Github
Equiangular Basis Vectors	Link	Github
DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium	Link	Github
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-Identification	Link	Github
Diversity-Aware Meta Visual Prompting	Link	Github
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training	Link	Github
Texts as Images in Prompt Tuning for Multi-Label Image Recognition	Link	Github
PointConvFormer: Revenge of the Point-Based Convolution	Link	Github
Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection	Link	Github
RILS: Masked Visual Reconstruction in Language Semantic Space	Link	Github
Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization	Link	Github
StyleRes: Transforming the Residuals for Real Image Editing With StyleGAN	Link	Github
SmallCap: Lightweight Image Captioning Prompted With Retrieval Augmentation	Link	Github
Learning With Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning	Link	Github
Handwritten Text Generation From Visual Archetypes	Link	Github
Post-Training Quantization on Diffusion Models	Link	Github
DPF: Learning Dense Prediction Fields With Weak Supervision	Link	Github
OSRT: Omnidirectional Image Super-Resolution With Distortion-Aware Transformer	Link	Github
SCPNet: Semantic Scene Completion on Point Cloud	Link	Github
Dynamic Graph Enhanced Contrastive Learning for Chest X-Ray Report Generation	Link	Github
Novel Class Discovery for 3D Point Cloud Semantic Segmentation	Link	Github
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation With Cross-Scale Distortion Awareness	Link	Github
M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis	Link	Github
Masked and Adaptive Transformer for Exemplar Based Image Translation	Link	Github
DCFace: Synthetic Face Generation With Dual Condition Diffusion Model	Link	Github
T-SEA: Transfer-Based Self-Ensemble Attack on Object Detection	Link	Github
SMPConv: Self-Moving Point Representations for Continuous Convolution	Link	Github
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution	Link	Github
A Large-Scale Homography Benchmark	Link	Github
GeoMVSNet: Learning Multi-View Stereo With Geometry Perception	Link	Github
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression	Link	Github
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks	Link	Github
Learning Transferable Spatiotemporal Representations From Natural Script Knowledge	Link	Github
Rethinking Federated Learning With Domain Shift: A Prototype View	Link	Github
Visual-Language Prompt Tuning With Knowledge-Guided Context Optimization	Link	Github
Dynamic Coarse-To-Fine Learning for Oriented Tiny Object Detection	Link	Github
Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning	Link	Github
Joint Video Multi-Frame Interpolation and Deblurring Under Unknown Exposure Time	Link	Github
Guiding Pseudo-Labels With Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation	Link	Github
Attribute-Preserving Face Dataset Anonymization via Latent Code Optimization	Link	Github
Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process	Link	Github
A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation From a Single RGB Image	Link	Github
DexArt: Benchmarking Generalizable Dexterous Manipulation With Articulated Objects	Link	Github
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition With Pre-Trained Vision-Language Models	Link	Github
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation	Link	Github
Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation	Link	Github
Visibility Constrained Wide-Band Illumination Spectrum Design for Seeing-in-the-Dark	Link	Github
VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud	Link	Github
Sharpness-Aware Gradient Matching for Domain Generalization	Link	Github
Deep Graph-Based Spatial Consistency for Robust Non-Rigid Point Cloud Registration	Link	Github
Decoupled Multimodal Distilling for Emotion Recognition	Link	Github
Open-Vocabulary Point-Cloud Object Detection Without 3D Annotation	Link	Github
An Image Quality Assessment Dataset for Portraits	Link	Github
Leveraging Hidden Positives for Unsupervised Semantic Segmentation	Link	Github
Semantic-Conditional Diffusion Networks for Image Captioning	Link	Github
STMixer: A One-Stage Sparse Action Detector	Link	Github
Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset	Link	Github
Joint Visual Grounding and Tracking With Natural Language Specification	Link	Github
Where Is My Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization	Link	Github
Power Bundle Adjustment for Large-Scale 3D Reconstruction	Link	Github
Rethinking Domain Generalization for Face Anti-Spoofing: Separability and Alignment	Link	Github
A Unified Pyramid Recurrent Network for Video Frame Interpolation	Link	Github
Revisiting Reverse Distillation for Anomaly Detection	Link	Github
SOOD: Towards Semi-Supervised Oriented Object Detection	Link	Github
POEM: Reconstructing Hand in a Point Embedded Multi-View Stereo	Link	Github
Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors	Link	Github
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation	Link	Github
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID	Link	Github
Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment	Link	Github
Task Residual for Tuning Vision-Language Models	Link	Github
Structured Sparsity Learning for Efficient Video Super-Resolution	Link	Github
Uncertainty-Aware Unsupervised Image Deblurring With Deep Residual Prior	Link	Github
Imitation Learning As State Matching via Differentiable Physics	Link	Github
PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration	Link	Github
Twin Contrastive Learning With Noisy Labels	Link	Github
TarViS: A Unified Approach for Target-Based Video Segmentation	Link	Github
Clover: Towards a Unified Video-Language Alignment and Fusion Model	Link	Github
Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need	Link	Github
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers	Link	Github
Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images	Link	Github
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos	Link	Github
Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection With Single Point Supervision	Link	Github
Interactive Segmentation As Gaussion Process Classification	Link	Github
PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation	Link	Github
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization	Link	Github
Adaptive Patch Deformation for Textureless-Resilient Multi-View Stereo	Link	Github
TrojDiff: Trojan Attacks on Diffusion Models With Diverse Targets	Link	Github
Exploring Discontinuity for Video Frame Interpolation	Link	Github
Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections	Link	Github
Affordance Grounding From Demonstration Video To Target Image	Link	Github
Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection	Link	Github
How to Backdoor Diffusion Models?	Link	Github
LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising	Link	Github
Neuron Structure Modeling for Generalizable Remote Physiological Measurement	Link	Github
Boundary-Enhanced Co-Training for Weakly Supervised Semantic Segmentation	Link	Github
STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection	Link	Github
RiDDLE: Reversible and Diversified De-Identification With Latent Encryptor	Link	Github
Perception-Oriented Single Image Super-Resolution Using Optimal Objective Estimation	Link	Github
Learning Federated Visual Prompt in Null Space for MRI Reconstruction	Link	Github
Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution	Link	Github
Learning Distortion Invariant Representation for Image Restoration From a Causality Perspective	Link	Github
PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery	Link	Github
MSF: Motion-Guided Sequential Fusion for Efficient 3D Object Detection From Point Cloud Sequences	Link	Github
CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection	Link	Github
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective	Link	Github
Polynomial Implicit Neural Representations for Large Diverse Datasets	Link	Github
3D-Aware Multi-Class Image-to-Image Translation With NeRFs	Link	Github
Masked Motion Encoding for Self-Supervised Video Representation Learning	Link	Github
Histopathology Whole Slide Image Analysis With Heterogeneous Graph Representation Learning	Link	Github
Towards Scalable Neural Representation for Diverse Videos	Link	Github
CLOTH4D: A Dataset for Clothed Human Reconstruction	Link	Github
Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration	Link	Github
Learning Procedure-Aware Video Representation From Instructional Videos and Their Narrations	Link	Github
Robust Test-Time Adaptation in Dynamic Scenarios	Link	Github
Task-Specific Fine-Tuning via Variational Information Bottleneck for Weakly-Supervised Pathology Whole Slide Image Classification	Link	Github
FashionSAP: Symbols and Attributes Prompt for Fine-Grained Fashion Vision-Language Pre-Training	Link	Github
MOSO: Decomposing MOtion, Scene and Object for Video Prediction	Link	Github
ALOFT: A Lightweight MLP-Like Architecture With Dynamic Low-Frequency Transform for Domain Generalization	Link	Github
A Whac-a-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others	Link	Github
SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency	Link	Github
Best of Both Worlds: Multimodal Contrastive Learning With Tabular and Imaging Data	Link	Github
Viewpoint Equivariance for Multi-View 3D Object Detection	Link	Github
DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection	Link	Github
Regularizing Second-Order Influences for Continual Learning	Link	Github
Backdoor Defense via Adaptively Splitting Poisoned Dataset	Link	Github
Towards Artistic Image Aesthetics Assessment: A Large-Scale Dataset and a New Method	Link	Github
JacobiNeRF: NeRF Shaping With Mutual Information Gradients	Link	Github
Accelerating Vision-Language Pretraining With Free Language Modeling	Link	Github
Explicit Boundary Guided Semi-Push-Pull Contrastive Learning for Supervised Anomaly Detection	Link	Github
PA&DA: Jointly Sampling Path and Data for Consistent NAS	Link	Github
An Empirical Study of End-to-End Video-Language Transformers With Masked Visual Modeling	Link	Github
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity	Link	Github
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection	Link	Github
ZBS: Zero-Shot Background Subtraction via Instance-Level Background Modeling and Foreground Selection	Link	Github
Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation	Link	Github
AdaptiveMix: Improving GAN Training via Feature Space Shrinkage	Link	Github
Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation	Link	Github
Camouflaged Object Detection With Feature Decomposition and Edge Reconstruction	Link	Github
A Strong Baseline for Generalized Few-Shot Semantic Segmentation	Link	Github
FrustumFormer: Adaptive Instance-Aware Resampling for Multi-View 3D Detection	Link	Github
Global-to-Local Modeling for Video-Based 3D Human Pose and Shape Estimation	Link	Github
Siamese DETR	Link	Github
Distribution Shift Inversion for Out-of-Distribution Prediction	Link	Github
Towards Unified Scene Text Spotting Based on Sequence Generation	Link	Github
CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer	Link	Github
Supervised Masked Knowledge Distillation for Few-Shot Transformers	Link	Github
MELTR: Meta Loss Transformer for Learning To Fine-Tune Video Foundation Models	Link	Github
Unsupervised Inference of Signed Distance Functions From Single Sparse Point Clouds Without Learning Priors	Link	Github
KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation	Link	Github
Adaptive Human Matting for Dynamic Videos	Link	Github
Making Vision Transformers Efficient From a Token Sparsification View	Link	Github
ViPLO: Vision Transformer Based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection	Link	Github
Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning	Link	Github
ACL-SPC: Adaptive Closed-Loop System for Self-Supervised Point Cloud Completion	Link	Github
Weakly Supervised Posture Mining for Fine-Grained Classification	Link	Github
H2ONet: Hand-Occlusion-and-Orientation-Aware Network for Real-Time 3D Hand Mesh Reconstruction	Link	Github
E2PN: Efficient SE(3)-Equivariant Point Network	Link	Github
Audio-Visual Grouping Network for Sound Localization From Mixtures	Link	Github
StyleIPSB: Identity-Preserving Semantic Basis of StyleGAN for High Fidelity Face Swapping	Link	Github
MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset	Link	Github
Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation	Link	Github
Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation	Link	Github
Glocal Energy-Based Learning for Few-Shot Open-Set Recognition	Link	Github
Indiscernible Object Counting in Underwater Scenes	Link	Github
Curricular Object Manipulation in LiDAR-Based Object Detection	Link	Github
TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning With Structure-Trajectory Prompted Reconstruction for Person Re-Identification	Link	Github
Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification	Link	Github
HOICLIP: Efficient Knowledge Transfer for HOI Detection With Vision-Language Models	Link	Github
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers	Link	Github
Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection	Link	Github
DAA: A Delta Age AdaIN Operation for Age Estimation via Binary Code Transformer	Link	Github
Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution	Link	Github
The Best Defense Is a Good Offense: Adversarial Augmentation Against Adversarial Attacks	Link	Github
Dynamic Conceptional Contrastive Learning for Generalized Category Discovery	Link	Github
Class Adaptive Network Calibration	Link	Github
Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation	Link	Github
FAC: 3D Representation Learning via Foreground Aware Feature Contrast	Link	Github
NICO++: Towards Better Benchmarking for Domain Generalization	Link	Github
Bridging Search Region Interaction With Template for RGB-T Tracking	Link	Github
Rotation-Invariant Transformer for Point Cloud Matching	Link	Github
Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm	Link	Github
CXTrack: Improving 3D Point Cloud Tracking With Contextual Information	Link	Github
CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition With Variational Alignment	Link	Github
Revisiting Residual Networks for Adversarial Robustness	Link	Github
Upcycling Models Under Domain and Category Shift	Link	Github
Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video	Link	Github
PDPP:Projected Diffusion for Procedure Planning in Instructional Videos	Link	Github
NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation	Link	Github
Bridging the Gap Between Model Explanations in Partially Annotated Multi-Label Classification	Link	Github
Detecting Backdoors in Pre-Trained Encoders	Link	Github
Equivalent Transformation and Dual Stream Network Construction for Mobile Image Super-Resolution	Link	Github
TAPS3D: Text-Guided 3D Textured Shape Generation From Pseudo Supervision	Link	Github
Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container	Link	Github
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution	Link	Github
Re-Thinking Federated Active Learning Based on Inter-Class Diversity	Link	Github
Joint Appearance and Motion Learning for Efficient Rolling Shutter Correction	Link	Github
Federated Incremental Semantic Segmentation	Link	Github
Evading Forensic Classifiers With Attribute-Conditioned Adversarial Faces	Link	Github
Learning Common Rationale To Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems	Link	Github
Neural Koopman Pooling: Control-Inspired Temporal Dynamics Encoding for Skeleton-Based Action Recognition	Link	Github
Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data	Link	Github
Optimization-Inspired Cross-Attention Transformer for Compressive Sensing	Link	Github
Context-Based Trit-Plane Coding for Progressive Image Compression	Link	Github
Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation	Link	Github
Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection	Link	Github
GradICON: Approximate Diffeomorphisms via Gradient Inverse Consistency	Link	Github
BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation	Link	Github
On the Effects of Self-Supervision and Contrastive Alignment in Deep Multi-View Clustering	Link	Github
Diverse 3D Hand Gesture Prediction From Body Dynamics by Bilateral Hand Disentanglement	Link	Github
sRGB Real Noise Synthesizing With Neighboring Correlation-Aware Noise Model	Link	Github
Reliability in Semantic Segmentation: Are We on the Right Track?	Link	Github
Diversity-Measurable Anomaly Detection	Link	Github
ABCD: Arbitrary Bitwise Coefficient for De-Quantization	Link	Github
Block Selection Method for Using Feature Norm in Out-of-Distribution Detection	Link	Github
Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution	Link	Github
Two-Shot Video Object Segmentation	Link	Github
MoLo: Motion-Augmented Long-Short Contrastive Learning for Few-Shot Action Recognition	Link	Github
Extracting Class Activation Maps From Non-Discriminative Features As Well	Link	Github
Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception	Link	Github
MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos	Link	Github
Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction	Link	Github
Visual Prompt Tuning for Generative Transfer Learning	Link	Github
Improved Test-Time Adaptation for Domain Generalization	Link	Github
Watch or Listen: Robust Audio-Visual Speech Recognition With Visual Corruption Modeling and Reliability Scoring	Link	Github
Enlarging Instance-Specific and Class-Specific Information for Open-Set Action Recognition	Link	Github
Inferring and Leveraging Parts From Object Shape for Improving Semantic Image Synthesis	Link	Github
DiGA: Distil To Generalize and Then Adapt for Domain Adaptive Semantic Segmentation	Link	Github
Learning a Practical SDR-to-HDRTV Up-Conversion Using New Dataset and Degradation Models	Link	Github
SliceMatch: Geometry-Guided Aggregation for Cross-View Pose Estimation	Link	Github
DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection	Link	Github
On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks	Link	Github
ScarceNet: Animal Pose Estimation With Scarce Annotations	Link	Github
Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric	Link	Github
Stimulus Verification Is a Universal and Effective Sampler in Multi-Modal Human Trajectory Prediction	Link	Github
Preserving Linear Separability in Continual Learning by Backward Feature Projection	Link	Github
Generalizable Implicit Neural Representations via Instance Pattern Composers	Link	Github
Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching	Link	Github
Progressive Neighbor Consistency Mining for Correspondence Pruning	Link	Github
Trainable Projected Gradient Method for Robust Fine-Tuning	Link	Github
Independent Component Alignment for Multi-Task Learning	Link	Github
Deep Arbitrary-Scale Image Super-Resolution via Scale-Equivariance Pursuit	Link	Github
DualVector: Unsupervised Vector Font Synthesis With Dual-Part Representation	Link	Github
Interventional Bag Multi-Instance Learning on Whole-Slide Pathological Images	Link	Github
Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection	Link	Github
Partial Network Cloning	Link	Github
Ultra-High Resolution Segmentation With Ultra-Rich Context: A Novel Benchmark	Link	Github
Object Detection With Self-Supervised Scene Adaptation	Link	Github
Generative Bias for Robust Visual Question Answering	Link	Github
MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation	Link	Github
Coreset Sampling From Open-Set for Fine-Grained Self-Supervised Learning	Link	Github
Sparsely Annotated Semantic Segmentation With Adaptive Gaussian Mixtures	Link	Github
SE-ORNet: Self-Ensembling Orientation-Aware Network for Unsupervised Point Cloud Shape Correspondence	Link	Github
B-Spline Texture Coefficients Estimator for Screen Content Image Super-Resolution	Link	Github
High-Fidelity Facial Avatar Reconstruction From Monocular Video With Generative Priors	Link	Github
DivClust: Controlling Diversity in Deep Clustering	Link	Github
Large-Scale Training Data Search for Object Re-Identification	Link	Github
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning	Link	Github
CREPE: Can Vision-Language Foundation Models Reason Compositionally?	Link	Github
Semi-Supervised Domain Adaptation With Source Label Adaptation	Link	Github
StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning	Link	Github
Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples	Link	Github
ScanDMM: A Deep Markov Model of Scanpath Prediction for 360° Images	Link	Github
PIP-Net: Patch-Based Intuitive Prototypes for Interpretable Image Classification	Link	Github
DIP: Dual Incongruity Perceiving Network for Sarcasm Detection	Link	Github
Weakly Supervised Video Representation Learning With Unaligned Text for Sequential Videos	Link	Github
PVT-SSD: Single-Stage 3D Object Detector With Point-Voxel Transformer	Link	Github
Continuous Intermediate Token Learning With Implicit Motion Manifold for Keyframe Based Motion Interpolation	Link	Github
VQACL: A Novel Visual Question Answering Continual Learning Setting	Link	Github
RONO: Robust Discriminative Learning With Noisy Labels for 2D-3D Cross-Modal Retrieval	Link	Github
PCT-Net: Full Resolution Image Harmonization Using Pixel-Wise Color Transformations	Link	Github
MixTeacher: Mining Promising Labels With Mixed Scale Teacher for Semi-Supervised Object Detection	Link	Github
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training	Link	Github
Computationally Budgeted Continual Learning: What Does Matter?	Link	Github
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers	Link	Github
Weakly Supervised Video Emotion Detection and Prediction via Cross-Modal Temporal Erasing Network	Link	Github
R2Former: Unified Retrieval and Reranking Transformer for Place Recognition	Link	Github
Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization	Link	Github
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement	Link	Github
DistilPose: Tokenized Pose Regression With Heatmap Distillation	Link	Github
Bitstream-Corrupted JPEG Images Are Restorable: Two-Stage Compensation and Alignment Framework for Image Restoration	Link	Github
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks	Link	Github
BiCro: Noisy Correspondence Rectification for Multi-Modality Data via Bi-Directional Cross-Modal Similarity Consistency	Link	Github
Representation Learning for Visual Object Tracking by Masked Appearance Transfer	Link	Github
AnchorFormer: Point Cloud Completion From Discriminative Nodes	Link	Github
TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation	Link	Github
Proximal Splitting Adversarial Attack for Semantic Segmentation	Link	Github
NVTC: Nonlinear Vector Transform Coding	Link	Github
CLAMP: Prompt-Based Contrastive Learning for Connecting Language and Animal Pose	Link	Github
Enhancing the Self-Universality for Transferable Targeted Attacks	Link	Github
Randomized Adversarial Training via Taylor Expansion	Link	Github
Long Range Pooling for 3D Large-Scale Scene Understanding	Link	Github
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training	Link	Github
Federated Domain Generalization With Generalization Adjustment	Link	Github
CoMFormer: Continual Learning in Semantic and Panoptic Segmentation	Link	Github
Fusing Pre-Trained Language Models With Multimodal Prompts Through Reinforcement Learning	Link	Github
MIST: Multi-Modal Iterative Spatial-Temporal Transformer for Long-Form Video Question Answering	Link	Github
STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition	Link	Github
An In-Depth Exploration of Person Re-Identification and Gait Recognition in Cloth-Changing Conditions	Link	Github
Learning Weather-General and Weather-Specific Features for Image Restoration Under Multiple Adverse Weather Conditions	Link	Github
Out-of-Distributed Semantic Pruning for Robust Semi-Supervised Learning	Link	Github
Long-Tailed Visual Recognition via Self-Heterogeneous Integration With Knowledge Excavation	Link	Github
Bias Mimicking: A Simple Sampling Approach for Bias Mitigation	Link	Github
OReX: Object Reconstruction From Planar Cross-Sections Using Neural Fields	Link	Github
Multi-Level Logit Distillation	Link	Github
Real-Time Evaluation in Online Continual Learning: A New Hope	Link	Github
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction	Link	Github
CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network With Large Input	Link	Github
Boosting Video Object Segmentation via Space-Time Correspondence Learning	Link	Github
Hunting Sparsity: Density-Guided Contrastive Learning for Semi-Supervised Semantic Segmentation	Link	Github
TINC: Tree-Structured Implicit Neural Compression	Link	Github
Improving Weakly Supervised Temporal Action Localization by Bridging Train-Test Gap in Pseudo Labels	Link	Github
DeGPR: Deep Guided Posterior Regularization for Multi-Class Cell Detection and Counting	Link	Github
Large-Capacity and Flexible Video Steganography via Invertible Neural Network	Link	Github
VDN-NeRF: Resolving Shape-Radiance Ambiguity via View-Dependence Normalization	Link	Github
LINe: Out-of-Distribution Detection by Leveraging Important Neurons	Link	Github
Neural Transformation Fields for Arbitrary-Styled Font Generation	Link	Github
Super-CLEVR: A Virtual Benchmark To Diagnose Domain Robustness in Visual Reasoning	Link	Github
Few-Shot Class-Incremental Learning via Class-Aware Bilateral Distillation	Link	Github
Geometry and Uncertainty-Aware 3D Point Cloud Class-Incremental Semantic Segmentation	Link	Github
FCC: Feature Clusters Compression for Long-Tailed Visual Recognition	Link	Github
Neural Vector Fields: Implicit Representation by Explicit Learning	Link	Github
Learning Action Changes by Measuring Verb-Adverb Textual Relationships	Link	Github
Make Landscape Flatter in Differentially Private Federated Learning	Link	Github
Confidence-Aware Personalized Federated Learning via Variational Expectation Maximization	Link	Github
Unsupervised Visible-Infrared Person Re-Identification via Progressive Graph Matching and Alternate Learning	Link	Github
Knowledge Combination To Learn Rotated Detection Without Rotated Annotation	Link	Github
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias	Link	Github
Symmetric Shape-Preserving Autoencoder for Unsupervised Real Scene Point Cloud Completion	Link	Github
PointCert: Point Cloud Classification With Deterministic Certified Robustness Guarantees	Link	Github
Advancing Visual Grounding With Scene Knowledge: Benchmark and Method	Link	Github
Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt	Link	Github
3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention	Link	Github
Self-Supervised 3D Scene Flow Estimation Guided by Superpoints	Link	Github
End-to-End Video Matting With Trimap Propagation	Link	Github
Transductive Few-Shot Learning With Prototype-Based Label Propagation by Iterative Graph Refinement	Link	Github
Discriminative Co-Saliency and Background Mining Transformer for Co-Salient Object Detection	Link	Github
RIATIG: Reliable and Imperceptible Adversarial Text-to-Image Generation With Natural Prompts	Link	Github
Spectral Enhanced Rectangle Transformer for Hyperspectral Image Denoising	Link	Github
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning Network	Link	Github
MAGVLT: Masked Generative Vision-and-Language Transformer	Link	Github
Focused and Collaborative Feedback Integration for Interactive Image Segmentation	Link	Github
OpenMix: Exploring Outlier Samples for Misclassification Detection	Link	Github
Adaptive Data-Free Quantization	Link	Github
VideoTrack: Learning To Track Objects via Video Transformer	Link	Github
Semi-Supervised 2D Human Pose Estimation Driven by Position Inconsistency Pseudo Label Correction Module	Link	Github
Towards Better Stability and Adaptability: Improve Online Self-Training for Model Adaptation in Semantic Segmentation	Link	Github
Contrastive Grouping With Transformer for Referring Image Segmentation	Link	Github
Fuzzy Positive Learning for Semi-Supervised Semantic Segmentation	Link	Github
3D-POP – An Automated Annotation Approach to Facilitate Markerless 2D-3D Tracking of Freely Moving Birds With Marker-Based Motion Capture	Link	Github
PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering	Link	Github
Towards Open-World Segmentation of Parts	Link	Github
PCR: Proxy-Based Contrastive Replay for Online Class-Incremental Continual Learning	Link	Github
Quantum Multi-Model Fitting	Link	Github
Few-Shot Learning With Visual Distribution Calibration and Cross-Modal Distribution Alignment	Link	Github
Practical Network Acceleration With Tiny Sets	Link	Github
Feature Alignment and Uniformity for Test Time Adaptation	Link	Github
Finding Geometric Models by Clustering in the Consensus Space	Link	Github
VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation	Link	Github
Meta-Learning With a Geometry-Adaptive Preconditioner	Link	Github
Divide and Conquer: Answering Questions With Object Factorization and Compositional Reasoning	Link	Github
Physical-World Optical Adversarial Attacks on 3D Face Recognition	Link	Github
Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning	Link	Github
On Calibrating Semantic Segmentation Models: Analyses and an Algorithm	Link	Github
Binary Latent Diffusion	Link	Github
Q: How To Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!	Link	Github
MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding From Object Detection	Link	Github
Behavioral Analysis of Vision-and-Language Navigation Agents	Link	Github
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding	Link	Github
Progressive Spatio-Temporal Alignment for Efficient Event-Based Motion Estimation	Link	Github
Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections	Link	Github
Normalizing Flow Based Feature Synthesis for Outlier-Aware Object Detection	Link	Github
Non-Contrastive Unsupervised Learning of Physiological Signals From Video	Link	Github
Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning	Link	Github
Markerless Camera-to-Robot Pose Estimation via Self-Supervised Sim-to-Real Transfer	Link	Github
Event-Guided Person Re-Identification via Sparse-Dense Complementary Learning	Link	Github
PeakConv: Learning Peak Receptive Field for Radar Semantic Segmentation	Link	Github
Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation	Link	Github
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning	Link	Github
Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification	Link	Github
Multiple Instance Learning via Iterative Self-Paced Supervised Contrastive Learning	Link	Github
Abstract Visual Reasoning: An Algebraic Approach for Solving Raven’s Progressive Matrices	Link	Github
Introducing Competition To Boost the Transferability of Targeted Adversarial Examples Through Clean Feature Mixup	Link	Github
Boosting Verified Training for Robust Image Classifications via Abstraction	Link	Github
DaFKD: Domain-Aware Federated Knowledge Distillation	Link	Github
Resource-Efficient RGBD Aerial Tracking	Link	Github
BiasBed – Rigorous Texture Bias Evaluation	Link	Github
Progressive Open Space Expansion for Open-Set Model Attribution	Link	Github
Harmonious Feature Learning for Interactive Hand-Object Pose Estimation	Link	Github
Masked Images Are Counterfactual Samples for Robust Fine-Tuning	Link	Github
MMANet: Margin-Aware Distillation and Modality-Aware Regularization for Incomplete Multimodal Learning	Link	Github
CFA: Class-Wise Calibrated Fair Adversarial Training	Link	Github
Regularization of Polynomial Networks for Image Recognition	Link	Github
SlowLiDAR: Increasing the Latency of LiDAR-Based Detection Using Adversarial Examples	Link	Github
Depth Estimation From Indoor Panoramas With Neural Scene Representation	Link	Github
Improving Robustness of Vision Transformers by Reducing Sensitivity To Patch Corruptions	Link	Github
EfficientSCI: Densely Connected Network With Space-Time Factorization for Large-Scale Video Snapshot Compressive Imaging	Link	Github
GKEAL: Gaussian Kernel Embedded Analytic Learning for Few-Shot Class Incremental Task	Link	Github
Boundary-Aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval	Link	Github
Towards Practical Plug-and-Play Diffusion Models	Link	Github
Where We Are and What We’re Looking At: Query Based Worldwide Image Geo-Localization Using Hierarchies and Scenes	Link	Github
PEFAT: Boosting Semi-Supervised Medical Image Classification via Pseudo-Loss Estimation and Feature Adversarial Training	Link	Github
From Node Interaction To Hop Interaction: New Effective and Scalable Graph Learning Paradigm	Link	Github
Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-Shot Learning With Hyperspherical Embeddings	Link	Github
Architecture, Dataset and Model-Scale Agnostic Data-Free Meta-Learning	Link	Github
Layout-Based Causal Inference for Object Navigation	Link	Github
Ensemble-Based Blackbox Attacks on Dense Prediction	Link	Github
Adversarial Robustness via Random Projection Filters	Link	Github
NLOST: Non-Line-of-Sight Imaging With Transformer	Link	Github
Fast Contextual Scene Graph Generation With Unbiased Context Augmentation	Link	Github
Event-Based Blurry Frame Interpolation Under Blind Exposure	Link	Github
Defending Against Patch-Based Backdoor Attacks on Self-Supervised Learning	Link	Github
GradMA: A Gradient-Memory-Based Accelerated Federated Learning With Alleviated Catastrophic Forgetting	Link	Github
Balanced Product of Calibrated Experts for Long-Tailed Recognition	Link	Github
Principles of Forgetting in Domain-Incremental Semantic Segmentation in Adverse Weather Conditions	Link	Github
Annealing-Based Label-Transfer Learning for Open World Object Detection	Link	Github
Make-a-Story: Visual Memory Conditioned Consistent Story Generation	Link	Github
Revisiting Prototypical Network for Cross Domain Few-Shot Learning	Link	Github
Perception and Semantic Aware Regularization for Sequential Confidence Calibration	Link	Github
Semi-Weakly Supervised Object Kinematic Motion Prediction	Link	Github
Image Quality-Aware Diagnosis via Meta-Knowledge Co-Embedding	Link	Github
MaLP: Manipulation Localization Using a Proactive Scheme	Link	Github
Adjustment and Alignment for Unbiased Open Set Domain Adaptation	Link	Github
Knowledge Distillation for 6D Pose Estimation by Aligning Distributions of Local Predictions	Link	Github
Sliced Optimal Partial Transport	Link	Github
HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions	Link	Github
Trap Attention: Monocular Depth Estimation With Manual Traps	Link	Github
GEN: Pushing the Limits of Softmax-Based Out-of-Distribution Detection	Link	Github
Learning From Noisy Labels With Decoupled Meta Label Purifier	Link	Github
Local Connectivity-Based Density Estimation for Face Clustering	Link	Github
Physics-Guided ISO-Dependent Sensor Noise Modeling for Extreme Low-Light Photography	Link	Github
Probing Neural Representations of Scene Perception in a Hippocampally Dependent Task Using Artificial Neural Networks	Link	Github
A Probabilistic Framework for Lifelong Test-Time Adaptation	Link	Github
PointCMP: Contrastive Mask Prediction for Self-Supervised Learning on Point Cloud Videos	Link	Github
Deep Polarization Reconstruction With PDAVIS Events	Link	Github
Optimal Transport Minimization: Crowd Localization on Density Maps for Semi-Supervised Counting	Link	Github
Probabilistic Debiasing of Scene Graphs	Link	Github
PMR: Prototypical Modal Rebalance for Multimodal Learning	Link	Github
Logical Consistency and Greater Descriptive Power for Facial Hair Attribute Learning	Link	Github
HyperCUT: Video Sequence From a Single Blurry Image Using Unsupervised Ordering	Link	Github
Document Image Shadow Removal Guided by Color-Aware Background	Link	Github
DLBD: A Self-Supervised Direct-Learned Binary Descriptor	Link	Github
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning	Link	Github
Learning Debiased Representations via Conditional Attribute Interpolation	Link	Github
Bayesian Posterior Approximation With Stochastic Ensembles	Link	Github
Decoupling Learning and Remembering: A Bilevel Memory Framework With Knowledge Projection for Task-Incremental Learning	Link	Github
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning	Link	Github
Noisy Correspondence Learning With Meta Similarity Correction	Link	Github
RMLVQA: A Margin Loss Approach for Visual Question Answering With Language Biases	Link	Github
Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval	Link	Github
BUFFER: Balancing Accuracy, Efficiency, and Generalizability in Point Cloud Registration	Link	Github
Are Data-Driven Explanations Robust Against Out-of-Distribution Data?	Link	Github
Model Barrier: A Compact Un-Transferable Isolation Domain for Model Intellectual Property Protection	Link	Github
Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning	Link	Github
High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition	Link	Github
A Bag-of-Prototypes Representation for Dataset-Level Applications	Link	Github
Neural Dependencies Emerging From Learning Massive Categories	Link	Github
Learning With Noisy Labels via Self-Supervised Adversarial Noisy Masking	Link	Github
CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset	Link	Github
Balanced Energy Regularization Loss for Out-of-Distribution Detection	Link	Github
Being Comes From Not-Being: Open-Vocabulary Text-to-Motion Generation With Wordless Training	Link	Github
Masked Representation Learning for Domain Generalized Stereo Matching	Link	Github
Where Is My Spot? Few-Shot Image Generation via Latent Subspace Optimization	Link	Github
Genie: Show Me the Data for Quantization	Link	Github
G-MSM: Unsupervised Multi-Shape Matching With Graph-Based Affinity Priors	Link	Github
TokenHPE: Learning Orientation Tokens for Efficient Head Pose Estimation via Transformers	Link	Github
Hierarchical Prompt Learning for Multi-Task Learning	Link	Github
Structure Aggregation for Cross-Spectral Stereo Image Guided Denoising	Link	Github
Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration	Link	Github
Paired-Point Lifting for Enhanced Privacy-Preserving Visual Localization	Link	Github
Towards Effective Visual Representations for Partial-Label Learning	Link	Github
Pose-Disentangled Contrastive Learning for Self-Supervised Facial Representation	Link	Github
Black-Box Sparse Adversarial Attack via Multi-Objective Optimisation	Link	Github
Spatio-Temporal Pixel-Level Contrastive Learning-Based Source-Free Domain Adaptation for Video Semantic Segmentation	Link	Github
Data-Free Knowledge Distillation via Feature Exchange and Activation Region Constraint	Link	Github
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-Channel Video-Language Retrieval	Link	Github
Discriminating Known From Unknown Objects via Structure-Enhanced Recurrent Variational AutoEncoder	Link	Github
Towards Bridging the Performance Gaps of Joint Energy-Based Models	Link	Github
Pixels, Regions, and Objects: Multiple Enhancement for Salient Object Detection	Link	Github
AsyFOD: An Asymmetric Adaptation Paradigm for Few-Shot Domain Adaptive Object Detection	Link	Github
ConStruct-VL: Data-Free Continual Structured VL Concepts Learning	Link	Github
X-Pruner: eXplainable Pruning for Vision Transformers	Link	Github
Efficient Mask Correction for Click-Based Interactive Image Segmentation	Link	Github
Dynamic Aggregated Network for Gait Recognition	Link	Github
Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery	Link	Github
Weakly Supervised Semantic Segmentation via Adversarial Learning of Classifier and Reconstructor	Link	Github
Adaptive Plasticity Improvement for Continual Learning	Link	Github
Jedi: Entropy-Based Localization and Removal of Adversarial Patches	Link	Github
BAAM: Monocular 3D Pose and Shape Reconstruction With Bi-Contextual Attention Module and Attention-Guided Modeling	Link	Github
Leverage Interactive Affinity for Affordance Learning	Link	Github
Evolved Part Masking for Self-Supervised Learning	Link	Github
CHMATCH: Contrastive Hierarchical Matching and Robust Adaptive Threshold Boosted Semi-Supervised Learning	Link	Github
High-Fidelity Event-Radiance Recovery via Transient Event Frequency	Link	Github
Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures	Link	Github
Detection of Out-of-Distribution Samples Using Binary Neuron Activation Patterns	Link	Github
Decoupled Semantic Prototypes Enable Learning From Diverse Annotation Types for Semi-Weakly Segmentation in Expert-Driven Domains	Link	Github
A Soma Segmentation Benchmark in Full Adult Fly Brain	Link	Github
KD-DLGAN: Data Limited Image Generation via Knowledge Distillation	Link	Github
PIVOT: Prompting for Video Continual Learning	Link	Github
Rate Gradient Approximation Attack Threats Deep Spiking Neural Networks	Link	Github
L-CoIns: Language-Based Colorization With Instance Awareness	Link	Github
Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph	Link	Github
Towards Building Self-Aware Object Detectors via Reliable Uncertainty Quantification and Calibration	Link	Github
Dense Network Expansion for Class Incremental Learning	Link	Github
Unsupervised Intrinsic Image Decomposition With LiDAR Intensity	Link	Github
Neuralizer: General Neuroimage Analysis Without Re-Training	Link	Github
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers	Link	Github
Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling	Link	Github
Modular Memorability: Tiered Representations for Video Memorability Prediction	Link	Github
Federated Learning With Data-Agnostic Distribution Fusion	Link	Github
Four-View Geometry With Unknown Radial Distortion	Link	Github
Manipulating Transfer Learning for Property Inference	Link	Github
BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image	Link	Github
3D Spatial Multimodal Knowledge Accumulation for Scene Graph Prediction in Point Cloud	Link	Github
Efficient Loss Function by Minimizing the Detrimental Effect of Floating-Point Errors on Gradient-Based Attacks	Link	Github
Towards Professional Level Crowd Annotation of Expert Domain Data	Link	Github
Improving Robustness of Semantic Segmentation to Motion-Blur Using Class-Centric Augmentation	Link	Github
Similarity Metric Learning for RGB-Infrared Group Re-Identification	Link	Github
On the Difficulty of Unpaired Infrared-to-Visible Video Translation: Fine-Grained Content-Rich Patches Transfer	Link	Github
Camouflaged Instance Segmentation via Explicit De-Camouflaging	Link	Github
Global Vision Transformer Pruning With Hessian-Aware Saliency	Link	Github
DoNet: Deep De-Overlapping Network for Cytology Instance Segmentation	Link	Github
ERM-KTP: Knowledge-Level Machine Unlearning via Knowledge Transfer	Link	Github
AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning	Link	Github
Simulated Annealing in Early Layers Leads to Better Generalization	Link	Github
Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding	Link	Github
Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation	Link	Github
Compositor: Bottom-Up Clustering and Compositing for Robust Part and Object Segmentation	Link	Github
MEDIC: Remove Model Backdoors via Importance Driven Cloning	Link	Github
Mitigating Task Interference in Multi-Task Learning via Explicit Task Routing With Non-Learnable Primitives	Link	Github
Adaptive Graph Convolutional Subspace Clustering	Link	Github
Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language	Link	Github
Correlational Image Modeling for Self-Supervised Visual Pre-Training	Link	Github
Text With Knowledge Graph Augmented Transformer for Video Captioning	Link	Github
Panoptic Video Scene Graph Generation	Link	Github
DartBlur: Privacy Preservation With Detection Artifact Suppression	Link	Github
IDGI: A Framework To Eliminate Explanation Noise From Integrated Gradients	Link	Github
Ultrahigh Resolution Image/Video Matting With Spatio-Temporal Sparsity	Link	Github
Vector Quantization With Self-Attention for Quality-Independent Representation Learning	Link	Github
Privacy-Preserving Representations Are Not Enough: Recovering Scene Content From Camera Poses	Link	Github
DETRs With Hybrid Matching	Link	Github
GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods	Link	Github
AltFreezing for More General Video Face Forgery Detection	Link	Github
Heterogeneous Continual Learning	Link	Github
EMT-NAS:Transferring Architectural Knowledge Between Tasks From Different Datasets	Link	Github
Efficient Movie Scene Detection Using State-Space Transformers	Link	Github
Private Image Generation With Dual-Purpose Auxiliary Classifier	Link	Github
BASiS: Batch Aligned Spectral Embedding Space	Link	Github
A Large-Scale Robustness Analysis of Video Action Recognition Models	Link	Github
Neumann Network With Recursive Kernels for Single Image Defocus Deblurring	Link	Github
Rebalancing Batch Normalization for Exemplar-Based Class-Incremental Learning	Link	Github
ToThePoint: Efficient Contrastive Learning of 3D Point Clouds via Recycling	Link	Github
Self-Supervised Blind Motion Deblurring With Deep Expectation Maximization	Link	Github
S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning	Link	Github
DINN360: Deformable Invertible Neural Network for Latitude-Aware 360° Image Rescaling	Link	Github
Patch-Craft Self-Supervised Training for Correlated Image Denoising	Link	Github
Learning Decorrelated Representations Efficiently Using Fast Fourier Transform	Link	Github
AstroNet: When Astrocyte Meets Artificial Neural Network	Link	Github
PanoSwin: A Pano-Style Swin Transformer for Panorama Understanding	Link	Github
Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge	Link	Github
Polarized Color Image Denoising	Link	Github

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CVPR2023 Top Open Papers

Best Papers

Top CVPR2023 Papers with Code

Files

README.md

Latest commit

History

README.md

File metadata and controls

CVPR2023 Top Open Papers

Best Papers

Top CVPR2023 Papers with Code