Generative ConvNet Foundation Model with Sparse and Low-Frequency Filtered Masked Modeling for Remote Sensing Image Interpretation

Introduction

This is the official repository for the paper "Generative ConvNet Foundation Model with Sparse and Low-Frequency Filtered Masked Modeling for Remote Sensing Image Interpretation".

Abstract: Foundation models offer a highly versatile and precise solution for intelligent interpretation of remote sensing images, thus greatly facilitating various remote sensing applications. Nevertheless, current foundational models for remote sensing predominantly employ vision transformers based on generative methods, with no corresponding exploration of ConvNets with masked image modeling (MIM). In this paper, we make the first attempt to propose a generative ConvNet foundation model tailored for remote sensing scenarios, which comprises two key components: Firstly, a large dataset named GeoSense, containing approximately nine million diverse remote sensing images, is constructed to enhance the robustness and generalization of the foundation model during the pre-training phase. Secondly, a sparse and low-frequency filtered masked modeling (SLFFM) self-supervised learning framework is designed for representation learning of ConvNet foundation model. Specifically, we introduce sub-manifold sparse convolutions to enable the ConvNet to process variable-length sequences for MIM self-supervised pre-training. Additionally, a low-frequency filtered reconstruction target is designed to guide the model's attention towards essential ground object features in remote sensing images, while mitigating unnecessary detail interference. To evaluate the general performance of our proposed foundation model, comprehensive experiments have been carried out on five datasets across three downstream tasks (i.e., object detection, semantic segmentation, and change detection.). Experimental results demonstrate that our method consistently achieves state-of-the-art performance across all benchmark datasets and downstream tasks.

Pre-trained and Fine-tuned Models

Pre-training

GeoSense

Pretrain	Backbone	Input Size	Paramters	Pretrained Model
SLFFM	ConvNeXt-Base	224x224	89M	Weights
SLFFM	ConvNeXt-Large	224x224	198M	Weights

Object Detection

Dota V1.0

Method	Pre-train	Backbone	Lr Schd	mAP	Config	Model
Oriented R-CNN	SLFFM	ConvNeXt-Base	1x	79.15	Config	Weights
Oriented R-CNN	SLFFM	ConvNeXt-Large	1x	79.33	Config	Weights

DIOR-R

Method	Pre-train	Backbone	Lr Schd	mAP	Config	Model
Oriented R-CNN	SLFFM	ConvNeXt-Base	1x	71.50	Config	Weights
Oriented R-CNN	SLFFM	ConvNeXt-Large	1x	72.33	Config	Weights

Semantic Segmentation

Potsdam

Method	Pre-train	Backbone	Lr Schd	OA	Config	Model
UperNet	SLFFM	ConvNeXt-Base	160k	91.72	Config	Weights
UperNet	SLFFM	ConvNeXt-Large	160k	91.82	Config	Weights

LoveDA

Method	Pre-train	Backbone	Lr Schd	mIoU	Config	Model
UperNet	SLFFM	ConvNeXt-Base	160k	52.59	Config	Weights
UperNet	SLFFM	ConvNeXt-Large	160k	53.03	Config	Weights

Change Detection

LEVIR-CD

Method	Pre-train	Backbone	Lr Schd	F1	Config	Model
BIT	SLFFM	ConvNeXt-Base	20k	93.66	Config	Weights
BIT	SLFFM	ConvNeXt-Large	20k	93.89	Config	Weights

Usage

Environment

python 3.8.13
pytorch 1.12.1+cu113
torchvision 0.13.1+cu113
timm 0.6.12
mmdet 2.28.2
mmsegmentation 0.30.0
opencd 0.0.3

Pre-training

torchrun --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=localhost --master_port=1234 main.py data_path=${DataPath} --exp_name=${ExpName} --exp_dir=${ExpDir} --model=${Model} --bs=1024 --init_weight=${InitWeight}

Finetune on Object Detection

Train:

bash tools/dist_train.sh ${ConfigPath} 8

Test:

bash tools/dist_test.sh ${ConfigPath} ${CheckpointPath} 8 --format-only --eval-options submission_dir=${SubmissionDir}

Finetune on Semantic Segmentation

Train:

bash tools/dist_train.sh ${ConfigPath} 8

Test:

bash tools/dist_test.sh ${ConfigPath} ${CheckpointPath} 8 --eval 'mFscore' 'mIoU'

Finetune on Change Detection

Train:

bash tools/dist_train.sh ${ConfigPath} 8

Test:

bash tools/dist_test.sh ${ConfigPath} ${CheckpointPath} 8 --eval mFscore mIoU

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ChangeDetection		ChangeDetection
ObjectDetection		ObjectDetection
Pretrain		Pretrain
SemanticSegmentation		SemanticSegmentation
LICENSE		LICENSE
README.md		README.md
flowchart.png		flowchart.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generative ConvNet Foundation Model with Sparse and Low-Frequency Filtered Masked Modeling for Remote Sensing Image Interpretation

Introduction

Pre-trained and Fine-tuned Models

Pre-training

GeoSense

Object Detection

Dota V1.0

DIOR-R

Semantic Segmentation

Potsdam

LoveDA

Change Detection

LEVIR-CD

Usage

Environment

Pre-training

Finetune on Object Detection

Finetune on Semantic Segmentation

Finetune on Change Detection

About

Releases

Packages

Languages

License

HIT-SIRS/SMLFR

Folders and files

Latest commit

History

Repository files navigation

Generative ConvNet Foundation Model with Sparse and Low-Frequency Filtered Masked Modeling for Remote Sensing Image Interpretation

Introduction

Pre-trained and Fine-tuned Models

Pre-training

GeoSense

Object Detection

Dota V1.0

DIOR-R

Semantic Segmentation

Potsdam

LoveDA

Change Detection

LEVIR-CD

Usage

Environment

Pre-training

Finetune on Object Detection

Finetune on Semantic Segmentation

Finetune on Change Detection

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages