🌟A collection of papers, datasets, benchmarks, code, and pre-trained weights for Remote Sensing Foundation Models (RSFMs).
🔥🔥🔥 Last Updated on 2024.04.03 🔥🔥🔥
- 2024.4.03: Update SAMRS and msGFM.
- 2024.4.01: Update PIS and H2RSVLM.
- 2024.3.27: Update Remote Sensing Task-specific Foundation Models and LuoJiaHOG.
- 2024.3.25: Update DOFA.
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
GeoKR | Geographical Knowledge-Driven Representation Learning for Remote Sensing Images | TGRS2021 | GeoKR | link |
- | Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding | CVPRW2021 | Paper | link |
GASSL | Geography-Aware Self-Supervised Learning | ICCV2021 | GASSL | link |
SeCo | Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data | ICCV2021 | SeCo | link |
DINO-MM | Self-supervised Vision Transformers for Joint SAR-optical Representation Learning | IGARSS2022 | DINO-MM | link |
SatMAE | SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery | NeurIPS2022 | SatMAE | link |
RS-BYOL | Self-Supervised Learning for Invariant Representations From Multi-Spectral and SAR Images | JSTARS2022 | RS-BYOL | null |
GeCo | Geographical Supervision Correction for Remote Sensing Representation Learning | TGRS2022 | GeCo | null |
RingMo | RingMo: A remote sensing foundation model with masked image modeling | TGRS2022 | RingMo | Code |
RVSA | Advancing plain vision transformer toward remote sensing foundation model | TGRS2022 | RVSA | link |
RSP | An Empirical Study of Remote Sensing Pretraining | TGRS2022 | RSP | link |
MATTER | Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks | CVPR2022 | MATTER | null |
CSPT | Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain | RS2022 | CSPT | link |
- | Self-supervised Vision Transformers for Land-cover Segmentation and Classification | CVPRW2022 | Paper | link |
BFM | A billion-scale foundation model for remote sensing images | Arxiv2023 | BFM | null |
TOV | TOV: The original vision model for optical remote sensing image understanding via self-supervised learning | JSTARS2023 | TOV | link |
CMID | CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding | TGRS2023 | CMID | link |
RingMo-Sense | RingMo-Sense: Remote Sensing Foundation Model for Spatiotemporal Prediction via Spatiotemporal Evolution Disentangling | TGRS2023 | RingMo-Sense | null |
IaI-SimCLR | Multi-Modal Multi-Objective Contrastive Learning for Sentinel-1/2 Imagery | CVPRW2023 | IaI-SimCLR | null |
CACo | Change-Aware Sampling and Contrastive Learning for Satellite Images | CVPR2023 | CACo | link |
SatLas | SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding | ICCV2023 | SatLas | link |
GFM | Towards Geospatial Foundation Models via Continual Pretraining | ICCV2023 | GFM | link |
Scale-MAE | Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning | ICCV2023 | Scale-MAE | link |
DINO-MC | DINO-MC: Self-supervised Contrastive Learning for Remote Sensing Imagery with Multi-sized Local Crops | Arxiv2023 | DINO-MC | link |
CROMA | CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders | NeurIPS2023 | CROMA | link |
Cross-Scale MAE | Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing | NeurIPS2023 | Cross-Scale MAE | link |
DeCUR | DeCUR: decoupling common & unique representations for multimodal self-supervision | Arxiv2023 | DeCUR | link |
Presto | Lightweight, Pre-trained Transformers for Remote Sensing Timeseries | Arxiv2023 | Presto | link |
CtxMIM | CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding | Arxiv2023 | CtxMIM | null |
FG-MAE | Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing | Arxiv2023 | FG-MAE | link |
Prithvi | Foundation Models for Generalist Geospatial Artificial Intelligence | Arxiv2023 | Prithvi | link |
RingMo-lite | RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework | Arxiv2023 | RingMo-lite | null |
- | A Self-Supervised Cross-Modal Remote Sensing Foundation Model with Multi-Domain Representation and Cross-Domain Fusion | IGARSS2023 | Paper | null |
EarthPT | EarthPT: a foundation model for Earth Observation | NeurIPS2023 CCAI workshop | EarthPT | link |
USat | USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery | Arxiv2023 | USat | link |
FoMo-Bench | FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models | Arxiv2023 | FoMo-Bench | Comming soon |
AIEarth | Analytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big Data | Arxiv2023 | AIEarth | link |
- | Self-Supervised Learning for SAR ATR with a Knowledge-Guided Predictive Architecture | Arxiv2023 | Paper | null |
Clay | Clay Foundation Model | - | null | link |
Hydro | Hydro--A Foundation Model for Water in Satellite Imagery | - | null | link |
U-BARN | Self-Supervised Spatio-Temporal Representation Learning of Satellite Image Time Series | JSTARS2024 | Paper | null |
GeRSP | Generic Knowledge Boosted Pre-training For Remote Sensing Images | Arxiv2024 | GeRSP | GeRSP |
SwiMDiff | SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image | Arxiv2024 | SwiMDiff | null |
OFA-Net | One for All: Toward Unified Foundation Models for Earth Vision | Arxiv2024 | OFA-Net | null |
SMLFR | Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation | TGRS2024 | SMLFR | link |
SpectralGPT | SpectralGPT: Spectral Foundation Model | TPAMI2024 | SpectralGPT | link |
S2MAE | S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data | CVPR2024 | S2MAE | null |
SatMAE++ | Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery | CVPR2024 | SatMAE++ | link |
msGFM | Bridging Remote Sensors with Multisensor Geospatial Foundation Models | CVPR2024 | msGFM | link |
SkySense | SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery | CVPR2024 | SkySense | Comming soon |
MTP | MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Arxiv2024 | MTP | link |
DOFA | Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities | Arxiv2024 | DOFA | link |
PIS | Pretrain A Remote Sensing Foundation Model by Promoting Intra-instance Similarity | - | null | link |
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
RSGPT | RSGPT: A Remote Sensing Vision Language Model and Benchmark | Arxiv2023 | RSGPT | link |
RemoteCLIP | RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | Arxiv2023 | RemoteCLIP | link |
GRAFT | Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment | ICLR2024 | GRAFT | null |
- | Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs | Arxiv2023 | Paper | link |
- | Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models | Arxiv2024 | Paper | link |
SkyEyeGPT | SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model | Arxiv2024 | Paper | link |
EarthGPT | EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain | Arxiv2024 | Paper | null |
SkyCLIP | SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing | AAAI2024 | SkyCLIP | link |
GeoChat | GeoChat: Grounded Large Vision-Language Model for Remote Sensing | CVPR2024 | GeoChat | link |
LHRS-Bot | LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model | Arxiv2024 | Paper | link |
H2RSVLM | H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model | Arxiv2024 | Paper | link |
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
Seg2Sat | Seg2Sat - Segmentation to aerial view using pretrained diffuser models | Github | null | link |
- | Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps | NeurIPSW2023 | Paper | link |
DiffusionSat | DiffusionSat: A Generative Foundation Model for Satellite Imagery | ICLR2024 | DiffusionSat | link |
CRS-Diff | CRS-Diff: Controllable Generative Remote Sensing Foundation Model | Arxiv2024 | Paper | null |
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
CSP | CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations | ICML2023 | CSP | link |
GeoCLIP | GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization | NeurIPS2023 | GeoCLIP | link |
SatCLIP | SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery | Arxiv2023 | SatCLIP | link |
Abbreviation | Title | Publication | Paper | Code & Weights |
---|---|---|---|---|
- | Self-supervised audiovisual representation learning for remote sensing data | JAG2022 | Paper | link |
Abbreviation | Title | Publication | Paper | Code & Weights | Task |
---|---|---|---|---|---|
SS-MAE | SS-MAE: Spatial-Spectral Masked Auto-Encoder for Mulit-Source Remote Sensing Image Classification | TGRS2023 | Paper | link | Image Classification |
TTP | Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection | Arxiv2023 | Paper | link | Change Detection |
CSMAE | Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing | Arxiv2024 | Paper | link | Image Retrieval |
RSPrompter | RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model | TGRS2024 | Paper | link | Instance Segmentation |
BAN | A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection | TGRS2024 | Paper | link | Change Detection |
- | Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM) | Arxiv2024 | Paper | null | Change Detection (Optical & OSM data) |
AnyChange | Segment Any Change | Arxiv2024 | Paper | null | Zero-shot Change Detection |
RS-CapRet | Large Language Models for Captioning and Retrieving Remote Sensing Images | Arxiv2024 | Paper | null | Image Caption & Text-image Retrieval |
- | Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation | Arxiv2024 | Paper | null | Image Segmentation (Noisy labels) |
RSBuilding | RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model | Arxiv2024 | Paper | link | Building Extraction and Change Detection |
SAM-Road | Segment Anything Model for Road Network Graph Extraction | Arxiv2024 | Paper | link | Road Extraction |
Abbreviation | Title | Publication | Paper | Link | Downstream Tasks |
---|---|---|---|---|---|
- | Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters | Arxiv2023 | Paper | link | Classification |
GEO-Bench | GEO-Bench: Toward Foundation Models for Earth Monitoring | Arxiv2023 | Paper | link | Classification & Segmentation |
FoMo-Bench | FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models | Arxiv2023 | FoMo-Bench | Comming soon | Classification & Segmentation & Detection for forest monitoring |
PhilEO | PhilEO Bench: Evaluating Geo-Spatial Foundation Models | Arxiv2024 | Paper | link | Segmentation & Regression estimation |
SkySense | SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery | CVPR2024 | SkySense | Comming Soon | Classification & Segmentation & Detection & Change detection & Multi-Modal Segmentation: Time-insensitive LandCover Mapping & Multi-Modal Segmentation: Time-sensitive Crop Mapping & Multi-Modal Scene Classification |
VLEO-Bench | Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data | Arxiv2024 | VLEO-bench | link | Location Recognition & Captioning & Scene Classification & Counting & Detection & Change detection |
Abbreviation | Title | Publication | Paper | Attribute | Link |
---|---|---|---|---|---|
fMoW | Functional Map of the World | CVPR2018 | fMoW | Vision | link |
SEN12MS | SEN12MS -- A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion | - | SEN12MS | Vision | link |
BEN-MM | BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval | GRSM2021 | BEN-MM | Vision | link |
MillionAID | On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID | JSTARS2021 | MillionAID | Vision | link |
SeCo | Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data | ICCV2021 | SeCo | Vision | link |
fMoW-S2 | SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery | NeurIPS2022 | fMoW-S2 | Vision | link |
TOV-RS-Balanced | TOV: The original vision model for optical remote sensing image understanding via self-supervised learning | JSTARS2023 | TOV | Vision | link |
SSL4EO-S12 | SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation | GRSM2023 | SSL4EO-S12 | Vision | link |
SSL4EO-L | SSL4EO-L: Datasets and Foundation Models for Landsat Imagery | Arxiv2023 | SSL4EO-L | Vision | link |
SatlasPretrain | SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding | ICCV2023 | SatlasPretrain | Vision (Supervised) | link |
CACo | Change-Aware Sampling and Contrastive Learning for Satellite Images | CVPR2023 | CACo | Vision | Comming soon |
SAMRS | SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model | NeurIPS2023 | SAMRS | Vision | link |
RSVG | RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data | TGRS2023 | RSVG | Vision-Language | link |
RS5M | RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model | Arxiv2023 | RS5M | Vision-Language | link |
GEO-Bench | GEO-Bench: Toward Foundation Models for Earth Monitoring | Arxiv2023 | GEO-Bench | Vision (Evaluation) | link |
RSICap & RSIEval | RSGPT: A Remote Sensing Vision Language Model and Benchmark | Arxiv2023 | RSGPT | Vision-Language | Comming soon |
Clay | Clay Foundation Model | - | null | Vision | link |
SATIN | SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models | ICCVW2023 | SATIN | Vision-Language | link |
SkyScript | SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing | AAAI2024 | SkyScript | Vision-Language | link |
ChatEarthNet | ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing | Arxiv2024 | ChatEarthNet | Vision-Language | [Comming soon] |
LuoJiaHOG | LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrieval | Arxiv2024 | LuoJiaHOG | Vision-Language | null |
Title | Publication | Paper | Attribute |
---|---|---|---|
Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future Works | TGRS2023 | Paper | Vision & Vision-Language |
Vision-Language Models in Remote Sensing: Current Progress and Future Trends | Arxiv2023 | Paper | Vision-Language |
The Potential of Visual ChatGPT For Remote Sensing | Arxiv2023 | Paper | Vision-Language |
遥感大模型:进展与前瞻 | 武汉大学学报 (信息科学版) 2023 | Paper | Vision & Vision-Language |
地理人工智能样本:模型、质量与服务 | 武汉大学学报 (信息科学版) 2023 | Paper | - |
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey | JSTARS2023 | Paper | Vision & Vision-Language |
Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters | Arxiv2023 | Paper | Vision |
An Agenda for Multimodal Foundation Models for Earth Observation | IGARSS2023 | Paper | Vision |
Transfer learning in environmental remote sensing | RSE2024 | Paper | Transfer learning |
遥感基础模型发展综述与未来设想 | 遥感学报2023 | Paper | - |
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications | Arxiv2023 | Paper | Vision-Language |
If you find this repository useful, please consider giving a star ⭐ and citation:
@InProceedings{guo2023skysense,
title={SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery},
author={Xin Guo and Jiangwei Lao and Bo Dang and Yingying Zhang and Lei Yu and Lixiang Ru and Liheng Zhong and Ziyuan Huang and Kang Wu and Dingxiang Hu and Huimei He and Jian Wang and Jingdong Chen and Ming Yang and Yongjun Zhang and Yansheng Li},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {},
year = {2024},
pages = {}
}