GaLR

The offical PyTorch code for paper "Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information", TGRS 2022.

GaLR

Author: Zhiqiang Yuan

-------------------------------------------------------------------------------------

Welcome 👍`Fork and Star`👍, then we'll let you know when we update

#### News:
#### 2021.9.26: ---->Under update ...<----

-------------------------------------------------------------------------------------

INTRODUCTION

This is GaLR, a cross-modal retrieval method for remote sensing images. We use the MIDF module to fuse multi-level RS image features, and add the DREA mechanism to improve the performance of local features. In addition, a multivariate rerank algorithm is designed to make full use of the information in the similarity matrix during the testing. Our method has achieved the state-of-the-art performance (2021.10) in RS cross-modal retrieval task on multiple RS image-text datasets.

Network Architecture

The proposed RSCTIR framework based on global and local information. Compared with the retrieval models constructed using only global features, GaLR incorporates optimized local features in the visual encoding considering the target redundancy of RS. The multi-level information dynamic fusion module is designed to fuse the two types of information, using the global information to supplement the local information and utilizing the latter to correct the former. The suggested multivariate rerank algorithm as a post-processing method further improves the retrieval accuracy without extra training.

DREA

To alleviate the pressure on the model from redundant target relations and increase the model’s focus on salient instances, we come up with a denoised representation matrix and a enhanced adjacency matrix to assist the GCN in producing better local representations. DREA filters the redundant features with high similarity and enhances the features of salient targets, which enables GaLR to obtain more transcendent visual representation.

MIDF

The proposed multi-level information dynamic fusion module. The method falls into two stages of feature retransformation and dynamic fusion. MIDF first uses SA and GA modules to retransform features, then uses global information to supplement local information and leverages the latter to correct the former. Further dynamic fusion of multi-level features is accomplished through the fabricated dynamic fusion module.

Multivariate Rerank

The proposed multivariate rerank algorithm. In order to make full use of the similarity matrix, we use k candidates for reverse search and to optimize the similarity results by considering multiple ranking factors. The figure shows an illustration of multivariate rerank when k = 3, using image i for retrieval.

Performance

Comparisons of Retrieval Performance on RSICD and RSITMD Testset.

-------------------------------------------------------------------------------------

IMPLEMENTATION

Installation

We recommended the following dependencies:
Python 3
PyTorch > 0.3
Numpy
h5py
nltk
yaml

file structure:
-- checkpoint    # savepath of ckpt and logs

-- data          # soorted anns of four datesets
    -- rsicd_precomp
        -- train_caps.txt     # train anns
        -- train_filename.txt # corresponding imgs
        -- test_caps.txt      # test anns
        -- test_filename.txt  # corresponding imgs
        -- images             # rsicd images here
    -- rsitmd_precomp
        ...

-- exec         # .sh file

-- layers        # models define

-- logs          # tensorboard save file

-- option        # different config for different datasets and models

-- util          # some script for data processing

-- vocab         # vocabs for different datasets

-- seq2vec       # some files about seq2vec
    -- bi_skip.npz
    -- bi_skip.npz.pkl
    -- btable.npy
    -- dictionary.txt
    -- uni_skip.npz
    -- uni_skip.npz.pkl
    -- utable.npy

-- postprocessing # multivariate rerank
    -- rerank.py
    -- file

-- data.py       # load data
-- engine.py     # details about train and val
-- test.py       # test k-fold answers
-- test_single.py    # test one model
-- train.py      # main file
-- utils.py      # some tools
-- vocab.py      # generate vocab

Note:
1. In order to facilitate reproduction, we have provided processed annotations.
2. We prepare some used file::
  (1)[seq2vec (Password:NIST)](https://pan.baidu.com/s/1FOPldSGO6ctETiXMlPGC8g)
  (2)[RSICD images (Password:NIST)](https://pan.baidu.com/s/1lH5m047P9m2IvoZMPsoDsQ)

Run: (We take the dataset RSITMD as an example)
Step1:
    Put the images of different datasets in ./data/{dataset}_precomp/images/

    --data
        --rsitmd_precomp
        -- train_caps.txt     # train anns
        -- train_filename.txt # corresponding imgs
        -- test_caps.txt      # test anns
        -- test_filename.txt  # corresponding imgs
        -- images             # images here
            --img1.jpg
            --img2.jpg
            ...

Step2:
    Modify the corresponding yaml in ./options.

    Regard RSITMD_AMFMN.yaml as opt, which you need to change is:
        opt['dataset']['data_path']  # change to precomp path
        opt['dataset']['image_path']  # change to image path
        opt['model']['seq2vec']['dir_st'] # some files about seq2vec

Step3:
    Bash the ./sh in ./exec.
    Note the GPU define in specific .sh file.

    cd exec/RSICD
    bash run_GaLR_rsicd.sh

Note: We use k-fold verity to do a fair compare. Other details please see the code itself.

Citation

If you feel this code helpful or use this code or dataset, please cite it as

Z. Yuan et al., "Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information," in IEEE Transactions on Geoscience and Remote Sensing, doi: 10.1109/TGRS.2022.3163706.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The offical PyTorch code for paper "Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information", TGRS 2022.

GaLR

Author: Zhiqiang Yuan

-------------------------------------------------------------------------------------

Welcome 👍`Fork and Star`👍, then we'll let you know when we update

-------------------------------------------------------------------------------------

INTRODUCTION

Network Architecture

DREA

MIDF

Multivariate Rerank

Performance

-------------------------------------------------------------------------------------

IMPLEMENTATION

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
__pycache__		__pycache__
data		data
detection		detection
exec		exec
figure		figure
layers		layers
logs		logs
option		option
postprocessing		postprocessing
util		util
vocab		vocab
LICENSE		LICENSE
README.md		README.md
Remote_Sensing_Cross-Modal_Text-Image_Retrieval_Based_on_Global_and_Local_Information.pdf		Remote_Sensing_Cross-Modal_Text-Image_Retrieval_Based_on_Global_and_Local_Information.pdf
data.py		data.py
engine.py		engine.py
mytools.py		mytools.py
test_ave.py		test_ave.py
test_single.py		test_single.py
train.py		train.py
utils.py		utils.py
vocab.py		vocab.py

License

xiaoyuan1996/GaLR

Folders and files

Latest commit

History

Repository files navigation

The offical PyTorch code for paper "Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information", TGRS 2022.

GaLR

Author: Zhiqiang Yuan

-------------------------------------------------------------------------------------

Welcome 👍Fork and Star👍, then we'll let you know when we update

-------------------------------------------------------------------------------------

INTRODUCTION

Network Architecture

DREA

MIDF

Multivariate Rerank

Performance

-------------------------------------------------------------------------------------

IMPLEMENTATION

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Welcome 👍`Fork and Star`👍, then we'll let you know when we update

Packages