Author: Zhiqiang yuan @ AIR CAS, Send a Email
A simple project for text-to-image remote sensing image generation
,
and we will release the code of using multiple text to control regions for super-large RS image generation
later.
Also welcome to see the project of image-condition fake sample generation in TGRS, 2023.
Follow and thanks original training repo .
We used RS image-text dataset RSITMD as training data and fine-tuned stable diffusion for 10 epochs with 1 x A100 GPU. When the batchsize is 4, the GPU memory consumption is about 40+ Gb during training, and about 20+ Gb during sampling. The pretrain weights is realesed at last-pruned.ckpt.
Download the pretrain weights last-pruned.ckpt
to current dir, and run with:
python scripts/txt2img.py \
--prompt 'Some boats drived in the sea' \
--outdir 'outputs/RS' \
--H 512 --W 512 \
--n_samples 4 \
--config 'configs/stable-diffusion/RSITMD.yaml' \
--ckpt './last-pruned.ckpt'
Put images of RSITMD in data/RSITMD/images
, and run with:
python main.py \
-t \
--base configs/lammbda/RSITMD.yaml \
--gpus 1 \
--scale_lr False \
--num_nodes 1 \
--check_val_every_n_epoch 10 \
--finetune_from './last-pruned.ckpt'
Caption: Some boats drived in the sea.
Caption: A lot of cars parked in the airport.
Caption: A large number of vehicles are parked in the parking lot, next to the bare desert.
Caption: There is a church in a dark green forest with two table tennis courts next to it.