Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No GDAL lib (tifffile replaces) #10

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

agrouaze
Copy link
Member

@agrouaze agrouaze commented Jul 5, 2023

rationales: https://kipling.medium.com/nogdal-e5b60b114a1c
information allowing to go further towards "no gdal" reader: cgohlke/tifffile#200 (comment)

  • make tifffile working to lazy load digital numbers without resampling…(as a first step)
  • make tifffile working to lazy load digital numbers with resampling (using F Nouguier code)
  • validation with xsar sphinx documentation
  • comparison GDAL output versus tifffile

@agrouaze agrouaze marked this pull request as draft July 5, 2023 09:44
@agrouaze agrouaze self-assigned this Jul 5, 2023
@agrouaze agrouaze added the enhancement New feature or request label Jul 5, 2023
@agrouaze agrouaze requested a review from lanougue July 5, 2023 12:49
@agrouaze
Copy link
Member Author

agrouaze commented Jul 5, 2023

@lanougue Hello,
I have a working version of the reader without gdal-rasterio-rioxarray replaced by tifffile (lib proposed by ODL).
The resampling part of the code is directly inspired from xsarslc.interface .
Major concern: regarding the time to resample a GRD product:

  • with new code (using tifffile and compute_low_res_tiles):
/home/antoine/Documents/data/sentinel1/S1A_IW_GRDH_1SDV_20170907T103020_20170907T103045_018268_01EB76_992F.SAFE
(200.0, <xarray.Dataset>
Dimensions:         (pol: 2, azimuth: 839, range: 1259)
Coordinates:
 * pol             (pol) object 'VV' 'VH'
Dimensions without coordinates: azimuth, range
Data variables:
   digital_number  (pol, azimuth, range) float64 202.7 197.3 ... 71.23 75.42)
elapsed time:66.09 sec
RAM 3382.39 Mo
  • with gdal using rioxarray for resampling:
/home/antoine/Documents/data/sentinel1/S1A_IW_GRDH_1SDV_20170907T103020_20170907T103045_018268_01EB76_992F.SAFE
(200.0, <xarray.Dataset>
Dimensions:         (pol: 2, line: 838, sample: 1259)
Coordinates:
 * pol             (pol) object 'VV' 'VH'
 * line            (line) float64 9.5 29.5 49.5 ... 1.673e+04 1.675e+04
 * sample          (sample) float64 9.5 29.5 49.5 ... 2.515e+04 2.517e+04
Data variables:
   digital_number  (pol, line, sample) uint16 dask.array<chunksize=(1, 838, 1259), meta=np.ndarray>)
elapsed time:1.44 sec
RAM 195.24 Mo

Do you have a tips to improve the performance?
Minor concern: I don't know to what extent the assumption of constant pixel spacing in range can hold for SLC products. (even if it is a bit weird to resample SLC data). I would suggest to simply remove the resampling feature for SLC products in this reader library.

code to reproduce the test:

from safe_s1 import Sentinel1Reader, sentinel1_xml_mappings
import time


def getCurrentMemoryUsage():
    ''' Memory usage in kB '''

    with open('/proc/self/status') as f:
        memusage = f.read().split('VmRSS:')[1].split('\n')[0][:-3]

    return int(memusage.strip())
t0 = time.time()
ff = '/home/antoine/Documents/data/sentinel1/S1A_IW_SLC__1SDV_20220507T162437_20220507T162504_043107_0525DE_B14E.SAFE'
strg = 'SENTINEL1_DS:'+ff+':IW1'
ff = '/home/antoine/Documents/data/sentinel1/S1A_IW_GRDH_1SDV_20170907T103020_20170907T103045_018268_01EB76_992F.SAFE'
strg = ff
print(strg)
reader = Sentinel1Reader(strg)
dt = reader.datatree
#print('image',dt['image'].ds)
chunks = {'line':1000,'sample':3000}
# test without resampling
#########################
# dn = reader.load_digital_number(chunks=chunks)
# test with resampling
######################
dn = reader.load_digital_number(chunks=chunks,resolution='200m')
print(dn)
print('elapsed time:%1.2f sec'%(time.time()-t0))
mem = getCurrentMemoryUsage()
print('RAM %1.2f Mo'%(mem/1000.))

@lanougue
Copy link

lanougue commented Jul 5, 2023

Using xsarslc.interface way to resample SLC data is an extremely bad way to proceed for doing what you want. The basic assumption of xsarslc.interface is to assume that spacing is constant per tile. This is thus a bad way for full burst processing. Having a SLC -> GRD-like efficient algorithm is part of the thing I have to do (I already have it done for SWOT data)

@lanougue
Copy link

lanougue commented Jul 5, 2023

I suggest to remove resampling capability in SLC reader for now. Resampling can easily be done after SLC -> GRD-ilke processing

@agrouaze
Copy link
Member Author

agrouaze commented Jul 5, 2023

Any suggestion regarding an efficient way to do the resampling for GRD?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants