Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requirements bump, update and improve readme #52

Merged
merged 14 commits into from
Aug 21, 2020

Conversation

scottyhq
Copy link
Collaborator

@scottyhq scottyhq commented Aug 18, 2020

first pass at tackling #49

Summary of changes:

  • updated readme and contributing.rst documentation
  • added libraries to ci/dev.yml environments, including satstac 0.4 and satsearch0.3.0rc1 for STAC 1.0
  • now have 3 example notebooks with 1.0.0beta STAC catalogs @
    1. planet disaster data
    2. landsat8 on AWS
    3. sentinel2 (using sat-search STAC-API output)
  • refactoring of catalog.py for compatibility with satstac 0.4 and STAC 1.0
  • refactoring of catalog.py so that stack_bands() function works with new item_assetsandeo` extension layout

@jhamman jhamman mentioned this pull request Aug 18, 2020
@scottyhq
Copy link
Collaborator Author

scottyhq commented Aug 18, 2020

Just hit an error running the example notebook

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/fsspec/registry.py in get_filesystem_class(protocol)
    185         try:
--> 186             register_implementation(protocol, _import_class(bit["class"]))
    187         except ImportError as e:

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/fsspec/registry.py in _import_class(cls, minv)
    200 
--> 201     mod = importlib.import_module(mod)
    202     if minversion:

~/miniconda3/envs/intake-stac-dev/lib/python3.7/importlib/__init__.py in import_module(name, package)
    126             level += 1
--> 127     return _bootstrap._gcd_import(name[level:], package, level)
    128 

~/miniconda3/envs/intake-stac-dev/lib/python3.7/importlib/_bootstrap.py in _gcd_import(name, package, level)

~/miniconda3/envs/intake-stac-dev/lib/python3.7/importlib/_bootstrap.py in _find_and_load(name, import_)

~/miniconda3/envs/intake-stac-dev/lib/python3.7/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

~/miniconda3/envs/intake-stac-dev/lib/python3.7/importlib/_bootstrap.py in _load_unlocked(spec)

~/miniconda3/envs/intake-stac-dev/lib/python3.7/importlib/_bootstrap_external.py in exec_module(self, module)

~/miniconda3/envs/intake-stac-dev/lib/python3.7/importlib/_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/fsspec/implementations/http.py in <module>
      2 
----> 3 import aiohttp
      4 import asyncio

ModuleNotFoundError: No module named 'aiohttp'
ImportError: HTTPFileSystem requires "requests" to be installed 

@jhamman - seems aiohttp is required (conda install aiohttp resolves this) but not installed automatically with more recent fsspec. should we add it to requirements.txt ?

@scottyhq
Copy link
Collaborator Author

scottyhq commented Aug 19, 2020

Reading a top-level catalog only returns a limited amount of metadata, for example,

url = 'https://raw.githubusercontent.com/cholmes/sample-stac/master/stac/catalog.json'
cat = intake.open_stac_catalog(url)
cat.metadata.keys()

dict_keys(['description', 'stac_version'])

Should we keep the entire json here? cc @jhamman @matthewhanson

furthermore list(cat) is empty, but we should be able to walk down to the URL chain to eventually get to https://raw.githubusercontent.com/cholmes/sample-stac/master/stac/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-RGB.json

@scottyhq
Copy link
Collaborator Author

@jhamman - I could use some guidance here. After some initial debugging i'm still coming up short. There is an issue with populating subcatalogs:

url = 'https://raw.githubusercontent.com/cholmes/sample-stac/master/stac/catalog.json'
intake_cat = intake.open_stac_catalog(url)
print(intake_cat.name)
list(intake_cat)
#planet-disaster-data
#[]

But using satstac directly seems to load everything:

# underlaying satstac0.4 Catalog
cat = intake_cat._stac_obj

# get first (and only in this case) sub-catalog
subcat = [c for c in cat.children()][0]

# print some IDs
print("Root Catalog: ", cat.id)
print("Sub Catalog: ", subcat.id)
print("Sub Catalog parent: ", subcat.parent().id)

# iterate through child catalogs of the sub-catalog
print("Sub Catalog children:")
for child in subcat.children():
    print('    ', child.id)

Output

Root Catalog:  planet-disaster-data
Sub Catalog:  hurricane-harvey
Sub Catalog parent:  planet-disaster-data
Sub Catalog children:
     hurricane-harvey-0831

I'm pushing two notebooks, which I've separated into 'planet' and 'landsat' test examples

@scottyhq
Copy link
Collaborator Author

Also following traceback trying to load the landsat-8-l1 example from sat-stac cat['landsat-8-l1'] :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-27a02dfc9d90> in <module>
----> 1 subcat = cat['landsat-8-l1']

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/intake/catalog/base.py in __getitem__(self, key)
    385             e._catalog = self
    386             e._pmode = self.pmode
--> 387             return e()
    388         if isinstance(key, str) and '.' in key:
    389             key = key.split('.')

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/intake/catalog/entry.py in __call__(self, persist, **kwargs)
     75             raise ValueError('Persist value (%s) not understood' % persist)
     76         persist = persist or self._pmode
---> 77         s = self.get(**kwargs)
     78         if persist != 'never' and s.has_been_persisted:
     79             from ..container.persist import store

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/intake/catalog/local.py in get(self, **user_parameters)
    279         """Instantiate the DataSource for the given parameters"""
    280         plugin, open_args = self._create_open_args(user_parameters)
--> 281         data_source = plugin(**open_args)
    282         data_source.catalog_object = self._catalog
    283         data_source.name = self.name

~/GitHub/intake-stac/intake_stac/catalog.py in __init__(self, stac_obj, **kwargs)
     37             raise ValueError('Expected %s instance, got: %s' % (self._stac_cls, type(stac_obj)))
     38 
---> 39         metadata = self._get_metadata(**kwargs.pop('metadata', {}))
     40 
     41         try:

~/GitHub/intake-stac/intake_stac/catalog.py in _get_metadata(self, **kwargs)
    191 
    192     def _get_metadata(self, **kwargs):
--> 193         metadata = self._stac_obj.properties.copy()
    194         for attr in [
    195             'title',

AttributeError: 'Collection' object has no attribute 'properties'

@scottyhq
Copy link
Collaborator Author

I think there is an issue here in that not all catalogs necessarily have 'collections', so instead this could be 'catalogs()'...

def _load(self):
"""
Load the STAC Catalog.
"""
for collection in self._stac_obj.collections():

@jhamman
Copy link
Collaborator

jhamman commented Aug 19, 2020

@scottyhq - I was running into a similar set of challenges yesterday when working on #43. Bottom line is that I think we need to reevaluate our mapping from the stac hierarchy to the intake hierarchy.

@matthewhanson had mentioned some changes in this space were coming, perhaps now is the right time to have that conversation.

@scottyhq
Copy link
Collaborator Author

Yes, seems that ItemCollection might also no longer be in core STAC radiantearth/stac-api-spec#25

class StacItemCollection(AbstractStacCatalog):
"""
Intake Catalog represeting a STAC ItemCollection
"""

@digitaltopo
Copy link

digitaltopo commented Aug 20, 2020

Can we also add panel to the environment dependencies here? Working on #34 and running intake.gui complains that panel isn't installed. Or should this go directly in intake?

@scottyhq
Copy link
Collaborator Author

scottyhq commented Aug 20, 2020

Can we also add panel to the environment dependencies here? Working on #34 and running intake.gui complains that panel isn't installed. Or should this go directly in intake?

@digitaltopo Probably best to keep holoviz libraries as optional install, so prompting the user to install in an error message if invoking catalog.gui is ok. But we can definitely add panel here https://github.com/intake/intake-stac/blob/master/ci/environment-dev-3.7.yml

@scottyhq
Copy link
Collaborator Author

scottyhq commented Aug 20, 2020

Made a bunch of progress this afternoon with a prototype for 1.0 compatibility with some small changes to catalog.py Note two notebooks in this PR branch with functional examples using 1.0.0beta1 catalogs for Landsat and Planet. Resolves #57

Currently running into an error related to new mimetypes (related to #56):

in landsat8-l1.ipynb

# Load single band into xarray data array
da = item.B4.to_dask()

giving:

--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-45d3b623eccf> in <module>
      1 # Load single band into xarray data array
----> 2 da = item.B4.to_dask()

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/intake/catalog/base.py in __getattr__(self, item)
    337             # Fall back to __getitem__.
    338             try:
--> 339                 return self[item]  # triggers reload_on_change
    340             except KeyError:
    341                 raise AttributeError(item)

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/intake/catalog/base.py in __getitem__(self, key)
    385             e._catalog = self
    386             e._pmode = self.pmode
--> 387             return e()
    388         if isinstance(key, str) and '.' in key:
    389             key = key.split('.')

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/intake/catalog/entry.py in __call__(self, persist, **kwargs)
     75             raise ValueError('Persist value (%s) not understood' % persist)
     76         persist = persist or self._pmode
---> 77         s = self.get(**kwargs)
     78         if persist != 'never' and s.has_been_persisted:
     79             from ..container.persist import store

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/intake/catalog/local.py in get(self, **user_parameters)
    278     def get(self, **user_parameters):
    279         """Instantiate the DataSource for the given parameters"""
--> 280         plugin, open_args = self._create_open_args(user_parameters)
    281         data_source = plugin(**open_args)
    282         data_source.catalog_object = self._catalog

~/miniconda3/envs/intake-stac-dev/lib/python3.7/site-packages/intake/catalog/local.py in _create_open_args(self, user_parameters)
    259                              'at https://intake.readthedocs.io/en/latest/plugin'
    260                              '-directory.html .'
--> 261                              % self._driver)
    262         elif isinstance(self._plugin, list):
    263             plugin = self._plugin[0]

ValueError: No plugins loaded for this entry: image/tiff; application=geotiff
A listing of installable plugins can be found at https://intake.readthedocs.io/en/latest/plugin-directory.html .

@scottyhq scottyhq requested a review from jhamman August 20, 2020 01:53
)

# Ultimately we get to a Collection of Catalog of Items
if lastCat:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help me understand the lastCat bit here? Will iterating over items high in the catalog hierarchy return items multiple levels down?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct. sat-utils.Catalog.items(), .catalogs(), and .collections() work recursively, but children() just gets the first level

# NOTE: quick fix, logic/syntax likely could be improved!
lastCat = True

for subcatalog in self._stac_obj.children():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a for-else clause would make sense here:

for subcatalog in self._stac_obj.children():
    self._entries[subcatalog.id] = LocalCatalogEntry(...)
else:
    # no children
    for item in self._stac_obj.items():
        self._entries[item.id] = LocalCatalogEntry(...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option is to check for items:

items = [{'rel': 'item'}.items() <= x.items() for x in self._stac_obj._data['links']]
if any(items):
    for item in self._stac_obj.items():
        self._entries[item.id] = LocalCatalogEntry(...)

@matthewhanson am I missing an easier way to do this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've gone with:

        subcatalog = None
        # load first sublevel catalog(s)
        for subcatalog in self._stac_obj.children():
            self._entries[subcatalog.id] = LocalCatalogEntry(...)
        if subcatalog is None:
            # load items under last catalog
            for item in self._stac_obj.items():

if not band_info:
raise AttributeError(
'Unable to parse "eo:bands" information from STAC Collection or Item Assets'
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for doing this!

Copy link
Collaborator

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scottyhq - this looks great. Feel free to merge when you are ready. We can ignore the doc build failure for now. I'll fix that once things stabilize on the version update.

@scottyhq scottyhq merged commit f4e636d into intake:master Aug 21, 2020
This was referenced Aug 24, 2020
@scottyhq scottyhq deleted the sat-stac-bump branch November 24, 2020 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants