Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul registry interface #289

Merged
merged 49 commits into from
Jun 3, 2022
Merged

Conversation

msdemlei
Copy link
Contributor

@msdemlei msdemlei commented Jan 13, 2022

This PR refactors the Registry interface to enable advanced, data discovery-like discovery queries interesting in particular in interactive (notebook) usage; the RegTAP queries are now constructed under the hood by Constraint objects. The legacy constraint specification using keyword arguments is translated to the new scheme transparently (or so I hope).

This also adds constraints for space, time, and spectrum using features from the draft for RegTAP 1.2.

A bit more on the rationale for this is discussed on https://blog.g-vo.org/towards-data-discovery-in-pyvo.html, where you will also find a jupyter notebook showcasing some of the new features.

This is intended to address bugs #278, #238, and #176.

@codecov
Copy link

codecov bot commented Jan 13, 2022

Codecov Report

Merging #289 (47026f1) into main (0702f29) will increase coverage by 1.81%.
The diff coverage is 86.05%.

@@            Coverage Diff             @@
##             main     #289      +/-   ##
==========================================
+ Coverage   76.18%   78.00%   +1.81%     
==========================================
  Files          44       45       +1     
  Lines        5119     5423     +304     
==========================================
+ Hits         3900     4230     +330     
+ Misses       1219     1193      -26     
Impacted Files Coverage Δ
pyvo/registry/regtap.py 78.24% <70.18%> (+26.90%) ⬆️
pyvo/registry/rtcons.py 97.68% <97.68%> (ø)
pyvo/registry/__init__.py 100.00% <100.00%> (ø)
pyvo/dal/exceptions.py 78.37% <0.00%> (+1.35%) ⬆️
pyvo/utils/formatting.py 100.00% <0.00%> (+47.36%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@bsipocz bsipocz added this to the v1.3 milestone Jan 14, 2022
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only comment on the pieces that would fix the currently present CI failures, and leave the actual content review for those who know more about the VO internals.

CHANGES.rst Outdated Show resolved Hide resolved
pyvo/registry/rtcons.py Show resolved Hide resolved
@@ -314,25 +486,120 @@ def standard_id(self):
"""
return self.get("standard_id", decode=True)
Copy link
Contributor

@zoghbi-a zoghbi-a Jan 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the method for standard_id be similar to access_url, where it returns the first element of standard_ids if it has one element, otherwise it fails?

An example code like:

services = pyvo.regsearch(servicetype='tap', keywords=['heasarc'])
services[0].describe()

fails in describe because self.standard_id is None.

If def standard_id is modified to something like access_url:

    @property
    def standard_id(self):
        """
        the IVOA standard identifier
        """
        #return self.get("standard_id", decode=True)
        standard_ids = list(set(self["standard_ids"]))
        if len(standard_ids)==1:
            return standard_ids[0]
        else:
            raise dalq.DALQueryError(
                "No unique standard_id.  Use get_service.")

and the call to standard_id in describe is changed to:

# stdid = self.get("standard_id", decode=True).lower()
stdid = self.standard_id

Things seem to work.

@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 17, 2022 via email

@zoghbi-a
Copy link
Contributor

Aw, dang, I had forgotten about .describe(). Actually, that has been broken for multi-capability resources before, too. So, I've given it a bit of a facelift in commit 646d5f00. This also includes an update to standard_id as you suggest, but that wouldn't have been enough, as describe's attempt to read the access URL would now fail for multi-capability resources. I hope describe's output -- human-oriented as it is -- won't count against the API and hence can be changed within reason without breaking things. Except that of course it must still work for humans. Does what there's now still work for you?

That works now. Thanks.

self.stdids.add(SERVICE_TYPE_MAP[std])
elif "://" in std:
self.stdids.add(std)
else:
Copy link
Contributor

@zoghbi-a zoghbi-a Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std is checked only in the keys of SERVICE_TYPE_MAP, not the values.
Previously, it checks both the keys and values.

So now, a search like:

services = vo.regsearch(servicetype='conesearch')

doesn't work, where it previously used to work. servicetype='scs' works. I am not sure if this is the intended behavior. I think it will be useful to allow the search with both sets of words.

The function accepts query constraints either as Constraint objects
passed in as positional arguments or as their associated keywords.
For what constraints are available, see TODO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of constraints is a TODO. Opening a case here so we don't forget it.

@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 18, 2022 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 18, 2022 via email

@zoghbi-a
Copy link
Contributor

Ah, right, I had forgot that. What I'd want to reference here is the local equivalent of http://docs.g-vo.org/temporary-pyvo/registry/index.html#basic-interface; is there any sphinx buff here that can tell me what to write in the docstring to have a robust link that works regardless of where the rendering ends up in?

Thanks.
I know little about sphinx to answer. Others may be able to help.

[r.short_name for r in self],
[r.res_title for r in self],
[r.res_description for r in self],
[", ".join(sorted(r.access_modes())) for r in self]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the rational may be to make the returned table as simple as possible, and as a user, I may also be interested in the ivoid in the returned table. Is it possible to add it?

@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 19, 2022 via email

@zoghbi-a
Copy link
Contributor

So, I think before we rack our heads about how we can attach some provenance-like info on the table, we ought to be clear on what workflow (or whatever) we want to enable by attaching it. Do you perhaps have something like a user story where such an information might be relevant? Or more simply: What made you miss the ivoid?

My understanding is that this function is to meant to make it easy to summarize the content of the registry records, so I was thinking that adding ivoid provides more information in this summary.

My use case here is that some of the datasets may be available from multiple providers (chandra data in mind here), so by including the ivoid, it helps quickly see the source of the data.

@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 20, 2022 via email

@zoghbi-a
Copy link
Contributor

zoghbi-a commented Jan 21, 2022

What about adding a reference to the originating RegistryRecord instance as a weakref in an origin attribute?

I don't see how a reference helps, because one can access the RegistryRecord directory if needed. I was more looking for an easy way to access or display the content without looping through the individual records.

Also, to backup a little, my suggestions stem from the fact that such feature (ivoid in the table) has been used in the navo-pyvo workshops that are conducted at the AAS.

As an alternative, would it be possible for example if to_table had a parameter (extra or extended) that if set to True, it returns a table with more columns, and when False (default), it returns just what we have now.

"reference_url",
"creator_seq",
"content_type",
"source_format",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't source_value be here too? (or maybe in to_table). I am not sure about their exact meaning, but it is another thing that breaks few cells in the tutorial notebooks.

@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 24, 2022 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 24, 2022 via email

msdemlei added a commit to msdemlei/pyvo that referenced this pull request Jan 24, 2022
msdemlei added a commit to msdemlei/pyvo that referenced this pull request Jan 24, 2022
This points back to the parent instance.

This is an attempt to address
astropy#289 (comment)
@zoghbi-a
Copy link
Contributor

zoghbi-a commented Jan 25, 2022

Hm -- do you have some sample code showing what you're trying to do? Be that as it may, I've tentatively added an .origin attribute to these VOSI tables in commit b111e8b. This seems a reasonable thing to do anyway, and it's basically free. I'm happy to pull it again if it doesn't help.

An example code may look something like the following:

uv_services = vo.regsearch(servicetype='image',keywords='galex', waveband='uv')
uv_services.to_table()['ivoid','short_name']

The second line fails because ivoid is not included in the table, unlike the previous version.

The solution may be to loop through the records with something like:

for s in uv_services:
     print(f'{s.ivoid} {s.short_name}')

but my understanding is that the to_table method is there to avoid looping through the records.

Copy link
Contributor

@zoghbi-a zoghbi-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are additional small suggestions that I came across testing the code as a user who doesn't know the deep details of the vo.

resource.
* :py:class:`pyvo.registry.Servicetype` (``servicetype``): constrain to
one of tap, ssa, sia, conesearch. This is the constraint you want
to use for service discovery.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or their local parts of the ivoid (ssap, image, scs)


>>> registry.Spatial([23, -40, 26, -39, 25, -43])

To find resources claiming to cover a MOC, pass an ASCII MOC::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a link to the MOC document. Is this the one? https://www.ivoa.net/documents/MOC/

>>> registry.Spatial(SkyCoord("23d +3d"))

SkyCoords also work as circle centers::

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... where the radius is in degrees.

A spectral point or interval to cover. This must be a wavelength,
a frequency, or an energy, or a pair of such quantities,
in which case the argument is interpreted as an interval.
All resources *overlapping* the interval are returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems to suggest that a float or int is assumed to be in joules. Maybe worth adding a sentence on that here.

webbrowser.open(self.access_url, 2)


class Interface:
Copy link
Contributor

@zoghbi-a zoghbi-a Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a __repr__ method to Interface ?
The example in mind is an interactive session with something like:

services[0].interfaces[0].to_service()

prints: <pyvo.dal.ssa.SSAService at 0x...>, so I know this is an SSA service, but this:

services[0].interfaces[0]

prints: <pyvo.registry.regtap.Interface at 0x...>, which is somewhat obscure.

@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 26, 2022 via email

msdemlei added a commit to msdemlei/pyvo that referenced this pull request Jan 26, 2022
Also, adding a __repr__ method to regtap.Interface that returns a
constructor call literal.

This is in response to
astropy#289 (review)
@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 26, 2022 via email

@zoghbi-a
Copy link
Contributor

I'm tempted to (a) call current to_table something else, perhaps "show" or "overview" and thus un-hide previous to_table. The advantage here would be that RegistryResults.to_table behaves as expectable from DALResults. The disadvantage is that I'd consider the output essentially unreadable in interactive applications. The alternative would be to (b) just paste the ivoid column to the right end of the current table. The advantage would be that to_table's return value is still halfway readable. The disadvantage would be that if someone wants to do some magic on the full result table, that wouldn't work. I'm leaning towards (a). Opinions?

Apologies. I may have not explained the issue properly.
I, too, like solution (a).

@msdemlei
Copy link
Contributor Author

msdemlei commented Jan 27, 2022 via email

msdemlei added a commit to msdemlei/pyvo that referenced this pull request Jan 27, 2022
This is because overwriting DALResults.to_table proved a bad idea
(see astropy#289 (comment))
@bsipocz
Copy link
Member

bsipocz commented Jun 1, 2022

Version bump to 4.1 is done here in this PR.

The issue I opened to discuss the policy (whether there should be one?) about dropping version supports, either on release timeframe or on need basis is this one #336

The messenger_vocabulary fixture should have made them non-remote, but
I've never actually tested that.  It now turns out that astropy.data doesn't
use requests and hence my nice requests mocker just didn't work.

I've now replaced it by code seeding astropy.data's cache.  This is a
pattern we ought to use whenever code uses IVOA vocabularies.  I wonder
where we should document this?
@msdemlei
Copy link
Contributor Author

msdemlei commented Jun 2, 2022 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented Jun 2, 2022 via email

@tomdonaldson
Copy link
Contributor

@msdemlei , based on the current test failures, is it possible that the new commonfixtures. was omitted from the commit?

@bsipocz
Copy link
Member

bsipocz commented Jun 2, 2022

(1) the example starting with tables = resources[4].get_tables()
fails, and unless I misunderstand pytest's output, that's because it
throws warnings unrelated to pyVO (I've talked to VizieR about them).
Before I start digging: What's the idiomatic way to shut these
warnings up in doctests like these?

For warning that are totally unrelated and we shouldn't do anything about them, you can use the following at the end of each line producing the warning:
# doctest: +IGNORE_WARNINGS

@bsipocz
Copy link
Member

bsipocz commented Jun 2, 2022

(2) there are two examples for "all-VO" querying. I suggest to
keep them switched off for testing/CI -- they hit a lot of services,
several of which will always be down, so it's unlikely they'll ever
pass. Sure: One could restrict that to a subset of known-reliable
services, but even then all that is a really expensive thing that
doesn't actually exercise registry but the DAL services on the end of
the providers, so that doesn't belong here.

The issue with those that the current timeout is not doctest related, they would timeout on my python terminal if I were a user.
So maybe summarise what you said above in a short sentence in the docs, that the command is expected to fail as it hits so many services, (and then skip the testing).

@bsipocz
Copy link
Member

bsipocz commented Jun 2, 2022

On the other hand, I'd really like to show how this works in the docs
in plain code, so I'd hate to take the examples out, too.

The issue with untested examples is that over time they can, and most certainly will go out of sync with the code, so we should have them sparingly.


This will ignore VOSI (infrastructure) services.
"""
return set(shorten_stdid(intf.standard_id) or "web"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msdemlei - would you consider ordering the output here? #334, I would not block merging the PR on it, but if it's something to consider ideally it should be done before it lands in a release.

msdemlei and others added 2 commits June 3, 2022 09:34
(sorry; also, I'm busy elsewhere, so I'll address the remaining issues on
Monday)
@tomdonaldson tomdonaldson mentioned this pull request Jun 3, 2022
Copy link
Contributor

@tomdonaldson tomdonaldson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready for merging. We have a couple of related cleanup issues to consider: #334 and #337.

@msdemlei
Copy link
Contributor Author

msdemlei commented Jun 9, 2022 via email

@msdemlei
Copy link
Contributor Author

msdemlei commented Jun 9, 2022 via email

@bsipocz
Copy link
Member

bsipocz commented Jun 9, 2022

The issue with mocking is that it may not pick up all the real issues.
There are "slow" directives for pytest, for akaik it is not yet supported doctests (maybe they should though).

E.g these examples timeouted for me, suggesting that maybe it's enough to mention or show to users how to work around it.

@bsipocz
Copy link
Member

bsipocz commented Jan 24, 2023

@msdemlei - in your OP you list a couple of issues this intended to fix. If it indeed did so, could you please close the issues, too?

This is intended to address bugs #278, #238, and #176.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants