-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow Simbad.query_objects & IRSA.query_region searches #3025
Comments
I would suggest separating these into two different issues, one for simbad and one for irsa. If possible including code examples, too as that would help any debugging/benchmarking as well as that way we can spot if something is used in a non-intended way (and thus can improve the docs to point out what not to do) I can say for irsa, that we totally switched out the backend, but not much has changed in the method's code, but a lot could have happened in the past 3 years on server side, etc. So an example code would also help us narrow down the problem to a useful suggestion (as e.g. new methods has been added since then) |
On the SIMBAD partIf I assume that you want the list of identifiers, the main identifier, and the positions for your 2MASS objects, then the proper way to do your query for now is with a TAP query (in the next astroquery version, this will be used behind the scenes by query_objects). Let's first generate a sample of 10k 2MASS identifiers: # let's get 10000 random 2MASS objects
from astroquery.simbad import Simbad
query = """SELECT TOP 10000 id from ident
WHERE id like '2MASS%'
"""
random_2MASS = Simbad.query_tap(query)
print(random_2MASS)
This part will be skipped for you, as you already have your own list. But you should have an astropy table with a single column with your own sample (if there are more columns you will loose upload time when we send the table to SIMBAD) We will now write the TAP query: query = """SELECT main_id, ra, dec, ids
FROM random_2MASS
JOIN ident ON ident.id = random_2MASS.id
JOIN basic ON basic.oid = ident.oidref
JOIN ids ON basic.oid = ids.oidref
"""
result = Simbad.query_tap(query, random_2MASS=random_2MASS)
It took 5.2 seconds on my machine. Query explanationWe select
You could chose more columns from The
See this help page for more explanation. An other possible speed-up for you is to be sure that you use the SIMBAD mirror closer to you (there is one in Europe and one in the USA). On Xmatch@fxpineau : you have a happy user 🙂 |
@ManonMarchand Thank you for the SIMBAD example! I've never used the tap search function before since query_objects has always worked for me up until now so this is super helpful :-) @bsipocz Here is an example for the IRSA behavior I'm noticing (particularly for the name matching using IRSA.query_region where it still seems to be using a coordinate match rather than searching using the 2MASS identifier) These are a few example 2MASS identifiers I have noticed the behavior for: 2MASS J21065473+3844265, 2MASS J21065341+3844529, 2MASS J11052903+4331357, 2MASS J05420897+1229252, 2MASS J23055131-3551130 If you run the following code:
If instead you expand the radius to 10 arcseconds using the same code above, the appropriate object is found. Perhaps I am making the same mistake here as I was with SIMBAD as @ManonMarchand pointed out and instead I should be using a TAP query? As for the time, I used IRSA.query_region to look for 16,055 objects in a loop one by one (the 16,055 is not a unique list, there are some objects repeated multiple times) which took 13 hours to run. Granted there are a few other things happening in the loop (saving the results table to a dictionary and printing out a progress report for the loop) so that is likely an exaggerated run time, but still the querying takes much longer than in astroquery 4.3. The loop looks like this:
|
I spent some time this afternoon looking into this and it seems like the Looking through the IRSA VO Table Access Protocol (TAP) Instructions there is no way to TAP query by name as there is for SIMBAD, which is kind of frustrating. I think that the old My guess is that the search result now is slower than in astroquery 4.3 due to the response time of IRSA. Based on my experience with how fast the SIMBAD.query_tap function this afternoon (which is very fast) it is interesting to me how slow the |
Sorry that I made it sound like a mistake, query tap is new since astroquery 0.4.7 for Simbad. |
If we want to dig a bit more in the SIMBAD time issue using |
Hi!
I have what might be considered an unusual use case for astroquery--I cross match (~20,000) objects with different catalogs using the Simbad, IRSA, and Xmatch queries for an instrument archive. I wrote code that completed all of this cross-matching for me several years ago and have been using it to update the archive I manage since then. I recently updated my environment moving astroquery to the newest 4.7 release, but my prior code doesn't work like it used to in astroquery 4.3 and I was wondering if something changed.
In particular:
Simbad.query_objects
has become unusable for my list of objects (~17,000) for name searching. I primarily use this function to reverse search 2MASS names returned from another cross match method to verify the integrity of the cross match. Even after increasing the timeout limit to over 24 hours, the function still failed to search all of the objects in that time. In astroquery 4.3 I did not have this issue, often a similar amount of objects were searched in 1-2 hours. I am mostly confused because if I loop through each object one at a time and search the names usingSimbad.query_object
the results are incredibly fast (done in under an hour).IRSA.query_region
using 2MASS names (I query a 5 arcsecond cone) has also become incredibly slow per object and appears to become even slower for objects later in the list, which again, I didn't struggle with in astroquery 4.3.IRSA.query_region
when using names to search for objects in the 2MASS catalog. Some of the names I put into the function return no results when searched with a radius of 5 arcsec, but show up in results if I widen the search radius to 10 arcsec. I thought by using the 2MASS names it would just return a result if the name appeared in the 2MASS catalog but it still seems like the search is still coordinate based? Is there a way to search the 2MASS catalog directly using the 2MASS name and not the coordinates?I do feel like the Xmatch function has sped up significantly since astroquery 4.3 which I love! I was just wondering if there were any changes made there that could have affected the Simbad and IRSA search functions.
The text was updated successfully, but these errors were encountered: