Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First photutils notebook: background estimation #1

Merged
merged 16 commits into from
Sep 6, 2018

Conversation

laurenmarietta
Copy link
Contributor

@laurenmarietta laurenmarietta commented Jun 28, 2018

Added the first photutils tutorial, for background estimation, as well as some supporting files (the photutils logo and the matplotlib style file). Updated the .gitignore to exclude the other notebooks, which are still in development.

Please provide feedback, comments, critique, etc.!

@eteq eteq self-requested a review July 19, 2018 15:13
Copy link
Owner

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurenmarietta - A great start on this sequence of notebooks!

Note that I could not actually use the inline commenting for the notebook itself, because of the first item I mention below. So for now to not slow us any further I'll just leave comments as separate paragraphs below. But in the future it's good to do the first thing there so that we can use inline github comments. Or if it'll be easier for back-and-forth commenting, you can do that change first and then I can move those comments to be inline.

Initial: In the notebook as you've checked it in, you've executed all the cells. That means the notebook is 8MB, primarily because of all the images. This is not desirable because git isn't meant for tracking content changes like that. Instead, you should check into git the unevaluated notebook (i.e. do a "clear outputs" before the git add). That's a much better file both for reviewing on github and for storing in git. Then when the notebooks are actually hosted somewhere, we automatically run them. That's also a good check because it determines if the notebook actually works as written, rather than only working because you did something in a cell you then later deleted (for example). Going forward we'll need to rebase this so that it doesn't take up a bunch of stuff in the repo, but not urgently (i.e. we can talk about that later).

Intro: Not sure how much we care about this... but it might be good to say just a bit about what the image is - that is, a "this is an ACS image of ". That both serves the purpose of making it clear that these data are space telescope data while at the same time attracting interest by the average astronomer who likely wants to know of a relevant science context.

Cell [1]: I don't think the import matplotlib is needed, nor the LogNorm, as it doesn't look like you use it elsewhere?

Cell [2]: This is good, but FYI I think in the future you'll want that to say something like "a shared style file over all the notebooks". But it's good as-is for now since there's currently jsut the one.

Words above Cell [3]]: this might also be a good place to give the science context (instead of the "intro" part above) - it also might be good to point out that this is a High Level Science Product rather than a raw image from the archive (otherwise we'd suggest using astroquery since that's more robust to changes in the URLs, but since it's a HLSP it makes sense to do it this way).

Cell In[4]: Is there a specific reason you're using for-loops here? I think this will work in it's place (and a lot faster):

mask = data == 0
n_data_pixels = len(data[~mask])
background = np.linspace(-1e-4, 5e-4, num=n_data_pixels)

modified_data = np.copy(data)
modified_data[~mask] = background[:]

Cell In[6], and others below: The plt.show() at the end isn't necessary - that happens automatically with the inline backend (and in fact the show sometimes causes the plot to show up twice depending on how your mpl is setup).

Above Cell In[11]: you might want to define what you mean by "scalar" - I know you mean "in contrast to 2D", but the reader who's only made it this far might not quite get that distinction yet. Just a sentence or two would be fine, perhaps referencing the upcoming section.

Cell In[15]: Instead of re-writing the data, I think this does what you want: xdf_scalar_bkgdsub = xdf_image.subtract(mean*u.ct/u.s) - I know it's a bit weird that this isn't the - operation, but that's intentional (I don't recall the motivation for sure, but I think it's got to do with different ways to propagate uncertainties).

Cell In[16]: There's a subtlety that's not obvious here: that the reason why they look different is that your stretch is a fixed range instead of "minmax" or the like. It might be worth mentioning something about this - when I first saw that I thought "shouldn't they look the same since the range is the same, just offset by a fixed amount?". Then again, maybe the fact that I saw that and then figured it out on my own is good. So your call if you want to modify the text at all, but thought I'd mention it.

Cell In[19]: I love this plot! Don't change a thing about it. 😉

Cell In[20]: Same as In[15].

The rest looks great! One possible suggestion for something to add at the end: do another sigma_clipped_stats on the 2D-subtracted one (or show the histogram). I speculate the new std will be significantly lower, demonstrating the whole point. Or alternatively, this could be a third exercise.

.gitignore Outdated
@@ -1,3 +1,7 @@
.pyc
.ipynb_checkpoints/
*.nbconvert.ipynb
photutils/dev/*
photutils/02_photutils_source_detection.ipynb
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bottom 3 of these should not be in the .gitignore file - I'm guessing you're ignoring them temporarily, but eventually we want them in there. Or imagine if you're bought out by google but then someone else later wants to implement this - they won't know they're supposed to delete this.

Instead, I suggest you have 3 other branches of this repo, and implement the 3 separately on those branches. That way you can keep track of their version history completely separately, and not worry about ignoring it in one place but not others. (maybe you're doing that already, in which case you can just drop this from the .gitignore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(copying from Slack conversation)

What I’ve been doing so far is I have one branch, just called photutils, where I have been developing all of the notebooks. I just created this PR branch, photutils-background-estimation, off of that development branch to ultimately merge the first notebook into master.

You suggest having one branch per notebook, but I’ve found that pretty frequently I need to make changes to multiple or all of the notebooks, not just one at a time. Do you think it sounds like a feasible workflow for me to keep this main development branch, then I can just branch off and delete the irrelevant notebooks whenever I submit a PR?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not many people will be looking at this other than us, but just in case... we concluded in an out-of-band discussion that we should change to a separate branch per notebook and make separate commits in the individual branches.

@laurenmarietta
Copy link
Contributor Author

Intro: Not sure how much we care about this... but it might be good to say just a bit about what the image is - that is, a "this is an ACS image of ". That both serves the purpose of making it clear that these data are space telescope data while at the same time attracting interest by the average astronomer who likely wants to know of a relevant science context.

Would you say that the description I provide before downloading the data (before [3]) isn't sufficiently clear?

otherwise we'd suggest using astroquery since that's more robust to changes in the URLs, but since it's a HLSP it makes sense to do it this way

Just to be clear, we could use astroquery to get the raw data from MAST, but since this file is a HLSP, it isn't available on MAST and we have to go straight to the archive.stsci.edu URL. Which astroquery cannot do. Is that correct?

modified_data[~mask] = background[:]

This is so much more clever than what I had - thank you!

do another sigma_clipped_stats on the 2D-subtracted one (or show the histogram). I speculate the new std will be significantly lower, demonstrating the whole point.

I had actually tried to do this before, but I found that the results weren't as exciting as I would have thought. The data before background subtraction has a standard deviation of 7.51e-4, and after it is 7.31e-4. I was unconvinced that this was a big enough difference to make any claims about, but if you think it is worth it, I can add it as an exercise.

@eteq
Copy link
Owner

eteq commented Aug 7, 2018

@laurenmarietta

Would you say that the description I provide before downloading the data (before [3]) isn't sufficiently clear?

I think it was clear, but might just need a sentence or two of motivation along the lines of "... of exposure time. While some of the faintest and most distant galaxies known can be found in this image, background subtraction is essential for science like that the later notebooks demonstrate." (or the last clause could be tweaked a bit to be more specific but hopefully that gets the idea across). That said, I guess the biggest thing would be just be for it to come earlier rather than at the time of download.

Just to be clear, we could use astroquery to get the raw data from MAST, but since this file is a HLSP, it isn't available on MAST and we have to go straight to the archive.stsci.edu URL. Which astroquery cannot do. Is that correct?

That is correct as far as I know. It may be that MAST has some API (or upcoming API) to discover HLSP's - if so we might consider switch to that to find the URL (since URLs sometimes change). I'll ask someone to make sure but I think that doesn't exist right now.

I had actually tried to do this before, but I found that the results weren't as exciting as I would have thought. ...

Hmm, ok, I see your point. I guess the background variation isn't enough to be very statistically strong relative to the actual real sources. You might get around this by just increasing your "inserted" background? The exercise could even be "try it. notice it's small. does this make sense? try to make it big." We're "leaving it as an exercise for the reader" 😉

@laurenmarietta
Copy link
Contributor Author

Okay, I tried to implement all the suggestions you gave!

The main part I had trouble with was the scientific motivation - mainly because we are using an HLSP, and thus it isn't /really/ necessary to perform background estimation on it (right?). I actually don't include any sort of background subtraction in subsequent notebooks after downloading the data, because the background has already been removed. So the sentence that I added, "Background subtraction is essential for accurate photometric analysis of astronomical data like the XDF," feels a little contrived. But maybe I'm just overthinking it?

laurenmarietta pushed a commit to laurenmarietta/csi-stsci-notebooks that referenced this pull request Aug 15, 2018
laurenmarietta pushed a commit to laurenmarietta/csi-stsci-notebooks that referenced this pull request Aug 15, 2018
@eteq
Copy link
Owner

eteq commented Sep 6, 2018

@laurenmarietta - looks great now aside from some very minor things (eb15247) , so merging!

@eteq eteq merged commit 80103c5 into eteq:master Sep 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants