Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved documentation and Q2 summary visualizations; centralizing parameters #80

Merged
merged 61 commits into from
Sep 30, 2019

Conversation

fedarko
Copy link
Collaborator

@fedarko fedarko commented Sep 20, 2019

An attempt to summarize new stuff:

  • A lot of changes to the README. I'll summarize a few of the important things here, but it's probably best if you just read over the new README.

    • Reorganized things: add quick examples of running Songbird standalone + in Q2 to the top of the README, both examples using the Red Sea dataset (and using parameters that I've previously ensured make the model fit reasonably well). Both examples stress the importance of checking model fit.
    • Added explicit sections on "Interpreting model fitting information", "Adjusting parameters to get reasonable fitting", and "Specifying a formula". A lot of these sections is cobbled together from stuff that was previously buried in the FAQs.
    • Added screenshots of tensorboard and the q2 summarize-single visualizations for the same dataset, with the same parameters. (These URLs will be un-broken when this PR is merged in—for now, the screenshots can be viewed here (tensorboard) and here (q2).)
    • Please double check to make sure that these plots look good to use as reference screenshots for "good" model fitting. I'm pretty sure they're good, but I didn't write Songbird :) If they need changing, I can change up the parameters or whatevs.
    • Restructured the FAQs into separate sections: added more detailed explanations of the Q2 outputs, cleaned up some of the existing text, etc.
    • Added a few sentences shilling Qurro ;) (mostly where the README is explaining what you can do with the differentials)
  • Modified the q2 summary visualizations to be a lot easier to interpret:

    • Switched around the cv and loss plots to match tensorboard (CV on top, loss on bottom)
    • Renamed "loglikehood" to loss for the now-bottom plot, to be consistent with tensorboard
    • Added in-HTML text explaining:
      • the need to decrease --p-summary-interval if you don't see anything in the plots
      • how to interpret the plots (this links to the new "Interpreting..." section of the README)
      • how to adjust parameters to get models to fit reasonably (this links to the new "Adjusting..." section of the README)
    • Tidied up formatting of the pseudo Q2 value for paired summaries (the plots aren't uncomfortably shifted to the right now :P)
    • These changes should close Improving QIIME 2 summary visualizations #72.
  • Removed the --i-feature-table parameter for qiime songbird summarize-single, since the number of samples is only used to compute Q2 values in the paired summary. (Closes summarize-single doesn't actually need a feature table as input #76.)

    • This necessitated making n an optional parameter for _summarize(). As you can see in the diff, I added some code to check for the never-should-happen case where n is None but baseline is not None (i.e. we're making a paired summary but don't have access to n for computing Q2 values). _summarize() throws an error in this case.
    • I also added a test to make sure that error is triggered properly.
  • All descriptions and defaults for all parameters (of multinomial, at least) are now stored in songbird/parameter_info.py. This centralized information is used by both the standalone and Q2 versions of Songbird, so now if you want to change a parameter you only need to change one thing.

    • (This is the same way we solved this sort of problem in DEICODE and Qurro.)
    • Making all of these changes messed up the styling of scripts/songbird, so instead of spending a bunch of time manually formatting it I just ran the file through black -l 79. (That's why that file looks so different in the diff.)
    • Following up on that, I lowered the default summary interval from 60 seconds to 10 seconds. Since most people's first experiences with Songbird will likely be running a small model that stops quickly, using a smaller default summary interval should help them see something. (Closes Using a smaller default summary-interval #77.)

OK, I think that's the bulk of it. Please let me know if you'd like any clarification or for me to change anything. In sum, I think these changes will make using Songbird a lot more pleasant (and understandable) for many people. Thanks!

This should make it clearer that you need to actually use this
for validating that your model is fitting properly.

Even though this is an important step, I've seen many people download
Songbird, run it once, and just ignore the tensorboard stuff because
it doesn't look important. (I was also guilty of this for a while, so
something something glass houses.) Making this more visible will
help indicate to people how to use this tool properly.
shouldn't cause the plots to get shifted to the right any more.

also highlights the label ("Pseudo Q-squared:") in <strong> tags.
should hopefully be a lot more usable now, but we'll see if i still
feel that way tomorrow morning
my fork's README took into account lisa's change, so no need
to do anything fancy when resolving the merge
just in the command-line examples! this makes the indentation levels
consistent.

...as it should be ;)
also added a "what will this cover" section
... i think many computers would complain if you tried to name a
directory <results>
Will incorporate this into the standalone script in a sec
I also ran the standalone script through black, since I didn't
want to have to reformat the entire dang thing by hand. Can run
the other files through black (or add this to the Makefile)
if desired.
Closes biocore#77. This should make the "default" run of Songbird
produce legible-ish diagnostic plots!
The fact that "n" is now optional in _summarize() means that there's
ostensibly a weird case where _summarize() could be called *with*
baseline stats but without n. So I made _summarize() throw an error
in that case (should never happen but might as well be defensive),
and also added a corresponding test case.

This closes biocore#76.
also updated the README summary screenshot accordingly.

I think now is kosher to submit a pr for biocore#72!
the interpreting graphs stuff pertained to the old tensorboard graph;
updated to reference redsea graphs
@fedarko
Copy link
Collaborator Author

fedarko commented Sep 20, 2019

Also, it'd probably be a good idea for multiple people to review this since this is a lot of changes, and it's important that they're correct. @lisa55asil if you'll be around tomorrow, would you mind sitting down and going over this with me? A lot of the documentation changes here involved your FAQs, so you'd probably be the most qualified person to comment on that :)

@fedarko
Copy link
Collaborator Author

fedarko commented Sep 23, 2019

Ok, Lisa's changes have been integrated in. We've both made some changes (I fixed the --num-random-test-examples thing, as well as tidied up the discussions of # of iterations). Also tried to clean up the "baseline model" description in the summarize-paired help here.

@mortonjt, pending your final approval I think this should be good to merge in.

@fedarko
Copy link
Collaborator Author

fedarko commented Sep 23, 2019

...made a few last-minute commits showing off using qiime metadata tabulate on a differentials file, but i think that's it from me for right now :)

@lisa55asil
Copy link
Contributor

lisa55asil commented Sep 23, 2019 via email

@fedarko
Copy link
Collaborator Author

fedarko commented Sep 23, 2019

Ok! I'm not sure adding a link to a QZV is the best idea, since I guess that'd have to be periodically updated if we also update Songbird... but I added a screenshot of the QZV output. I think that's a good advertisement for that functionality :) (here's a link to the screenshot)

@mortonjt
Copy link
Collaborator

Noting qiime metadata tabulate is a great idea!

@mortonjt
Copy link
Collaborator

@fedarko are you finished with edits?

@fedarko
Copy link
Collaborator Author

fedarko commented Sep 24, 2019 via email

@fedarko
Copy link
Collaborator Author

fedarko commented Sep 24, 2019

@mortonjt I'm done! Added a small subsection clearly describing setting the reference for categorical formulas, since this has confused a lot of people (me included) in the past.

Should be ready to merge in—thanks @lisa55asil + @mortonjt for your help with this :)

@lisa55asil
Copy link
Contributor

lisa55asil commented Sep 24, 2019 via email

@fedarko
Copy link
Collaborator Author

fedarko commented Sep 26, 2019

@mortonjt I'm done with editing, so provided you're cool with it I think this is ready to go. Thanks!

@mortonjt
Copy link
Collaborator

Awesome. Thanks @fedarko !

@mortonjt mortonjt merged commit 00df924 into biocore:master Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants