Improved documentation and Q2 summary visualizations; centralizing parameters #80

fedarko · 2019-09-20T05:08:21Z

An attempt to summarize new stuff:

A lot of changes to the README. I'll summarize a few of the important things here, but it's probably best if you just read over the new README.
- Reorganized things: add quick examples of running Songbird standalone + in Q2 to the top of the README, both examples using the Red Sea dataset (and using parameters that I've previously ensured make the model fit reasonably well). Both examples stress the importance of checking model fit.
- Added explicit sections on "Interpreting model fitting information", "Adjusting parameters to get reasonable fitting", and "Specifying a formula". A lot of these sections is cobbled together from stuff that was previously buried in the FAQs.
- Added screenshots of tensorboard and the q2 summarize-single visualizations for the same dataset, with the same parameters. (These URLs will be un-broken when this PR is merged in—for now, the screenshots can be viewed here (tensorboard) and here (q2).)
- Please double check to make sure that these plots look good to use as reference screenshots for "good" model fitting. I'm pretty sure they're good, but I didn't write Songbird :) If they need changing, I can change up the parameters or whatevs.
- Restructured the FAQs into separate sections: added more detailed explanations of the Q2 outputs, cleaned up some of the existing text, etc.
- Added a few sentences shilling Qurro ;) (mostly where the README is explaining what you can do with the differentials)
Modified the q2 summary visualizations to be a lot easier to interpret:
- Switched around the cv and loss plots to match tensorboard (CV on top, loss on bottom)
- Renamed "loglikehood" to loss for the now-bottom plot, to be consistent with tensorboard
- Added in-HTML text explaining:
  - the need to decrease --p-summary-interval if you don't see anything in the plots
  - how to interpret the plots (this links to the new "Interpreting..." section of the README)
  - how to adjust parameters to get models to fit reasonably (this links to the new "Adjusting..." section of the README)
- Tidied up formatting of the pseudo Q² value for paired summaries (the plots aren't uncomfortably shifted to the right now :P)
- These changes should close Improving QIIME 2 summary visualizations #72.
Removed the --i-feature-table parameter for qiime songbird summarize-single, since the number of samples is only used to compute Q² values in the paired summary. (Closes summarize-single doesn't actually need a feature table as input #76.)
- This necessitated making n an optional parameter for _summarize(). As you can see in the diff, I added some code to check for the never-should-happen case where n is None but baseline is not None (i.e. we're making a paired summary but don't have access to n for computing Q² values). _summarize() throws an error in this case.
- I also added a test to make sure that error is triggered properly.
All descriptions and defaults for all parameters (of multinomial, at least) are now stored in songbird/parameter_info.py. This centralized information is used by both the standalone and Q2 versions of Songbird, so now if you want to change a parameter you only need to change one thing.
- (This is the same way we solved this sort of problem in DEICODE and Qurro.)
- Making all of these changes messed up the styling of scripts/songbird, so instead of spending a bunch of time manually formatting it I just ran the file through black -l 79. (That's why that file looks so different in the diff.)
- Following up on that, I lowered the default summary interval from 60 seconds to 10 seconds. Since most people's first experiences with Songbird will likely be running a small model that stops quickly, using a smaller default summary interval should help them see something. (Closes Using a smaller default summary-interval #77.)

OK, I think that's the bulk of it. Please let me know if you'd like any clarification or for me to change anything. In sum, I think these changes will make using Songbird a lot more pleasant (and understandable) for many people. Thanks!

Relates to ideas brought up in biocore#72.

This should make it clearer that you need to actually use this for validating that your model is fitting properly. Even though this is an important step, I've seen many people download Songbird, run it once, and just ignore the tensorboard stuff because it doesn't look important. (I was also guilty of this for a while, so something something glass houses.) Making this more visible will help indicate to people how to use this tool properly.

As proposed in biocore#72.

Another item from biocore#72.

shouldn't cause the plots to get shifted to the right any more. also highlights the label ("Pseudo Q-squared:") in <strong> tags.

should hopefully be a lot more usable now, but we'll see if i still feel that way tomorrow morning

my fork's README took into account lisa's change, so no need to do anything fancy when resolving the merge

just in the command-line examples! this makes the indentation levels consistent. ...as it should be ;)

also added a "what will this cover" section

... i think many computers would complain if you tried to name a directory <results>

Will incorporate this into the standalone script in a sec

I also ran the standalone script through black, since I didn't want to have to reformat the entire dang thing by hand. Can run the other files through black (or add this to the Makefile) if desired.

Closes biocore#77. This should make the "default" run of Songbird produce legible-ish diagnostic plots!

The fact that "n" is now optional in _summarize() means that there's ostensibly a weird case where _summarize() could be called *with* baseline stats but without n. So I made _summarize() throw an error in that case (should never happen but might as well be defensive), and also added a corresponding test case. This closes biocore#76.

getting close! biocore#72

also updated the README summary screenshot accordingly. I think now is kosher to submit a pr for biocore#72!

the interpreting graphs stuff pertained to the old tensorboard graph; updated to reference redsea graphs

fedarko · 2019-09-20T05:13:47Z

Also, it'd probably be a good idea for multiple people to review this since this is a lot of changes, and it's important that they're correct. @lisa55asil if you'll be around tomorrow, would you mind sitting down and going over this with me? A lot of the documentation changes here involved your FAQs, so you'd probably be the most qualified person to comment on that :)

@lisa55asil

Thanks @lisa55asil for the feedback!

This opts to specify things in less detail, in favor of being *correct*. Should close biocore#79.

fedarko · 2019-09-23T22:16:21Z

Ok, Lisa's changes have been integrated in. We've both made some changes (I fixed the --num-random-test-examples thing, as well as tidied up the discussions of # of iterations). Also tried to clean up the "baseline model" description in the summarize-paired help here.

@mortonjt, pending your final approval I think this should be good to merge in.

fedarko · 2019-09-23T22:31:01Z

...made a few last-minute commits showing off using qiime metadata tabulate on a differentials file, but i think that's it from me for right now :)

lisa55asil · 2019-09-23T23:03:38Z

That's awesome! If its easy and you already have it a link to the output of tabulating the differentials.qza from the Red Sea data would be helpful I think.

…

On Mon, Sep 23, 2019 at 3:31 PM Marcus Fedarko ***@***.***> wrote: ...made a few last-minute commits showing off using qiime metadata tabulate on a differentials file, but i think that's it from me for right now :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#80?email_source=notifications&email_token=AE6EV4WU2TFHC6UYXCRBDWDQLE7SLA5CNFSM4IYSYWZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MO2TA#issuecomment-534310220>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE6EV4UX7Y5V7ROPYBL4BI3QLE7SLANCNFSM4IYSYWZA> .

@lisa55asil

based on feedback from @lisa55asil

fedarko · 2019-09-23T23:13:25Z

Ok! I'm not sure adding a link to a QZV is the best idea, since I guess that'd have to be periodically updated if we also update Songbird... but I added a screenshot of the QZV output. I think that's a good advertisement for that functionality :) (here's a link to the screenshot)

mortonjt · 2019-09-23T23:18:06Z

Noting qiime metadata tabulate is a great idea!

mortonjt · 2019-09-24T16:00:54Z

@fedarko are you finished with edits?

fedarko · 2019-09-24T18:59:39Z

Going to make one more pass over everything today. I'll try to have it done by 5pm EST. Thanks! Marcus

…

On Tue, Sep 24, 2019, 9:00 AM Jamie Morton ***@***.***> wrote: @fedarko <https://github.com/fedarko> are you finished with edits? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#80?email_source=notifications&email_token=AA736PYLQZ5MMMHB364XZY3QLI2TPA5CNFSM4IYSYWZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7O4EAA#issuecomment-534626816>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA736P4Y5QAKAM273AMFUITQLI2TPANCNFSM4IYSYWZA> .

Since a lot of people (me included) misunderstand this.

fedarko · 2019-09-24T21:01:35Z

@mortonjt I'm done! Added a small subsection clearly describing setting the reference for categorical formulas, since this has confused a lot of people (me included) in the past.

Should be ready to merge in—thanks @lisa55asil + @mortonjt for your help with this :)

lisa55asil · 2019-09-24T22:57:12Z

This is looking dope Marcus! I really like the clarification about the reference for categorical formulas. I added a short sentence that I think helps make this description more tangible.

…

On Tue, Sep 24, 2019 at 2:01 PM Marcus Fedarko ***@***.***> wrote: @mortonjt <https://github.com/mortonjt> I'm done! Added a small subsection clearly describing setting the reference for categorical formulas, since this has confused a lot of people (me included) in the past. Should be ready to merge in—thanks @lisa55asil <https://github.com/lisa55asil> + @mortonjt <https://github.com/mortonjt> for your help with this :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#80?email_source=notifications&email_token=AE6EV4XZ6EMQNRAACDYSSLDQLJ527A5CNFSM4IYSYWZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7PZK6A#issuecomment-534746488>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE6EV4WIQYV5KM4JVLNRLYLQLJ527ANCNFSM4IYSYWZA> .

-Replace remaining uses of "microbe" with "feature" -Mention that red sea dataset's features are KOs -various other small changes

fedarko · 2019-09-26T22:13:10Z

@mortonjt I'm done with editing, so provided you're cool with it I think this is ready to go. Thanks!

mortonjt · 2019-09-30T14:29:06Z

Awesome. Thanks @fedarko !

fedarko added 29 commits September 17, 2019 16:24

MAINT: Mention summary-interval in summary HTML

ee088d9

Relates to ideas brought up in biocore#72.

ENH: Change "Loglikehood" to "Loss" in summary viz

915b2f6

As proposed in biocore#72.

ENH: Switch ordering of cv/loss plots in q2 summ.s

2c4cf01

Another item from biocore#72.

ENH: Fix styling of pseudo-Q2 value display biocore#72

c881a1f

shouldn't cause the plots to get shifted to the right any more. also highlights the label ("Pseudo Q-squared:") in <strong> tags.

DOC: Drastically reorganize + clean up README

663e7b2

should hopefully be a lot more usable now, but we'll see if i still feel that way tomorrow morning

Merge branch 'master' of https://github.com/biocore/songbird

91be9fc

my fork's README took into account lisa's change, so no need to do anything fancy when resolving the merge

DOC: to be consistent, replace tabs with spaces

9dc9e82

just in the command-line examples! this makes the indentation levels consistent. ...as it should be ;)

DOC: use redsea for both q2+non-q2 examples

c5e52e7

also added a "what will this cover" section

DOC: use a better results dir name than <results>

8a81648

... i think many computers would complain if you tried to name a directory <results>

DOC: Minor README improvements

6b84689

DOC: mention need to have conda installed

0131164

DOC: clean install instrs

3707795

MNT: Unify + abstract out param descs/defaults

26d8120

Will incorporate this into the standalone script in a sec

MNT/STY: Integrate DESCS/DEFAULTS with standalone

7ee6269

I also ran the standalone script through black, since I didn't want to have to reformat the entire dang thing by hand. Can run the other files through black (or add this to the Makefile) if desired.

STY: appease flake8

e5a793c

ENH: Lower default summary interval: 60s -> 10s

d6c73c8

Closes biocore#77. This should make the "default" run of Songbird produce legible-ish diagnostic plots!

DOC: explicitly specify .qzvs

125a84b

DOC: update README re: biocore#76 being addressed

595d15d

DOC: Polish README; use proper params; screenshots

8870c8a

getting close! biocore#72

TST: Add links pointing from q2summaries to README

4c9ebe4

also updated the README summary screenshot accordingly. I think now is kosher to submit a pr for biocore#72!

DOC: Improve 'adjusting' section a lot

8e8d157

DOC: "Adjusting models" -> "Adjusting parameters"

225b636

DOC: Clean up adjusting params tldr

9d09d8c

DOC: Tidying up a bunch of prior documentation

812257f

DOC: Redsea -> Red Sea

4b87f16

DOC: Update old stuff in README

36db28f

the interpreting graphs stuff pertained to the old tensorboard graph; updated to reference redsea graphs

DOC: Clean up q2 summary command documentation

2eb3020

fedarko added 6 commits September 23, 2019 14:50

Merge branch 'lisa55asil-patch-8'

25db791

Thanks @lisa55asil for the feedback!

DOC: fix default # of random samples biocore#80

67cd0ae

DOC: Clear up "iterations" contradictions

13fe2a1

This opts to specify things in less detail, in favor of being *correct*. Should close biocore#79.

DOC: More context re: the baseline model stuff

a51bdb8

DOC: use bash highlighting [ciskip]

943c8c8

DOC: make epochs info clearer [ciskip]

1d19159

fedarko added 2 commits September 23, 2019 15:26

DOC: describe using "tabulate" on differentials

7495307

DOC: formatting for metadata tabulate example

86e71de

DOC: add screenshot of tabulated differentials

a9f3c81

based on feedback from @lisa55asil

fedarko added 3 commits September 24, 2019 13:38

DOC: Add section on setting refs for cat variables

5ee9af6

Since a lot of people (me included) misunderstand this.

DOC: clean up implicit references section

da25936

DOC: reorganize formula stuff

f39ccc0

added quick sentence about implicit reference

73aa01a

fedarko added 5 commits September 24, 2019 16:22

DOC: small wording changes

26ddfe6

DOC: punctuation [ci skip]

41f4012

DOC: add context re inputs [ci skip]

0f17485

DOC: repeat note about #q2: in more visible place

6d80919

DOC: clear up ambiguities re: feature types

0675b21

-Replace remaining uses of "microbe" with "feature" -Mention that red sea dataset's features are KOs -various other small changes

mortonjt merged commit 00df924 into biocore:master Sep 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved documentation and Q2 summary visualizations; centralizing parameters #80

Improved documentation and Q2 summary visualizations; centralizing parameters #80

fedarko commented Sep 20, 2019 •

edited

Loading

fedarko commented Sep 20, 2019

fedarko commented Sep 23, 2019 •

edited

Loading

fedarko commented Sep 23, 2019

lisa55asil commented Sep 23, 2019 via email

fedarko commented Sep 23, 2019

mortonjt commented Sep 23, 2019

mortonjt commented Sep 24, 2019

fedarko commented Sep 24, 2019 via email

fedarko commented Sep 24, 2019

lisa55asil commented Sep 24, 2019 via email

fedarko commented Sep 26, 2019

mortonjt commented Sep 30, 2019

Improved documentation and Q2 summary visualizations; centralizing parameters #80

Improved documentation and Q2 summary visualizations; centralizing parameters #80

Conversation

fedarko commented Sep 20, 2019 • edited Loading

fedarko commented Sep 20, 2019

fedarko commented Sep 23, 2019 • edited Loading

fedarko commented Sep 23, 2019

lisa55asil commented Sep 23, 2019 via email

fedarko commented Sep 23, 2019

mortonjt commented Sep 23, 2019

mortonjt commented Sep 24, 2019

fedarko commented Sep 24, 2019 via email

fedarko commented Sep 24, 2019

lisa55asil commented Sep 24, 2019 via email

fedarko commented Sep 26, 2019

mortonjt commented Sep 30, 2019

fedarko commented Sep 20, 2019 •

edited

Loading

fedarko commented Sep 23, 2019 •

edited

Loading