Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify additional configuration #4

Open
ccunningham101 opened this issue Mar 15, 2022 · 7 comments
Open

Specify additional configuration #4

ccunningham101 opened this issue Mar 15, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@ccunningham101
Copy link

The current system will generate a decile chart for every file matching this measure regex
Users may want to be able to provide

  1. A more granular regex/glob specification (such as only measures files that contain the word "practice" in them)
  2. Configuration information for individual or groups of charts (i.e. all measures with qof in the filename will have a ylabel of "Percentage" whereas others will have "Rate")
    • Some of this information may not be easily parsable from the measures file name, so we could provide it in the yaml configuration

Per file:

qof:
    input_file: "measure_qof_practice_rate.csv"
    title: "QOF Achievement by Deciles"
    ylabel: "Percentage"
ssri:
    input_file: "measure_ssri_practice_rate.csv"
    title: "SSRI Use by Decile"
    ylabel: "Rate"

or

To allow a combo of per-file and group with regex
For regex could

  1. Have internal code to parse the measure filename to determine parameters that would not be the same for the whole group (slightly different titles)
  2. If that is not possible, it may be easier to have code to generate the yaml configuration or just require per-file specification
qof:
    input_file: "measure_qof_*_rate.csv"
    ylabel: "Percentage"
ssri:
    input_file: "measure_ssri_practice_rate.csv"
    title: "SSRI Use by Decile"
    ylabel: "Rate"
@iaindillingham
Copy link
Member

I think there are three sub-issues within this issue:

  1. Specify an input glob pattern (e.g. --input-files) rather than an input directory (--input-dir)
  2. Specify additional configuration e.g. title, ylabel, etc. Values would be the same for all files matched by the input glob pattern or the input directory
  3. Per-group invocation. Each group would consist of an input glob pattern or an input directory and additional configuration. deciles-charts would be invoked for each group

I think the first sub-issue is the same as #7. Let's move discussion from here to there. I think the second sub-issue needs more information about what additional configuration we'd like to specify. We could use Vega-Lite's config object for inspiration. Let's keep discussion here; I'll change the name of ths issue. Finally, I think the third sub-issue deserves its own issue, possibly: if I've understood this sub-issue correctly, then the intention is to replace two calls to deciles-charts with one call to deciles-charts, but for two groups' worth of additional configuration. For now, I think we could ask users call deciles-chart twice.

What do you think, @ccunningham101? 🙂

@iaindillingham iaindillingham changed the title Specification of which decile charts to generate Specify additional configuration Apr 5, 2022
@ccunningham101
Copy link
Author

ccunningham101 commented Apr 6, 2022

I agree with all of the above, thanks for the clear explanation!
Two more configuration options that we have seen in repositories that use deciles charts are vertical plotlines to denote important dates and allow for scaling of the y-axis. I did very quick implementations on Milan's blood pressure repository

  1. Add vertical plotlines to denote important dates
  • I went with manually user specified dates rather than anything programmatic as I don't think it would be easy to decipher a plot with many more than 4 or so lines
  • It is currently missing any checks to see whether the values are within the visible range of the x-axis
  • It currently assumes that the x axis is a date and casts the user input to a date- a more flexible option could require the user to specify the type in the config, and check that the type specified matches the x axis
  1. Allow for scaling of the y-axis
  • I started with 1.05*max val until Milan pointed out that the autoscale function existed, but we were seeing strange results on the server. Could have been an issue with where/how I was invoking it, but may require further debugging

@LFISHER7
Copy link
Contributor

#22 describes an additional configuration option. I think this along with the title and axis labels would be a good first step for some customisation

@iaindillingham iaindillingham self-assigned this Apr 20, 2022
@iaindillingham iaindillingham moved this to Todo in Overview Apr 21, 2022
@iaindillingham iaindillingham added the enhancement New feature or request label Apr 21, 2022
@iaindillingham
Copy link
Member

This is a good opportunity to think about how the config key looks. #18 and #21 propose/add configuration. With this configuration, an invocation of deciles-charts could look like this:

generate_deciles_charts:
  run: >
    python:latest analysis/deciles_charts.py
      --input-files output/measure_*.csv
      --output-dir output
  config:
    show_outer_percentiles: true
    outputs:
      - tables
      - charts
  needs: [generate_measures]
  outputs:
    moderately_sensitive:
      deciles_tables: output/deciles_table_*.png
      deciles_charts: output/deciles_chart_*.png

However, this invocation could be confusing: do we show outer percentiles on the tables and charts? If the table isn't for presentation, then is "show" the best verb?

An alternative invocation of deciles-charts could look like this:

generate_deciles_charts:
  run: >
    python:latest analysis/deciles_charts.py
      --input-files output/measure_*.csv
      --output-dir output
  config:
    tables:
      output: true
      outer_percentiles: false
    charts:
      output: true
      outer_percentiles: false
      encoding:
        x:
          title: "Date"
          scale:
            domain: ["min", "max"]
        y:
          title: "Value"
          scale:
            domain: [0, "max"]
  needs: [generate_measures]
  outputs:
    moderately_sensitive:
      deciles_tables: output/deciles_table_*.png
      deciles_charts: output/deciles_chart_*.png

I've given default values, so the config key is unnecessary. You're probably wondering about the encoding key. Well, I'd like to propose using a subset of Vega-Lite for configuring deciles charts. Essentially, the charts key could contain a subset of a single view specification.

I'd really appreciate your feedback on ☝🏻, because although we can refactor configuration, this gets harder and harder as we add more. Ideally, we'd like to make the config key:

  • unnecessary for the majority of cases - "no config" tending to "low config"
  • expressive - it should allow us to express many possible configurations, even those we haven't thought of yet 🙂
  • "intuitive" - you shouldn't have to think too hard about what the configuration is configuring

I'm going to create a new issue for "Add vertical plotlines to show important dates", because this feels like a non-trivial amount of work. It would be useful to discuss examples, too.

@LFISHER7
Copy link
Contributor

I think we can be consistent between outer percentiles across tables and charts but agree if we are, something like include_outer_percentiles makes more sense.

I agree with the three requirements of the config and the single view spec looks like it would achieve that. Could we achieve something similar with matplotlib stylesheets? I'm not familiar enough with vega-lite to understand the advantages/disadvantages!

@iaindillingham
Copy link
Member

I think I agree with you, @LFISHER7 🙂 Ultimately, a deciles chart should be a rendering of a deciles table. This would be good for redaction; if you saw deciles_chart_has_sbp_event_by_stp_code.png and deciles_table_has_sbp_event_by_stp_code.csv, then you could be confident that the former is a rendering of the latter, so you could direct your redaction efforts accordingly. Having table-specific and chart-specific percentiles configuration would make redaction harder.

I've not investigated Matplotlib stylesheets, but would suggest not coupling deciles-charts to an underlying library, such as Matplotlib, too tightly. Hence, I'm not suggesting that we use Vega-Lite (the library), just a subset of the spec. I'm confident that the spec is expressive and intuitive, given the history of the project; more expressive and intuitive than matplotlib.RcParams, anyhow.

@LFISHER7
Copy link
Contributor

Given that, the spec looks like a good option for what we want!

@iaindillingham iaindillingham removed their assignment May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Todo
Development

No branches or pull requests

3 participants