Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API proposal for bridging {ospsuite} and {tlf} packages #674

Closed
IndrajeetPatil opened this issue Dec 6, 2021 · 36 comments · Fixed by #690
Closed

API proposal for bridging {ospsuite} and {tlf} packages #674

IndrajeetPatil opened this issue Dec 6, 2021 · 36 comments · Fixed by #690

Comments

@IndrajeetPatil
Copy link
Member

IndrajeetPatil commented Dec 6, 2021

Original Proposal

Based on discussion with @msevestre, @PavelBal, @pchelle, @Yuri05

Adopt a functional approach to create plots using {tlf} with a few arguments and sensible internal defaults.

The primary argument would be an R6 object encapsulating both simulated and observed data from {ospsuite}.

plotProposal

As to which arguments these plots should have remains to be decided (see the comment below with possible arguments).

Implementation note

Having already implemented a few functions, here are some notes on how to add new plotting functions in {ospsuite}.

unnamed

  • All {ospsuite}-based plotting functions are wrappers around {tlf}-based plotting functions with a one-to-one correspondence.
  • All {ospsuite}-based plotting functions require DataCombined (can't be missing) and DefaultPlotConfiguration (if missing, defaults are used) objects.
  • DataCombined provides data to visualize, while DefaultPlotConfiguration provides plot's aesthetic properties.
  • Each {tlf}-based plotting function requires a specific variant of PlotConfiguration object (e.g. tlf::plotObsVsPred() needs tlf::ObsVsPredPlotConfiguration object).
  • {ospsuite}-based plotting functions internally convert user-specified DefaultPlotConfiguration object to the specific variant of PlotConfiguration object needed by {tlf}-based plotting function (using helper ospsuite:::.convertGeneralToSpecificPlotConfiguration()).
@msevestre

This comment was marked as outdated.

@IndrajeetPatil
Copy link
Member Author

IndrajeetPatil commented Dec 7, 2021

Steps needed:

  • implement DataCombined objects (containing simulation results, observed data, and grouping information)
  • implement plot*() functions

@PavelBal
Copy link
Member

PavelBal commented Dec 7, 2021

  • DataCombined should allow to set scale factors and offsets for data (x and y values)
  • DataCombined should have a method toDataFrame that returns a data frame with all data sets and additional information specified (like dimension, unit, offsets, scale factors, meta data)

Following plot types are prio one:

- individualTimeProfilePlot()

image

Optional arguments:

  • Main title
  • x limit
  • y limit
  • x lin/log
  • y lin/log
  • x axis label
  • y axis label
  • x unit
  • y unit
  • legend on/off
  • legend position
  • for each data set:
    • type (symbols, lines, both)
    • symbol type
    • line type
    • color

- aggregatedTimeProfilePlot()

image

Optional arguments:

  • Main title
  • x limit
  • y limit
  • x lin/log
  • y lin/log
  • x axis label
  • y axis label
  • x unit
  • y unit
  • legend on/off
  • legend position
  • for each data set:
    • type (symbols, lines, both)
    • symbol type
    • line type
    • color
    • Plot individual or aggregated
      • By default, simulation results are plotted aggregated, data as individual points. But the user might want to also plot data as quantiles
  • quantiles

- predictedVsObservedPlot()

image

Optional arguments:

  • Main title
  • axes limit
  • axes lin/log
  • x axis label
  • y axis label
  • values unit
  • legend on/off
  • legend position
  • for each data set:
    • symbol type
    • color
  • foldDistance - numerical values for the fold-distance lines to be drawn. Default is 2, multiple fold-distance lines can be specified
  • timeDiffThreshold - Allowed difference between observed and simulated time values in minutes. Default is 10 minutes. If for a certain observed point no simulated time point exists within the defined threshold, the value is not considered. Can be discussed if this can be solved differently.

Following plot types are prio two:

- residualsVsTimePlot()

image

Optional arguments:

Will be specified when we are about to implement this. Especially the objective function (how to calculate residuals).

- pkRatioPlot()

This requires other type of input data, DataCombined is not the right concept for this.

- guestPlot()

Open-Systems-Pharmacology/Reporting-Engine#99
This requires other type of input data, DataCombined is not the right concept for this.

multi panel

@PavelBal
Copy link
Member

PavelBal commented Dec 7, 2021

Additional requirements:

  • Output of plots to PNG (incl. providing resolution and file path)
  • Plotting multiple plots in one panel
    image

@Yuri05

This comment was marked as resolved.

@PavelBal

This comment was marked as resolved.

@Yuri05

This comment was marked as resolved.

@IndrajeetPatil IndrajeetPatil pinned this issue Dec 12, 2021
@PavelBal
Copy link
Member

PavelBal commented Dec 13, 2021

@msevestre @IndrajeetPatil

Basically, DataCombined needs to support two scenarios:

  1. Adding DataSet

    • dataCombined$addDataSets(dataSets, groups = NULL), where dataSets is a single DataSet or a list of DataSet objects, and groups is a list of strings defining the groupings. The names under which the DataSet will be stored (and used e.g. in the caption) should be the name property of the DataSet.
  2. Adding simulation results. Now it becomes more tricky. The user would not only provide the SimulationResults object, but also the paths, and maybe even individual Ids.

    • dataCombined$addSimulationResults(simulationResults, paths, individualIds = NULL, groups = NULL). individualResults in this case should be only one object, not a list. Now, what about the names? We could use the path as the name by default, but in that case adding results from different simulation runs will become a problem. The user has to specify the name? E.g.
    • dataCombined$addSimulationResults(simulationResults, paths, individualIds = NULL, names, groups = NULL).

This is also why I would prefer to add data with methods only, and not in the constructor. Otherwise the constructor get way too many arguments. The most simple workflow would then be:

dataCombined <- DataCombined$new()
dataCombined$addDataSets(dataSets)
dataCombined$addSimulationResults(simulationResults, "Organism|VenousBlood|Plasma|CompoundA|Concentration")
  • Every entry (whether it is an added DataSet or simulation result) must have a unique name. With this name, the user can set scale factors and offsets.

And now the most important questions - how to store simulation results internally? The easiest way would be to convert them to DataSet on adding, but I see @msevestre point that for population simulation that would create just too many DataSet objects. We could create a data frame for each output path and populate it with all necessary information (dimension, unit, molecular weight, maybe even organ/compartment).

@PavelBal PavelBal mentioned this issue Dec 16, 2021
22 tasks
@IndrajeetPatil

This comment was marked as resolved.

@PavelBal

This comment was marked as resolved.

@IndrajeetPatil IndrajeetPatil linked a pull request Jan 19, 2022 that will close this issue
22 tasks
@IndrajeetPatil

This comment was marked as resolved.

@PavelBal

This comment was marked as resolved.

@IndrajeetPatil

This comment was marked as resolved.

@PavelBal

This comment was marked as resolved.

@IndrajeetPatil

This comment was marked as resolved.

@msevestre

This comment was marked as resolved.

@PavelBal
Copy link
Member

DataSet can have an empty name (e.g. create a new one with dataSet <- DataSet$new(). Not sure what to do in this situation:

@msevestre
Copy link
Member

is this a problem ? name can be null (or "") not sure what makes more sense

@IndrajeetPatil
Copy link
Member Author

Currently, in this situation, empty dataframes will be returned:

library(ospsuite)
#> Loading required package: rClr
#> Loading the dynamic library for Microsoft .NET runtime...
#> Loaded Common Language Runtime version 4.0.30319.42000

dataset <- DataSet$new()
myCombDat <- DataCombined$new()
myCombDat$addDataSets(dataset)

myCombDat$toDataFrame()
#> # A tibble: 0 x 14
#> # ... with 14 variables: name <chr>, group <chr>, dataType <chr>,
#> #   xValues <dbl>, xUnit <chr>, xDimension <chr>, yValues <dbl>, yUnit <chr>,
#> #   yDimension <chr>, yErrorValues <dbl>, yErrorType <lgl>, yErrorUnit <lgl>,
#> #   molWeight <lgl>, lloq <lgl>

myCombDat$dataTransformations
#> # A tibble: 0 x 5
#> # ... with 5 variables: name <chr>, xOffsets <dbl>, yOffsets <dbl>,
#> #   xScaleFactors <dbl>, yScaleFactors <dbl>

myCombDat$groupMap
#> # A tibble: 0 x 3
#> # ... with 3 variables: group <chr>, name <chr>, dataType <chr>

myCombDat$names
#> character(0)

Created on 2022-01-24 by the reprex package (v2.0.1.9000)

@PavelBal
Copy link
Member

All data added to a DataCombined must have some unique identifier (we call it name). This allows e.g. to set data transformations for this data set.
We use the $name property from DataSet when adding a DataSet as the default name, for simulation results these are the paths of the added outputs. The user can also set custom names.

So adding a DataSet with an empty name will now results with it`s name being empty and therefore... Well, not a unique identifier.

@IndrajeetPatil
Copy link
Member Author

So adding a DataSet with an empty name will now results with it`s name being empty and therefore... Well, not a unique identifier.

I think I am confused because I don't understand the context in which this might occur, or what's the use case for this. But why should something that doesn't exist get a unique identifier?

My understanding is that if there is no name associated with a DataSet object, it just means that there is no data in the object to be uniquely identified. Am I wrong in my understanding?

@msevestre
Copy link
Member

My understanding is that if there is no name associated with a DataSet object, it just means that there is no data in the object to be uniquely identified. Am I wrong in my understanding?

I agree with this. This is how I understand the object

@PavelBal
Copy link
Member

PavelBal commented Jan 25, 2022

> myDs <- DataSet$new()
> myDs$setValues(xValues = c(1,2,3), yValues = c(2,3,4))

A perfectly valid DataSet without a name, no? Maybe we should make the name a requirement when creating DataSet from scratch?

Also - do all DataRepository coming from PKSim/MoBi always have a name?

@msevestre
Copy link
Member

msevestre commented Jan 25, 2022

Also - do all DataRepository coming from PKSim/MoBi always have a name?

YEs. This is what is being displayed in the tree. If it's emopty, someone messed around with the pkml file

Maybe we should make the name a requirement when creating DataSet from scratch?

Yes! but how can you enforce it if you do not want to force the user to specify a name in the constructor.
Alternatively, the name is assigned as a GUID when created and overwritten when created via repo or simresults

@PavelBal
Copy link
Member

Yes! but how can you enforce it if you do not want to force the user to specify a name in the constructor.
Alternatively, the name is assigned as a GUID when created and overwritten when created via repo or simresults

I would be OK forcing the user to specify the name when creating an empty DataSet.

@IndrajeetPatil
Copy link
Member Author

As requested by @PavelBal, I am creating separate discussion threads for each plot specification.

This makes the current issue outdated and relevant only for reference. So I am closing to reduce its interference.

The issue will continue to be pinned until all plotting functions have been implemented.

@IndrajeetPatil
Copy link
Member Author

IndrajeetPatil commented Mar 10, 2022

I will keep updating references in this comment to the discussions as they are opened:

@IndrajeetPatil
Copy link
Member Author

IndrajeetPatil commented Jul 21, 2022

@PavelBal, @Yuri05, @StephanSchaller

Out of possible scatter plots, we currently have:

  • Observed versus Predicted
  • Residuals versus Time

But not

  • Residuals versus Predicted

Is it not useful?

tlf does support this plot (plotResVsPred()), so it should be straightforward to implement in ospsuite as well.

@Yuri05
Copy link
Member

Yuri05 commented Jul 21, 2022

I think it is useful :)

@PavelBal
Copy link
Member

Yes, but very low prio.

@IndrajeetPatil
Copy link
Member Author

It's already implemented in #1036.

@IndrajeetPatil
Copy link
Member Author

I have updated my top-level comment in this issue with an implementation note, which should be helpful for whoever takes over implementing new plots in {ospsuite}.

@IndrajeetPatil
Copy link
Member Author

Currently, plots which are supported in {tlf}, but have no equivalent in {ospsuite} (either because they were low priority or because they don't belong to the DataCombined-based workflow):

@IndrajeetPatil IndrajeetPatil pinned this issue Aug 18, 2022
@PavelBal PavelBal unpinned this issue Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants