-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use somewhat normalized "experiments" table instead of conditions/timecourses #585
Comments
I can see the advantages and I think that it is a viable solution that should be explored further. In general the long format for the condition table means that conditions with many parameters are not as simple and clearly arranged as they are in the wide format, but I think that the flexibility in the long format, also w.r.t. further columns, is the stronger argument. (It should also be trivial to create the conditions table in wide format and then switch using the What I am critical about is the potential ambiguity. I am also considering whether too much flexibility could deter new/basic users because it would be more difficult to figure out (1.) which way to correctly specify a PEtab problem and (2.) understand if there are undesired consequences of specifying a PEtab problem in one way, e.g.: If Example 2 |
Thanks @dilpath , great to see efforts to make the tables more normalized! But similar to @m-philipps , I think putting rows with different meaning in the same table should be avoided. My suggestion would be something like
Here, we have four different experiments. One problem with this table is that you might have to change the model if you want to do infusion dosing (i.e. the model might not have a
One issue I see is that you could accidentally do
which should not switch off the infusion imo, just set drugA to zero, while the infusion keeps going. Of course, to save rows you could also add an optional Lmk what you think. I have not been involved with v2 yet, so maybe I'm missing crucial bits here. |
Thanks for the quick feedback. I changed the first post to address some of it.
This will be available in the v1<->v2 converter that we will supply in libpetab-python.
Yes, this is a discussion point -- should we have a default
Yes, the flexibility makes things a little more complicated. However, the examples the users see in the docs can be rather basic and in two "separate" tables, e.g. Example 2 can be specified like
This is equivalent to separate normalized formats for the conditions and timecourses table. However, they would be combined into a single experiments table in the PEtab YAML, because they're all just timecourses to the tool. ...
problems:
- experiment_files:
- normalized_conditions.tsv
- normalized_timecourses.tsv
measurement_files:
- ....tsv
...
It's currently up to the tool whether it chooses to perform simulations that have no measurements... |
To me, the rows do not have a different meaning. Every row describes the (piecewise-)constant input function of a dynamical system. The only different between rows is whether they update a single, or multiple, model parameters, but hopefully this is made clear through the use of
This will be supported. i.e. your suggestion
can be expressed like so in PEtab v2 (regardless of which table format we go with, expressions will be supported...)
We opted for expressions over a column like
Would be fine for me to include this
I guess instead of
or we use a
|
Ok. I see what you mean now. But I also think that PEtab should stick to the classic relational database model, where foreign keys point to rows in other tables. Then you can just go with object-relational mapping (table=class, column=class attribute, row=instance) to represent everything in (Python) objects. Otherwise the format specification would break the principle of least astonishment. But I'm definitely not a database expert, so I might be wrong here. Problem with the relational system is, that for n levels of nesting, you require n tables. And you don't know how deep a user wants to nest. But imo the last table before the "Nesting" headline looks good to me. And tbh, for most reasonable biological applications the number of rows will still be something reasonable.
OK, that's a good point.
Hmm. That's a problem. I think at three cases need to be supported:
So perhaps instead of using a
|
Also not an expert, but I think this is already supported. I am currently working with this proposed table and importing it into Python objects using pydantic without issue.
Agreed, IMO the nice thing about the proposed format is that it supports the very intuitive "condition-only long table" and "timecourse-only long table" (see updated Example 2), while also supporting the arbitrary nesting that avoids multiple, or very long, tables.
Sounds good to me! |
Thanks for bringing up this discussion. For some joining in late to this discussion it would have been helpful to have some kind of markup in the post to see changes. |
Agreed, but I didn't make any changes to the format of the tables in the first post yet (e.g. all columns and their meaning have remained the same so far). I only added explanatory text or re-arranged the order of some things so people can hopefully understand the proposal better. I now "quote" any tables in the old formats (e.g. the old conditions and timecourses tables) in the first post, so it's clear which tables are in the proposed format. |
I like this proposal quite a bit as it would resolved some of the ambiguity in the assignment that we currently have, which requires pretty detailed understanding of SBML semantics. Would be great to also differentiate between I am a bit unhappy about the |
To me, it simply means "input function", but I probably misunderstood your point...
Yes, I left an explanation of these corner cases out of the first post so far 🙈 But this is how I would do it: To your question: although a user specifies Consider the
e.g. the first This is useful for me to be able to concisely define a radiation therapy involving "radiation on" then "radiation off" repeatedly until some end time point. It denests into something that can be easily verified to have the intended timecourse, and I would include a plotting function in
I haven't thought too much about these additional columns that mean things like SBML But I don't see an issue with |
This is because it nests three different concepts, as @dilpath mentioned:
Of course, they can be considered as one concept ("input function", as Dilan suggested), but still, input functions are so diverse things that they almost deserve their separate tables. That said, we would not have these problems if we just would not support nesting. Dilan's denested table above looks very clean and easy to understand. I don't see big disadvantages. Pandas, DataFrames.jl and even Excel allows you easily to create such denested tables. Maybe file size could become a little pain if you exceed GitHub file size and have to start using Git-lfs at some point, but this should be super rare. On the other hand the advantages of the denested form seem much much larger for me:
(And if anyone wants to have a more concise notation, maybe simply allow both, scalars and start:step:end strings in the time column -> should do 90% of the conciseness with 10% of the confusion) |
Well, we probably want something a bit more biology oriented. In this thread we often refer to "experiment", but with one-layer nesting, it's probably more appropriate to call respective entries something like "experimental phase" and I am not sure what to call 3-layer nesting or why it would be necessary. But as Paul mentioned, if we already introduce such notation why not make our lives easier and just separate them in individual files?
I am not worried about combinations of assignment, but using this in combination with nesting. |
Agreed, this should be allowed regardless if we adopt the long format; this means users do not need to define conditions, because they can specify model parameters in this experiments table directly.
Thanks, nice points, I can see the benefits. No ambiguity is definitely a big advantage.
Makes sense. I have been thinking in the context of drug regimens for a couple of applications so far. e.g. maybe one only applies a radiation therapy during 9 am and 5 pm (first level timecourse); and only on weekdays (second level timecourse); and only on the first week of the month (third level timecourse); and only for six months (fourth level timecourse); and then measures tumor response at the seventh month. Some of this I made up -- in a current application I am only looking at a third level timecourse, but I think these larger nesting applications are plausible. The nesting allows me to define one "covariate condition" per patient, and define the "radiation therapy" nested timecourse once, and then combine "covariate condition"+"radiation therapy timecourse" to create patient-specific "experiments". I am not sure of a suitable biology term here though, apart from "experiment" or "protocol". This makes for a neat specific for my problem, though.
If we end up only supporting "one-layer nesting" (i.e. equivalent to a conditions and timecourses file), then individual files is completely fine for me. But I think there is a similar complexity cost (in terms of user comprehension) from introducing too many tables, compared to the complexity of understanding nested experiments. Alright, one last attempt to see if I can make this nesting intuitive for new users. What if we say that, if any row in the experiments table is missing a e.g.
Here, If we can agree that this is sensible, then I would try to make a case for nested But if you think it's still to confusing/ambiguous, then I'll open a new issue and we can move forward with long-format versions of the conditions and timecourses tables, since I guess we will agree on those. I can then design a third table for nested timecourses, since this has been requested by a couple of users to implement repeating timecourses like
I don't think |
A bit late to the game, but I agree with Paul here that we should avoid nesting. I find the nesting to be confusing, while the long format is more intuitive and I think overall easier to understand (and |
Well, this did not go the way I expected! But there were some good additional suggestions, thanks for the feedback 🙂 Unless someone says otherwise, I'll make a suggestion for two simpler "conditions" and "timecourses" long-format tables in the next week, with the additional columns like |
Thanks Dilan, that sounds good to me. However, since you mentioned earlier
I think we have three options of how the value could be interpreted:
So instead of |
@matthiaskoenig and others have suggested we make our tables more normalized.
Instead of a timecourses table (#581), I would suggest an experiments table, which merges the conditions and timecourses tables into a single table. The main idea is that conditions/timecourses all describe the "input" function of the dynamical system. Combining this "input function information" into a single table enables some additional operations.
All "unquoted" tables in this post are in the new proposed format.
Conditions table -> experiments table
The following columns are sufficient to define a "normalized" PEtab v1 conditions table.
Example 1: classic conditions table as experiments table
This PEtab v1 conditions table
is now this PEtab v2 experiments table
This enables additional optional columns, e.g. for units.
Timecourses table -> experiments table
This experiments table can be extended to support timecourses like #581, with the following optional column:
Example 2: timecourses table as experiments table
This timecourse in the currently-proposed format (#581)
is now specified in these long formats for the conditions and timecourses
normalized_conditions.tsv
normalized_timecourses.tsv
which are specified in the PEtab YAML like
Here, you might notice the trick. The two tables are combined into a single
experiments
table, i.e., those two long tables, and the joint table below, are equivalent tables in the exact same format -- all are valid tables in the proposed format.This joint table enables a lot more flexibility, e.g. the following two features.
(1) Timecourses can be specified in terms of model parameters directly, e.g. the above joint table is equivalent to
(2) Nesting is now possible, for easier specification of periodic timecourses.
Nested timecourses
We already agreed that repeating timecourse specification is useful. I would add nested timecourses too, since I already have a use case. Hence the following optional column:
Example 3: Nested and repeating timecourse
This describes an experiment where a switch is toggled on/off every 5 time units until
t=100
.switchOn
andswitchOff
are like PEtab v1 conditionsswitchSequence
is like a timecourse as in WIP: Specification of Timecourses #581experiment1
is a nested timecourse whereswitchSequence
is repeated every 10 time units to simulate the repeated toggling of the switch, untilt=100
.Pros
timecourse1 = 0:condition1
) timecourse table to convert their PEtab v1 problems into v2, and can instead use any condition/timecourse/nested timecourseexperimentId
in the measurements table. I think this is more intuitive for users.cond1
with 1000 input variables at just one of its input variables likei.e., I think this format future-proofs PEtab v2 by supporting many features/operations on conditions. In the end, these can all be "denested" easily into things that look like PEtab v1 conditions applied at specific time points (or, SBML events), so it makes no difference to PEtab-compatible tools.
Cons
The text was updated successfully, but these errors were encountered: