Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimising molecular model parameters #3

Open
johnlees opened this issue Sep 20, 2024 · 6 comments
Open

Optimising molecular model parameters #3

johnlees opened this issue Sep 20, 2024 · 6 comments

Comments

@johnlees
Copy link
Member

Between topology optimisation, want to optimise the rate matrix parameters.

Ultimately we want:

  • GTR model for SNPs
  • Fast binary rate for genes (MGEs etc)
  • Slow binary rate for other accessory genes

Maybe a helpful reference: https://github.com/4ment/phylostan

Also see models here: http://www.iqtree.org/doc/Substitution-Models#dna-models and http://www.iqtree.org/doc/Substitution-Models#binary-and-morphological-models

@jhellewell14
Copy link
Collaborator

I've made a start on this here: 362e3db

Now Trees (the struct) initialise with default rate parameters

impl Default for RateParam {
fn default() -> Self {
RateParam(4.0 / 3.0, 4.0 / 3.0, 4.0 / 3.0, 4.0 / 3.0, 4.0 / 3.0, 4.0 / 3.0, vec![0.25, 0.25, 0.25, 0.25])
}
}
and these rate parameters are used to construct a matrix using the GTR model
pub fn update_rate_matrix_GTR(&mut self) {
let RateParam(a, b, c, d, e, f, pv) = &self.rate_param;
// pv = pivec defined as (piA, piC, piG, piT)
let mut q = na::Matrix4::new(
-(a * pv[1] + b * pv[2] + c * pv[3]),
a * pv[1],
b * pv[2],
c * pv[3],
a * pv[0],
-(a * pv[0] + d * pv[2] + e * pv[3]),
d * pv[2],
e * pv[3],
b * pv[0],
d * pv[1],
-(b * pv[0] + d * pv[1] + f * pv[3]),
f * pv[3],
c * pv[0],
e * pv[1],
f * pv[2],
-(c * pv[0] + e * pv[1] + f * pv[2]));
self.rate_matrix = q;
}
}

Now the update.likelihood method uses the internal tree rate matrix so we can make changes to the rate paramters, update the matrix, and then update the likelihood.

A problem I might need your help with is the phyML test. If you run the test now it passes, even though the call to phyML specifies for it to use the JC69 model

.args(&["-i", input_alignment_phylip.as_str(), "-u", output_tr_file.1.as_str(), "-m", "JC69", "-o", "n", "-a", "1", "-c", "1"])

If you swap the JC69 to GTR then the test fails because the likelihoods are different. Any ideas why? As far as I can tell I have constructed the default rate matrix to be the same as the one that phyML uses.

@johnlees
Copy link
Member Author

So phyml with JC69 matches your code with GTR? Or they match when both JC69, and mismatch when both GTR?

Can't immediately spot anything obviously wrong in the rates, but will take a closer look soon

@johnlees
Copy link
Member Author

A rust thing: I wonder if we might want a RateMatrix trait, which then JC69 and GTR structs have impl of

@jhellewell14
Copy link
Collaborator

So phyml with JC69 matches your code with GTR? Or they match when both JC69, and mismatch when both GTR?

Can't immediately spot anything obviously wrong in the rates, but will take a closer look soon

I think what I don't quite understand is why this should change the likelihood at all. GTR vs JC69 are ways of parameterising the matrix right? So if I choose parameters in each model that produce a matrix with the same values then the likelihood of the tree should be the same?

Before I was explicitly defining a rate matrix q that equals

(-1, 1/3, 1/3, 1/3, 
1/3, -1, 1/3, 1/3,
.... and so on

Then I calculate the tree likelihood as normal by exponentiating this rate matrix. This produce a likelihood that matched with phyML set to JC69 mode so I'd imagine phyML uses this as the default (I think this corresponds to the 1 parameter in JC69 = 4/3 ?).

Now I've added a function so that we no longer define q separately and it produces a matrix in the Tree struct that has the exact same value as q (by having specific default values for the a, b, c, d, e, f etc parameters in the GTR model). So my code would produce the same likelihood.

However, when you set phyML to use GTR it produces a different likelihood to my code. So possibly it just has a different default set of GTR parameters that produce a different rate matrix and therefore different likelihood.

@jhellewell14
Copy link
Collaborator

A rust thing: I wonder if we might want a RateMatrix trait, which then JC69 and GTR structs have impl of

Implemented this here: https://github.com/bacpop/trees_rs/blob/main/src/rate_matrix.rs

@jhellewell14
Copy link
Collaborator

Shout out @mattjohnruss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants