Working with continuous expression data? #94

stevenagl12 · 2024-02-19T20:01:13Z

I have a potentially dumb question. So, as I understand it, we need to discretize the data to work with this package on continuous biological data, such as gene expression or cytometry data. The inbuilt function for bn. discretize however takes in a build graph as an input though. With our data, we can't infer which nodes and edges we have to start a random graph. How can we use this package with such continuous data? As I understand it, in the R bnlearn library, it came with the iamb, and hatermink discretization options, but I don't see that in this package.

erdogant · 2024-02-21T15:30:23Z

When you only have data, and want to start without a structure, try the structure learning. However the methods in bnlearn does require data to be discrete.

Two suggestions how to approach this:

Discritize your data based on your domein knowledge and/or in combination with other statistics. For example, for your gene expression profiles you could do a t-test between a control group and set a threshold (alpha is 0.05) with or without multiple test correction. This would return three states for each gene (up, baseline, down). If you dont have a control group, try fitting the distribution to a theoretical distribution (checkout distfit) and make a cut on the 95%CII or so. Do both sides of the distribution and you would again have three states per gene. This comes close to constrain based: https://erdogant.github.io/bnlearn/pages/html/Structure%20learning.html#constraint-based
Try using the built on functionality of bnearn to automatically discritize and create states based on the continuous expression profiles. This is again a starting point towards structure learning. See documentation for more details.

https://erdogant.github.io/bnlearn/pages/html/Continuous%20Data.html

No methods like iamb. However, checkout what’s available is pgmpy. If there is something what could help you, I am open to merge commits.

Asking questions makes you smart btw. Keep it up 👍🏻

stevenagl12 · 2024-02-21T15:34:54Z

So, while I understand the first part, I was wondering about the second. Using that discretize function it takes the argument for DAG. This DAG in the example is created by priors of the connections between the variables. How do we create one without knowing what variables might be connected?

…

On Wed, Feb 21, 2024, 10:30 AM Erdogan ***@***.***> wrote: When you only have data, and want to start without a structure, try the structure learning. However the methods in bnlearn does require data to be discrete. Two suggestions how to approach this: 1. Discritize your data based on your domein knowledge and/or in combination with other statistics. For example, for your gene expression profiles you could do a t-test between a control group and set a threshold (alpha is 0.05) with or without multiple test correction. This would return three states for each gene (up, baseline, down). If you dont have a control group, try fitting the distribution to a theoretical distribution and make a cut on the 95%CII or so. Do both sides of the distribution and you would again have three states per gene. 2. Try using the built on functionality of bnearn to automatically discritize and create states based on the continuous expression profiles. This is again a starting point towards structure learning. See documentation for more details. https://erdogant.github.io/bnlearn/pages/html/Continuous%20Data.html <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ferdogant.github.io%2Fbnlearn%2Fpages%2Fhtml%2FContinuous%2520Data.html&data=05%7C02%7Csalewis%40g-mail.buffalo.edu%7C22398428127f4cc2e56708dc32f20c75%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638441262383142903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=N0E28utjYRE%2BfHgAxU9%2ByW4xifn7NvLSMCZFz1%2Fkj84%3D&reserved=0> Asking questions makes you smart btw. Keep it up 👍🏻 — Reply to this email directly, view it on GitHub <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ferdogant%2Fbnlearn%2Fissues%2F94%23issuecomment-1956955962&data=05%7C02%7Csalewis%40g-mail.buffalo.edu%7C22398428127f4cc2e56708dc32f20c75%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638441262383142903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=KvtJCA5TF5he7U8qteRhqO5aJ21m%2FzU3r1qP%2BSGATEg%3D&reserved=0>, or unsubscribe <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAHHKCLJNSQQ2COY45GBIDZDYUYHJVAVCNFSM6AAAAABDP7Z36CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJWHE2TKOJWGI&data=05%7C02%7Csalewis%40g-mail.buffalo.edu%7C22398428127f4cc2e56708dc32f20c75%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638441262383299149%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=5MQttHvvakcCWVX41hiMmc3ZNUSG6r8NsUgQODe5CYw%3D&reserved=0> . You are receiving this because you authored the thread.Message ID: ***@***.***>

erdogant · 2024-02-26T23:06:18Z

You are right. The second part does need a DAG at start. Unfortunately there is no other implementation yet.

akshatakarjun · 2024-07-16T21:38:26Z

Hi,

By continuous biological data, did you mean continuous data like various numbers (for ex 103.2, 102, 99, 2.5, etc) or time-series data?
If it ain't any of these, could you please explain what the data you have mentioned, loos like?

Also, if it is different, is this package applicable fr continuous data like the one I have mentioned above?

stevenagl12 · 2024-07-16T22:36:02Z

I was talking about various numbers of RNAseq fold changes.

erdogant · 2024-07-22T16:54:15Z

If you would like to know some comparison with other causal packages, you can read it in my blog over here. The last time I checked, only CausalImpact can model continuous values but that is for time series data. So, it is not applicable when you are using RNAseq data.

Loominarty · 2024-07-25T07:25:22Z

I also have a dumb question:

I have a dataset that mixes continuous and discrete data. I noticed the bn.discretize function takes a lot of time (my dataset is 11000 points roughly, 9 columns, among which 4 are continuous).
Is there a possibility to discretize outside of bnlearn or is this not compatible ?

I tried using the pandas functions to circumvent the issue and generate Interval Indexes in my dataset but with very little success.

akshatakarjun · 2024-07-25T11:30:08Z

Unsure what kind of continuous data you have but If possible, you can manually put them into a discrete range. For example, if a feature called BloodPressure has various values, then we know what values of BP is considered as normal, high BP and low BP. You can do a if loop, if the value falls in this range, replace all those rows value with the categorical value you want.

Just a thought!!

Loominarty · 2024-07-26T01:32:36Z

Hi @akshatakarjun ,

I found something that works alright, but is not very convenient in terms of user comfort. I have discretized outside of the library and used bn.df2onehot to encode the indexes into integers.
Then I just translate my new incoming data into one of these numbers.

erdogant · 2024-08-02T08:00:26Z

You can indeed manipulate your data as you wish. The df2onehot was included in bnlearn to provide one of the steps from start-to-results. So you are right, it brings some comfort but at the same time it is generally slow.

erdogant · 2024-10-08T18:12:58Z

I implemented LiNGAM methods (Direct and ICA) to model datasets with continuous variables (without discretizing!). See docs here.

update to the latest version with:

pip install -U bnlearn

erdogant · 2024-10-20T10:03:45Z

This functionality is added with the last update! Re-open if needed.

erdogant mentioned this issue Jul 22, 2024

What kind of data type does the library handle? #101

Closed

erdogant closed this as completed Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working with continuous expression data? #94

Working with continuous expression data? #94

stevenagl12 commented Feb 19, 2024

erdogant commented Feb 21, 2024 •

edited

Loading

stevenagl12 commented Feb 21, 2024 via email

erdogant commented Feb 26, 2024

akshatakarjun commented Jul 16, 2024

stevenagl12 commented Jul 16, 2024 via email •

edited by erdogant

Loading

erdogant commented Jul 22, 2024

Loominarty commented Jul 25, 2024

akshatakarjun commented Jul 25, 2024

Loominarty commented Jul 26, 2024

erdogant commented Aug 2, 2024 •

edited

Loading

erdogant commented Oct 8, 2024 •

edited

Loading

erdogant commented Oct 20, 2024

Working with continuous expression data? #94

Working with continuous expression data? #94

Comments

stevenagl12 commented Feb 19, 2024

erdogant commented Feb 21, 2024 • edited Loading

stevenagl12 commented Feb 21, 2024 via email

erdogant commented Feb 26, 2024

akshatakarjun commented Jul 16, 2024

stevenagl12 commented Jul 16, 2024 via email • edited by erdogant Loading

erdogant commented Jul 22, 2024

Loominarty commented Jul 25, 2024

akshatakarjun commented Jul 25, 2024

Loominarty commented Jul 26, 2024

erdogant commented Aug 2, 2024 • edited Loading

erdogant commented Oct 8, 2024 • edited Loading

erdogant commented Oct 20, 2024

erdogant commented Feb 21, 2024 •

edited

Loading

stevenagl12 commented Jul 16, 2024 via email •

edited by erdogant

Loading

erdogant commented Aug 2, 2024 •

edited

Loading

erdogant commented Oct 8, 2024 •

edited

Loading