-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working with continuous expression data? #94
Comments
When you only have data, and want to start without a structure, try the structure learning. However the methods in bnlearn does require data to be discrete. Two suggestions how to approach this:
https://erdogant.github.io/bnlearn/pages/html/Continuous%20Data.html No methods like iamb. However, checkout what’s available is pgmpy. If there is something what could help you, I am open to merge commits. Asking questions makes you smart btw. Keep it up 👍🏻 |
So, while I understand the first part, I was wondering about the second.
Using that discretize function it takes the argument for DAG. This DAG in
the example is created by priors of the connections between the variables.
How do we create one without knowing what variables might be connected?
…On Wed, Feb 21, 2024, 10:30 AM Erdogan ***@***.***> wrote:
When you only have data, and want to start without a structure, try the
structure learning. However the methods in bnlearn does require data to be
discrete.
Two suggestions how to approach this:
1.
Discritize your data based on your domein knowledge and/or in
combination with other statistics. For example, for your gene expression
profiles you could do a t-test between a control group and set a threshold
(alpha is 0.05) with or without multiple test correction. This would return
three states for each gene (up, baseline, down). If you dont have a control
group, try fitting the distribution to a theoretical distribution and make
a cut on the 95%CII or so. Do both sides of the distribution and you would
again have three states per gene.
2.
Try using the built on functionality of bnearn to automatically
discritize and create states based on the continuous expression profiles.
This is again a starting point towards structure learning. See
documentation for more details.
https://erdogant.github.io/bnlearn/pages/html/Continuous%20Data.html
<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ferdogant.github.io%2Fbnlearn%2Fpages%2Fhtml%2FContinuous%2520Data.html&data=05%7C02%7Csalewis%40g-mail.buffalo.edu%7C22398428127f4cc2e56708dc32f20c75%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638441262383142903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=N0E28utjYRE%2BfHgAxU9%2ByW4xifn7NvLSMCZFz1%2Fkj84%3D&reserved=0>
Asking questions makes you smart btw. Keep it up 👍🏻
—
Reply to this email directly, view it on GitHub
<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ferdogant%2Fbnlearn%2Fissues%2F94%23issuecomment-1956955962&data=05%7C02%7Csalewis%40g-mail.buffalo.edu%7C22398428127f4cc2e56708dc32f20c75%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638441262383142903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=KvtJCA5TF5he7U8qteRhqO5aJ21m%2FzU3r1qP%2BSGATEg%3D&reserved=0>,
or unsubscribe
<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAHHKCLJNSQQ2COY45GBIDZDYUYHJVAVCNFSM6AAAAABDP7Z36CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJWHE2TKOJWGI&data=05%7C02%7Csalewis%40g-mail.buffalo.edu%7C22398428127f4cc2e56708dc32f20c75%7C96464a8af8ed40b199e25f6b50a20250%7C0%7C0%7C638441262383299149%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=5MQttHvvakcCWVX41hiMmc3ZNUSG6r8NsUgQODe5CYw%3D&reserved=0>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
You are right. The second part does need a DAG at start. Unfortunately there is no other implementation yet. |
Hi, By continuous biological data, did you mean continuous data like various numbers (for ex 103.2, 102, 99, 2.5, etc) or time-series data? Also, if it is different, is this package applicable fr continuous data like the one I have mentioned above? |
I was talking about various numbers of RNAseq fold changes.
|
If you would like to know some comparison with other causal packages, you can read it in my blog over here. The last time I checked, only CausalImpact can model continuous values but that is for time series data. So, it is not applicable when you are using RNAseq data. |
I also have a dumb question: I have a dataset that mixes continuous and discrete data. I noticed the bn.discretize function takes a lot of time (my dataset is 11000 points roughly, 9 columns, among which 4 are continuous). I tried using the pandas functions to circumvent the issue and generate Interval Indexes in my dataset but with very little success. |
Unsure what kind of continuous data you have but If possible, you can manually put them into a discrete range. For example, if a feature called BloodPressure has various values, then we know what values of BP is considered as normal, high BP and low BP. You can do a if loop, if the value falls in this range, replace all those rows value with the categorical value you want. Just a thought!! |
Hi @akshatakarjun , I found something that works alright, but is not very convenient in terms of user comfort. I have discretized outside of the library and used bn.df2onehot to encode the indexes into integers. |
You can indeed manipulate your data as you wish. The df2onehot was included in bnlearn to provide one of the steps from start-to-results. So you are right, it brings some comfort but at the same time it is generally slow. |
I implemented LiNGAM methods (Direct and ICA) to model datasets with continuous variables (without discretizing!). See docs here. update to the latest version with:
|
This functionality is added with the last update! Re-open if needed. |
I have a potentially dumb question. So, as I understand it, we need to discretize the data to work with this package on continuous biological data, such as gene expression or cytometry data. The inbuilt function for bn. discretize however takes in a build graph as an input though. With our data, we can't infer which nodes and edges we have to start a random graph. How can we use this package with such continuous data? As I understand it, in the R bnlearn library, it came with the iamb, and hatermink discretization options, but I don't see that in this package.
The text was updated successfully, but these errors were encountered: