Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neatmap / neatsort #146

Open
antagomir opened this issue Feb 6, 2022 · 25 comments
Open

neatmap / neatsort #146

antagomir opened this issue Feb 6, 2022 · 25 comments
Assignees

Comments

@antagomir
Copy link
Member

Moving here miaViz issue #40
Row/col sorting based on angle on MDS projection is implemented in SEtools R package.

The vignette says "By default, rows are sorted not with hierarchical clustering, but from the angle on a MDS plot, which tends to give nicer results than bottom-up hierarchical clustering."

Add an example on how to use this to construct so-called neatmaps.

Cite the neatmap publication as well.

@antagomir
Copy link
Member Author

This could go, for instance, in ch. 15 (visualization) / heatmaps.

@RiboRings
Copy link
Member

After some reading, It seems that sechm does order rows from the angle of the MDS plot by default (just like NeatMap, which is no longer an independent package and it works under the hood of sechm).

So OMA chapter 15.3 is already showing how to do that. According to this sechm tutorial section 2.2, It is also possible to change the ordering criterion back to standard hierarchical clustering, which could be added to OMA. However, I personally find ComplexHeatmap good enough for hierarchical clustering.

What are your thoughts?

@antagomir
Copy link
Member Author

I agree that sechm is good for neatmaps and ComplexHeatmap for hclust.

In fact ch 15.3 could be simplified in this regard.

  • explicitly mention that sechm does neatmap (and cite the publication if relevant)
  • remove the base R hclust example and instead cite ComplexHeatmap, or show an example (avoid duplicating examples)
  • perhaps explain the difference between these very briefly
  • there is also too much housekeeping code for transformations etc.; to check if this part could be simplified somehow. On the other hand using clr+z for few top taxa is practically very useful so I am not sure if it can be shortened. At least data preparation could be done in its own step and shortened if possible so that the example is better organized.

@antagomir
Copy link
Member Author

This topic is now under construction in miaViz PR microbiome/miaViz#128 - in fact I forgot we had this issue.. Let us finalize this OMA issue after the miaViz PR is complete.

@SHillman836 I gather two points from above:

  1. Row/col sorting based on angle on MDS projection is implemented in SEtools R package. Our case is more general, miaViz::getNeatOrder can use PCA, MDS, or any other ordination method. I would cite this SEtools method in the documentation of miaViz::getNeatOrder just for the record. It would be good to make a quick comparison to see if there are any notable differences in performance/outcomes or speed.

  2. We could consider contributing the expanded functionality (miaViz::getNeatOrder) to the SEtools package instead (from miaViz). It could fit there naturally (perhaps even more naturally than to miaViz) and support collaborative ecosystem development. Downside is that we might need to comply with the naming conventions in that package.

  3. The sechm package (see above) supports SE objects and it has a similar radial angle method already implemented. I still think it is useful to have our own method because sechm does this directly as part of heatmap plotting but we have separated these functions and that has added value. In any case it would be useful to mention in the getNeatOrder function documentation (the @details slot or or in the function @example comments) that the sechm implements a similar method and includes support for SE. In OMA, we could show both, or just the miaViz implementation (sechm has other uses that will be demonstrated anyway). It would be good to have a brief look how these compare: do we get similar heatmaps with both miaViz::getNeatOrder and sechm?

@SHillman836
Copy link
Contributor

SHillman836 commented Jun 17, 2024

OK thanks - I had a few follow up questions

  1. To your first point, where exactly is this SETools method? The SETools docs are pretty limited - https://www.bioconductor.org/packages/release/bioc/vignettes/SEtools/inst/doc/SEtools.html - and I might be wrong but I thought SETools got amalgamated into sechm.... and that MDS sorting method was only in sechm

  2. So I've done some testing of how the sechm sorted heatmaps differ from the getNeatOrder sorted heatmaps.

When you perform an MDS on the z-transformed assay data, then order with the getNeatOrder function, then plot the ordered original z-transformed data, with the scaling parameter in sechm as FALSE, you get this -
Screenshot 2024-06-18 at 13 37 25

Then when you sort via sechm directly, passing in the relative abundance data, using the sechm scaling parameter, you get this -
Screenshot 2024-06-18 at 13 38 05

So there is some difference.

@antagomir
Copy link
Member Author

antagomir commented Jun 18, 2024

  1. If neatsorting is not available in SEtools then we can forget about that

  2. Could you show the code that does this?

@SHillman836
Copy link
Contributor

SHillman836 commented Jun 19, 2024

Code for heatmap 1, with MDS is -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

tse <- transformAssay(tse, assay.type = "counts", method = "relabundance", MARGIN = "samples", name = "relabundance")

tse <- transformAssay(tse, assay.type = "relabundance", method = "z", MARGIN = "features", name = "z")

tse <- runMDS(tse,
              FUN = vegan::vegdist,
              method = "bray",
              assay.type = "z",
              name = "MDS_bray")

mds_coords <- reducedDim(tse, "MDS_bray")

sorted_order <- getNeatOrder(mds_coords, dimensions = c(1, 2), centering.method = "mean")
tse <- tse[, sorted_order]

features <- rownames(assay(tse, "z"))
sechm_plot <- sechm(tse, assayName = "z", features=features, do.scale=FALSE, cluster_rows=FALSE, sortRowsOn = NULL)

print(sechm_plot_custom_sorting)

Code for heatmap 2, direct via sechm is -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

tse <- transformAssay(tse, assay.type = "counts", method = "relabundance", MARGIN = "samples", name = "relabundance")

tse <- transformAssay(tse, assay.type = "relabundance", method = "z", MARGIN = "features", name = "z")

features <- rownames(assay(tse, "relabundance"))
sechm_plot <- sechm(tse, assayName = "relabundance", features = features, do.scale = TRUE, sortRowsOn = NULL, cluster_rows = FALSE)

print(sechm_plot)

@SHillman836
Copy link
Contributor

(Though I'm adding your changes over params and method calling now)

@antagomir
Copy link
Member Author

antagomir commented Jun 19, 2024

For final examples in the R documentation it will better to have just once the parts in data preparation that are common for both methods (agglom, transf).

Code 1: I would simplify this:

reducedDim(tse, "MDS_bray") <- runMDS(tse,
              FUN = vegan::vegdist,
              method = "bray",
              assay.type = "z",
              name = "MDS_bray")

ord <- getNeatOrder(reducedDim(tse, "MDS_bray"), centering.method = "mean")

sechm_plot <- sechm(tse[, ord], assayName = "z", features=rownames(tse), do.scale=FALSE, cluster_rows=FALSE, sortRowsOn = NULL)

Some comments could be added to explain the code.

@TuomasBorman
Copy link
Contributor

Use calculateMDS https://rdrr.io/bioc/scater/man/runMDS.html

@antagomir
Copy link
Member Author

antagomir commented Jun 19, 2024

Usually we would recommend the following transformations for this kind of heatmap:

  • clr (or log10p) for samples
  • z (or now, "scale") for features

The reason: clr will logarithmize the data (-> becomes more normally distributed -> nice for symmetric visualization) and aims to remove the compositionality bias; then scaling on features will put center on zero and scale variance to unit for all features -> more direct visual comparability

-> Check how the plots look after this.

@SHillman836
Copy link
Contributor

SHillman836 commented Jun 19, 2024

Ah OK - just to clarify sorry should've made it clearer, so that code wasn't code to be put into the documentation or published anywhere, that was just me exploring differences between sechm and getNeatOrder heatmaps. Which is why it's abit messy and there are no comments - it was just to look at differences in heatmaps. In fact, my code to be put into the documentation uses PCA not MDS - the reason I had to do MDS for this was cause sechm only creates MDS heatmaps.

Ok yes, noted about clr. will retest the plots with this. Would you use clr not rel-abundance for PCAs too? Yeah it's not pushed yet but my updated miaViz method example uses scale for the transformAssay method

@antagomir
Copy link
Member Author

Yep I realized afterwards. I think those transformation updates might help

@antagomir
Copy link
Member Author

CLR can be done for both counts or relabundances.

If you use CLR + Euclidean distance, then PCA would be the common option. Otherwise, relabundance + Bray-Curtis -> MDS / NMDS

@SHillman836
Copy link
Contributor

SHillman836 commented Jun 19, 2024

Ah ok, thanks.

So for PCA -

clr for samples -> z-transformation for features-> Euclidean distance/PCA

Then for MDS -

relabundances -> clr for samples -> z transformations for features -> bray-curtis -> MDS

Let me know if I'm missing something

@antagomir
Copy link
Member Author

yes ok - although clr you can do directly on counts as well, I think it shouldnt drastically change the outcome (or ideally, you could check that). If it doesn't change the results then I would prefer the simpler sequence (counts -> clr -> z -> ...

@SHillman836
Copy link
Contributor

OK so retested and there are definitely some differences.

So when ordering and plotting with sechm you get this heatmap -

Screenshot 2024-06-20 at 13 31 33

The code for this is -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

## Group data by taxonomic order
tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

## Add a pseudocount to the counts data
assay(tse, "counts") <- assay(tse, "counts") + 1

## Transform the samples into relative abundances
tse <- transformAssay(tse, assay.type = "counts", method = "clr", MARGIN = "samples", name = "clr")

## Transform the features (taxa) into zero mean, unit variance (z transformation)
tse <- transformAssay(tse, assay.type = "clr", method = "z", MARGIN = "features", name = "z")

# Plot with sechm
features <- rownames(assay(tse, "z"))
sechm_plot <- sechm(tse, assayName = "z", features = features, do.scale = TRUE, sortRowsOn = NULL, cluster_rows = FALSE)

## Display the plot
print(sechm_plot)

Although, when you create the MDS with mia, apply the getNeatOrder method, then plot that with sechm you get something quite different

Screenshot 2024-06-20 at 13 35 30

The code for this is -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

## Group data by taxonomic order
tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

## Add a pseudocount to the counts data
assay(tse, "counts") <- assay(tse, "counts") + 1

## Transform the samples into relative abundances
tse <- transformAssay(tse, assay.type = "counts", method = "clr", MARGIN = "samples", name = "clr")

## Transform the features (taxa) into zero mean, unit variance (z transformation)
tse <- transformAssay(tse, assay.type = "clr", method = "z", MARGIN = "features", name = "z")

## Run MDS on the dataset with Bray-Curtis distances using relative abundance data
reducedDim(tse, "MDS_bray") <- calculateMDS(tse,
                           FUN = vegan::vegdist,
                           method = "bray",
                           assay.type = "z")

## Sort by radial theta using MDS coordinates
sorted_order <- getNeatOrder(reducedDim(tse, "MDS_bray")[, c(1,2)], centering = "mean")
tse <- tse[, sorted_order]

## Create the heatmap with sechm whilst retaining this radial theta ordering
sechm_plot <- sechm(tse, assayName = "z", features=rownames(assay(tse, "z")), do.scale=FALSE, cluster_rows=FALSE, sortRowsOn = NULL)

## Display the plot
print(sechm_plot_custom_sorting)

@antagomir
Copy link
Member Author

Any chance to troubleshoot where the difference appears exactly? is it the plotting command or some of the preceding steps (ordination, sorting, color palette..)

@antagomir
Copy link
Member Author

@SHillman836 is this issue/PR ready or something still pending?

@SHillman836
Copy link
Contributor

@antagomir ah sorry I actually completely forgot about this one! My bad, just got so caught up in all the other stuff. Will add this to the to do list for this week.

@antagomir
Copy link
Member Author

Great! Let's make sure that there are no near-finished pending tasks..!

@SHillman836
Copy link
Contributor

SHillman836 commented Jul 26, 2024

Screenshot 2024-07-26 at 14 04 32

Image on the left is sorting with sechm directly -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

## Group data by taxonomic order
tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

## Add a pseudocount to the counts data
assay(tse, "counts") <- assay(tse, "counts") + 1

## Transform the samples into relative abundances
tse <- transformAssay(tse, assay.type = "counts", method = "clr", MARGIN = "samples", name = "clr")

## Transform the features (taxa) into zero mean, unit variance (z transformation)
tse <- transformAssay(tse, assay.type = "clr", method = "z", MARGIN = "features", name = "z")

## Verify the assay data before plotting
test_unsorted <- assay(tse, "z")
View(test_unsorted)

# Plot with sechm using its default MDS angle sorting
sechm_plot_mds_angle_sorting <- sechm(tse, assayName = "z", features = rownames(assay(tse, "z")), 
                                      do.scale = FALSE, cluster_rows = FALSE, breaks = 0.985)

## Display the plot
print(sechm_plot_mds_angle_sorting)

image on right is sorting with getneatorder -

library(mia)
library(scater)
library(sechm)
data(peerj13075)

## Group data by taxonomic order
tse <- agglomerateByRank(peerj13075, rank = "order", onRankOnly = TRUE)

## Add a pseudocount to the counts data
assay(tse, "counts") <- assay(tse, "counts") + 1

## Transform the samples into relative abundances
tse <- transformAssay(tse, assay.type = "counts", method = "clr", MARGIN = "samples", name = "clr")

## Transform the features (taxa) into zero mean, unit variance (z transformation)
tse <- transformAssay(tse, assay.type = "clr", method = "z", MARGIN = "features", name = "z")

## Run MDS on the dataset with Bray-Curtis distances using relative abundance data
reducedDim(tse, "MDS_bray") <- calculateMDS(tse, FUN = vegan::vegdist, method = "bray", assay.type = "z")

## Sort by radial theta using MDS coordinates
sorted_order <- getNeatOrder(reducedDim(tse, "MDS_bray")[, c(1, 2)], centering = "mean")
tse <- tse[, sorted_order]

## Verify sorted assay data
test_sorted <- assay(tse, "z")
View(test_sorted)

## Create the heatmap with sechm whilst retaining this radial theta ordering
sechm_plot_custom_sorting <- sechm(tse, assayName = "z", features = rownames(assay(tse, "z")), cluster_rows = FALSE, 
                                   do.scale = FALSE, sortRowsOn = NULL, breaks = 0.985)

## Display the plot
print(sechm_plot_custom_sorting)

it's the sortRowsOn parameter that makes the difference. And it's because sechms MDS sorting affects rows. They use cols to alter the row order. Whereas the getNeatOrder miaViz method sorts samples, not rows. so when you apply this and disable the sechm sorting, you get something different. Sechm MDS sorts rows. getNeatOrder MDS sorts cols.

@antagomir
Copy link
Member Author

It should be possible to enable sorting both rows and columns, and the user should be able to decide if they sort neither, one, or both.

@SHillman836
Copy link
Contributor

SHillman836 commented Jul 26, 2024

In terms of the user deciding for what - this issue was just about differences between sechm and neatOrder. With neatOrder you can just transpose the matrix to sort the rows I guess. But the neat order method has been merged in, and just takes in ordinated data to sort. It's very specific. Then in terms of sechm - you can't sort cols, only rows.

@antagomir
Copy link
Member Author

Great - a quickly checked the code and noticed the following at least.

  1. Your example code for getNearOrder is using "bray" method for dissimilarity calculations. This method does not work for negative values, so I am not sure how it can be applied for z-transformed values, also a warning is thrown.

  2. The sechm package calls seriate::get_order for the sorting get_order(seriate(dist(y), method=method)) on line 53 here. And this uses base R function dist with default value, the default is euclidean distance. Therefore your example code seems to rely on euclidean distance in the sechm example and bray dissimilarity on the getNeatSort example. This is expected to cause notable differences.

  3. If we just want to compare these two methods, then can we get the comparable ordering if we just run getNeatSort for the transposed data? Then it should we an advantage of getNeatSort that user can choose to run it for rows, cols, or both, whereas in sechm only rows are available?

  4. However.. sechm is calling the seriation CRAN package which provides many different linear sorting methods, not just MDS. Noticing this, we might like to use this directly because there is larger variety of optimized and actively maintained methods for the same purpose.

Therefore my suggestion for this issue is that we should experiment a bit with seriation and see if that provides the simple solution for our heatmap sorting needs. If yes, then I would remove getNeatOrder function from mia (and apologize for the extra hassle with that) and add examples on heatmap sorting with seriation in OMA.

@antagomir antagomir assigned antagomir and Daenarys8 and unassigned RiboRings Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

5 participants