-
Notifications
You must be signed in to change notification settings - Fork 42
UCSC Xena Roadmap
UCSC Xena is a tool for biologists to view and explore public and their private cancer genomics data using a web browser. If you have questions or feedback about this roadmap, please submit an issue (https://github.com/ucscXena/ucsc-xena-client/issues) on GitHub. Please note: this roadmap is subject to change.
Last updated: December 19, 2023
Our vision is to make Xena Browser a single website for both single cell and bulk sample data visualization. Production Xena is currently supporting bulk sample data, so supporting single-cell is a priority. This will require significant performance upgrades for viewing matrix data. Our goal is to handle at least 1 million cells per dataset.
- Use compressed data representation (finished)
- Use binary data transfer (finished)
- Use radix sort (finished)
- Use web assembly (finished)
- Use sparse matrix data representation in both hub and browser (finished)
- Upgrade development hubs with new hub software (finished)
- Browser backward compatible with existing hubs (finished)
- Upgrade production public hubs with new hub software (v2)
- Release new installable Xena hub software (v2)
- Interview users to create designs (finished)
- Finish preliminary designs for single cell visualizations with user feedback (finished)
- Add ability to visualize spatial genomics data (finished)
- Add ability to visualize image data (finished)
- Add ability to visualize UMAP/tSNE data (finished)
- Add testing code for the new single cell branch client code (ongoing)
- Add sample/cell search and filter, subgroup, chart, pdf, and bookmark functionalities to the single-cell branch (ongoing)
- Upgrade production Xena Browser with single-cell branch client (v2)
-
Update data from HCA Data Portal (ongoing)
- Supported data type: gene expression estimates, images, spatially resolved gene expression data, meta-data
-
Update data from HTAN Data Portal (ongoing)
- Supported data type: gene expression estimates, images, spatially resolved gene expression data, meta-data
-
Add cancer scRNA-seq data from publications (ongoing)
We plan to build utilities to directly connect Xena with popular genomics data analysis tools. Right now it is tedious to visualize users' analysis results because they need to 1) reformat their data 2) start a private Xena hub 3) load data into hub before they can view their results. Steps 2 and 3 were recently simplified (see use case: https://ucsc-xena.gitbook.io/project/local-xena-hub), but step 1 remains difficult for non-computational users to convert output of analysis tools to Xena compatible data files. We aim to ease the problem by directly connecting Xena with a few popular analysis tools.
We will start with focusing on integrating with scRNA-seq analysis tools:
- Scanpy (Wolf et al. 2018) (finished)
- Paste (Zeira et al. 2022) (finished)
- scBind (Dou et al. 2022) (finished)
- Seurat (Butler et al. 2018)
- Cell Ranger (10x Genomics)
We will focus on integrating with transcriptome analysis tools:
- Salmon (Patro et al. 2017)
- Sailfish (Patro, Mount, and Kingsford 2014)
- kallisto (Bray et al. 2016)
- RSEM (Li and Dewey 2011)
- HTSeq (Anders 2015)
These usability improvements and expansions are in response to user research and feedback.
Through user feedback and usability studies we have highlighted improvements to our Chart user interface that will help users to explore data with many categories, including Dot plots and Ridge plots. Wireframes are still in development.
Related github issues:
- https://github.com/ucscXena/ucsc-xena-client/issues/444
- https://github.com/ucscXena/ucsc-xena-client/issues/558
- https://github.com/ucscXena/ucsc-xena-client/issues/556
- https://github.com/ucscXena/ucsc-xena-client/issues/557
- https://github.com/ucscXena/ucsc-xena-client/issues/555
- https://github.com/ucscXena/ucsc-xena-client/issues/377
- https://github.com/ucscXena/ucsc-xena-client/issues/389
- https://github.com/ucscXena/ucsc-xena-client/issues/440
- https://github.com/ucscXena/ucsc-xena-client/issues/193
Currently users are able to run and generate one Kaplan Meier analysis and graft at a time. Through user research we have found that users frequently want to run multiple analyses at a time, such as on multiple genes or across multiple cancer types. We currently have a prototype of this functionality that has passed User Acceptance testing. It is ready for implementation.
Related github issues:
- https://github.com/ucscXena/ucsc-xena-client/issues/559
- https://github.com/ucscXena/ucsc-xena-client/issues/356
- https://github.com/ucscXena/ucsc-xena-client/issues/396
- https://github.com/ucscXena/ucsc-xena-client/issues/414
Currently we cluster genes/probes for our data matrix view. This allows users to see how different groups of genes/probes are regulated similarly. We want to expand this code functionality to also be able to cluster samples/cells. This will allow users to perform 2D hierarchical clustering of samples/cells and genes based on genomic data in a Xena column. The time and compute needed to perform the clustering will increase with the sample or cell number. Thus, the functionality is likely only available under certain sample/cell number threshold and work together with sample/cell search and filtering functionality.
Our PDF downloads are great for users looking to user our visualizations in publications or presentations. Currently we only allow users to download the canvas, or the 'spreadsheet' part of the visualization. We want to expand this to include the SVG graphics at the top of a column that help give genomic context to the visualization. https://github.com/ucscXena/ucsc-xena-client/issues/142
Users often download our data, either files through our S3 bucket or slices of the data through our python package or UCSCXenaTools R package. A REST API would provide a more consistent interface to our data hubs and further increase access to our data. The API development will be driven by collaborations with the Treehouse Childhood Cancer Initiative and Childhood Cancer Data Initiative Data Federation. Our collaborations will ensure the API serves real scientific use cases and is compatible with others in the field.
- Develop scientific use cases to drive requirements for data and the API
- Define a minimal set of demographic and clinical phenotype data
- Define the minimal harmonization required to support cross-resource queries of the data defined above
- Harmonize Treehouse data according to data requirement defined above
- Survey and decide on strategy for the API
- Define and document a standard CCDI Federation open API as a means of accessing the data defined above
- Implement the REST API according to the CCDI Federation open API standard
- Deploy server to support the REST API
- Develop user documentation, test site, query examples for the REST API
- Make Treehouse shareable data available via the REST API
Many of the newest GDC datasets have longitudinal data on patients. Our goal is to support this data, allowing users to fully explore this data.
Develop a new visualization for this type of data. Will likely be a new tab on our browser (as opposed to attempting to integrate it into the current Visual Spreadsheet paradigm)
- MMRF-COMMPASS
We are continually updating and adding new datasets to our portal. Below is our current priorities:
- Add data from KidsFirst (finished)
- open-access data
- enable ability to compare to GTEx and TCGA
- Release PCAWG data after marker paper publication (finished)
- Update data from GDC Data Portal (ongoing)
- open-access data
- explore possibility of viewing controlled-access data through NCI CRDC authentication mechanism
- Add data from PDC Data Portal (mass spec)
- protein abundance
- phosphoprotein abundance
- ensure that can be visualized next to GDC data from the same cohort
- Add data from TCPA Data Portal (reverse phase protein array)
- protein abundance
- phosphoprotein abundance
- Update data from GTEx Data Portal
- Update data from CCLE Data Portal
- Add Sanger Cell Line data