Skip to content
jingchunzhu edited this page Dec 20, 2023 · 19 revisions

UCSC Xena is a tool for biologists to view and explore public and their private cancer genomics data using a web browser. If you have questions or feedback about this roadmap, please submit an issue (https://github.com/ucscXena/ucsc-xena-client/issues) on GitHub. Please note: this roadmap is subject to change.

Code of Conduct

Last updated: December 19, 2023

Single-cell genomics

Our vision is to make Xena Browser a single website for both single cell and bulk sample data visualization. Production Xena is currently supporting bulk sample data, so supporting single-cell is a priority. This will require significant performance upgrades for viewing matrix data. Our goal is to handle at least 1 million cells per dataset.

Xena data hubs

  • Use compressed data representation (finished)
  • Use binary data transfer (finished)
  • Use radix sort (finished)
  • Use web assembly (finished)
  • Use sparse matrix data representation in both hub and browser (finished)
  • Upgrade development hubs with new hub software (finished)
  • Browser backward compatible with existing hubs (finished)
  • Upgrade production public hubs with new hub software (v2)
  • Release new installable Xena hub software (v2)

Xena Browser

  • Interview users to create designs (finished)
  • Finish preliminary designs for single cell visualizations with user feedback (finished)
  • Add ability to visualize spatial genomics data (finished)
  • Add ability to visualize image data (finished)
  • Add ability to visualize UMAP/tSNE data (finished)
  • Add testing code for the new single cell branch client code (ongoing)
  • Add sample/cell search and filter, subgroup, chart, pdf, and bookmark functionalities to the single-cell branch (ongoing)
  • Upgrade production Xena Browser with single-cell branch client (v2)

Single-cell data ingestion

  • Update data from HCA Data Portal (ongoing)

    • Supported data type: gene expression estimates, images, spatially resolved gene expression data, meta-data
  • Update data from HTAN Data Portal (ongoing)

    • Supported data type: gene expression estimates, images, spatially resolved gene expression data, meta-data
  • Add cancer scRNA-seq data from publications (ongoing)

Connect to analysis tools

We plan to build utilities to directly connect Xena with popular genomics data analysis tools. Right now it is tedious to visualize users' analysis results because they need to 1) reformat their data 2) start a private Xena hub 3) load data into hub before they can view their results. Steps 2 and 3 were recently simplified (see use case: https://ucsc-xena.gitbook.io/project/local-xena-hub), but step 1 remains difficult for non-computational users to convert output of analysis tools to Xena compatible data files. We aim to ease the problem by directly connecting Xena with a few popular analysis tools.

scRNA-seq data analysis tools

We will start with focusing on integrating with scRNA-seq analysis tools:

Bulk-tumor data analysis tools

We will focus on integrating with transcriptome analysis tools:

Improve the usability and expansion of browser functionality

These usability improvements and expansions are in response to user research and feedback.

Chart user interface

Through user feedback and usability studies we have highlighted improvements to our Chart user interface that will help users to explore data with many categories, including Dot plots and Ridge plots. Wireframes are still in development.

Related github issues:

Multi-panel KM plot

Currently users are able to run and generate one Kaplan Meier analysis and graft at a time. Through user research we have found that users frequently want to run multiple analyses at a time, such as on multiple genes or across multiple cancer types. We currently have a prototype of this functionality that has passed User Acceptance testing. It is ready for implementation.

Related github issues:

Cluster in the sample/cell dimension

Currently we cluster genes/probes for our data matrix view. This allows users to see how different groups of genes/probes are regulated similarly. We want to expand this code functionality to also be able to cluster samples/cells. This will allow users to perform 2D hierarchical clustering of samples/cells and genes based on genomic data in a Xena column. The time and compute needed to perform the clustering will increase with the sample or cell number. Thus, the functionality is likely only available under certain sample/cell number threshold and work together with sample/cell search and filtering functionality.

Add gene annotations to PDF download

Our PDF downloads are great for users looking to user our visualizations in publications or presentations. Currently we only allow users to download the canvas, or the 'spreadsheet' part of the visualization. We want to expand this to include the SVG graphics at the top of a column that help give genomic context to the visualization. https://github.com/ucscXena/ucsc-xena-client/issues/142

Provide REST API for Xena data hubs

Users often download our data, either files through our S3 bucket or slices of the data through our python package or UCSCXenaTools R package. A REST API would provide a more consistent interface to our data hubs and further increase access to our data. The API development will be driven by collaborations with the Treehouse Childhood Cancer Initiative and Childhood Cancer Data Initiative Data Federation. Our collaborations will ensure the API serves real scientific use cases and is compatible with others in the field.

Define Data and API Requirements

  • Develop scientific use cases to drive requirements for data and the API
  • Define a minimal set of demographic and clinical phenotype data
  • Define the minimal harmonization required to support cross-resource queries of the data defined above

Data Harmonization

  • Harmonize Treehouse data according to data requirement defined above

API Development

  • Survey and decide on strategy for the API
  • Define and document a standard CCDI Federation open API as a means of accessing the data defined above
  • Implement the REST API according to the CCDI Federation open API standard
  • Deploy server to support the REST API
  • Develop user documentation, test site, query examples for the REST API
  • Make Treehouse shareable data available via the REST API

Support new longitudinal GDC datasets

Many of the newest GDC datasets have longitudinal data on patients. Our goal is to support this data, allowing users to fully explore this data.

New visualization

Develop a new visualization for this type of data. Will likely be a new tab on our browser (as opposed to attempting to integrate it into the current Visual Spreadsheet paradigm)

GDC longitudinal data ingestion

Public data resource updates

We are continually updating and adding new datasets to our portal. Below is our current priorities:

  • Add data from KidsFirst (finished)
    • open-access data
    • enable ability to compare to GTEx and TCGA
  • Release PCAWG data after marker paper publication (finished)
  • Update data from GDC Data Portal (ongoing)
    • open-access data
    • explore possibility of viewing controlled-access data through NCI CRDC authentication mechanism
  • Add data from PDC Data Portal (mass spec)
    • protein abundance
    • phosphoprotein abundance
    • ensure that can be visualized next to GDC data from the same cohort
  • Add data from TCPA Data Portal (reverse phase protein array)
    • protein abundance
    • phosphoprotein abundance
  • Update data from GTEx Data Portal
  • Update data from CCLE Data Portal
  • Add Sanger Cell Line data

UCSC Xena Coding Guidelines

Coding Guidelines

UCSC Xena Code of Conduct

Code of Conduct

Mentorship for Underrepresented Students

Mentorship description

Current Project Ideas

Completed Student Projects

Previous Project Ideas

Resources

Google Summer of Code projects

2019: Update GDC Data Ingestion Pipeline and Run

2018: Xena web loader

2017: Transcript View

UCSC Xena Roadmap

Roadmap

Clone this wiki locally