Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Make BED v1 a primitive data format #583

Closed
manzt opened this issue Nov 16, 2021 · 4 comments · Fixed by #877
Closed

feat: Make BED v1 a primitive data format #583

manzt opened this issue Nov 16, 2021 · 4 comments · Fixed by #877
Assignees
Labels
enhancement New feature or request

Comments

@manzt
Copy link
Member

manzt commented Nov 16, 2021

Motivation

BED (Browser Extensible Data) format provides a flexible way to define the data lines that are displayed in an annotation track. It has recently been formalized in the v1 specification.

Gosling currently support BED via CSV, but it is quite verbose and users can define any field names they'd like for standard BED fields:

Specifying BED12+1 in Gosling as CSV
{
  "type": "csv",
  "url": "https://localhost:8080/data.bed",
  "headerNames": ["chrom", "chromStart", "chromEnd", "name", "score", "strand", "thickStart", "thickEnd", "itemRgb", "blockCount", "blockSizes", "myField"],
  "chromosomeField": "chrom",
  "genomicFields": ["chromStart", "chromEnd"],
  "quantitativeFields": ["score", "thickStart", "thickEnd", "blockCount"],
  "separator": "\t"
}

Proposal

Add BED as a new data-type in Gosling. BED is designed for this exact use case, and should be the preferred format for representing text-based genomic annotation data (over a custom CSV capturing identical information). Using BED will make specifications less verbose and more reusable. Using BED has the additional side-effect of ensuring datasets behind a Gosling visualization are more likely to be interoperable with other genomics tools.

interface BED {
  type: "bed";
  url: string;
  customFields?: string;
  separator?: string;
}
Specifying BED12+1 in Gosling as CSV
{
  "type": "bed",
  "url": "https://localhost:8080/data.bed",
  "customFields": ["myField"]
}
@manzt manzt added the enhancement New feature or request label Nov 16, 2021
@sehilyi
Copy link
Member

sehilyi commented Nov 16, 2021

Thank you for creating this issue! This will be a helpful update to make our grammar more genomic-specific.

One quick clarification - By the length of customeFields, we will infer the number of standard and custom fields, i.e., if the length is 1, then we consider the last column to be the custom one while the other fields are standard ones.

@manzt
Copy link
Member Author

manzt commented Nov 16, 2021

One quick clarification - By the length of customFields, we will infer the number of standard and custom fields, i.e., if the length is 1, then we consider the last column to be the custom one while the other fields are standard ones.

Yes exactly. We can determine BEDn+m from the custom fields alone (n = total # of columns - m). Custom fields can only follow standard fields, so the order of customFields matters and the number of custom fields tells us how many of standard fields are present.

e.g.

For a TSV with 4 columns

{
  "type": "bed",
  "url": "https://localhost:8080/data.bed",
}

Interpretation is BED4 (chrom, chromStart, chromEnd, score)

{
  "type": "bed",
  "url": "https://localhost:8080/data.bed",
  "customFields": ["custom"]
}

Interpretation is BED3+1 (chrom, chromStart, chromEnd, custom)

@manzt
Copy link
Member Author

manzt commented Nov 16, 2021

The final thing here is whether types need to be defined for the custom fields. This is similar to part of the discussion in #579, and I'd argue for a similar reason they are not necessary.

@sehilyi
Copy link
Member

sehilyi commented Nov 16, 2021

This is similar to part of the discussion in #579, and I'd argue for a similar reason they are not necessary.

I assume the custom fields will be either nominal or quantitative. If so, I agree with not requiring users to specify the field types.

@manzt manzt changed the title feat: first class support for BED feat: Make BED v1 a data-primative Nov 16, 2021
@manzt manzt changed the title feat: Make BED v1 a data-primative feat: Make BED v1 a primitive data format Nov 16, 2021
@sehilyi sehilyi pinned this issue Jul 26, 2022
@etowahadams etowahadams self-assigned this Apr 18, 2023
@sehilyi sehilyi unpinned this issue May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants