Parses biom files
For use in node
this module is tested with nodejs version 4.2 or higher.
Specifically versions 0.x are not working (still standard in Ubuntu prior to 16.04).
Included libraries are:
Install the module with:
npm install biojs-io-biom
Then you can use it in node
like this:
var Biom = require('biojs-io-biom').Biom;
biom = new Biom(); // "creates new biom object"
To use the biojs-io-biom module in the browser you need the build/biom.js
file:
<!-- ... -->
<script src="biom.js"></script>
<script type="text/javascript">
var Biom = require('biojs-io-biom').Biom;
biom = new Biom();
</script>
<!-- ... -->
You can also use bower to install the biojs-io-biom component for use in the browser:
bower install biojs-io-biom
The file you need will be under bower_components/biojs-io-biom/build/biom.js
in this case.
See the biom format specification (version 1.0) for more details on the file format.
Please cite our article at f1000 Research that describes this module:
Markus J. Ankenbrand, Niklas Terhoeven, Sonja Hohlfeld, Frank Förster, and Alexander Keller.
biojs-io-biom, a BioJS component for handling data in Biological Observation Matrix (BIOM) format[version 2; referees: 1 approved, 2 approved with reservations].
F1000Research 2017, 5:2348. doi: 10.12688/f1000research.9618.2
You can cite the current version of this software repository using the Zenodo
Please cite the biom-format project (in addition to this module) as:
The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome.
Daniel McDonald, Jose C. Clemente, Justin Kuczynski, Jai Ram Rideout, Jesse Stombaugh, Doug Wendel, Andreas Wilke, Susan Huse, John Hufnagle, Folker Meyer, Rob Knight, and J. Gregory Caporaso.
GigaScience 2012, 1:7. doi:10.1186/2047-217X-1-7
Parameter: object
Type: Object
Example: {}
The 'constructor' method is responsible for creating an object call via new Biom()
.
How to use this method
biom = new Biom();
// or with a (partial) biom json object
biom = new Biom({
id: "Table ID",
shape: [2,2],
rows: [
{id: "row1", metadata: null},
{id: "row2", metadata: null}
],
columns: [
{id: "col1", metadata: null},
{id: "col2", metadata: null}
]
// ...
});
// type validation is performed so assigning illegal types will result in a TypeError:
// would throw a TypeError:
//new Biom({id: 7});
The getter methods are called implicitly when reading properties with the dot notation.
biom = new Biom();
id = biom.id;
data = biom.data;
The setter methods are called implicitly when assigning to properties with the dot notation (some basic type checks are performed).
biom = new Biom();
biom.id = "New ID";
// would throw a TypeError:
//biom.id = 17;
The matrix_type
setter also updates internal representation of data
:
biom = new Biom({matrix_type: 'sparse', shape: [2,4], data: [[0,1,12],[1,2,7],[1,3,17]]});
biom.matrix_type = 'dense';
biom.data;
// [[0,12,0,0],
// [0,0,7,17]]
The rows
and columns
setter update the internal data by id.
Consider the following example:
biom = new Biom({
matrix_type: 'dense',
rows: [{id: 'row1'},{id: 'row2'},{id: 'row3'}],
columns: [{id: 'col1'},{id: 'col2'}],
data: [[0,1],[2,7],[5,0]]
});
This will result in the following data
matrix:
col1 | col2 | |
---|---|---|
row1 | 0 | 1 |
row2 | 2 | 7 |
row3 | 5 | 0 |
Now setting the rows
will also update the data
accordingly:
biom.rows = [{id: 'row2'},{id: 'row1'},{id: 'row4'},{id: 'row5'}];
console.log(biom.data);
This results in the following table (row1 and row2 are swapped, row3 is removed and two new rows, row4 and row5 are added):
col1 | col2 | |
---|---|---|
row2 | 2 | 7 |
row1 | 0 | 1 |
row4 | 0 | 0 |
row5 | 0 | 0 |
This all happens in the background by simply assigning to rows
. The same applies to columns
.
This way data integrity is preserved.
Parameter: object
Type: Object
Example: {dimension: 'rows', attribute: 'taxonomy'}
Throws Error
if attribute is not set or dimension is none of "rows", "observation", "columns" or "sample"
This method extracts metadata with a given attribute from rows or columns (dimension).
Default value for object.dimension
is "rows".
biom = new Biom({
id: "Table ID",
shape: [2,5],
rows: [
{id: "row1", metadata: null},
{id: "row2", metadata: null}
],
columns: [
{id: "col1", metadata: {pH: 7.2}},
{id: "col2", metadata: {pH: 8.1}},
{id: "col3", metadata: {pH: null}},
{id: "col4", metadata: {pH: 6.9}},
{id: "col5", metadata: {}}
]
// ...
});
meta = biom.getMetadata({dimension: 'columns', attribute: 'pH'});
// [7.2, 8.1, null, 6.9, null]
The attribute is used as path as in the lodash get function (https://lodash.com/docs/4.16.6#get) so multiple levels can be searched (string with dots is interpreted as a path ('a.b.c' is equivalent to ['a','b','c']))
biom = new Biom({
id: "Table ID",
shape: [2,2],
rows: [
{id: "row1", metadata: null},
{id: "row2", metadata: null}
],
columns: [
{id: "col1", metadata: {'a': {'b': {'c': 1}}}},
{id: "col2", metadata: {'a': {'b': {'c': 2}}}},
{id: "col3", metadata: {'a': {'b': null}}},
{id: "col4", metadata: {'a': {'b': {'c': null}}}},
{id: "col5", metadata: {'a': {'b': {'c': 5}}}}
]
// ...
});
meta = biom.getMetadata({dimension: 'columns', attribute: ['a', 'b', 'c']});
// [1, 2, null, null, 5]
Parameter: object
Type: Object
Example: {dimension: 'columns', attribute: 'pH', defaultValue: 7}
Example: {dimension: 'columns', attribute: 'pH', values: [6,7,5,5,9]}
Example: {dimension: 'rows', attribute: 'importance', values: {row3id: 5, row7id: 0}}
Throws Error
if attribute is not set
Throws Error
if dimension is none of "rows", "observation", "columns" or "sample"
Throws Error
if not exactly one of defaultValue
or values
is set (setting to null is considered unset)
Throws Error
if values is an array with a length that does not match that of the dimension (rows or columns)
This method adds metadata with a given attribute to rows or columns (dimension).
Default value for object.dimension
is "rows"
biom = new Biom({
id: "Table ID",
shape: [2,5],
rows: [
{id: "row1", metadata: null},
{id: "row2", metadata: null}
],
columns: [
{id: "col1", metadata: {}},
{id: "col2", metadata: {}},
{id: "col3", metadata: {}},
{id: "col4", metadata: {}},
{id: "col5", metadata: {}}
]
// ...
});
biom.addMetadata({dimension: 'columns', attribute: 'pH', defaultValue: 7})
biom.getMetadata({dimension: 'columns', attribute: 'pH'});
// [7, 7, 7, 7, 7]
biom.addMetadata({dimension: 'columns', attribute: 'pH', values: [1,2,null,4,5]})
biom.getMetadata({dimension: 'columns', attribute: 'pH'});
// [1, 2, null, 4, 5]
biom.addMetadata({dimension: 'columns', attribute: 'pH', values: {col2: 7, col3: 9, col4: null}})
biom.getMetadata({dimension: 'columns', attribute: 'pH'});
// [1, 3, 9, null, 5]
The attribute is used as path as in the lodash set function (https://lodash.com/docs/4.16.6#set) so multiple levels can be given (string with dots is interpreted as a path ('a.b.c' is equivalent to ['a','b','c']))
Accessing data
directly returns different results depending on matrix_type
(sparse
or dense
).
Therefore a couple of helper functions have been implemented that work independent of matrix_type
:
getDataAt(rowID, colID)
returns a single data pointsetDataAt(rowID, colID, value)
sets a single data pointgetDataRow(rowID)
returns data for a single rowsetDataRow(rowID, values)
sets data for a single rowgetDataColumn(colID)
returns data for a single columnsetDataColumn(colID, values)
sets data for a single columngetDataMatrix()
returns the full data matrix in dense format (independent of internal representation)setDataMatrix(values)
sets the full data matrix in dense format (independent of internal representation)
biom = new Biom({
rows: [{id: 'r1'},{id: 'r2'},{id: 'r3'},{id: 'r4'},{id: 'r5'}],
columns: [{id: 'c1'},{id: 'c2'},{id: 'c3'},{id: 'c4'},{id: 'c5'}],
matrix_type: 'sparse',
data: [[0,1,11],[1,2,13],[4,4,9]]
});
biom.getDataAt('r1','c2');
// 11
biom.getDataRow('r2');
// [0,0,13,0,0]
biom.getDataColumn('c5');
// [0,0,0,0,9]
biom.getDataMatrix();
// [[0,11, 0, 0, 0],
// [0, 0,13, 0, 0],
// [0, 0, 0, 0, 0],
// [0, 0, 0, 0, 0],
// [0, 0, 0, 0, 9]]
biom.setDataAt('r3','c4',99);
biom.setDataRow('r4',[1,2,3,4,5]);
biom.setDataColumn('c1',[10,9,8,7,6]);
biom.getDataMatrix();
// [[10,11, 0, 0, 0],
// [ 9, 0,13, 0, 0],
// [ 8, 0, 0,99, 0],
// [ 7, 2, 3, 4, 5],
// [ 6, 0, 0, 0, 9]]
// internal data remains to be sparse
biom.data
// [[0,0,10],[0,1,11],[1,0,9],[1,2,13],[2,0,8],[2,3,99],[3,0,7],[3,1,2],[3,2,3],[3,3,4],[3,4,5],[4,1,6],[4,4,9]]
Parameter: inPlace
Type: boolean
Returns Array
This function returns the presence/absence matrix as an array of arrays. If inPlace
is true
the internal data matrix is replaced.
biom = new Biom({
rows: [{id: 'o1'}, {id: 'o2'}],
columns: [{id: 's1'}, {id: 's2'}, {id: 's3'}],
matrix_type: 'dense',
data: [[0, 0, 1], [1, 3, 42]]
});
biom.getDataMatrix();
// [[0, 0, 1], [1, 3, 42]]
biom.pa(false);
// [[0, 0, 1], [1, 1, 1]]
biom.getDataMatrix();
// [[0, 0, 1], [1, 3, 42]]
biom.pa(true);
// [[0, 0, 1], [1, 1, 1]]
biom.getDataMatrix();
// [[0, 0, 1], [1, 1, 1]]
Parameter: options
Type: Object
Example: {f: function(data, id, metadata){return data.map(x => x*2)}, dimension: 'columns', inPlace: true}
Returns Array
This function returns the transformed matrix as an array of arrays using the provided function.
If inPlace
is true
the internal data matrix is replaced.
biom = new Biom({
rows: [{id: 'o1'}, {id: 'o2'}],
columns: [{id: 's1'}, {id: 's2'}, {id: 's3'}],
matrix_type: 'dense',
data: [[0, 0, 1], [1, 3, 42]]
});
biom.getDataMatrix();
// [[0, 0, 1], [1, 3, 42]]
biom.transform({f: function(data, id, metadata){return data.map(x => x*2)}, dimension: 'columns', inPlace: true});
// [[0, 0, 2], [2, 6, 84]]
Parameter: options
Type: Object
Example: {dimension: 'rows', inPlace: false}
Returns Array
This function returns the normalized matrix as an array of arrays using relativation (either by row or by column).
If inPlace
is true
the internal data matrix is replaced.
biom = new Biom({
rows: [{id: 'o1'}, {id: 'o2'}],
columns: [{id: 's1'}, {id: 's2'}, {id: 's3'}],
matrix_type: 'dense',
data: [[0, 0, 8], [3, 5, 42]]
});
biom.getDataMatrix();
// [[0, 0, 8], [3, 5, 42]]
biom.norm({dimension: 'columns', inPlace: false});
// [[0.0, 0.0, 0.16], [1.0, 1.0, 0.84]]
biom.norm({dimension: 'rows', inPlace: false});
// [[0.0, 0.0, 1.0], [0.06, 0.1, 0.84]]
Parameter: options
Type: Object
Example: {f: function(data, id, metadata){return metadata.priority > 1}, dimension: 'columns', inPlace: true}
Returns Array
This function returns the filtered matrix as an array of arrays using the provided function.
If inPlace
is true
the internal data matrix is replaced.
biom = new Biom({
rows: [{id: 'o1'}, {id: 'o2'}],
columns: [{id: 's1'}, {id: 's2'}, {id: 's3'}],
matrix_type: 'dense',
data: [[0, 0, 1], [1, 3, 42]]
});
biom.getDataMatrix();
// [[0, 0, 1], [1, 3, 42]]
biom.filter({f: function(data, id, metadata){return id !== 's2'}, dimension: 'columns', inPlace: false});
// [[0, 1], [1, 42]]
This function transposes the matrix in place.
In the process rows
and columns
are switched as well.
biom = new Biom({
rows: [{id: 'o1'}, {id: 'o2'}],
columns: [{id: 's1'}, {id: 's2'}, {id: 's3'}],
matrix_type: 'dense',
data: [[0, 0, 1], [1, 3, 42]]
});
biom.transpose()
biom.getDataMatrix();
// [[0, 1], [0, 3], [1, 42]]
biom.rows[0].id
// s1
Parameter: biomString
Type: String
Example: '{"id": "test", "shape": [0,0]}'
Parameter: options
Type: Object
Example: {conversionServer: 'https://biomcs.iimog.org/convert.php', arrayBuffer: ab}
Returns Promise
this function returns a promise. In case of success the new Biom object is passed otherwise the Error object is passed.
The conversion server is a simple php application that provides a webservice interface to the official python biom-format utility. You can host your own server using a pre-configured Docker container. A publicly available instance is reachable under https://biomcs.iimog.org/. For this version of the module biom-conversion-server v0.2.0 or later is required.
The promise is rejected:
- if biomString is not valid JSON and no conversionServer is given
- if biomString is JSON that is not compatible with biom specification. Error will be thrown by the Biom constructor
- if there is a conversion error (conversionServer not reachable, conversionServer returns error)
This method parses the content of a biom file either as string or as ArrayBuffer.
// Example: json String
Biom.parse('{"id": "Table ID", "shape": [2,2]}', {}).then(
function(biom){
console.log(biom.shape);
}
);
// Example: raw arrayBuffer from hdf5 file (file is a reference on the hdf5 file)
// Using the public conversionServer on biomcs.iimog.org
var reader = new FileReader();
reader.onload = function(c) {
Biom.parse('', {conversionServer: 'https://biomcs.iimog.org/convert.php', arrayBuffer: c.target.result}).then(
// in case of success
function(biom){
console.log(biom);
},
// in case of failure
function(fail){
console.log(fail);
}
);
};
reader.readAsArrayBuffer(file);
Parameter: options
Type: Object
Example: {conversionServer: 'https://biomcs.iimog.org/convert.php', asHdf5: false}
Returns Promise
this function returns a promise. In case of success the String or ArrayBuffer representation of the biom object is passed otherwise the Error object is passed.
The conversion server is a simple php application that provides a webservice interface to the official python biom-format utility.
You can host your own server using a pre-configured Docker container.
A publicly available instance is reachable under https://biomcs.iimog.org/.
For this version of the module biom-conversion-server v0.2.0 or later is required.
If you just want to get the JSON string representation (i.e. biom-format version 1) you can use .toString()
which works synchronously.
The promise is rejected:
- if there is a conversion error (conversionServer not reachable, conversionServer returns error)
This method generates a String (json) or ArrayBuffer (hdf5) representation of the biom object.
biom = new Biom({
id: "Table ID",
shape: [2,2]
// ...
});
// Example: to json String
biom.write().then(
function(biomString){
console.log(biomString);
}
);
// Example: to raw arrayBuffer (hdf5)
// Using the public conversionServer on biomcs.iimog.org
biom.write({conversionServer: 'https://biomcs.iimog.org/convert.php', asHdf5: true}).then(
// in case of success
function(biomArrayBuffer){
console.log(biomArrayBuffer);
// a Blob can be created
// blob = new Blob([biomArrayBuffer], {type: "application/octet-stream"});
// and saved as file e.g. with https://github.com/eligrey/FileSaver.js/
// saveAs(blob, 'export.hdf5.biom', true);
},
// in case of failure
function(fail){
console.log(fail);
}
);
Parameter: sparseData
Type: Array
Example: [[0,1,1],[1,0,2]]
Parameter: shape
Type: Array
Example: [2,2]
Returns Array
the given sparseData converted to dense, e.g. [[0,1],[2,0]]
var denseData = Biom.sparse2dense([[0,1,1],[1,0,2]], [2,2]);
// denseData = [[0,1],[2,0]]
Parameter: denseData
Type: Array
Example: [[0,1],[2,0]]
Returns Array
the given denseData converted to sparse, e.g. [[0,1,1],[1,0,2]]
var sparseData = Biom.dense2sparse([[0,1],[2,0]]);
// sparseData = [[0,1,1],[1,0,2]]
In general it is possible to assign arbitrary metadata (key/value pairs) to each column and row. BIOM format version 1.0 does not restrict the type of the value so strings, numbers, arrays and objects are all possible. Strings, numbers and arrays (e.g. taxonomy) are commonly used. However BIOM 1.0 files that contain nested metadata (values are themselves objects) can not be converted to BIOM format version 2.1 with the official python command line tool. This is a design decision rather than a bug (see biocore/biom-format#513). As it might be useful to have nested metadata and it is easy to handle in javascript it is automatically converted to and from JSON strings. The same applies for numeric metadata (it is automatically converted from and to string). See the following examples:
Strings as values in metadata are automatically parsed as JSON and unpacked if possible.
var biom = new Biom({
rows: [
{id: 'row1', metadata: {'jsonExample': '{"a": {"b": [1,2,3]}}'}},
{id: 'row2', metadata: {'jsonExample': '{"a": {"b": [2,3,1]}}'}},
{id: 'row3', metadata: {'jsonExample': '{"a": {"b": [3,1,2]}}'}}
]
});
// The string value of jsonExample is automatically unpacked as object
var row1ab = biom.rows[0].metadata.jsonExample.a.b;
// row1ab is the array [1,2,3]
Accordingly metadata objects are converted to JSON when writing the object as string (toString or write)
var biom = new Biom({
columns: [
{id: 'col1', metadata: {'object': {a: {b: [1,2,3]}}}},
{id: 'col2', metadata: {'object': {a: {b: [2,3,1]}}}}
]
});
// The string value of jsonExample is automatically unpacked as object
var biomString = biom.toString();
// biomString contains ... {id: "col1", metadata: {"object": "{\"a\": {\"b\": [1,2,3]}}"}} ...
- Add
transpose
function - Update dependencies
- Add yarn as dependency manager
- Add proper handling of arrays as metadata (replace with empty object, fixes PHPs json decode/encode problem with empty objects)
- Add 'Table' to cv for type. Improves interoperability with python tool.
- Add filter function
- Add norm function
- Add transform function
- Add pa function to convert data to absence/presence
- Export numeric metadata as string (compatibility with BIOM v2.1)
- Handle nested metadata (import/export)
- Override toString function to get JSON
- Add capability of deep attributes in getMetadata
- Add capability of deep attributes in addMetadata
- Add minimal required node version
- Fix installation instructions
- Fix installation via npm
- Fix minfied version of js
- Init metadata in
rows
andcolumns
- Add
matrix_type
agnostic getter/setter fordata
- Add static methods
sparse2dense
anddense2sparse
- Update
data
on setcolumns
- Update
data
on setrows
- Check
data
for correct dimensions - Check
rows
andcolumns
for missing or duplicate ids - Make
shape
property read only - Check
shape
on construction - Add getter for
nnz
(#10) - Add
data
transformation tomatrix_type
setter (#3)
- Add write function
- Add browserified build
- Add parse function
- Add hdf5 conversion capability to parse (via external server)
- Add getMetadata function
- Add addMetadata function
- Bower init
- Initial release
- Constructor
- Getter/Setter for specified fields
- Basic type checking in setters
All contributions are welcome.
If you have any problem or suggestion please open an issue here.
The MIT License
Copyright (c) 2016, Markus J. Ankenbrand
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.