Skip to content

Latest commit

 

History

History
855 lines (626 loc) · 29.6 KB

release.rst

File metadata and controls

855 lines (626 loc) · 29.6 KB

Release notes

v1.3.3

v1.3.2

v1.3.1

v1.3.0

Important

This release drop support for Python 2.7.

v1.2.1

v1.2.0

Important

Use of the allel.stats namespace is deprecated in this release, all functions from stats modules are available from the root allel namespace, please access them from there.

Important

Python 2.7 has had a stay of execution - this release supports Python 2.7 and 3.5-3.7. However, support for Python 2.7 will definitely be removed in version 1.3.

v1.1.10

v1.1.9

v1.1.8

  • Changed semantics of is_snp computed field when extracting data from VCF to exclude variants where one of the alternate alleles is a spanning deletion ('*') (:issue:`155`).
  • Resolved minor logging bug (:issue:`152`).

v1.1.7

v1.1.6

  • Include fixture data in release to aid testing and binary builds.

v1.1.0

Reading Variant Call Format (VCF) files

This release includes new functions for extracting data from VCF files and loading into NumPy arrays, HDF5 files and other storage containers. These functions are backed by VCF parsing code implemented in Cython, so should be reasonably fast. This is new code so there may be bugs, please report any issues via GitHub.

For a tutorial and worked examples, see the following article: Extracting data from VCF.

For API documentation, see the following functions: :func:`allel.read_vcf`, :func:`allel.vcf_to_npz`, :func:`allel.vcf_to_hdf5`, :func:`allel.vcf_to_zarr`, :func:`allel.vcf_to_dataframe`, :func:`allel.vcf_to_csv`, :func:`allel.vcf_to_recarray`, :func:`allel.iter_vcf_chunks`.

Reading GFF3 files

Added convenience functions :func:`allel.gff3_to_dataframe` and :func:`allel.gff3_to_recarray`.

Maintenance work

End of support for Python 2

Important

This is the last version of scikit-allel that will support Python 2. The next version of scikit-allel will support Python versions 3.5 and later only.

v1.0.3

Fix test compatibility with numpy 1.10.

v1.0.2

Move cython function imports outside of functions to work around bug found when using scikit-allel with dask.

v1.0.1

Add missing test packages so full test suite can be run to verify install.

v1.0.0

This release includes some subtle but important changes to the architecture of the data structures modules (:mod:`allel.model.ndarray`, :mod:`allel.model.chunked`, :mod:`allel.model.dask`). These changes are mostly backwards-compatible but in some cases could break existing code, hence the major version number has been incremented. Also included in this release are some new functions related to Mendelian inheritance and calling runs of homozygosity, further details below.

Mendelian errors and phasing by transmission

This release includes a new :mod:`allel.stats.mendel` module with functions to help with analysis of related individuals. The function :func:`allel.mendel_errors` locates genotype calls within a trio or cross that are not consistent with Mendelian segregation of alleles. The function :func:`allel.phase_by_transmission` will resolve unphased diploid genotypes into phased haplotypes for a trio or cross using Mendelian transmission rules. The function :func:`allel.paint_transmission` can help with evaluating and visualizing the results of phasing a trio or cross.

Runs of homozygosity

A new :func:`allel.roh_mhmm` function provides support for locating long runs of homozygosity within a single sample. The function uses a multinomial hidden Markov model to predict runs of homozygosity based on the rate of heterozygosity over the genome. The function can also incorporate information about which positions in the genome are not accessible to variant calling and hence where there is no information about heterozygosity, to reduce false calling of ROH in regions where there is patchy data. We've run this on data from the Ag1000G project but have not performed a comprehensive evaluation with other species, feedback is very welcome.

Changes to data structures

The :mod:`allel.model.ndarray` module includes a new :class:`allel.model.ndarray.GenotypeVector` class. This class represents an array of genotype calls for a single variant in multiple samples, or for a single sample at multiple variants. This class makes it easier, for example, to locate all variants which are heterozygous in a single sample.

Also in the same module are two new classes :class:`allel.model.ndarray.GenotypeAlleleCountsArray` and :class:`allel.model.ndarray.GenotypeAlleleCountsVector`. These classes provide support for an alternative encoding of genotype calls, where each call is stored as the counts of each allele observed. This allows encoding of genotype calls where samples may have different ploidy for a given chromosome (e.g., Leishmania) and/or where samples carry structural variation within some genome regions, altering copy number (and hence effective ploidy) with respect to the reference sequence.

There have also been architectural changes to all data structures modules. The most important change is that all classes in the :mod:`allel.model.ndarray` module now wrap numpy arrays and are no longer direct sub-classes of the numpy :class:`numpy.ndarray` class. These classes still behave like numpy arrays in most respects, and so in most cases this change should not impact existing code. If you need a plain numpy array for any reason you can always use :func:`numpy.asarray` or access the .values property, e.g.:

>>> import allel
>>> import numpy as np
>>> g = allel.GenotypeArray([[[0, 1], [0, 0]], [[0, 2], [1, 1]]])
>>> isinstance(g, np.ndarray)
False
>>> a = np.asarray(g)
>>> isinstance(a, np.ndarray)
True
>>> isinstance(g.values, np.ndarray)
True

This change was made because there are a number of complexities that arise when sub-classing class:numpy.ndarray and these were proving tricky to manage and maintain.

The :mod:`allel.model.chunked` and :mod:`allel.model.dask` modules also follow the same wrapper pattern. For the :mod:`allel.model.dask` module this means a change in the way that classes are instantiated. For example, to create a :class:`allel.model.dask.GenotypeDaskArray`, pass the underlying data directly into the class constructor, e.g.:

>>> import allel
>>> import h5py
>>> h5f = h5py.File('callset.h5', mode='r')
>>> h5d = h5f['3R/calldata/genotype']
>>> genotypes = allel.GenotypeDaskArray(h5d)

If the underlying data is chunked then there is no need to specify the chunks manually when instantiating a dask array, the native chunk shape will be used.

Finally, the allel.model.bcolz module has been removed, use either the :mod:`allel.model.chunked` or :mod:`allel.model.dask` module instead.

v0.21.2

This release resolves compatibility issues with Zarr version 2.1.

v0.21.1

  • Added parameter min_maf to :func:`allel.ihs` to skip IHS calculation for variants below a given minor allele frequency.
  • Minor change to calculation of integrated haplotype homozygosity to enable values to be reported for first and last variants if include_edges is True.
  • Minor change to :func:`allel.standardize_by_allele_count` to better handle missing values.

v0.21.0

In this release the implementations of :func:`allel.ihs` and :func:`allel.xpehh` selection statistics have been reworked to address a number of issues:

  • Both functions can now integrate over either a genetic map (via the map_pos parameter) or a physical map.
  • Both functions now accept max_gap and gap_scale parameters to perform adjustments to integrated haplotype homozygosity where there are large gaps between variants, following the standard approach. Alternatively, if a map of genome accessibility is available, it may be provided via the is_accessible parameter, in which case the distance between variants will be scaled by the fraction of accessible bases between them.
  • Both functions are now faster and can make use of multiple threads to further accelerate computation.
  • Several bugs in the previous implementations of these functions have been fixed (:issue:`91`).
  • New utility functions are provided for standardising selection scores, see :func:`allel.standardize_by_allele_count` (for use with IHS and NSL) and :func:`allel.standardize` (for use with XPEHH).

Other changes:

v0.20.3

  • Fixed a bug in the count_alleles() methods on genotype and haplotype array classes that manifested if the max_allele argument was provided (:issue:`59`).
  • Fixed a bug in Jupyter notebook display method for chunked tables (:issue:`57`).
  • Fixed a bug in site frequency spectrum scaling functions (:issue:`54`).
  • Changed behaviour of subset method on genotype and haplotype arrays to better infer argument types and handle None argument values (:issue:`55`).
  • Changed table eval and query methods to make python the default for expression evaluation, because it is more expressive than numexpr (:issue:`58`).

v0.20.2

v0.20.1

v0.20.0

  • Added new :mod:`allel.model.dask` module, providing implementations of the genotype, haplotype and allele counts classes backed by dask.array (:issue:`32`).
  • Released the GIL where possible in Cython optimised functions (:issue:`43`).
  • Changed functions in :mod:`allel.stats.selection` that accept min_ehh argument, such that min_ehh = None should now be used to indicate that no minimum EHH threshold should be applied.

v0.19.0

The major change in v0.19.0 is the addition of the new :mod:`allel.model.chunked` module, which provides classes for variant call data backed by chunked array storage (:issue:`31`). This is a generalisation of the previously available :mod:`allel.model.bcolz` to enable the use of both bcolz and HDF5 (via h5py) as backing storage. The :mod:`allel.model.bcolz` module is now deprecated but will be retained for backwargs compatibility until the next major release.

Other changes:

Contributors: :user:`alimanfoo <alimanfoo>`, :user:`hardingnj <hardingnj>`

v0.18.1

  • Minor change to the Garud H statistics to avoid raising an exception when the number of distinct haplotypes is very low (:issue:`20`).

v0.18.0

v0.17.0

  • Added new module for computing and plotting site frequency spectra, see :mod:`allel.stats.sf` (:issue:`12`).
  • All plotting functions have been moved into the appropriate stats module that they naturally correspond to. The :mod:`allel.plot` module is deprecated (:issue:`13`).
  • Improved performance of carray and ctable loading from HDF5 with a condition (:issue:`11`).

v0.16.2

  • Fixed behaviour of take() method on compressed arrays when indices are not in increasing order (:issue:`6`).
  • Minor change to scaler argument to PCA functions in :mod:`allel.stats.decomposition` to avoid confusion about when to fall back to default scaler (:issue:`7`).

v0.16.1

v0.16.0

v0.15.2

v0.15.1

  • Fix missing package in setup.py.

v0.15

v0.14

v0.12

v0.11

v0.10

v0.9

v0.8

v0.7

  • Added function :func:`allel.write_fasta` for writing a nucleotide sequence stored as a NumPy array out to a FASTA format file.

v0.6