Change Log

v0.8.14

bugfix in read attrition metrics

v0.8.13

dtype for is_control in annotation

v0.8.12

dtype for state should be float to NA values from failed cells

v0.8.11

add subsampling prior to museq to remove read sinks

v0.8.10

adding track type header to wig
hmmcopy: updating state to NA, allow failed cells in heatmap

v0.8.9

adding is_control and read attrition metrics

v0.8.8

Changes:

annotation pipeline for variant calling can now be configured in config file

v0.8.7

Changes:

fastqscreen tags limited to [0,1]
fastqscreen now can filter out custom tag combinations
supports list of references per organism

v0.8.6

Changes:

added ref and alt to infer haps output

v0.8.5

changes:

merged development QC pipeline changes

v0.8.4

changes:

updated test datasets
added softclipped read filter to merge bams

v0.8.3

Changes:

updated annotation to handle configurable genomes

v0.8.2

Changes:

added filtering per genome in fastqscreen
more cutomizable genomes in fastqscreen
update pypeliner to 0.6.2

v0.8.1

Changes:

moved hmmcopy R into repo

v0.8.0

changes:

picard: quiet mode
pypeliner update to v0.6.1
docker: added azure libs
updated input file for snv genotyping
snv genotyping in testing now

bugs:

fastqscreen supports non gzipped fastqs now

v0.7.6

Changes:

updated pypeliner to v0.6.0

v0.7.5

Changes:

updated pypeliner

v0.7.4

Changes:

added cohort_qc to testing
changes from andrew

v0.7.3

Changes:

update destruct to v0.4.19

v0.7.2

Changes:

networkx version for nreakpoint docker

v0.7.1

Changes:

added pseudobulk QC to codebuild

v0.7.0

Outputs do not change with this release

Changes:

deprecated conda package
deprecated docker in docker
new docker containers (one per pipeline) and new org in quay.io

v0.6.46:

Changes:

Added sample id and library id to alignment and hmm metrics

v0.6.45:

Changes:

remove unused code in hmmcopy
review of QC codebase
pseudobulk QC is in codebuild

v0.6.44

Changes:

raise exception when reference type is unknown
set all dtypes as str when reading maf

Bug:

errors in cohort QC

v0.6.43

changes:

update to latest pypeliner (v0.6.27)

v0.6.42

Changes

produces all outputs necessary to load cbioportal with cna + an oncoplot from maftools
updated HMMcopy to R=4

v0.6.41

Changes

trim and sequencing center is now a pipeline level flag

Bug

alignment and hmmcopy tarballs were broken

v0.6.40

Bug:

bugfix: trim field in seqinfo incorrect (#119)

Changes:

remove mem_retry_factor overrides in pipeline
deprecation: sequencing instrument in input.yaml is deprecated, replaced with trim (boolean)

v0.6.39

Bugfix:

Error in fastqscreen when removing contaminated reads

v0.6.38

Changes:

destruct updated to remove secondary reads
updated pseudobulk QC on jun o (singularity)
updated pseudobulk plots
updated pseudobulk documentation

v0.6.37

Changes:

fixed snpeff call in germline calling
updated docs

v0.6.36

Changes:

updated hmmcopy

v0.6.35

Changes:

updated pypeliner to v0.5.23 to support latest azure python sdk

v0.6.34

Changes:

update pypeliner to v0.5.22 which lowers lsf query volume

v0.6.33

Bugs:

fixed issue with missing contamination table in annotation html

v0.6.32

Bugs:

csvutils couldnt handle tsv

v0.6.31

Bugs:

hmmcopy plots were hardcoded to human genomes

v0.6.30

Bugs:

update cell cycle classifier

v0.6.29

Bugs:

csvutils annotate_csv: annotation was in incorrect order

v0.6.28

Bugs:

added Trim to dtypes

v0.6.27

Changes:

pseudo_bulk_qc: allele data loading is done in chunks to improve memory usage
pseudo_bulk_qc: added rearrangement types for lumpy
pseudo_bulk_qc: additional qcplots
pseudo_bulk_qc: better plot organization
pseudo_bulk_qc: resized report output
pseudo_bulk_qc: code can now handle incomplete data
pseudo_bulk_qc: placeholder text for missing plots

v0.6.26

Changes:

remove bwa aln

v0.6.25

Changes:

update pypeliner
deleted redundant dockerfiles

v0.6.24

Changes:

clean up pseudobulk QC
add pseudoculk QC to master build

BUG:

unused ete3 code

v0.6.23

Changes:

conda packages: added drmaa, simpler recipes

v0.6.22

Changes:

pseudo bulk QC pipeline
remove ete3 dependency
updated conda recipe. removed ete3 added drmaa

v0.6.21

Bug:

KDE plot can't handle bad data

v0.6.20

Changes:

conda packages are now built on top on py 3.8

Bug:

minor heatmap axis labeling bug (assumes chrom1 is starting point)

v0.6.19

Changes:

produces a MT bam file
setup doc also includes information to switch pipeline to production datasets

v0.6.18

Bugs:

missing data error in species classifier
build: pip cannot install from commits in a PR coming from a fork

Changes:

conda: annotation pipeline requires jinja2 dependency
conda: align requires ete3 and R
added a quickstart guide based on conda

v0.6.17

Bugs:

missing fastqscreen training data for mouse in config
data type bug in fastqscreen summary

v0.6.16

Changes:

new build process (AWS codebuild, on PR)

v0.6.15

Bug:

there were no counts for cells with no data in fastqscreen summary

v0.6.14

Changes:

removed joblib files from repo
added conda recipes

v0.6.13

Changes:

updated remixt

v0.6.12

Bug:

.tmp in input.yaml file path in metadata yaml file

Changes:

test data set for count haps
updated infer haps testing
removed hardcoded genomes in HTML report
infer and count haps use proper chromosomes

v0.6.11

Changes:

pypeliner version update to v0.5.19

v0.6.10

Bug:

missing cell id in count haps

v0.6.9

Bug:

fixed snv annotation issues

Changes:

biowrappers upgraded to v0.2.8

v0.6.8

Bug:

incorrect rebase on is_contaminated flag code
fixed issues with tag name in docker build

Changes:

updated QC generation code, added classifier
updated docs: read the docs compatible

v0.6.7

Bug:

issue with dtypes for lumpy

Changes:

updated biowrappers to v0.2.7

v0.6.6

Bug:

missing yaml extension for count haps input

v0.6.5

Bug:

issues with missing yaml files

Changes:

add yaml validators
added docs

v0.6.4:

Bug:

Issue with fastqscreen counts
remove corrupt tree files from metadata
dtype for chr is str not int in lumpy

v0.6.3:

Bug:

issue with yaml missing in extensions for snv file

v0.6.2:

Bug:

collect metrics introduced nan into int columns

v0.6.1

Changes:

per sample genotyping results

v0.6.0

Changes:

jenkins refactor
csvutils refactor

V0.5.21

Changes:

upgraded remixt to v0.5.11

v0.5.20

Changes:

upgraded destruct and remixt to latest pypeliner.

v0.5.19

####Changes:

upgraded remixt and cell cycle classifier

v0.5.18

Changes:

upgraded pypeliner to v0.5.18

v0.5.17

Changes:

updated the calculation for is_contaminated
moved contaminated flagging to annotation
pypeliner updated to v0.5.17
removed org from scp docker container names in config

v0.5.16

####Changes:

updated pypeliner to v0.5.16
conda build fixed

v0.5.15

bug:

metadata regions are none sometimes

v0.5.14

bug:

hdf to csv was only writing header for first output file

v0.5.13

bug:

splitting heatmap into 1000 cells per page doesnt account for cases where the next page only has 1 cell which cant be plotted.

v0.5.12

bug:

destruct read indexing was broken

Changes:

flags to selectively run destruct or lumpy

v0.5.11:

Changes:

tempdir is now separated by subcommands
file names updated for breakpoint calling

v0.5.10

Bug:

conversion to csv from hdf was overwriting not appending

Changes:

destruct version updated
docker container is built from tag, not from commit id

v0.5.9

Bug:

excessive memory usage in h5 to csv conversion
fixed config issue for snpeff

v0.5.8

Changes:

updated biowrappers to v0.2.4
updated pypeliner version to v0.5.15
optimized QC VM size for cost savings
snpeff now uses data from the refdata dir instead of downloading
added test datasets

v0.5.7

Changes:

updated biowrappers version to v0.2.2
updated destruct and remixt container versions
added sv genotyping (experimental)

bugs:

missing last line in lumpy parsed output
issues with fastqscreen tags

v0.5.6:

bugs:

fixed an issue with double headers in snv calling output
issues in plotting due to gc data dtypes set to string

Changes:

destruct container does not need single cell pipeline
remixt container does not need single cell pipeline
some updates to documentation
updated to biowrappers v0.2.1
split infer and count haps

v0.5.5:

changes:

updated pypeliner to v0.5.13
updated documentation

v0.5.4:

bugs:

mismatching csv type error in empty fastq files

changes:

refactor errors with empty fastq screen files when fastq is empty

v0.5.3:

bugs:

fixed base64 encode issue for images in qc reports
issue deleting non-existent bam key
missing .gz.csv.yaml extensions for lumpy files
fixed container for lumpy
pd.concat on empty set of dataframes fixed
plotting bugs in hmmcopy
pandas loading issues related to dtypes specified but not names
use new hmmcopy script, better params format

v0.5.2:

Changes:

alignment for all lanes is run in a single job
optimized destruct fastq reindexing

bugs:

removed redundant replace ? by 0 in plotting
annotation works when sample_info is none
fixed paths for annotation low mad yaml in tests
updated dtypes in integrity tests

bugs:

issues with NA handling in csvutils
all median cols are float now

v0.5.1:

bugs:

issues with NA handling in csvutils
all median cols are float now

v0.5.0:

CLI refactor

added:

germline calling mode

Changes:

refactored the CLI
removed: a) QC and b) multi_sample_pseudo_bulk
added in commands for a) alignment, b) hmmcopy, c) annotation, d) merge_cell_bams, e) split_normal, f) variant_calling, g) variant_counting (multi-sample) h) germline_calling i) infer_haps j) breakpoint_calling
add predefined dtypes to all workflows
metadata yaml files are generated within the pipeline
added a sentinel file with some provenance information as a teardown job

removed:

aneufinder
copynumber_calling

v0.4.2

added

Enforced data dtypes in csvutils
added input yaml to output and metadata.yaml
jenkins: added output integrity check

Changes

removed autoscale
fixes to travis builds, travis now builds with python3
merged alignment tasks
removed local indel realignment

v0.4.1

Changes:

variant calling: switch h5 output to csv
containers: picard and hmmcopy containers now use base R container

bugs:

fixed error in QC generation on low quality datasets.
destruct needs more disk
metadata: code didn't upload when storage is set to local.

v0.4.0

Changes:

pipeline now runs on python3. python2 is not supported anymore
pandas call for converting to categoricals
refactor config generation
added docs for LSF and singularity
updated type from alignment to align in metadata
library level snv counting removed from variant calling

bugs:

check fastq screen output directory for files from older runs and delete them
fixed error raised when uploading meta yaml where storage is not specified
readcounter: can handle non tagged bams now
fastqscreen: can handle fastq files with multiple periods in name

v0.3.1

added

added a column indicating if a cell is contaminated
added a column indicating if a segment is low mappability
filtered contaminated cells from heatmap
added extensions to metadata yaml files

Changes:

seqdata files from haplotype calling are now temporary files
fastqscreen counts column names begin with fastqscreen_

v0.3.0

added

added fastq screen
- runs fastqscreen with --tag
- all downstream analysis is run on the tagged data.
- bam headers contain required information for parsing fastq screen tag
- by default, pipeline removes all reads that belong to another organism
- generates a detailed table and adds summary metrics to alignment table
- more details at organism filter
added salmon reference to images.
added conda package for corrupt tree. updated docker container to use the conda package
added newick support to heatmap
added cell order based on corrupt tree to output
added this changelog
added metadata yaml files to output directories
added flag to disable corrupt tree

changes:

hmmcopy segments plots have a global max for ylim per run (library)
standardized page size for corrupt tree output, annotated each page.
replaced yaml.load with yaml.safe_load
replaced nan values in QC html with 0
removed biobloom
destruct can now handle empty/small fastq files.
fixed strelka filename issue (missing _)
refactor alignment workflow
added a tarball output with all hmmcopy outputs except autoploidy (multipliers 1-6)
merged all picard based metrics into a single tarball
reorganized reference data
now uses miniconda docker image to delete files in batch
arguments changes for QC
- removed --out_dir
- add --alignment_output
- add --hmmcopy_output
- add --annotation_output
argument changes for pseudo bulk
- removed --out_dir
- added --variants_output
- added --haps_output
- added --destruct_output
- added --lumpy_output

bugs:

fixed missing header issue with destruct outputs
pipeline can now handle tsv files.
fixed issues with missing cell cycle data in outputs

v0.2.25

added

Added Corrupt Tree

changes

Reorganized QC pipeline outputs
updated to newest biobloom container (v0.0.2). biobloom container now runs as root user.
QC html doesn't require reference GC curve data
give more memory to biobloom
load input yaml with safe_load

bugs

bugfix: fixed a merge issue with trim galore running script.

v0.2.24

added

- Added Cell Cycle Classifier

changes

disable biobloom by default

v0.2.23

bugs

bugfix: Destruct was not tagging reads with cell ids

changes

removed reference fasta index from github

v0.2.22

changes

lumpy can now handle empty bams

v0.2.21

added

added Html QC output
added biobloom
added a single 'QC' command to run alignment and hmmcopy

changes

merged the alignment and metrics workflows.
remove hmmcopy multipliers, only use autoploidy downstream
removed option to specify multiple hmmcopy parameter sets

bugs

bugfix: issue with automatic dtype detection in csv yaml files.

v0.2.20

changes

updated input yaml format for pseudowgs. The normal section now follows same schema as tumour (with sample and library id).

v0.2.19

bugs

bug: missing header in allele_counts file.

v0.2.18

added

added travis build.
added smarter dtype merging for csv files.

changes

updated conda recipe
updated destruct output format from h5 to csv
fixed destruct to generate counts from filtered output to remove normal reads
optimized breakpoint calling, normal preprocessing runs only once per run.
cleaned up raw_dir in output folder
updated to conda based hmmcopy and mutationseq containers
updated to latest version of lumpy with correct bed output
now supports multiple libraries per normal in pseudowgs

v0.2.17

changes

updated lumpy bed file parsing.
changed lumpy output file format from h5 to csv.

v0.2.16

changes

added mutationseq parameters to config. now users can override default settings.

v0.2.15

added

added parallelization over libraries in pseudowgs

bugs

fixed an issue with read tagging that caused int overflow in bowtie
some pickling issues due to python compatibility updates in pypeliner
fixes in csv and yaml generation code

v0.2.14

changes

refactored lumpy workflow, only run normal preprocessing once per run
merges in destruct require more disk space
destruct: read indexes are now unique int
destuct: reindex both reads in a single job to reduce number of jobs
destruct: prepocess normal once per run
faster csv file concatenation
updated batch config to match pypeliner v0.5.6. now pool selection also accounts for disk usage.
- Each pool will have available disk space. jobs will be scheduled in a pool based on requirements. production will have smaller disk in standard pool to save on costs.
classifier now supports csv inputs

bugs

bwa couldnt parse readgroup when not running in docker

v0.2.13

changes

split and merge bams only when running snv calling
refactored to make main workflow calling functions standalone subworkflows
revamped destruct workflow for pseudobulk

v0.2.12

changes

switched to gzipped csv from H5 due to compatibility issues
order IGV segs file by the clustering order, filter on quality

v0.2.11

added

added flags to only run parts of pseudowgs workflow

bugs

fixed issue with infer haps where some parameters werent specified correctly.

v0.2.10

changes

destruct and remixt containers now use the same versioning as single cell pipeline
lumpy accepts normal cells
refactor: haplotype calling workflows
all psudo wgs commands use the same input format as multi sample pseudo bulk.
bam merge now supports merging larger number of files.

v0.2.9

added

destruct now supports list of cells as normal
separate pools based on disk sizes.

changes

separate docker container for destruct
haplotype calling supports list of cells as normal
parallel runs support more cells now.

bugs

bug: fixed an issue that caused low mappability mask to disappear in the heatmap

v0.2.8

changes

replaced python based multiprocessing with gnu parallel

bugs

bug: remixt path fixed in config, mkdir doesnt cause failures anymore in batch vm startup

v0.2.7

added

added trim galore container
added a flag to switch disk to 1TB for all batch nodes
added a flag to specify whether to trim the fastqs. The flag overrides the sequencer based trimming logic.
row, column, cell_call and experimental condition can now be null
switched to gnu parallel for parallel on node runs
feature: pools are now chosen automatically

changes

updated readgroup string.
renamed total_mapped_reads column in hmmcopy to total_mapped_reads_hmmcopy to avoid clashed with column of same name in alignment metrics
snv calling: allow overlaps in vcf files
remove meta yaml file
moved autoploidy segment plot to top of page
added option to launch the pipeline with docker by just adding --run_with_docker
h5 dtype casting uses less memory now
alignment metrics plot: now faster, plots atmost 1000 cells per page. extra cells overflow onto to the next page.
support for pypeliner auto detect batch pool
updated docker
updated vcfutils to use pypeliner to handle vcf index files.
subworkflow resolution now runs in a docker container on compute nodes.
switched from warnings to logging. the logs from compute now gets reported in main pypeliner log file.
pipeline now uses OS disk in azure to store temporary files
updated docker configuration changes in pypeliner. the container doesnt require the prefix anymore.
added info.yaml file with some metadata per run
merged andrew's pseudobulk changes.

bugs

fix: strelka uses chromosome size instead of genome size

v0.2.6

changes

now supports plain text fastq files

v0.2.5

added

added: heatmap and boxplots for cell quality score

changes

switching to smaller 256GB disks

v0.2.4

changes

use table format in h5 files

bugs

fix: non unique categories error fixed by specifying categories at initialization

v0.2.3

changes

properly deletes file on batch node after task completes
cell_id is now a categorical in output
supports .fq and .fq.gz fastq file extensions
heatmap is generated even if all cells are nan
default for non-azure environments is not singularity anymore.

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Change Log

v0.8.14

v0.8.13

v0.8.12

v0.8.11

v0.8.10

v0.8.9

v0.8.8

Changes:

v0.8.7

Changes:

v0.8.6

Changes:

v0.8.5

changes:

v0.8.4

changes:

v0.8.3

Changes:

v0.8.2

Changes:

v0.8.1

Changes:

v0.8.0

changes:

bugs:

v0.7.6

Changes:

v0.7.5

Changes:

v0.7.4

Changes:

v0.7.3

Changes:

v0.7.2

Changes:

v0.7.1

Changes:

v0.7.0

Changes:

v0.6.46:

Changes:

v0.6.45:

Changes:

v0.6.44

Changes:

Bug:

v0.6.43

changes:

v0.6.42

Changes

v0.6.41

Changes

Bug

v0.6.40

Bug:

Changes:

v0.6.39

v0.6.38

v0.6.37

v0.6.36

v0.6.35

v0.6.34

v0.6.33

Bugs:

v0.6.32

Bugs:

v0.6.31

Bugs:

v0.6.30

Bugs:

v0.6.29

Bugs:

v0.6.28