- bugfix in read attrition metrics
- dtype for is_control in annotation
- dtype for state should be float to NA values from failed cells
- add subsampling prior to museq to remove read sinks
- adding track type header to wig
- hmmcopy: updating state to NA, allow failed cells in heatmap
- adding is_control and read attrition metrics
- annotation pipeline for variant calling can now be configured in config file
- fastqscreen tags limited to [0,1]
- fastqscreen now can filter out custom tag combinations
- supports list of references per organism
- added ref and alt to infer haps output
- merged development QC pipeline changes
- updated test datasets
- added softclipped read filter to merge bams
- updated annotation to handle configurable genomes
- added filtering per genome in fastqscreen
- more cutomizable genomes in fastqscreen
- update pypeliner to 0.6.2
- moved hmmcopy R into repo
- picard: quiet mode
- pypeliner update to v0.6.1
- docker: added azure libs
- updated input file for snv genotyping
- snv genotyping in testing now
- fastqscreen supports non gzipped fastqs now
- updated pypeliner to v0.6.0
- updated pypeliner
- added cohort_qc to testing
- changes from andrew
- update destruct to v0.4.19
- networkx version for nreakpoint docker
- added pseudobulk QC to codebuild
- Outputs do not change with this release
- deprecated conda package
- deprecated docker in docker
- new docker containers (one per pipeline) and new org in quay.io
- Added sample id and library id to alignment and hmm metrics
- remove unused code in hmmcopy
- review of QC codebase
- pseudobulk QC is in codebuild
- raise exception when reference type is unknown
- set all dtypes as str when reading maf
- errors in cohort QC
- update to latest pypeliner (v0.6.27)
- produces all outputs necessary to load cbioportal with cna + an oncoplot from maftools
- updated HMMcopy to R=4
- trim and sequencing center is now a pipeline level flag
- alignment and hmmcopy tarballs were broken
- bugfix: trim field in seqinfo incorrect (#119)
- remove mem_retry_factor overrides in pipeline
- deprecation: sequencing instrument in input.yaml is deprecated, replaced with trim (boolean)
Bugfix:
- Error in fastqscreen when removing contaminated reads
Changes:
- destruct updated to remove secondary reads
- updated pseudobulk QC on jun o (singularity)
- updated pseudobulk plots
- updated pseudobulk documentation
Changes:
- fixed snpeff call in germline calling
- updated docs
Changes:
- updated hmmcopy
Changes:
- updated pypeliner to v0.5.23 to support latest azure python sdk
Changes:
- update pypeliner to v0.5.22 which lowers lsf query volume
- fixed issue with missing contamination table in annotation html
- csvutils couldnt handle tsv
- hmmcopy plots were hardcoded to human genomes
- update cell cycle classifier
- csvutils annotate_csv: annotation was in incorrect order
- added Trim to dtypes
- pseudo_bulk_qc: allele data loading is done in chunks to improve memory usage
- pseudo_bulk_qc: added rearrangement types for lumpy
- pseudo_bulk_qc: additional qcplots
- pseudo_bulk_qc: better plot organization
- pseudo_bulk_qc: resized report output
- pseudo_bulk_qc: code can now handle incomplete data
- pseudo_bulk_qc: placeholder text for missing plots
- remove bwa aln
- update pypeliner
- deleted redundant dockerfiles
- clean up pseudobulk QC
- add pseudoculk QC to master build
- unused ete3 code
- conda packages: added drmaa, simpler recipes
- pseudo bulk QC pipeline
- remove ete3 dependency
- updated conda recipe. removed ete3 added drmaa
- KDE plot can't handle bad data
- conda packages are now built on top on py 3.8
- minor heatmap axis labeling bug (assumes chrom1 is starting point)
- produces a MT bam file
- setup doc also includes information to switch pipeline to production datasets
- missing data error in species classifier
- build: pip cannot install from commits in a PR coming from a fork
- conda: annotation pipeline requires jinja2 dependency
- conda: align requires ete3 and R
- added a quickstart guide based on conda
- missing fastqscreen training data for mouse in config
- data type bug in fastqscreen summary
- new build process (AWS codebuild, on PR)
- there were no counts for cells with no data in fastqscreen summary
- removed joblib files from repo
- added conda recipes
- updated remixt
- .tmp in input.yaml file path in metadata yaml file
- test data set for count haps
- updated infer haps testing
- removed hardcoded genomes in HTML report
- infer and count haps use proper chromosomes
- pypeliner version update to v0.5.19
- missing cell id in count haps
- fixed snv annotation issues
- biowrappers upgraded to v0.2.8
- incorrect rebase on is_contaminated flag code
- fixed issues with tag name in docker build
- updated QC generation code, added classifier
- updated docs: read the docs compatible
- issue with dtypes for lumpy
- updated biowrappers to v0.2.7
- missing yaml extension for count haps input
- issues with missing yaml files
- add yaml validators
- added docs
- Issue with fastqscreen counts
- remove corrupt tree files from metadata
- dtype for chr is str not int in lumpy
- issue with yaml missing in extensions for snv file
- collect metrics introduced nan into int columns
- per sample genotyping results
- jenkins refactor
- csvutils refactor
- upgraded remixt to v0.5.11
- upgraded destruct and remixt to latest pypeliner.
####Changes:
- upgraded remixt and cell cycle classifier
- upgraded pypeliner to v0.5.18
- updated the calculation for is_contaminated
- moved contaminated flagging to annotation
- pypeliner updated to v0.5.17
- removed org from scp docker container names in config
####Changes:
- updated pypeliner to v0.5.16
- conda build fixed
- metadata regions are none sometimes
- hdf to csv was only writing header for first output file
- splitting heatmap into 1000 cells per page doesnt account for cases where the next page only has 1 cell which cant be plotted.
- destruct read indexing was broken
- flags to selectively run destruct or lumpy
- tempdir is now separated by subcommands
- file names updated for breakpoint calling
- conversion to csv from hdf was overwriting not appending
- destruct version updated
- docker container is built from tag, not from commit id
- excessive memory usage in h5 to csv conversion
- fixed config issue for snpeff
- updated biowrappers to v0.2.4
- updated pypeliner version to v0.5.15
- optimized QC VM size for cost savings
- snpeff now uses data from the refdata dir instead of downloading
- added test datasets
- updated biowrappers version to v0.2.2
- updated destruct and remixt container versions
- added sv genotyping (experimental)
- missing last line in lumpy parsed output
- issues with fastqscreen tags
- fixed an issue with double headers in snv calling output
- issues in plotting due to gc data dtypes set to string
- destruct container does not need single cell pipeline
- remixt container does not need single cell pipeline
- some updates to documentation
- updated to biowrappers v0.2.1
- split infer and count haps
- updated pypeliner to v0.5.13
- updated documentation
- mismatching csv type error in empty fastq files
- refactor errors with empty fastq screen files when fastq is empty
- fixed base64 encode issue for images in qc reports
- issue deleting non-existent bam key
- missing .gz.csv.yaml extensions for lumpy files
- fixed container for lumpy
- pd.concat on empty set of dataframes fixed
- plotting bugs in hmmcopy
- pandas loading issues related to dtypes specified but not names
- use new hmmcopy script, better params format
- alignment for all lanes is run in a single job
- optimized destruct fastq reindexing
- removed redundant replace ? by 0 in plotting
- annotation works when sample_info is none
- fixed paths for annotation low mad yaml in tests
- updated dtypes in integrity tests
- issues with NA handling in csvutils
- all median cols are float now
- issues with NA handling in csvutils
- all median cols are float now
CLI refactor
- germline calling mode
- refactored the CLI
- removed: a) QC and b) multi_sample_pseudo_bulk
- added in commands for a) alignment, b) hmmcopy, c) annotation, d) merge_cell_bams, e) split_normal, f) variant_calling, g) variant_counting (multi-sample) h) germline_calling i) infer_haps j) breakpoint_calling
- add predefined dtypes to all workflows
- metadata yaml files are generated within the pipeline
- added a sentinel file with some provenance information as a teardown job
- aneufinder
- copynumber_calling
- Enforced data dtypes in csvutils
- added input yaml to output and metadata.yaml
- jenkins: added output integrity check
- removed autoscale
- fixes to travis builds, travis now builds with python3
- merged alignment tasks
- removed local indel realignment
- variant calling: switch h5 output to csv
- containers: picard and hmmcopy containers now use base R container
- fixed error in QC generation on low quality datasets.
- destruct needs more disk
- metadata: code didn't upload when storage is set to local.
- pipeline now runs on python3. python2 is not supported anymore
- pandas call for converting to categoricals
- refactor config generation
- added docs for LSF and singularity
- updated type from alignment to align in metadata
- library level snv counting removed from variant calling
- check fastq screen output directory for files from older runs and delete them
- fixed error raised when uploading meta yaml where storage is not specified
- readcounter: can handle non tagged bams now
- fastqscreen: can handle fastq files with multiple periods in name
- added a column indicating if a cell is contaminated
- added a column indicating if a segment is low mappability
- filtered contaminated cells from heatmap
- added extensions to metadata yaml files
- seqdata files from haplotype calling are now temporary files
- fastqscreen counts column names begin with fastqscreen_
- added fastq screen
- runs fastqscreen with
--tag
- all downstream analysis is run on the tagged data.
- bam headers contain required information for parsing fastq screen tag
- by default, pipeline removes all reads that belong to another organism
- generates a detailed table and adds summary metrics to alignment table
- more details at organism filter
- runs fastqscreen with
- added salmon reference to images.
- added conda package for corrupt tree. updated docker container to use the conda package
- added newick support to heatmap
- added cell order based on corrupt tree to output
- added this changelog
- added metadata yaml files to output directories
- added flag to disable corrupt tree
- hmmcopy segments plots have a global max for ylim per run (library)
- standardized page size for corrupt tree output, annotated each page.
- replaced yaml.load with yaml.safe_load
- replaced nan values in QC html with 0
- removed biobloom
- destruct can now handle empty/small fastq files.
- fixed strelka filename issue (missing _)
- refactor alignment workflow
- added a tarball output with all hmmcopy outputs except autoploidy (multipliers 1-6)
- merged all picard based metrics into a single tarball
- reorganized reference data
- now uses miniconda docker image to delete files in batch
- arguments changes for QC
- removed --out_dir
- add --alignment_output
- add --hmmcopy_output
- add --annotation_output
- argument changes for pseudo bulk
- removed --out_dir
- added --variants_output
- added --haps_output
- added --destruct_output
- added --lumpy_output
- fixed missing header issue with destruct outputs
- pipeline can now handle tsv files.
- fixed issues with missing cell cycle data in outputs
- Added Corrupt Tree
- Reorganized QC pipeline outputs
- updated to newest biobloom container (v0.0.2). biobloom container now runs as root user.
- QC html doesn't require reference GC curve data
- give more memory to biobloom
- load input yaml with safe_load
- bugfix: fixed a merge issue with trim galore running script.
-
- Added Cell Cycle Classifier
- disable biobloom by default
- bugfix: Destruct was not tagging reads with cell ids
- removed reference fasta index from github
- lumpy can now handle empty bams
- added Html QC output
- added biobloom
- added a single 'QC' command to run alignment and hmmcopy
- merged the alignment and metrics workflows.
- remove hmmcopy multipliers, only use autoploidy downstream
- removed option to specify multiple hmmcopy parameter sets
- bugfix: issue with automatic dtype detection in csv yaml files.
- updated input yaml format for pseudowgs. The normal section now follows same schema as tumour (with sample and library id).
- bug: missing header in allele_counts file.
- added travis build.
- added smarter dtype merging for csv files.
- updated conda recipe
- updated destruct output format from h5 to csv
- fixed destruct to generate counts from filtered output to remove normal reads
- optimized breakpoint calling, normal preprocessing runs only once per run.
- cleaned up raw_dir in output folder
- updated to conda based hmmcopy and mutationseq containers
- updated to latest version of lumpy with correct bed output
- now supports multiple libraries per normal in pseudowgs
- updated lumpy bed file parsing.
- changed lumpy output file format from h5 to csv.
- added mutationseq parameters to config. now users can override default settings.
- added parallelization over libraries in pseudowgs
- fixed an issue with read tagging that caused int overflow in bowtie
- some pickling issues due to python compatibility updates in pypeliner
- fixes in csv and yaml generation code
- refactored lumpy workflow, only run normal preprocessing once per run
- merges in destruct require more disk space
- destruct: read indexes are now unique int
- destuct: reindex both reads in a single job to reduce number of jobs
- destruct: prepocess normal once per run
- faster csv file concatenation
- updated batch config to match pypeliner v0.5.6. now pool selection also accounts for disk usage.
- Each pool will have available disk space. jobs will be scheduled in a pool based on requirements. production will have smaller disk in standard pool to save on costs.
- classifier now supports csv inputs
- bwa couldnt parse readgroup when not running in docker
- split and merge bams only when running snv calling
- refactored to make main workflow calling functions standalone subworkflows
- revamped destruct workflow for pseudobulk
- switched to gzipped csv from H5 due to compatibility issues
- order IGV segs file by the clustering order, filter on quality
- added flags to only run parts of pseudowgs workflow
- fixed issue with infer haps where some parameters werent specified correctly.
- destruct and remixt containers now use the same versioning as single cell pipeline
- lumpy accepts normal cells
- refactor: haplotype calling workflows
- all psudo wgs commands use the same input format as multi sample pseudo bulk.
- bam merge now supports merging larger number of files.
- destruct now supports list of cells as normal
- separate pools based on disk sizes.
- separate docker container for destruct
- haplotype calling supports list of cells as normal
- parallel runs support more cells now.
- bug: fixed an issue that caused low mappability mask to disappear in the heatmap
- replaced python based multiprocessing with gnu parallel
- bug: remixt path fixed in config, mkdir doesnt cause failures anymore in batch vm startup
- added trim galore container
- added a flag to switch disk to 1TB for all batch nodes
- added a flag to specify whether to trim the fastqs. The flag overrides the sequencer based trimming logic.
- row, column, cell_call and experimental condition can now be null
- switched to gnu parallel for parallel on node runs
- feature: pools are now chosen automatically
- updated readgroup string.
- renamed total_mapped_reads column in hmmcopy to total_mapped_reads_hmmcopy to avoid clashed with column of same name in alignment metrics
- snv calling: allow overlaps in vcf files
- remove meta yaml file
- moved autoploidy segment plot to top of page
- added option to launch the pipeline with docker by just adding
--run_with_docker
- h5 dtype casting uses less memory now
- alignment metrics plot: now faster, plots atmost 1000 cells per page. extra cells overflow onto to the next page.
- support for pypeliner auto detect batch pool
- updated docker
- updated vcfutils to use pypeliner to handle vcf index files.
- subworkflow resolution now runs in a docker container on compute nodes.
- switched from warnings to logging. the logs from compute now gets reported in main pypeliner log file.
- pipeline now uses OS disk in azure to store temporary files
- updated docker configuration changes in pypeliner. the container doesnt require the prefix anymore.
- added info.yaml file with some metadata per run
- merged andrew's pseudobulk changes.
- fix: strelka uses chromosome size instead of genome size
- now supports plain text fastq files
- added: heatmap and boxplots for cell quality score
- switching to smaller 256GB disks
- use
table
format in h5 files
- fix: non unique categories error fixed by specifying categories at initialization
- properly deletes file on batch node after task completes
- cell_id is now a categorical in output
- supports
.fq
and.fq.gz
fastq file extensions - heatmap is generated even if all cells are
nan
- default for non-azure environments is not singularity anymore.
- added docker container info to info yaml files
- added info yaml files
- interprets
?
in picard tools output as 0 - casts all columns in h5 to their correct dtypes.
- added multisample pseudobulk
- divided alignment into 2 separate workfloes
- replaced segments and bias pdf with per cell plots with a tarball of png files.
- added singularity support
- added docker support for whole genome
- added docker support for alignment and hmmcopy
- added test data set
- project wide refactor
- switched to image with a single disk. removed startup mount commands.
- pseudowgs: added infoer Haplotype code
- clip copynumber in hmmcopy plots to 40
- added LTM
- added haploid poison to hmmcopy
- yaml files use block style format
- added cell quality classifier
- bwa-mem is now the default aligner. bwa-aln is also supported
- fix: handles empty segments in hmmcopy segment plot
- pseudowgs: added option to merge bams
- pseudowgs: added option to split bam by reads (pairs next to each other)
- smaller segments plots file size, consistent colormap across plots
- added mask for low mappability regions in heatmap
- one pdf file for segments and bias plots per row of cells
- added VM image URI and SKU to batch yaml file
- hmmcopy: added autoploidy
- added titan to pseudowgs
- hmmcopy can now run independently from alignment
- rename pick_met to cell_call and condition to experimental_condition
- chooses VM image based on the pipeline version
- cleanly exit hmmcopy script if data is not enough/missing
- aneufinder,alignment, hmmcopy output is now h5
- fixed reouding issue in autoscale formula (there is no round method)
- streka now runs over split bam files
- can now save split bams using a template specified at run time
- switch to 1-based state in hmmcopy
- added filtering to copynumber heatmap
- added classifier
- each lane now has a sequencing centre
- metrics heatmaps are not restricted to 72*72
- exposed all hmmcopy params to config file
- modal correction can run on empty datasets