-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First PR on nf-core #53
Open
chriswyatt1
wants to merge
165
commits into
main
Choose a base branch
from
dev
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
165 commits
Select commit
Hold shift + click to select a range
6259231
Add a gitpod yml
chriswyatt1 1f695de
gitignore
chriswyatt1 ff92a5c
Update README.md
chriswyatt1 ed77896
Merge pull request #1 from Eco-Flow/chriswyatt1-patch-1
FernandoDuarteF 3450dd9
Added ncbigenomedownload
FernandoDuarteF bfb0680
Fixed output for create_path.nf
FernandoDuarteF 6d8403b
Modified samplesheet.csv
FernandoDuarteF 3d1577b
Looks good, just checked it worked and made a few minor edits
chriswyatt1 3fe1f9a
Merge pull request #7 from Eco-Flow/first_try
chriswyatt1 492503f
Added input validation for sample sheet
FernandoDuarteF 68707c6
Updated error message in sample sheet validation
FernandoDuarteF 0768675
Added busco module
FernandoDuarteF 646568d
Added busco parameters
FernandoDuarteF 9da8887
Added local GFFREAD and bin/ folder
FernandoDuarteF 2851b40
Removed GFFREAD from modules.json
FernandoDuarteF 86dd2c5
Merge pull request #10 from Eco-Flow/busco
FernandoDuarteF 67f8de2
Added orthofinder module
FernandoDuarteF 9f0aff1
Updated modules.json
FernandoDuarteF d74154b
Merge pull request #13 from Eco-Flow/orthofinder
chriswyatt1 500aaf5
Added GFFREAD and longest modules
FernandoDuarteF ef619fa
TIDK subworkflow
chriswyatt1 27814af
Revert "TIDK subworkflow"
chriswyatt1 b6c1fce
Updated main workflow for compressed files
FernandoDuarteF 5c4f54d
Merge pull request #20 from Eco-Flow/longest_isoform
chriswyatt1 64ae1da
Added TIDK subworkflow
FernandoDuarteF 239e64a
Merge pull request #23 from Eco-Flow/tidk_fer
chriswyatt1 c05f5e8
Update test_full.config
chriswyatt1 997aaf9
Merge pull request #24 from Eco-Flow/new_input
chriswyatt1 758c763
Add mod
chriswyatt1 921c70e
Added AGAT spstatistics and Quast modules
FernandoDuarteF 0d2041d
Broken pipeline, need fix of channels
chriswyatt1 8a4dd8a
Fixed up to orthofinder
chriswyatt1 9a130d0
Fixed up to just before working tree script. Need a container
chriswyatt1 d12e27e
Add container for tree build in R
chriswyatt1 616f0e3
Merge pull request #28 from Eco-Flow/treefigure
chriswyatt1 2fde67b
Merge pull request #26 from Eco-Flow/quast_agatstats
chriswyatt1 3d543b4
Added local subworkflows
FernandoDuarteF 1383947
Remove/add versions where needed
chriswyatt1 bc1e6bf
redundant_info2
chriswyatt1 bcc2963
A basic working tree plot and busco module
chriswyatt1 b724c93
Merge pull request #35 from Eco-Flow/redundant_info
FernandoDuarteF 3c8ad5f
Two working tree plots inc pie
chriswyatt1 d65aa69
Resolve conflicts
FernandoDuarteF 1715fcc
Working plots for busco and quast
chriswyatt1 21e4e4e
Merge branch 'dev' into working_tree
chriswyatt1 2ddb427
Merge pull request #36 from Eco-Flow/working_tree
chriswyatt1 c5f4670
Added GFFREAD from excon
FernandoDuarteF 7340ef9
Added tree plot
FernandoDuarteF 2525b8d
Fix species extension removal for plotting
chriswyatt1 f2714c0
pie chart for busco
chriswyatt1 972552e
Merge branch 'dev' into subworkflows
FernandoDuarteF 5349cce
Merge pull request #32 from Eco-Flow/subworkflows
FernandoDuarteF 0cf0dd0
new plots organisation
chriswyatt1 b6f060e
Update README.md
chriswyatt1 63065f3
Merge pull request #47 from Eco-Flow/Better_tree_plots
FernandoDuarteF 94558df
Removed excon scripts
FernandoDuarteF 230e61b
Fixed genome only option not working
FernandoDuarteF 8a0a2df
add merqury and meryl modules to json with nf-core tools genomeqc/#42…
stephenturner 1acf3f1
add merqury and meryl module configs genomeqc#42 genomeqc#58
stephenturner 081ad3f
add merqury module genomeqc#42
stephenturner e43785b
add meryl module genomeqc#58
stephenturner d0e2859
include modules for meryl and merqury
stephenturner 6336ad5
Add files via upload
fperezcobos 7b4b083
add fasta null default and kvalue for meryl default k=21
stephenturner ab1d1c5
typo in outdir
stephenturner 6b32c66
add meryl count to workflow genomeqc#58
stephenturner a34a1cb
update schema with web ui builder
stephenturner b63b944
fix whitespace in schema
stephenturner c193d3c
add meryl unionsum genomeqc#60
stephenturner 56f5e42
merqury skip param
stephenturner 9380cb0
add merqury step in wf
stephenturner 9008ece
remove stray view()
stephenturner 5490ddb
Updated subworkflows and workflows
FernandoDuarteF 22973ef
allow for fastq input in samplesheet see also nf-core/test-datasets#1365
stephenturner 374dc82
update
fperezcobos 8c4757c
push
fperezcobos 813e85f
added agat module
fperezcobos 832e115
Merge pull request #1 from fperezcobos/test
fperezcobos aa13541
Delete test_prunus_dulcis.csv
fperezcobos 9e70870
Delete test_athaliana.csv
fperezcobos a084ff4
Delete felipe_testing.config
fperezcobos 788ae45
Scripts updated
chriswyatt1 b7c4ff2
return fastq in validateinputsamplesheet #62
stephenturner e558904
fix cardinality for CREATE_PATH and branch cardinality on samplesheet
stephenturner f5247e1
Update conf/test_full.config
chriswyatt1 2ef206a
Added AGAT gff checking to genome_and_annotation
fperezcobos 05b062c
Update .gitignore
fperezcobos bd55499
merqury_skip = false in test profile #62 #42
stephenturner 9615c08
run merqury if providing fastq file #62 #42
stephenturner 1611259
Fix Script path issue
chriswyatt1 d8b7cca
test profiles
stephenturner 4c85fb8
Merge pull request #66 from nf-core/Fix-bin-path-removal
FernandoDuarteF 72f2a6e
conditionally file() the fastq if it's present in the sample sheet, f…
stephenturner a6fd2a4
remove views
stephenturner 25f9bbd
Add tidk optional
chriswyatt1 3a29414
Merge pull request #65 from fperezcobos/dev
chriswyatt1 5e88fd9
Merge branch 'dev' into dev
FernandoDuarteF 01d7ced
make test profile use reads, test_nofastq use _nofastq csv
stephenturner 491ba12
add fastq column to samplesheet
stephenturner 6d1c3f9
update readme with info about reads/merqury, different test profiles,…
stephenturner 91f9f5a
add missing test run command in readme for running test data without …
stephenturner 5e5f897
Merge pull request #63 from stephenturner/dev
stephenturner e06b754
busco and quast added to mq report
f3493bb
CREATE_PATH now outputs a tuple
FernandoDuarteF 03d6601
Merge branch 'dev' into Conditional-tidk-flags
chriswyatt1 0159836
Merge pull request #70 from nf-core/Conditional-tidk-flags
FernandoDuarteF fbe093e
Fixed uncompressing not working
FernandoDuarteF 84f5ec8
First commit
chriswyatt1 b04b51f
with gff lineage and busco into the process
chriswyatt1 8532448
Used multiMap instead of map for combined input channels
FernandoDuarteF c86b26e
Merge branch 'dev' into new_input_validation
FernandoDuarteF e45081d
Resolve merge conflicts
FernandoDuarteF 6613139
Merge pull request #64 from nf-core/Organise_script_comments
FernandoDuarteF 336c35c
Working_version
chriswyatt1 2b17f56
Solved conflicts with dev and improved syntax (I think)
FernandoDuarteF d51d5d8
Fix out channels and published correctly
chriswyatt1 6ec7fc1
Updated nextflow_schema.json
FernandoDuarteF c581a37
Added ".pre-commit-config.yaml" for pre-commit
FernandoDuarteF c56b5e6
Changed "merqury_skip" to "skip_merqury"
FernandoDuarteF 49a21b3
Merge pull request #57 from Eco-Flow/new_input_validation
FernandoDuarteF eac2215
Fixed combine() not working when empty ch_fastq
FernandoDuarteF 3c870c9
Added --run_merquery flag
FernandoDuarteF dfe2aaf
Fixed QUAST inputs out of sync
FernandoDuarteF de9a0d7
Merge pull request #84 from Eco-Flow/fix_ch_input
chriswyatt1 45e6f1e
Merge branch 'dev' into busco_ideograms
FernandoDuarteF 111424d
Fixed channel names for ideogram
FernandoDuarteF 5736fa6
Fixed busco ideogram not working
FernandoDuarteF 11e9cb4
Commented some lines
FernandoDuarteF a26730a
Merge pull request #82 from nf-core/busco_ideograms
FernandoDuarteF 9be9c72
Updated README
FernandoDuarteF 64a7fdf
Merge pull request #87 from Eco-Flow/update_readme
FernandoDuarteF d1d6e64
Added longest and nf-core GFFREAD modules
FernandoDuarteF a7f288e
Added AGAT extract sequences
FernandoDuarteF 57df688
Added AGAT extract sequences
FernandoDuarteF af478f3
Added fasta validator
FernandoDuarteF f0a3123
Added nf-core GFFREAD back
FernandoDuarteF 41d689a
Modified ideogram script for better visualization
FernandoDuarteF f97b80b
Update README
FernandoDuarteF 28c4c49
Update README.md
FernandoDuarteF f82c2cc
Update README.md
FernandoDuarteF e381163
Improved readability
FernandoDuarteF 5206882
Merge pull request #88 from Eco-Flow/agat_longest_isoform
FernandoDuarteF 53bc65d
Update README.md
FernandoDuarteF 4d00256
Gene overlap first commit
chriswyatt1 12dba5c
second commit
chriswyatt1 5c7711f
remove installs
chriswyatt1 6399712
third commit working
chriswyatt1 ab3f837
Merge branch 'dev' into dev
FernandoDuarteF 1d560ca
Merge pull request #74 from awanalkoerdi289/dev
FernandoDuarteF 4835f86
Fixed multiqc not working
FernandoDuarteF 520f6d5
Added second table with count stats
chriswyatt1 435368a
Removed meta from output channels for multiqc
FernandoDuarteF e642bee
Fixed Quast results not showing in the multiqc report
FernandoDuarteF e8a6faa
Merge pull request #95 from Eco-Flow/fix_multiqc
FernandoDuarteF 6729de5
Merge branch 'dev' into gene_overlap
chriswyatt1 20bb70f
Merge pull request #94 from chriswyatt1/gene_overlap
chriswyatt1 6385ae8
Decreased size of results folder
FernandoDuarteF f944325
Updated nextflow_schema.json
FernandoDuarteF ae76a10
Merge pull request #101 from nf-core/publish_results
FernandoDuarteF d6350fa
Added genome ideogram local module
FernandoDuarteF a43634a
Update plot_markers scripts
FernandoDuarteF b84b762
Fixed ideogram not working for genome mode
FernandoDuarteF c69d846
More descriptive names for modules and subworkflows
FernandoDuarteF fe07df8
Removed redundant lines and updated modules.config
FernandoDuarteF e7df634
Merge pull request #102 from nf-core/genome_ideogram
FernandoDuarteF File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
*.pyc | ||
.DS_Store | ||
.nextflow* | ||
.nf-test.log | ||
data/ | ||
nf-test | ||
.nf-test* | ||
results/ | ||
test.xml | ||
testing* | ||
testing/ | ||
work/ | ||
log | ||
out |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
image: nfcore/gitpod:latest | ||
tasks: | ||
- name: Update Nextflow and setup pre-commit | ||
command: | | ||
pre-commit install --install-hooks | ||
nextflow self-update | ||
- name: unset JAVA_TOOL_OPTIONS | ||
command: | | ||
unset JAVA_TOOL_OPTIONS | ||
vscode: | ||
extensions: # based on nf-core.nf-core-extensionpack | ||
- codezombiech.gitignore # Language support for .gitignore files | ||
# - cssho.vscode-svgviewer # SVG viewer | ||
- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code | ||
- eamodio.gitlens # Quickly glimpse into whom, why, and when a line or code block was changed | ||
- EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files | ||
- Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar | ||
- mechatroner.rainbow-csv # Highlight columns in csv files in different colors | ||
# - nextflow.nextflow # Nextflow syntax highlighting | ||
- oderwat.indent-rainbow # Highlight indentation level | ||
- streetsidesoftware.code-spell-checker # Spelling checker for source code |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
repository_type: pipeline |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# See https://pre-commit.com for more information | ||
# See https://pre-commit.com/hooks.html for more hooks | ||
repos: | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v3.2.0 | ||
hooks: | ||
- id: trailing-whitespace | ||
- id: end-of-file-fixer | ||
- id: check-yaml | ||
- id: check-added-large-files |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
sample,fastq_1,fastq_2 | ||
SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz | ||
SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz, | ||
species,refseq,fasta,gff,fastq | ||
Vespula_vulgaris,GCF_905475345.1,,, | ||
Vespa_velutina,GCF_912470025.1,,, | ||
Apis_mellifera,GCF_003254395.2,,, | ||
Osmia_bicornis,GCF_907164935.1,,, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/usr/bin/python3 | ||
|
||
# Written by Chris Wyatt and released under the MIT license. | ||
# Converts a group of busco outputs to a table to plot on a tree | ||
|
||
import pandas as pd | ||
import argparse | ||
|
||
# Set up the argument parser | ||
parser = argparse.ArgumentParser(description='Extract and merge specific columns from a table.') | ||
parser.add_argument('input_file', type=str, help='Path to the input TSV file.') | ||
parser.add_argument('output_file', type=str, help='Path to save the output TSV file.') | ||
|
||
# Parse the arguments | ||
args = parser.parse_args() | ||
|
||
# Read the input table into a pandas DataFrame | ||
df = pd.read_csv(args.input_file, sep='\t') | ||
|
||
# Select the required columns | ||
df_extracted = df[['Input_file', 'Single', 'Duplicated', 'Fragmented', 'Missing']] | ||
|
||
# Merge the columns from 'Complete' to 'Missing' into a single column, with values separated by commas | ||
df_extracted['busco'] = df_extracted[['Single', 'Duplicated', 'Fragmented', 'Missing']].astype(str).agg(','.join, axis=1) | ||
|
||
# Drop the individual 'Complete' to 'Missing' columns | ||
df_extracted = df_extracted[['Input_file', 'busco']] | ||
|
||
# Write the header and custom line first | ||
with open(args.output_file, 'w') as f: | ||
# Write the header | ||
f.write('species\tbusco\n') | ||
# Insert 'NA<tab>stacked' as the second line | ||
f.write('NA\tpie\n') | ||
|
||
# Append the DataFrame content to the file without the header | ||
df_extracted.to_csv(args.output_file, sep='\t', index=False, mode='a', header=False) | ||
|
||
print(f"Extraction completed successfully. Output saved to {args.output_file}.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
#!/usr/bin/env Rscript | ||
|
||
# Load required libraries | ||
suppressMessages(library(dplyr)) | ||
suppressMessages(library(readr)) | ||
suppressMessages(library(stringr)) | ||
|
||
# Get command line arguments | ||
args <- commandArgs(trailingOnly = TRUE) | ||
if (length(args) != 3) { | ||
stop("Usage: Rscript match_busco_gff.R <busco_file> <gff_file> <output_file>") | ||
} | ||
|
||
busco_file <- args[1] | ||
gff_file <- args[2] | ||
output_file <- args[3] | ||
|
||
# Step 1: Read the BUSCO file line-by-line, filter out comment and "Missing" lines | ||
busco_raw <- readLines(busco_file) | ||
busco_filtered <- busco_raw[!grepl("^#|Missing", busco_raw)] | ||
|
||
# Step 2: Parse the remaining lines as a TSV without column names, then rename columns | ||
busco_data <- read_delim( | ||
I(busco_filtered), | ||
delim = "\t", | ||
col_names = FALSE, | ||
show_col_types = FALSE | ||
) | ||
|
||
# Check if the expected 7 columns are present | ||
if (ncol(busco_data) != 7) { | ||
stop("Expected 7 columns in BUSCO data after filtering, but found ", ncol(busco_data), ". Please check the input file format.") | ||
} | ||
|
||
# Rename columns | ||
colnames(busco_data) <- c("Busco_id", "Status", "Sequence", "Score", "Length", "OrthoDB_url", "Description") | ||
|
||
# Read the GFF file | ||
gff_data <- read_tsv( | ||
gff_file, | ||
comment = "#", | ||
col_names = FALSE, | ||
col_types = cols( | ||
X1 = col_character(), X2 = col_character(), X3 = col_character(), | ||
X4 = col_integer(), X5 = col_integer(), X6 = col_character(), | ||
X7 = col_character(), X8 = col_character(), X9 = col_character() | ||
), | ||
show_col_types = FALSE, | ||
skip_empty_rows = TRUE | ||
) | ||
|
||
# Extract the gene name from the 9th column in GFF, looking for ID=<value> up to the first ; | ||
gff_data <- gff_data %>% | ||
mutate(gene_name = str_extract(X9, "ID=([^;]+)")) %>% | ||
mutate(gene_name = str_replace(gene_name, "ID=", "")) %>% # Remove the "ID=" prefix | ||
filter(!is.na(gene_name)) | ||
|
||
# Perform the join on gene name from both data frames | ||
result <- inner_join( | ||
busco_data, | ||
gff_data, | ||
by = c("Sequence" = "gene_name") | ||
) | ||
|
||
# Select and rename the columns we need | ||
output_data <- result %>% | ||
select(Status, Scaffold = X1, Start = X4, End = X5) %>% | ||
distinct() # Remove any potential duplicates | ||
|
||
# Write the output in the requested format | ||
write.table(output_data, file = output_file, sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) | ||
|
||
# Print a message to confirm the output has been written | ||
cat("Output has been written to", output_file, "\n") |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a personal thing, but I think loose scripts should have a comment at the top, either saying where a script originated from ( e.g. adapted from script at url:path/to/script or if it's part of an existing package and copied into the workflow ) or have a authored by to denote it's a custom written script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added link to original script, or note what the script does, and who wrote it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#64