Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Community #105

Merged
merged 30 commits into from
Dec 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
39713ec
move the whole PGGB into a subworkflow
subwaystation Dec 12, 2022
48b172f
we can run with an outside FASTA
subwaystation Dec 12, 2022
bcbd323
elegance
subwaystation Dec 12, 2022
d55b3f8
community skeleton
subwaystation Dec 12, 2022
ef26895
community wfmash map
subwaystation Dec 12, 2022
36fd52f
refactor wfmashMap into a subworkflow
subwaystation Dec 13, 2022
8683da7
we have our FASTA communities!
subwaystation Dec 13, 2022
a0ba63e
step by step
subwaystation Dec 13, 2022
b5c62e7
stuck in the middle with you
subwaystation Dec 14, 2022
8b6a7bb
before seqwish
subwaystation Dec 14, 2022
fb226ca
HOLIDAYgit add subworkflows/*!
subwaystation Dec 14, 2022
91a6139
let's cheat
subwaystation Dec 14, 2022
c156bca
hmm
subwaystation Dec 14, 2022
5370b6e
fix fai,gzi paths
subwaystation Dec 14, 2022
c93c9b9
remove unecessary .view()
subwaystation Dec 14, 2022
fff9e58
fix vg_deconstruct issues
subwaystation Dec 14, 2022
461b2d6
text communities and a more recent NXFv
subwaystation Dec 14, 2022
106fe2d
EOL
subwaystation Dec 14, 2022
a204f7c
remove dangling .view()
subwaystation Dec 14, 2022
0b668eb
update test parameters for faster runtime
subwaystation Dec 14, 2022
b00a322
update test parameters for faster runtime
subwaystation Dec 14, 2022
f893c02
test new CI setup
subwaystation Dec 15, 2022
7a659ed
maybe we can't go latest
subwaystation Dec 15, 2022
c5eff66
la vie est fantastique
subwaystation Dec 15, 2022
77440ef
we want the matrix across parameters
subwaystation Dec 15, 2022
bf7ab00
force older Nextflow version
subwaystation Dec 15, 2022
fa85add
squeeze it!
subwaystation Dec 15, 2022
ce0da3e
fix --wfmash_only bug
subwaystation Dec 15, 2022
490f54e
subworkflow squeeze is ready :)
subwaystation Dec 15, 2022
a426db6
update help text and online parameter docs
subwaystation Dec 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
271 changes: 251 additions & 20 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,102 @@ on:
release:
types: [published]

env:
NXF_ANSI_LOG: false

jobs:
test:
name: Run workflow tests
name: Run pipeline with test data
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/pangenome') }}
runs-on: ubuntu-latest
strategy:
matrix:
# Nextflow versions: check pipeline minimum and current latest
NXF_VER:
- "20.10.0"
- "22.04.5"
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Check if Dockerfile or Conda environment changed
uses: technote-space/get-diff-action@v4
with:
FILES: |
Dockerfile
environment.yml

- name: Build new docker image
if: env.MATCHED_FILES
run: docker build --no-cache . -t nfcore/pangenome:dev

- name: Pull docker image
if: ${{ !env.MATCHED_FILES }}
run: |
docker pull nfcore/pangenome:dev
docker tag nfcore/pangenome:dev nfcore/pangenome:dev

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

- name: Run pipeline with test data
run: |
NXF_VER=22.04.5 nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12

no_viz_no_layout:
name: Run pipeline without graph vizualizations or graph layouts
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/pangenome') }}
runs-on: ubuntu-latest
strategy:
matrix:
# Nextflow versions: check pipeline minimum and current latest
parameters:
- "--no_viz"
- "--no_layout"
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Check if Dockerfile or Conda environment changed
uses: technote-space/get-diff-action@v4
with:
FILES: |
Dockerfile
environment.yml

- name: Build new docker image
if: env.MATCHED_FILES
run: docker build --no-cache . -t nfcore/pangenome:dev

- name: Pull docker image
if: ${{ !env.MATCHED_FILES }}
run: |
docker pull nfcore/pangenome:dev
docker tag nfcore/pangenome:dev nfcore/pangenome:dev

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

- name: Run pipeline with test data
run: |
NXF_VER=22.04.5 nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 ${{ matrix.parameters }}

vg_deconstruct:
name: Run pipeline with vg deconstruct parameter
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/pangenome') }}
runs-on: ubuntu-latest
env:
NXF_VER: ${{ matrix.nxf_ver }}
NXF_ANSI_LOG: false
strategy:
matrix:
# Nextflow versions: check pipeline minimum and current latest
nxf_ver: ['20.10.0', '21.04.1', '21.10.3']
parameters:
- "--vcf_spec \"gi|568815561:#,gi|568815567:#\""
steps:
- name: Check out pipeline code
uses: actions/checkout@v2
Expand All @@ -43,22 +126,170 @@ jobs:
docker tag nfcore/pangenome:dev nfcore/pangenome:dev

- name: Install Nextflow
env:
CAPSULE_LOG: none
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

- name: Run pipeline with test data
run: |
NXF_VER=22.04.5 nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 ${{ matrix.parameters }}

smoothxg:
name: Run pipeline with smoothxg parameters
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/pangenome') }}
runs-on: ubuntu-latest
strategy:
matrix:
# Nextflow versions: check pipeline minimum and current latest
parameters:
- "--smoothxg_write_maf --smoothxg_poa_length 100,200,300 --smoothxg_run_abpoa --smoothxg_run_global_poa"
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Check if Dockerfile or Conda environment changed
uses: technote-space/get-diff-action@v4
with:
FILES: |
Dockerfile
environment.yml

- name: Build new docker image
if: env.MATCHED_FILES
run: docker build --no-cache . -t nfcore/pangenome:dev

- name: Pull docker image
if: ${{ !env.MATCHED_FILES }}
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
docker pull nfcore/pangenome:dev
docker tag nfcore/pangenome:dev nfcore/pangenome:dev

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

- name: Run pipeline with test data
run: |
NXF_VER=22.04.5 nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 ${{ matrix.parameters }}

wfmash_chunks:
name: Run pipeline with wfmash chunk parameter
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/pangenome') }}
runs-on: ubuntu-latest
strategy:
matrix:
# Nextflow versions: check pipeline minimum and current latest
parameters:
- "--wfmash_chunks 2"
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Check if Dockerfile or Conda environment changed
uses: technote-space/get-diff-action@v4
with:
FILES: |
Dockerfile
environment.yml

- name: Build new docker image
if: env.MATCHED_FILES
run: docker build --no-cache . -t nfcore/pangenome:dev

- name: Pull docker image
if: ${{ !env.MATCHED_FILES }}
run: |
docker pull nfcore/pangenome:dev
docker tag nfcore/pangenome:dev nfcore/pangenome:dev

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

- name: Run pipeline with test data
run: |
NXF_VER=22.04.5 nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 ${{ matrix.parameters }}

wfmash_only:
name: Run only the wfmash part of the pipeline
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/pangenome') }}
runs-on: ubuntu-latest
strategy:
matrix:
# Nextflow versions: check pipeline minimum and current latest
parameters:
- "--wfmash_only"
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Check if Dockerfile or Conda environment changed
uses: technote-space/get-diff-action@v4
with:
FILES: |
Dockerfile
environment.yml

- name: Build new docker image
if: env.MATCHED_FILES
run: docker build --no-cache . -t nfcore/pangenome:dev

- name: Pull docker image
if: ${{ !env.MATCHED_FILES }}
run: |
docker pull nfcore/pangenome:dev
docker tag nfcore/pangenome:dev nfcore/pangenome:dev

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

- name: Run pipeline with test data
run: |
NXF_VER=22.04.5 nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 ${{ matrix.parameters }}

communities:
name: Run the pipeline with the communities parameter
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/pangenome') }}
runs-on: ubuntu-latest
strategy:
matrix:
# Nextflow versions: check pipeline minimum and current latest
parameters:
- "--communities --squeeze_gfa"
steps:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Check if Dockerfile or Conda environment changed
uses: technote-space/get-diff-action@v4
with:
FILES: |
Dockerfile
environment.yml

- name: Build new docker image
if: env.MATCHED_FILES
run: docker build --no-cache . -t nfcore/pangenome:dev

- name: Pull docker image
if: ${{ !env.MATCHED_FILES }}
run: |
docker pull nfcore/pangenome:dev
docker tag nfcore/pangenome:dev nfcore/pangenome:dev

- name: Install Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "${{ matrix.NXF_VER }}"

- name: Run pipeline with test data
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
# Remember that you can parallelise this by using strategy.matrix
# We also test basic visualization and reporting options here
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 --no_viz --no_layout
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 --smoothxg consensus_spec 10,100,1000
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 --vcf_spec "gi|568815561:#,gi|568815567:#"
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 --smoothxg_write_maf
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 --wfmash_chunks 2
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 --wfmash_only
NXF_VER=22.04.5 nextflow run ${GITHUB_WORKSPACE} -profile test,docker --n_haplotypes 12 ${{ matrix.parameters }}
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ RUN apt-get update \
&& apt-get clean -y && rm -rf /var/lib/apt/lists/*

COPY bin/split_approx_mappings_in_chunks.py /
COPY bin/paf2net.py /
COPY bin/net2communities.py /
subwaystation marked this conversation as resolved.
Show resolved Hide resolved

# Install miniconda
RUN wget \
Expand Down
83 changes: 83 additions & 0 deletions bin/net2communities.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
import argparse

# Create the parser and add arguments
parser = argparse.ArgumentParser(
description="It detects communities by applying the Leiden algorithm (Trag et al., 2018).",
epilog='Author: Andrea Guarracino (https://github.com/AndreaGuarracino)'
)
parser.add_argument('-e', '--edge-list', dest='edge_list', help="edge list representing the pairs of sequences mapped in the network", required=True)
parser.add_argument('-w', '--edge-weights', dest='edge_weights', help="list of edge weights", required=True)
parser.add_argument('-n', '--vertice-names', dest='vertice_names', help="'id to sequence name' map", required=True)
parser.add_argument('--output-prefix', dest='output_prefix', default="", help="prefix to add to the output filenames")
parser.add_argument('--accurate-detection', dest='accurate', default=False, action='store_true', help="accurate community detection (slower)")
parser.add_argument('--plot', dest='plot', default=False, action='store_true', help="plot the network, coloring by community and labeling with contig/scaffold names (it assumes PanSN naming)")

# Parse and print the results
args = parser.parse_args()


import igraph as ig

# Read weights
weight_list = [float(x) for x in open(args.edge_weights).read().strip().split('\n')]

# Read the edge list and initialize the network
g = ig.read( filename=args.edge_list, format='edgelist', directed=False)

# Detect the communities
partition = g.community_leiden(
objective_function='modularity',
n_iterations=120 if args.accurate else 60, # -1 would indicate to iterate until convergence
weights=weight_list
)

# Slower implementation
# import leidenalg as la
# partition = la.find_partition(
# g,
# la.ModularityVertexPartition,
# n_iterations=-1 if args.accurate else 30, # -1 indicates to iterate until convergence
# weights=weight_list,
# seed=42
# )

print(f'Detected {len(partition)} communities.')

# Write the communities
id_2_name_dict = {}
with open(args.vertice_names) as f:
for line in f:
id, name = line.strip().split(' ')

id_2_name_dict[int(id)] = name

output_prefix = args.output_prefix if args.output_prefix else args.edge_weights

for id_community, id_members in enumerate(partition):
with open(f'{output_prefix}.community.{id_community}.txt', 'w') as fw:
for id in id_members:
fw.write(f'{id_2_name_dict[id]}\n')

# Write the plot
if args.plot:
print('Plotting on PDF')

# Take contig names (it assumes PanSN naming)
name_list = [x.split(' ')[-1].split('#')[-1] for x in id_2_name_dict.values()]

# To scale between ~0 and 5.0
max_weight=max(weight_list) / 5.0

ig.plot(
partition,
target = f'{output_prefix}.communities.pdf',
vertex_size=50,
#vertex_color=['blue', 'red', 'green', 'yellow'],
vertex_label=name_list,
vertex_label_size=20,
#vertex_label_color='black',
edge_width=[x/max_weight for x in weight_list],
#edge_color=['black', 'grey'],
bbox=(2000, 2000),
margin=100
)
Loading