Skip to content

Commit

Permalink
parent 556e9b9
Browse files Browse the repository at this point in the history
author Manda Wilson <[email protected]> 1703199176 -0500
committer Robert Sheridan <[email protected]> 1711560265 -0400

upgrade to java 21

switch to genome-nexus-annotation-pipeline that uses new maf repo

updated to spring 6, spring batch 5, spring boot 3 to match cbioportal

fix typos

Updates to AZ-MSKIMPACT to integrate with CDM (#1098)

Fix bug in checking for duplicate Mutation Records (#1099)

* Check if mutationRecord is duplicated before annotating

* Populate mutationMap in loadMutationRecordsFromJson

* add addRecordToMap

* Remove comments, add local vars for debugging

* Remove duplicate MAF variants for AZ

* Fix remove-duplicate-maf-variants call

* revert whitespace change

updates for migrating darwin and crdb to java11 (#1080)

pom changes for pulling moved dependencies
changes to java args to silence warnings

Co-authored-by: cbioportal import user <[email protected]>

Remove Annotated MAF before Import (#958)

* remove annotated MAF to prevent duplicate

* Update subset_and_merge_crdb_pdx_studies.py

---------

Co-authored-by: Avery Wang <[email protected]>

Script to combine arbitrary files (#1104)

* Script to combine arbitrary files

* Modify unit tests to work with script changes

* Remove unnecessary column specifier

* Fix syntax bug

Add sophia script (#1105)

* Add sophia script

* rename transpose_cna file

* Add filter-clinical-arg-functions script

* Add az var to correct automation environment

* Add correct path to transpose_cna script

* Call seq_date function

* Add seq_date before filtering columns

* syntax fix

* Fix call to filter out clinical attribute columns

* Fix nonsigned out file path

* Automate folder name

* directory fixes

* remove quotes?

* change date formatting

* output filepath for duplicate variants script

* use az_msk_impact_data_home var

* move sophia_data_home to automation environment

* Add comments

* Change dir structures in sophia script to match new repo structure

* Add git operations

* Remove test file

* Fix dirs for sophia zip command

* remove quotes

* Zip files before cleanup

* move zip step before git push

Add script for merging Dremio/SMILE into cmo-access (#1102)

- adds cfdna clinical and timeline data from dremio/SMILE
- converts patient identifiers using "dmp over cmo" identifier logic from dremio
- dremio patient id mapping table export code called to produce mapping table
- main script then calls update_cfdna_clinical_sample_patient_ids_via_dremio.sh
- merge.py used to combine clinical data from dremio with clinical data from cmo-access
- metadata headers added using new script : merge_clinical_metadata_headers_py3.py
- other import process flow (similar to other import scripts) followed
- error detection step added after debugging for sporadic data loss in results

Co-authored-by: Manda Wilson <[email protected]>

Modify preconsume script to work on one cohort at a time (#1107)

Call correct function name

add options for logging in for different accounts

Preconsume archer-solid-cv4 and add fetch loop (#1129)

* Handle archer-solid-cv4 samples
* Add loop
* move each cohort to its own dir and fix filename

switch to genome-nexus-annotation-pipeline that uses new maf repo

use updated genome-nexus-annotation-pipeline

update version of cmo-pipelines to 1.0.0

Convert BatchConfiguration to new Spring Batch format

drop unneeded dependency from redcap

removed gdd, updated crdb and ddp batch configs to spring batch 5

removed commons-lang

start of converting cvr to spring batch 5

fix cvr fetcher BatchConfiguration

fixed redcap pipeline spring batch 5 configuration

make spring-batch-integration match batch version

Co-authored-by: Manda Wilson <[email protected]>

drop darwin fetcher (and docs/scripts)
  • Loading branch information
mandawilson authored and sheridancbio committed Mar 27, 2024
1 parent 556e9b9 commit 173f341
Show file tree
Hide file tree
Showing 224 changed files with 2,335 additions and 15,806 deletions.
7 changes: 5 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ jobs:
- run:
name: update apt-get
command: sudo apt-get update --fix-missing
- run:
name: update java
command: wget https://download.java.net/java/GA/jdk21.0.2/f2283984656d49d69e91c558476027ac/13/GPL/openjdk-21.0.2_linux-x64_bin.tar.gz; sudo tar -xvf openjdk-21.0.2_linux-x64_bin.tar.gz -C /usr/lib/jvm
- run:
name: Install setuptools in python 2
command: sudo apt-get -y install python-pip
Expand All @@ -37,11 +40,11 @@ jobs:
# Use mvn clean and package as the standard maven build phase
- run:
name: Build
command: mvn -B -DskipTests clean package
command: export JAVA_HOME=/usr/lib/jvm/jdk-21.0.2; export PATH=/usr/lib/jvm/jdk-21.0.2/bin:$PATH; mvn -B -DskipTests clean package
# Then run your tests!
- run:
name: Test
command: mvn test
command: export JAVA_HOME=/usr/lib/jvm/jdk-21.0.2; export PATH=/usr/lib/jvm/jdk-21.0.2/bin:$PATH; mvn test

# Invoke jobs via workflows
# See: https://circleci.com/docs/2.0/configuration-reference/#workflows
Expand Down
14 changes: 0 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ There are these Java components and appliations:
- common : a java library of helpful utilities used (as a dependency) in other components
- gdd : the "genome directed diagnosis pipeline", which is not currently being maintained (delete?)
- crdb : "crdb_fetcher", a pipeline which fetches data from the clinical research database
- darwin : "darwin_fetcher", a pipeline which fetches data from the Darwin DB2 database. In particular, Caisis timeline data.
- redcap : "redcap_pipeline", a pipeline which uploads data to or downloads data from the redcap clinical database server
- cvr : "cvr_fetcher", a pipeline which downloads samples with identified genomic variants and clinical data from the CVR servers (tumor and germline)
- gene : "gene_data_updater", a pipeline which processes a downloaded NCBI human gene info file and encorporates info into the cBioPortal gene table. No longer maintained. (delete?)
Expand All @@ -20,16 +19,3 @@ There is this compiled linux executable:
- src : "import-tool", a program which writes appropriate improt trigger files for users who control the running of the import pipelines with import-tool scripts.

There are numerous scripts for fetch / import / montior / notification / configuration in the "import-scripts" subdirectory. Also included are current schedule crontab entries.

## Java Versions

The java applications are currently compiled and run under different JAVA releases.

### Applications maintained using Java 8
- crdb
- darwin
- ddp

### Applications maintained using Java 11
- cvr
- redcap
10 changes: 3 additions & 7 deletions common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,15 @@
<name>MSKCC CMO Pipelines Common</name>
<description>Shared classes</description>
<artifactId>common</artifactId>
<version>0.1.0</version>
<version>1.0.0</version>

<parent>
<groupId>org.mskcc.cmo.ks</groupId>
<artifactId>master</artifactId>
<version>0.1.0</version>
<version>1.0.0</version>
</parent>

<dependencies>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.4</version>
</dependency>
<dependency>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
Expand All @@ -31,6 +26,7 @@
<!-- in parent pom and we don't want this -->
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<version>${spring.boot.version}</version>
<configuration>
<skip>true</skip>
</configuration>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,12 @@

package org.cbioportal.cmo.pipelines.common.util;

import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.*;
import java.util.Properties;
import javax.mail.*;
import javax.mail.internet.*;
import org.apache.commons.lang.exception.ExceptionUtils;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
Expand Down Expand Up @@ -101,7 +102,10 @@ public void sendErrorEmail(String pipelineName, List<Throwable> exceptions, Stri
String body = "An error occured while running the " + pipelineName + " with job parameters " + parameters + ".\n\n";
List<String> messages = new ArrayList<>();
for (Throwable exception : exceptions) {
messages.add(ExceptionUtils.getFullStackTrace(exception));
StringWriter sw = new StringWriter();
PrintWriter pw = new PrintWriter(sw);
exception.printStackTrace(pw);
messages.add(pw.toString());
}
body += String.join("\n\n", messages);
sendEmailToDefaultRecipient(subject, body);
Expand Down
12 changes: 4 additions & 8 deletions crdb/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
<name>MSKCC CRDB Pipeline</name>
<description>Clinical Research Database pipeline</description>
<artifactId>crdb</artifactId>
<version>0.1.0</version>
<version>1.0.0</version>

<packaging>jar</packaging>
<parent>
<groupId>org.mskcc.cmo.ks</groupId>
<artifactId>master</artifactId>
<version>0.1.0</version>
<version>1.0.0</version>
</parent>

<dependencies>
Expand All @@ -23,11 +23,6 @@
<artifactId>spring-web</artifactId>
<version>${spring.version}</version>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.4</version>
</dependency>
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
Expand Down Expand Up @@ -100,7 +95,7 @@
<dependency>
<groupId>org.mskcc.cmo.ks</groupId>
<artifactId>common</artifactId>
<version>0.1.0</version>
<version>1.0.0</version>
<type>jar</type>
</dependency>
</dependencies>
Expand All @@ -120,6 +115,7 @@
<!-- required to build an executable jar -->
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<version>${spring.boot.version}</version>
<configuration>
<mainClass>org.mskcc.cmo.ks.crdb.CRDBPipeline</mainClass>
</configuration>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2016 - 2019 Memorial Sloan-Kettering Cancer Center.
* Copyright (c) 2016 - 2024 Memorial Sloan-Kettering Cancer Center.
*
* This library is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY, WITHOUT EVEN THE IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS
Expand Down Expand Up @@ -45,13 +45,16 @@
import org.mskcc.cmo.ks.crdb.pipeline.util.CRDBUtils;
import org.springframework.batch.core.*;
import org.springframework.batch.core.configuration.annotation.*;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.launch.support.SimpleJobLauncher;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.repository.support.JobRepositoryFactoryBean;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.batch.item.*;
import org.springframework.batch.support.transaction.ResourcelessTransactionManager;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.*;
import org.springframework.core.io.Resource;
Expand All @@ -67,18 +70,11 @@
*/

@Configuration
@EnableBatchProcessing
public class BatchConfiguration {

public static final String CRDB_IMPACT_JOB = "crdbImpactJob";
public static final String CRDB_PDX_JOB = "crdbPDXJob";

@Autowired
public JobBuilderFactory jobBuilderFactory;

@Autowired
public StepBuilderFactory stepBuilderFactory;

@Bean
public CRDBUtils crdbUtils() {
return new CRDBUtils();
Expand All @@ -90,31 +86,38 @@ public EmailUtil emailUtil() {
}

@Bean
public Job crdbImpactJob() {
return jobBuilderFactory.get(CRDB_IMPACT_JOB)
.start(crdbSurveyStep())
.next(crdbDatasetStep())
public Job crdbImpactJob(JobRepository jobRepository,
@Qualifier("crdbSurveyStep") Step crdbSurveyStep,
@Qualifier("crdbDatasetStep") Step crdbDatasetStep) {
return new JobBuilder(CRDB_IMPACT_JOB, jobRepository)
.start(crdbSurveyStep)
.next(crdbDatasetStep)
.build();
}

@Bean
public Job crdbPDXJob() {
return jobBuilderFactory.get(CRDB_PDX_JOB)
.start(crdbPDXClinicalSampleStep())
.next(crdbPDXClinicalPatientStep())
.next(crdbPDXTimelineStep())
.next(crdbPDXSourceToDestinationMappingStep())
.next(crdbPDXClinicalAnnotationMappingStep())
public Job crdbPDXJob(JobRepository jobRepository,
@Qualifier("crdbPDXClinicalSampleStep") Step crdbPDXClinicalSampleStep,
@Qualifier("crdbPDXClinicalPatientStep") Step crdbPDXClinicalPatientStep,
@Qualifier("crdbPDXTimelineStep") Step crdbPDXTimelineStep,
@Qualifier("crdbPDXSourceToDestinationMappingStep") Step crdbPDXSourceToDestinationMappingStep,
@Qualifier("crdbPDXClinicalAnnotationMappingStep") Step crdbPDXClinicalAnnotationMappingStep) {
return new JobBuilder(CRDB_PDX_JOB, jobRepository)
.start(crdbPDXClinicalSampleStep)
.next(crdbPDXClinicalPatientStep)
.next(crdbPDXTimelineStep)
.next(crdbPDXSourceToDestinationMappingStep)
.next(crdbPDXClinicalAnnotationMappingStep)
.build();
}

/**
* Step 1 reads, processes, and writes the CRDB Survey query results
*/
@Bean
public Step crdbSurveyStep() {
return stepBuilderFactory.get("crdbSurveyStep")
.<CRDBSurvey, String> chunk(10)
@Bean(name = "crdbSurveyStep")
public Step crdbSurveyStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("crdbSurveyStep", jobRepository)
.<CRDBSurvey, String> chunk(10, transactionManager)
.reader(crdbSurveyReader())
.processor(crdbSurveyProcessor())
.writer(crdbSurveyWriter())
Expand All @@ -141,10 +144,10 @@ public ItemStreamWriter<String> crdbSurveyWriter() {
/**
* Step 2 reads, processes, and writes the CRDB Dataset query results
*/
@Bean
public Step crdbDatasetStep() {
return stepBuilderFactory.get("crdbDatasetStep")
.<CRDBDataset, String> chunk(10)
@Bean(name = "crdbDatasetStep")
public Step crdbDatasetStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("crdbDatasetStep", jobRepository)
.<CRDBDataset, String> chunk(10, transactionManager)
.reader(crdbDatasetReader())
.processor(crdbDatasetProcessor())
.writer(crdbDatasetWriter())
Expand All @@ -168,10 +171,10 @@ public ItemStreamWriter<String> crdbDatasetWriter() {
return new CRDBDatasetWriter();
}

@Bean
public Step crdbPDXClinicalAnnotationMappingStep() {
return stepBuilderFactory.get("crdbPDXClinicalAnnotationMappingStep")
.<CRDBPDXClinicalAnnotationMapping, String> chunk(10)
@Bean(name = "crdbPDXClinicalAnnotationMappingStep")
public Step crdbPDXClinicalAnnotationMappingStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("crdbPDXClinicalAnnotationMappingStep", jobRepository)
.<CRDBPDXClinicalAnnotationMapping, String> chunk(10, transactionManager)
.reader(crdbPDXClinicalAnnotationMappingReader())
.processor(crdbPDXClinicalAnnotationMappingProcessor())
.writer(crdbPDXClinicalAnnotationMappingWriter())
Expand All @@ -196,10 +199,10 @@ public CRDBPDXClinicalAnnotationMappingWriter crdbPDXClinicalAnnotationMappingWr
return new CRDBPDXClinicalAnnotationMappingWriter();
}

@Bean
public Step crdbPDXSourceToDestinationMappingStep() {
return stepBuilderFactory.get("crdbPDXSourceToDestinationMappingStep")
.<CRDBPDXSourceToDestinationMapping, String> chunk(10)
@Bean(name = "crdbPDXSourceToDestinationMappingStep")
public Step crdbPDXSourceToDestinationMappingStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("crdbPDXSourceToDestinationMappingStep", jobRepository)
.<CRDBPDXSourceToDestinationMapping, String> chunk(10, transactionManager)
.reader(crdbPDXSourceToDestinationMappingReader())
.processor(crdbPDXSourceToDestinationMappingProcessor())
.writer(crdbPDXSourceToDestinationMappingWriter())
Expand All @@ -224,10 +227,10 @@ public ItemStreamWriter<String> crdbPDXSourceToDestinationMappingWriter() {
return new CRDBPDXSourceToDestinationMappingWriter();
}

@Bean
public Step crdbPDXClinicalSampleStep() {
return stepBuilderFactory.get("crdbPDXClinicalSampleStep")
.<CRDBPDXClinicalSampleDataset, String> chunk(10)
@Bean(name = "crdbPDXClinicalSampleStep")
public Step crdbPDXClinicalSampleStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("crdbPDXClinicalSampleStep", jobRepository)
.<CRDBPDXClinicalSampleDataset, String> chunk(10, transactionManager)
.reader(crdbPDXClinicalSampleReader())
.processor(crdbPDXClinicalSampleProcessor())
.writer(crdbPDXClinicalSampleWriter())
Expand All @@ -252,10 +255,10 @@ public ItemStreamWriter<String> crdbPDXClinicalSampleWriter() {
return new CRDBPDXClinicalSampleWriter();
}

@Bean
public Step crdbPDXClinicalPatientStep() {
return stepBuilderFactory.get("crdbPDXClinicalPatientStep")
.<CRDBPDXClinicalPatientDataset, String> chunk(10)
@Bean(name = "crdbPDXClinicalPatientStep")
public Step crdbPDXClinicalPatientStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("crdbPDXClinicalPatientStep", jobRepository)
.<CRDBPDXClinicalPatientDataset, String> chunk(10, transactionManager)
.reader(crdbPDXClinicalPatientReader())
.processor(crdbPDXClinicalPatientProcessor())
.writer(crdbPDXClinicalPatientWriter())
Expand All @@ -280,10 +283,10 @@ public ItemStreamWriter<String> crdbPDXClinicalPatientWriter() {
return new CRDBPDXClinicalPatientWriter();
}

@Bean
public Step crdbPDXTimelineStep() {
return stepBuilderFactory.get("crdbPDXTimelineStep")
.<CRDBPDXTimelineDataset, String> chunk(10)
@Bean(name = "crdbPDXTimelineStep")
public Step crdbPDXTimelineStep(JobRepository jobRepository, PlatformTransactionManager transactionManager) {
return new StepBuilder("crdbPDXTimelineStep", jobRepository)
.<CRDBPDXTimelineDataset, String> chunk(10, transactionManager)
.reader(crdbPDXTimelineReader())
.processor(crdbPDXTimelineProcessor())
.writer(crdbPDXTimelineWriter())
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,7 @@ public void close() throws ItemStreamException {
}

@Override
public void write(List<? extends String> items) throws Exception {
List<String> writeList = new ArrayList<>();
for (String result : items) {
writeList.add(result);
}
flatFileItemWriter.write(writeList);
public void write(Chunk<? extends String> items) throws Exception {
flatFileItemWriter.write(items);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ public void close() throws ItemStreamException {
}

@Override
public void write(List<? extends String> items) throws Exception {
public void write(Chunk<? extends String> items) throws Exception {
flatFileItemWriter.write(items);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ public void close() throws ItemStreamException {
}

@Override
public void write(List<? extends String> items) throws Exception {
public void write(Chunk<? extends String> items) throws Exception {
flatFileItemWriter.write(items);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ public void close() throws ItemStreamException {
}

@Override
public void write(List<? extends String> items) throws Exception {
public void write(Chunk<? extends String> items) throws Exception {
flatFileItemWriter.write(items);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ public void close() throws ItemStreamException {
}

@Override
public void write(List<? extends String> items) throws Exception {
public void write(Chunk<? extends String> items) throws Exception {
flatFileItemWriter.write(items);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ public void close() throws ItemStreamException {
}

@Override
public void write(List<? extends String> items) throws Exception {
public void write(Chunk<? extends String> items) throws Exception {
flatFileItemWriter.write(items);
}

Expand Down
Loading

0 comments on commit 173f341

Please sign in to comment.