Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yaml.scanner.ScannerError when initializing PEP downloaded with geofetch #80

Closed
nleroy917 opened this issue Aug 15, 2022 · 1 comment
Closed
Assignees
Milestone

Comments

@nleroy917
Copy link
Member

nleroy917 commented Aug 15, 2022

I was trying to initialize a PEP I got from GEO using geofetch and peppy threw this error: yaml.scanner.ScannerError: mapping values are not allowed here

Here is the project config:

# Autogenerated by geofetch

name: GSE96155
pep_version: 2.1.0
sample_table: GSE96155_annotation.csv
subsample_table: null

looper:
  output_dir: GSE96155
  pipeline_interfaces: {pipeline_interfaces}

sample_modifiers:
  append:
    Sample_extract_protocol_ch1: general protocol: https://www.encodeproject.org/documents/be2a0f12-af38-430c-8f2d-57953baab5f5/@@download/attachment/Epigenomics_Alternative_Mag_Bead_ChIP_Protocol_v1.1_exp.pdf
    Sample_characteristics_ch1: link: ENCODE dbxrefs Cellosaurus CVCL_B260; http://web.expasy.org/cellosaurus/CVCL_B260
    Sample_data_processing: See GSM*_README.txt supplementary file linked below
    SRR_files: SRA
    
  derive:
    attributes: [read1, read2, SRR_files]
    sources:
      SRA: "${SRABAM}/{SRR}.bam"
      FQ: "${SRAFQ}/{SRR}.fastq.gz"
      FQ1: "${SRAFQ}/{SRR}_1.fastq.gz"
      FQ2: "${SRAFQ}/{SRR}_2.fastq.gz"      
  imply:
    - if: 
        organism: "Mus musculus"
      then:
        genome: mm10
    - if: 
        organism: "Homo sapiens"
      then:
        genome: hg38          
    - if: 
        read_type: "PAIRED"
      then:
        read1: FQ1
        read2: FQ2          
    - if: 
        read_type: "SINGLE"
      then:
        read1: FQ1

project_modifiers:
  amend:
    sra_convert:
      looper:
        results_subdir: sra_convert_results
      sample_modifiers:
        append:
          SRR_files: SRA
          pipeline_interfaces: ${CODE}/geofetch/pipeline_interface_convert.yaml
        derive:
          attributes: [read1, read2, SRR_files]
          sources:
            SRA: "${SRARAW}/{SRR}.sra"
            FQ: "${SRAFQ}/{SRR}.fastq.gz"
            FQ1: "${SRAFQ}/{SRR}_1.fastq.gz"
            FQ2: "${SRAFQ}/{SRR}_2.fastq.gz"

I think this is an issue with the sample_modifiers section. If I wrap the append modifiers in a string with double quotes, the issue resolves. However, I get a second error.

This time peppy throws a TypeError: Provided argument has to be a List[str] or a str, got 'NoneType'. I was able to solve this by removing entirely the subsample_table: null line. Then the project gets initialized properly.

I think there are two "bugs" here.

  1. geofetch doesn't account for colons messing with the key-value syntax of yaml files.
  2. A subsample_table key of null breaks a peppy.Project() instantiation. I am wondering if we should either not include the subsample table path in the config file if it doesn't exist or change peppy to permit null subsample tables.
@nsheff
Copy link
Contributor

nsheff commented Aug 15, 2022

yes, I think you are exactly right on both of these.

For your second point -- I think you should do both:

  1. raise an issue with peppy to allow null subsample tables.
  2. Change geofetch to not create null subsample tables if not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants