filter & trim #16

averagehat · 2016-02-03T16:45:02Z

No description provided.

averagehat · 2016-02-03T16:57:24Z

"By default, the filtering options discard or redirect the read pair if any of the two reads fulfill the criteria."
http://cutadapt.readthedocs.org/en/stable/guide.html#filtering-reads

necrolyte2 · 2016-02-03T17:14:23Z

so essentially that is saying that if any read is filtered that both reads will be taken out and put into the "filtered" file
That is, you will always have properly paired forward and reverse files for output

So then you have to somehow fetch the filtered out reads from some file?

necrolyte2 · 2016-02-03T17:18:54Z

I'm also trying to figure out why jip is not parsing the docopt correctly. it seems like it should work to do something like
You can see here that at least the library is working as expected

from jip.vendor.docopt import docopt
d=docopt('''\nUsage:\n  test (-j <j> | -x <x>)\n\nOptions:\n   -j <j>\n   -x <x>\n''', ['-j', '5', '-x', '6'])
print d

Usage:
  test (-j <j> | -x <x>)

d=docopt('''\nUsage:\n  test (-j <j> | -x <x>)\n\nOptions:\n   -j <j>\n   -x <x>\n''', ['-j', '5'])
print d

{'-j': '5',
 '-x': None}

averagehat · 2016-02-03T17:41:32Z

from jinja2 import Template
print Template(open("templates/jip.jinja").read()).render(json.loads(open("templates/filter.json").read()))

necrolyte2 · 2016-02-03T17:49:58Z

So here you can see he does some crazy code where he parses the docstring
https://github.com/thasso/pyjip/blob/master/jip/options.py#L1384
I think he does all of this so special tags like stdin and stdout are created correctly and such

The downfall is that you lose the ability to use all the normal docopt features like mutually exclusive groups

necrolyte2 · 2016-02-03T18:32:26Z

thasso/pyjip#62

necrolyte2 · 2016-02-03T19:54:31Z

Well, we could investigate running jobs straight from template + json

python -c 'from jip.tools import ScriptTool; import json; from jinja2 import Template; tool_src=Template(open("templates/jip.jinja").read()).render(json.loads(open("templates/filter.json").read())); import sys; t=ScriptTool.from_string(tool_src); t.init(); import jip; jip.jobs.create_jobs(t, args=sys.argv[1:])
' -f f -r r -u u

averagehat · 2016-02-03T20:09:42Z

On a related note, I would reccomend doing validation using either
schematic
or with json schemas [example(https://github.com/VDBWRAIR/bioframework/blob/8d3f95aa2567830492e8e997838ae36b53828b93/sim/schema.json)

json schemas have the advantage of being very clear and being just json but schematic is more flexible because you can use any python function (not unlike python contracts)

necrolyte2 · 2016-02-03T20:41:04Z

I just realized that in our template the Usage line is not printing the options from json file.

…into filter-trim

necrolyte2 · 2016-02-05T14:46:46Z

Discussion:

All stages accepting paired reads should accept interleave
The input arguments should be -s for "single read" and -f -r for "forward" and "reverse
Stages outputting fastq should support outputting interleave as well
All stages should output command they are running?

necrolyte2 · 2016-02-05T15:57:45Z

Just to document discussion:

Stages accepting reads should accept interleave only(unpaired as well)
[<input>] [-p] would be the docopt string to allow for standard input
Output will be interleave as well

necrolyte2 · 2016-02-05T17:20:25Z

marcelm/cutadapt#174

…into filter-trim

…endency to setup.py

…e them

…common. Updated common to support toggable log to console for all scripts

…rleave now test usage for input/output

…plate for being able to easily generate sequence/records

Hypothesis

averagehat · 2016-02-10T17:19:11Z

I'm comfortable merging this

necrolyte2 · 2016-02-10T17:32:44Z

The jip tools don't conform to the
[<input>] [-p] schema yet

index_filter_quality.jip
interleaved_index_filter.jip

Should this PR represent the completion of filter_reads and trim_reads ?

averagehat · 2016-02-11T19:24:53Z

This is ready for merging
NB: this is the io matrix, and something like that might be more generalized in Biotest

necrolyte2 · 2016-02-11T20:30:50Z

What is that iomatrix doing? Hard to tell by just reading it

averagehat · 2016-02-11T20:54:57Z

yeah I will have to re-write it sometime. Basically it defines a matrix of choices like
[stream, file](where stream is stdin/stdout) as a hypothesis strategy. So in the test that uses it, that test will get a random combination of stream/real file for input/output.

Basically it does this by baking the options in to sh (equivalent to partial application) and ensuring the actual fastq strings are in whatever stream/file gets baked in for input. Unfortunately that means the make_io_matrix needs the sequences that at are actually generated by hypothesis in order to put them there. So make_io_matrix is actually flatmapped onto reads_and_indices. The strategy returns functions so that the actual I/O can be delayed (note that if you assume a condition that fails then these will get thrown away and the I/O time would've been wasted). The I/O functions are used in the test which receives that strategy to get the results.

This is not super simple but it is fairly generalized as it only replaces the input/output. The point is you can put whatever you want into the body of your test and have your test test whatever you want to test.

filter & trim

added jip template

abe38fd

averagehat changed the title ~~added jip template~~ filter & trim Feb 3, 2016

averagehat and others added 2 commits February 3, 2016 12:22

templating templates example

6d559f8

moved file

0ddb306

fix few issues

7176b0a

fix mutually exclusive option

e285376

necrolyte2 and others added 2 commits February 3, 2016 15:13

take jinja template and json and combine into output stream

5f71b6e

added index quality filter

9a643a1

averagehat and others added 4 commits February 3, 2016 15:41

refactoring io

ff2cf14

Merge branch 'filter-trim' of https://github.com/VDBWRAIR/bioframework …

15b029b

…into filter-trim

template now includes options in usage

df242b0

added index filter jip file

e6216f5

averagehat added 2 commits February 5, 2016 10:15

added interleaved template

f502f28

write_zip_results now takes variable number of input files as args

d010cd0

averagehat added 3 commits February 5, 2016 11:21

added interleaved index filter

1b451d0

stdin support

4e682f1

better arg names

d108795

transform_reads tool

689ecf3

averagehat and others added 17 commits February 5, 2016 13:16

Merge branch 'filter-trim' of https://github.com/VDBWRAIR/bioframework …

ff38983

…into filter-trim

Renaming transform_reads to trim_reads

cf31ee6

Set execute perms on jip modules. Add shebang to tools. Add toolz dep…

d2f8421

…endency to setup.py

adding in generic test libraries and converted trim_reads tests to us…

14c1698

…e them

removed redundant cutadapt tool. Updated convert_format.robot to use …

a6f33fb

…common. Updated common to support toggable log to console for all scripts

Fixed all tests to use common keywords and convert and paired to inte…

98366b6

…rleave now test usage for input/output

fixed interleave index filter and some basic tests just to get bioler…

6722d2c

…plate for being able to easily generate sequence/records

fix missing dependency for tests

797be9e

check dir exist before removing

e14ebf6

index_filter now lazy (uses generators)

a8dcd80

refactor index_filter

3249d91

hypothesis tests

60152ac

filtering--with index--is not idempotent

a9cdb9b

add hypothesis and sh to requirements

ddeaa18

fixed seqio error where output file could be empty string

0bf52b6

more tests, all working locally

2d2efd7

Merge pull request #17 from VDBWRAIR/hypothesis

b370f0e

Hypothesis

averagehat force-pushed the filter-trim branch from f702715 to b370f0e Compare February 10, 2016 19:31

averagehat added 3 commits February 10, 2016 14:35

filterjip now uses [<input>] consistent with others

2ba326f

working on fixing filter jip for tests

1180bca

hypothesis tests working

48dfc4c

necrolyte2 added a commit that referenced this pull request Feb 11, 2016

Merge pull request #16 from VDBWRAIR/filter-trim

bec2f45

filter & trim

necrolyte2 merged commit bec2f45 into dev Feb 11, 2016

averagehat mentioned this pull request Feb 11, 2016

Bam processing (WIP) #19

Open

4 tasks

necrolyte2 deleted the filter-trim branch February 12, 2016 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filter & trim #16

filter & trim #16

averagehat commented Feb 3, 2016

averagehat commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

averagehat commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

averagehat commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

necrolyte2 commented Feb 5, 2016

necrolyte2 commented Feb 5, 2016

necrolyte2 commented Feb 5, 2016

averagehat commented Feb 10, 2016

necrolyte2 commented Feb 10, 2016

averagehat commented Feb 11, 2016

necrolyte2 commented Feb 11, 2016

averagehat commented Feb 11, 2016

filter & trim #16

filter & trim #16

Conversation

averagehat commented Feb 3, 2016

averagehat commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

averagehat commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

averagehat commented Feb 3, 2016

necrolyte2 commented Feb 3, 2016

necrolyte2 commented Feb 5, 2016

necrolyte2 commented Feb 5, 2016

necrolyte2 commented Feb 5, 2016

averagehat commented Feb 10, 2016

necrolyte2 commented Feb 10, 2016

averagehat commented Feb 11, 2016

necrolyte2 commented Feb 11, 2016

averagehat commented Feb 11, 2016