-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathmaf-querying.feature
95 lines (86 loc) · 4.05 KB
/
maf-querying.feature
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
@milestone_3
Feature: Filter results from MAF files
In order to work with only relevant data from a MAF file
Such as only species recognized by PhyloCSF
I want to filter the results of MAF queries
Scenario: Return only specified species
Given MAF data:
"""
##maf version=1
a score=10542.0
s mm8.chr7 80082334 34 + 145134094 GGGCTGAGGGC--AGGGATGG---AGGGCGGTCC--------------CAGCA-
s rn4.chr1 136011785 34 + 267910886 GGGCTGAGGGC--AGGGACGG---AGGGCGGTCC--------------CAGCA-
s oryCun1.scaffold_199771 14021 43 - 75077 -----ATGGGC--AAGCGTGG---AGGGGAACCTCTCCTCCCCTCCGACAAAG-
s hg18.chr15 88557580 27 + 100338915 --------GGC--AAGTGTGGA--AGGGAAGCCC--------------CAGAA-
s panTro2.chr15 87959837 27 + 100063422 --------GGC--AAGTGTGGA--AGGGAAGCCC--------------CAGAA-
s rheMac2.chr7 69864714 28 + 169801366 -------GGGC--AAGTATGGA--AGGGAAGCCC--------------CAGAA-
s canFam2.chr3 56030570 39 + 94715083 AGGTTTAGGGCAGAGGGATGAAGGAGGAGAATCC--------------CTATG-
s dasNov1.scaffold_106893 7435 34 + 9831 GGAACGAGGGC--ATGTGTGG---AGGGGGCTGC--------------CCACA-
s loxAfr1.scaffold_8298 30264 38 + 78952 ATGATGAGGGG--AAGCGTGGAGGAGGGGAACCC--------------CTAGGA
s echTel1.scaffold_304651 594 37 - 10007 -TGCTATGGCT--TTGTGTCTAGGAGGGGAATCC--------------CCAGGA
"""
When I open it with a MAF reader
And filter for only the species
| hg18 |
| mm8 |
| rheMac2 |
Then an alignment block can be obtained
And the alignment block has 3 sequences
Scenario: Return only blocks having all specified species
Given a MAF source file "mm8_chr7_tiny.maf"
When I open it with a MAF reader
And build an index on the reference sequence
And filter for blocks with the species
| panTro2 |
| loxAfr1 |
And search for blocks between positions 80082471 and 80082730 of mm8.chr7
Then 1 block is obtained
Scenario: Return only blocks having a certain number of sequences
Given a MAF source file "mm8_chr7_tiny.maf"
When I open it with a MAF reader
And build an index on the reference sequence
And filter for blocks with at least 6 sequences
And search for blocks between positions 80082767 and 80083008 of mm8.chr7
Then 1 block is obtained
# sizes present:
# 55 64 128 148 157 163 165 192
Scenario: Return blocks with a maximum text size
Given a MAF source file "mm8_chr7_tiny.maf"
When I open it with a MAF reader
And build an index on the reference sequence
And filter for blocks with text size at least 150
And search for blocks between positions 0 and 80100000 of mm8.chr7
Then 4 blocks are obtained
Scenario: Return blocks with a minimum text size
Given a MAF source file "mm8_chr7_tiny.maf"
When I open it with a MAF reader
And build an index on the reference sequence
And filter for blocks with text size at most 72
And search for blocks between positions 0 and 80100000 of mm8.chr7
Then 2 blocks are obtained
Scenario: Return blocks within a text size range
Given a MAF source file "mm8_chr7_tiny.maf"
When I open it with a MAF reader
And build an index on the reference sequence
And filter for blocks with text size between 72 and 160
And search for blocks between positions 0 and 80100000 of mm8.chr7
Then 3 blocks are obtained
@no_jruby
Scenario: Parse blocks from a BGZF-compressed file
Given test files:
| mm8.chrM.maf |
| mm8.chrM.maf.bgz |
When I run `maf_extract -m mm8.chrM.maf --interval mm8.chrM:6938-13030 -o m1.maf`
And I run `maf_extract -m mm8.chrM.maf.bgz --interval mm8.chrM:6938-13030 -o m2.maf`
And I run `diff m1.maf m2.maf`
Then the exit status should be 0
@no_jruby
Scenario: One-based indexing with maf_extract
Given test files:
| mm8_chr7_tiny.maf |
| mm8_chr7_tiny.kct |
When I run `sh -c 'maf_extract -d . --one-based --interval mm8.chr7:80082592-80082713 | grep "^a" | wc -l'`
Then it should pass with:
"""
2
"""