Skip to content

Commit

Permalink
update strides
Browse files Browse the repository at this point in the history
  • Loading branch information
davek44 committed Oct 17, 2024
1 parent 65d71da commit f83f4d9
Show file tree
Hide file tree
Showing 4 changed files with 4 additions and 4 deletions.
2 changes: 1 addition & 1 deletion tutorials/latest/make_data/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ OUT=data

# mini borzoi configuration
LENGTH=393216
TSTRIDE=43691 # (393216-2*131072)/3
TSTRIDE=131087 # 393216/3 - 15
CROP=0
WIDTH=32
FOLDS=8
Expand Down
2 changes: 1 addition & 1 deletion tutorials/latest/make_data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Finally, run the Makefile to create genome-wide binned coverage tracks, stored a
make
```

In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 43691 bp. The output coverage tracks corresponding to each input sequence are not cropped in the latest version of Borzoi models. This results in 12288 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file.
In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 131087 bp (which is ~1/3 of the sequence length, but shifts the bin boundaries, too). The output coverage tracks corresponding to each input sequence are not cropped in the latest version of Borzoi models. This results in 12288 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file.

*targets_human.txt*:
- (unnamed) => integer index of each track (must start from 0 when training a new model).
Expand Down
2 changes: 1 addition & 1 deletion tutorials/legacy/make_data/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ OUT=data

# mini borzoi configuration
LENGTH=393216
TSTRIDE=43691 # (393216-2*131072)/3
TSTRIDE=65551 # (393216-2*98304)/3 + 15
CROP=98304
WIDTH=32
FOLDS=8
Expand Down
2 changes: 1 addition & 1 deletion tutorials/legacy/make_data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Finally, run the Makefile to create genome-wide binned coverage tracks, stored a
make
```

In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 43691 bp. The output coverage tracks corresponding to each input sequence are cropped by 98304 bp on each side, before pooling the measurements in 32 bp bins. This results in 6144 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file.
In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 65551 bp (which is ~1/3 of the cropped sequence length, but shifts the bin boundaries, too). The output coverage tracks corresponding to each input sequence are cropped by 98304 bp on each side, before pooling the measurements in 32 bp bins. This results in 6144 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file.

*targets_human.txt*:
- (unnamed) => integer index of each track (must start from 0 when training a new model).
Expand Down

0 comments on commit f83f4d9

Please sign in to comment.