update strides

calico · Oct 17, 2024 · f83f4d9 · f83f4d9
1 parent 65d71da
commit f83f4d9
Show file tree

Hide file tree

Showing 4 changed files with 4 additions and 4 deletions.
diff --git a/tutorials/latest/make_data/Makefile b/tutorials/latest/make_data/Makefile
@@ -14,7 +14,7 @@ OUT=data
 
 # mini borzoi configuration
 LENGTH=393216
-TSTRIDE=43691 # (393216-2*131072)/3
+TSTRIDE=131087 # 393216/3 - 15
 CROP=0
 WIDTH=32
 FOLDS=8

diff --git a/tutorials/latest/make_data/README.md b/tutorials/latest/make_data/README.md
@@ -24,7 +24,7 @@ Finally, run the Makefile to create genome-wide binned coverage tracks, stored a
 make
 ```
 
-In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 43691 bp. The output coverage tracks corresponding to each input sequence are not cropped in the latest version of Borzoi models. This results in 12288 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file.
+In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 131087 bp (which is ~1/3 of the sequence length, but shifts the bin boundaries, too). The output coverage tracks corresponding to each input sequence are not cropped in the latest version of Borzoi models. This results in 12288 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file.
 
 *targets_human.txt*:
 - (unnamed) => integer index of each track (must start from 0 when training a new model).

diff --git a/tutorials/legacy/make_data/Makefile b/tutorials/legacy/make_data/Makefile
@@ -14,7 +14,7 @@ OUT=data
 
 # mini borzoi configuration
 LENGTH=393216
-TSTRIDE=43691 # (393216-2*131072)/3
+TSTRIDE=65551 # (393216-2*98304)/3 + 15
 CROP=98304
 WIDTH=32
 FOLDS=8

diff --git a/tutorials/legacy/make_data/README.md b/tutorials/legacy/make_data/README.md
@@ -24,7 +24,7 @@ Finally, run the Makefile to create genome-wide binned coverage tracks, stored a
 make
 ```
 
-In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 43691 bp. The output coverage tracks corresponding to each input sequence are cropped by 98304 bp on each side, before pooling the measurements in 32 bp bins. This results in 6144 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file.
+In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 65551 bp (which is ~1/3 of the cropped sequence length, but shifts the bin boundaries, too). The output coverage tracks corresponding to each input sequence are cropped by 98304 bp on each side, before pooling the measurements in 32 bp bins. This results in 6144 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file.
 
 *targets_human.txt*:
 - (unnamed) => integer index of each track (must start from 0 when training a new model).