diff --git a/tutorials/latest/make_data/Makefile b/tutorials/latest/make_data/Makefile index c47bb3d..d1c3ae9 100644 --- a/tutorials/latest/make_data/Makefile +++ b/tutorials/latest/make_data/Makefile @@ -14,7 +14,7 @@ OUT=data # mini borzoi configuration LENGTH=393216 -TSTRIDE=43691 # (393216-2*131072)/3 +TSTRIDE=131087 # 393216/3 - 15 CROP=0 WIDTH=32 FOLDS=8 diff --git a/tutorials/latest/make_data/README.md b/tutorials/latest/make_data/README.md index 52df292..b879782 100644 --- a/tutorials/latest/make_data/README.md +++ b/tutorials/latest/make_data/README.md @@ -24,7 +24,7 @@ Finally, run the Makefile to create genome-wide binned coverage tracks, stored a make ``` -In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 43691 bp. The output coverage tracks corresponding to each input sequence are not cropped in the latest version of Borzoi models. This results in 12288 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file. +In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 131087 bp (which is ~1/3 of the sequence length, but shifts the bin boundaries, too). The output coverage tracks corresponding to each input sequence are not cropped in the latest version of Borzoi models. This results in 12288 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file. *targets_human.txt*: - (unnamed) => integer index of each track (must start from 0 when training a new model). diff --git a/tutorials/legacy/make_data/Makefile b/tutorials/legacy/make_data/Makefile index f2dce79..0e35f48 100644 --- a/tutorials/legacy/make_data/Makefile +++ b/tutorials/legacy/make_data/Makefile @@ -14,7 +14,7 @@ OUT=data # mini borzoi configuration LENGTH=393216 -TSTRIDE=43691 # (393216-2*131072)/3 +TSTRIDE=65551 # (393216-2*98304)/3 + 15 CROP=98304 WIDTH=32 FOLDS=8 diff --git a/tutorials/legacy/make_data/README.md b/tutorials/legacy/make_data/README.md index 05a53b6..1e7463a 100644 --- a/tutorials/legacy/make_data/README.md +++ b/tutorials/legacy/make_data/README.md @@ -24,7 +24,7 @@ Finally, run the Makefile to create genome-wide binned coverage tracks, stored a make ``` -In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 43691 bp. The output coverage tracks corresponding to each input sequence are cropped by 98304 bp on each side, before pooling the measurements in 32 bp bins. This results in 6144 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file. +In this example, the Makefile creates 8 cross-validation folds of TFRecords with input sequences of length 393216 bp, generated with a genome-wide stride of 65551 bp (which is ~1/3 of the cropped sequence length, but shifts the bin boundaries, too). The output coverage tracks corresponding to each input sequence are cropped by 98304 bp on each side, before pooling the measurements in 32 bp bins. This results in 6144 coverage bins per 393kb sequence. The specific .w5 tracks to include in the TFRecord generation, and the scales and pooling transforms applied to the bins of each experiment, are given in the targets file 'targets_human.txt'. Below is a description of the columns in this file. *targets_human.txt*: - (unnamed) => integer index of each track (must start from 0 when training a new model).