Skip to content

Commit

Permalink
Update EdgeNgramTokenizer.php
Browse files Browse the repository at this point in the history
fixed EdgeNgramTokenizer split word count
  • Loading branch information
toohamster authored Jan 9, 2024
1 parent c8863c6 commit c6658fe
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/Support/EdgeNgramTokenizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ public function tokenize($text, $stopwords = [])
$splits = preg_split($this->getPattern(), $text, -1, PREG_SPLIT_NO_EMPTY);

foreach ($splits as $split) {
for ($i = 2; $i <= strlen($split); $i++) {
for ($i = 2; $i <= mb_strlen($split); $i++) {
$ngrams[] = mb_substr($split, 0, $i);
}
}
Expand Down

0 comments on commit c6658fe

Please sign in to comment.