Skip to content

Commit

Permalink
[tools] fix compile_lexicon_token_fst.sh to avoid duplicate symbols i…
Browse files Browse the repository at this point in the history
…n words.txt (wenet-e2e#1445)
  • Loading branch information
kakashidan authored Sep 14, 2022
1 parent 8308c1f commit c6391c0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion tools/fst/compile_lexicon_token_fst.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ tools/fst/ctc_token_fst_compact.py $dir/tokens.txt | \
fstarcsort --sort_type=olabel > $dir/T.fst || exit 1;

# Encode the words with indices. Will be used in lexicon and language model FST compiling.
cat $tmpdir/lexiconp.txt | awk '{print $1}' | sort | awk '
cat $tmpdir/lexiconp.txt | awk '{print $1}' | sort | uniq | awk '
BEGIN {
print "<eps> 0";
}
Expand Down

0 comments on commit c6391c0

Please sign in to comment.