Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support decoding with ranges to avoid a copy #54

Open
jerinphilip opened this issue Sep 7, 2021 · 0 comments
Open

Support decoding with ranges to avoid a copy #54

jerinphilip opened this issue Sep 7, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@jerinphilip
Copy link

For browsermt/bergamot-translator#202, it was suggested that a view for a vector/range backed by a binary representaion be substituted to eliminate conversions between vector and binary representation.

However, decodeWithByteRanges assumes a vector to be provided:

void decodeWithByteRanges(const Words& sentence,
std::string &decoded,
std::vector<string_view> &byteRanges,
bool ignoreEOS) const override {
sentencepiece::SentencePieceText spt;
std::vector<int> spmSentence;
spmSentence.reserve(sentence.size());
for(auto&& word : sentence)
spmSentence.push_back(word.toWordIndex());
spm_->Decode(spmSentence, &spt);
decoded = spt.text(); // Creates copy of string.
string_view decoded_view(decoded);
for(auto piece : spt.pieces()) {
string_view byteRange = decoded_view.substr(piece.begin(), piece.end() - piece.begin());
byteRanges.push_back(byteRange);
}
if(ignoreEOS){
byteRanges.pop_back();
}
}

We need to provide something that can work with ranges/iterators instead to avoid the additional copy. Since this is a function only used by bergamot we need not worry about breaking any backwards compatibility. Consistency with the remaining may still be a concern, in which case we can provide an overload which internally calls the range-based method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant