Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Tokens in Swift and Kotlin #227

Merged
merged 16 commits into from
Aug 5, 2023

Conversation

w11wo
Copy link
Contributor

@w11wo w11wo commented Jul 28, 2023

Implemented tokens in Swift and Kotlin API. Tested on iOS and Android.


// Pointer to continuous memory which holds string based tokens
// which are seperated by \0
const char *tokens;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we define

const char *_tokens;
const char *const*tokens;

where _tokens is your current tokens

And the new char **tokens is a pointer array.
tokens[0] contains the address of the first token in
_tokens. tokens[1] contains the address of the second
token in _tokens.

In this way, it simplifies users' life as they only need to iterate
char **tokens.

for (int32 i = 0; i != count; ++i) {
  const char*t = tokens[i];
  // process this token
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csukuangfj I added const char *const *tokensArr; for easy to implement, and I also tested

} else {
r->count = 0;
r->timestamps = nullptr;
r->tokens = nullptr;
r->tokensArr = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add code to free it to avoid memory leak.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csukuangfj sorry, what do you want to free? This line is just to initialize null for pointer incase there is no token from SherpaOnnxOnlineRecognizerResult

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change

void DestroyOnlineRecognizerResult(const SherpaOnnxOnlineRecognizerResult *r) {
delete[] r->text;
delete r;
}

to

void DestroyOnlineRecognizerResult(const SherpaOnnxOnlineRecognizerResult *r) {
  delete[] r->text;
  delete[] r->json;
  delete[] r->tokensArr;
  delete r;
}

By the way, please change the variable names:

  • totalLength -> total_length
  • tokensTemp -> tokens_temp
  • tokensArr -> tokens_arr
    to follow the code convention in sherpa-onnx.

r->timestamps[i] = result.timestamps[i];
}

r->tokensArr = const_cast<const char *const *>(tokensTemp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r->tokensArr = const_cast<const char *const *>(tokensTemp);
r->tokensArr = tokensTemp;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm done

return r;
}

void DestroyOnlineRecognizerResult(const SherpaOnnxOnlineRecognizerResult *r) {
delete[] r->text;
delete[] r->json;
delete[] r->tokens;
delete[] r->tokens_arr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also delete

delete[] r->timestamps;

char **tokens_temp = new char*[r->count];
int pos = 0;
for (int32_t i = 0; i < r->count; ++i) {
tokens_temp[i] = const_cast<char*>(r->tokens) + pos;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tokens_temp[i] = const_cast<char*>(r->tokens) + pos;
tokens_temp[i] = r->tokens + pos;

Copy link
Contributor

@ductranminh ductranminh Aug 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this casting const_cast<char*> is build error:

error: assigning to 'char *' from 'const char *' discards qualifiers
tokens_temp[i] = r->tokens + pos;

total_length);
r->timestamps = new float[r->count];
char **tokens_temp = new char*[r->count];
int pos = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int pos = 0;
int32_t pos = 0;

const char *const *tokens_arr;

// Pointer to continuous memory which holds timestamps which
// are seperated by \0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment. It is not separated by \0 for timestamps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@csukuangfj
Copy link
Collaborator

Thanks for your contribution!

@csukuangfj csukuangfj merged commit 64efbd8 into k2-fsa:master Aug 5, 2023
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants