Skip to content

ankiteciitkgp/bertTokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bert Tokenizer

This repository contains java implementation of Bert Tokenizer. The implementation is referred from the Hugging face Transformers library.

https://huggingface.co/transformers/main_classes/tokenizer.html

Usage

To get tokens from text:

String text = "Text to tokenize";
BertTokenizer tokenizer = new BertTokenizer();
List<String> tokens = tokenizer.tokenize(text);

To get token ids using Bert Vocab:

List<Integer> token_ids = tokenizer.convert_tokens_to_ids(tokens);

About

A java implementation of Bert Tokenizer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages