You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 6, 2020. It is now read-only.
Thanks for the great work! My question is that, how to use the trained model to split a sentence. For example;
sentence: last month we went on vocation the trip was very hard but it was worth doing because we had a lot of fun however on the way I lost my favorite shoes
output: ( splitted sentences output using the train model )
Also, I couldnt see the part where to vectorize the sentences to train, I see you split them and save as dataset.sentences.
Thanks a lot
The text was updated successfully, but these errors were encountered:
Sorry for the confusion here. This project was kind of a disorganized tool I was using for a research paper I was working on so it didn't really get the love it needed. I ended up abandoning this idea, though, and went different path.
The overall strategy for turning character streams to sentences was this:
Model 1 is the binary classifier which took a window of characters/words and decided if there needed to be a punctuation mark between any of them. If model 1 said "yes" we pass the data to model 2.
Model 2 is a multiclass classifier which decides where a punctuation mark goes.
These two models were intended to be trained separately.
I know some others who worked on this (https://github.com/jaggzh/nn-punk) and they tried the straight single model method with a char-CNN, but I found that one too sparse with a higher dimensional output, as most of the time you don't need any punctuation in a sentence.
RE: Vectorization: Vectorization happens in the load_data.py file, precompute function. (https://github.com/brandonrobertz/sentence-autosegmentation/blob/master/load_data.py#L8)
Currently, it's a character based model (which failed). It wouldn't be difficult to change it to a word embedding based model, but I just didn't get that far. (But I suspect it would work a lot better.)
Does that answer it?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
Thanks for the great work! My question is that, how to use the trained model to split a sentence. For example;
sentence: last month we went on vocation the trip was very hard but it was worth doing because we had a lot of fun however on the way I lost my favorite shoes
output: ( splitted sentences output using the train model )
Also, I couldnt see the part where to vectorize the sentences to train, I see you split them and save as dataset.sentences.
Thanks a lot
The text was updated successfully, but these errors were encountered: