Using the trained model to get sentences split #1

genyunus · 2018-08-17T15:03:02Z

Hi,

Thanks for the great work! My question is that, how to use the trained model to split a sentence. For example;

sentence: last month we went on vocation the trip was very hard but it was worth doing because we had a lot of fun however on the way I lost my favorite shoes

output: ( splitted sentences output using the train model )

Also, I couldnt see the part where to vectorize the sentences to train, I see you split them and save as dataset.sentences.

Thanks a lot

brandonrobertz · 2018-08-19T04:26:49Z

Hey there,

Sorry for the confusion here. This project was kind of a disorganized tool I was using for a research paper I was working on so it didn't really get the love it needed. I ended up abandoning this idea, though, and went different path.

The overall strategy for turning character streams to sentences was this:
Model 1 is the binary classifier which took a window of characters/words and decided if there needed to be a punctuation mark between any of them. If model 1 said "yes" we pass the data to model 2.
Model 2 is a multiclass classifier which decides where a punctuation mark goes.
These two models were intended to be trained separately.

I know some others who worked on this (https://github.com/jaggzh/nn-punk) and they tried the straight single model method with a char-CNN, but I found that one too sparse with a higher dimensional output, as most of the time you don't need any punctuation in a sentence.

The dual model was derived from some previous work, as you can find in the overall comment here: https://github.com/brandonrobertz/sentence-autosegmentation/blob/master/classifier.py#L17

RE: Vectorization: Vectorization happens in the load_data.py file, precompute function. (https://github.com/brandonrobertz/sentence-autosegmentation/blob/master/load_data.py#L8)
Currently, it's a character based model (which failed). It wouldn't be difficult to change it to a word embedding based model, but I just didn't get that far. (But I suspect it would work a lot better.)

Does that answer it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the trained model to get sentences split #1

Using the trained model to get sentences split #1

genyunus commented Aug 17, 2018

brandonrobertz commented Aug 19, 2018

Using the trained model to get sentences split #1

Using the trained model to get sentences split #1

Comments

genyunus commented Aug 17, 2018

brandonrobertz commented Aug 19, 2018