Build the language model based on n-grams frequencies #31

iosadchiy · 2018-05-27T09:14:17Z

This is not intended to be merged but to discuss if this feature can be useful.

I was experimenting with n-grams frequencies from Ruscorpora. The idea was to load the frequency files directly into the model:

jamspell load_ngrams alphabet_ru.txt 1grams.csv 2grams.csv 3grams.csv ru.bin

You can see some short samples of the .csv files here

Let me know if this feature can be useful.

bakwc · 2018-05-27T09:53:36Z

Thanks for PR, good feature! Let me know when you finish - I'll be glad to merge it.
BTW, could you please upload somewhere your model? I'd like to compare it to a model trained on wikipedia+news.

bakwc · 2018-05-27T09:59:30Z

convert_corpora.rb

@@ -0,0 +1,19 @@
+# encoding: UTF-8


Better rewrite it on python and put to evaluate folder - all useful scripts are stored there for now.

iosadchiy · 2018-05-27T10:22:03Z

Yep, sure, the model is here

rprilepskiy · 2020-01-27T13:51:45Z

@iosadchiy, did you have time to fix the PR checks error?

iosadchiy added 3 commits May 27, 2018 11:56

WIP Add 'load_ngrams' command

70d1d2e

Convert ruscorpora's ngrams to csv

cf49cd0

Load ngrams as lang model

0b48589

bakwc reviewed May 27, 2018

View reviewed changes

iosadchiy mentioned this pull request Jun 12, 2018

WIP: Update model at runtime #21

Closed

iosadchiy added 2 commits July 5, 2018 17:15

Train model both from corpora ngrams and text

3b72124

Rewrite convert script to python

a3e0663

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build the language model based on n-grams frequencies #31

Build the language model based on n-grams frequencies #31

iosadchiy commented May 27, 2018

bakwc commented May 27, 2018

bakwc May 27, 2018

iosadchiy commented May 27, 2018

rprilepskiy commented Jan 27, 2020

Build the language model based on n-grams frequencies #31

Are you sure you want to change the base?

Build the language model based on n-grams frequencies #31

Conversation

iosadchiy commented May 27, 2018

bakwc commented May 27, 2018

bakwc May 27, 2018

Choose a reason for hiding this comment

iosadchiy commented May 27, 2018

rprilepskiy commented Jan 27, 2020