Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The underlying model seems wrong to me #28

Closed
adrienball opened this issue Jan 27, 2016 · 1 comment
Closed

The underlying model seems wrong to me #28

adrienball opened this issue Jan 27, 2016 · 1 comment

Comments

@adrienball
Copy link

Hi, Could you explain a little bit this function:

function getDistance(trigrams, model) {
    var distance = 0;
    var index = trigrams.length;
    var trigram;
    var difference;

    while (index--) {
        trigram = trigrams[index];

        if (trigram[0] in model) {
            difference = trigram[1] - model[trigram[0]];

            if (difference < 0) {
                difference = -difference;
            }
        } else {
            difference = MAX_DIFFERENCE;
        }

        distance += difference;
    }

    return distance;
}

Especially, I don't get why you do difference = trigram[1] - model[trigram[0]];
Basically you are comparing the number of occurences of a specific trigram, trigram[1], in the input string, with its weight in a specific language model, model[trigram[0]]. And this, for me, doesn't make a lot of sense. Am I getting something wrong here?

For instance I tested it with the simple input "de " which contains the two trigrams "de " and " de". Based on the language models defined in data.json, the expected output should have been "spa" as those two trigrams are in 1st and 3rd positions. However the result is "por", even if these two trigrams are ranked 2nd and 3rd.

Thanks!

Adrien

@wooorm
Copy link
Owner

wooorm commented Jan 27, 2016

Thanks, will push a fix in a second

@wooorm wooorm closed this as completed in 0152d0a Jan 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants