Skip to content

Commit

Permalink
Merge branch 'feature/add-more-languages'
Browse files Browse the repository at this point in the history
This merge adds support for more languages: in total, franc now
supports 168 languages out of the box (almost every language with
1,000,000 or more speakers is now supported).

It is now trivial to add support for every language, or languages
with 100,000 or more speakers.

This change does, however, remove support for Klingon, Latin, Welsh,
Basque, Hawaiian, Southern Ndebele, and Venda.

Additionally, this removes the unique treatment of Brazillian and
European Portuguese. Currently, the generic `por` is returned.
  • Loading branch information
wooorm committed Oct 2, 2014
2 parents fb21764 + c6d7322 commit a26b4c7
Show file tree
Hide file tree
Showing 110 changed files with 2,784 additions and 1,090 deletions.
1 change: 0 additions & 1 deletion .jscs.json
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,6 @@
"maximumLineLength": 78,
"requireCapitalizedConstructors": true,
"safeContextKeyword": "self",
"requireDotNotation": true,
"disallowYodaConditions": true,
"validateJSDoc": {
"checkParamNames": true,
Expand Down
28 changes: 28 additions & 0 deletions History.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,32 @@

0.2.0-rc.2 / 2014-09-30
==================

* Update Readme.md and benchmark for recent changes
* Update spec for b8bb1a6
* Refactor API to return ISO-639-3 codes instead of 639-2
* Add generation of ISO-639-3 codes, remove 639-2 support
* Add ISO-639-3 names file, remove ISO-639-2 names file

0.2.0-rc.1 / 2014-09-30
==================

* Update Supported-Languages.md for recent changes
* Update spec for recent API changes
* Remove support for three languages
* Update bower.json, component.json for recent directory refactor
* Update fixtures
* Update data to new and current trigrams
* Move ISO codes to data directory
* Add Portuguese to ISO codes
* Update npm scripts to lint new files, handle renames
* Update fixture generation script
* Add trigram generation script
* Add mapping between ISO and UDHR keys
* Add trigrams and udhr as dev-dependencies
* Move scrips and API files
* Remove fixtures

0.1.1 / 2014-09-19
==================

Expand Down
43 changes: 17 additions & 26 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,40 +24,31 @@ $ bower install franc
```js
var franc = require('franc');

franc('Alle menslike wesens word vry'); // "af"
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট'); // "bn"
franc('Alle mennesker er født frie og'); // "no"
franc('Alle menslike wesens word vry'); // "afr"
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট'); // "ben"
franc('Alle mennesker er født frie og'); // "nob"
franc(''); // "und"

franc.all('O Brasil caiu 26 posições em');
/*
* [
* [ 'pt-BR', 4342 ],
* [ 'pt-PT', 6393 ],
* [ 'pt', 5281 ],
* [ 'ca', 6091 ],
* [ 'cs', 6137 ]
* [ 'por', 5507 ],
* [ 'lat', 6384 ],
* [ 'lav', 6391 ],
* [ 'cat', 6432 ],
* [ 'spa', 6481 ]
* ...
* ]
*/

franc.all('Heghlu\'meH QaQ jajvam').slice(0, 3);
/*
* [
* [ 'tlh', 4253 ], // 'eH, tlhIngan, 'e' H*'t*gh QaQ!
* [ 'haw', 5472 ],
* [ 'az', 5537 ]
* ]
*/

franc.all(''); // [ [ 'und', 1 ] ]
```

> Note!: **franc** returns the `"und"` language code for an undetermined language. This happens when the input value is to short to give a significant answer.
> Note!: **franc** returns the `"und"` language code for an undetermined language. This happens when the input value is too short to give a significant answer.
## Supported languages

**franc** supports 86 languages. For a complete list, check out [Supported-Languages.md](Supported-Languages.md).
**franc** supports 82 languages. For a complete list, check out [Supported-Languages.md](Supported-Languages.md).

## Other Language detection libraries

Expand All @@ -74,16 +65,16 @@ $ npm run install-benchmark # Just once of course.
$ npm run benchmark
```

On a MacBook Air, it runs 86 tests, 11 times per second (total: 946 op/s).
On a MacBook Air, it runs 169 tests, 2 times per second (total: 338 op/s).

```
benchmarks * 86 paragraphs in different languages
11 op/s » franc -- this module
7 op/s » guesslanguage
6 op/s » languagedetect
6 op/s » vac
benchmarks * 169 paragraphs in different languages
2 op/s » franc -- this module
2 op/s » guesslanguage
2 op/s » languagedetect
2 op/s » vac
```

## License

LGPL
LGPL © Titus Wormer
263 changes: 172 additions & 91 deletions Supported-Languages.md

Large diffs are not rendered by default.

45 changes: 25 additions & 20 deletions benchmark/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,31 +47,36 @@ function forEveryLanguage(callback) {
}
}

suite('benchmarks * 86 paragraphs in different languages', function () {
set('iterations', 30);
set('type', 'static');
suite(
'benchmarks * ' +
Object.keys(fixtures).length +
' paragraphs in different languages',
function () {
set('iterations', 10);
set('type', 'static');

bench('franc -- this module', function () {
forEveryLanguage(function (language, fixture) {
franc(fixture);
bench('franc -- this module', function () {
forEveryLanguage(function (language, fixture) {
franc(fixture);
});
});
});

bench('guesslanguage', function () {
forEveryLanguage(function (language, fixture) {
guessLanguage(fixture);
bench('guesslanguage', function () {
forEveryLanguage(function (language, fixture) {
guessLanguage(fixture);
});
});
});

bench('languagedetect', function () {
forEveryLanguage(function (language, fixture) {
languageDetect(fixture);
bench('languagedetect', function () {
forEveryLanguage(function (language, fixture) {
languageDetect(fixture);
});
});
});

bench('vac', function () {
forEveryLanguage(function (language, fixture) {
vac(fixture);
bench('vac', function () {
forEveryLanguage(function (language, fixture) {
vac(fixture);
});
});
});
});
}
);
6 changes: 5 additions & 1 deletion bower.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,15 @@
".*",
"*.log",
"*.md",
"*.yml",
"component.json",
"package.json",
"benchmark",
"components",
"coverage",
"data",
"node_modules",
"benchmark",
"script",
"spec"
]
}
30 changes: 0 additions & 30 deletions build-fixtures-file.js

This file was deleted.

35 changes: 0 additions & 35 deletions build-supported-languages.js

This file was deleted.

9 changes: 6 additions & 3 deletions component.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "franc",
"version": "0.1.1",
"version": "0.2.0-rc.2",
"description": "Detect the language of text",
"license": "LGPL",
"keywords": [
Expand All @@ -17,9 +17,12 @@
},
"repository": "wooorm/franc",
"scripts": [
"index.js"
"index.js",
"lib/franc.js",
"lib/expressions.js",
"lib/singletons.js"
],
"json": [
"data.json"
"lib/data.json"
]
}
Loading

0 comments on commit a26b4c7

Please sign in to comment.