-
Notifications
You must be signed in to change notification settings - Fork 654
v12 Release Notes
compromise is a modest library that does natural-language processing in javascript.
it was built to make searching and transforming human-text easy and playful.
v12 is the biggest (and most-proud) release in it's 8 year history. It involves 500 commits over 11 months of work.
You can read about some of the design-decisions for this update here.
Although the release is a near-complete rewrite, most compromise v11 scripts will continue to work in v12.
There are many subtle changes, and this document is intended as a upgrade guide.
-
v12 is considerably faster. In most cases it is 50% faster than v11.
-
v12 is considerably smaller. It is 170kb, instead of 235kb. (~40% smaller)
-
pass-by-reference issues are gone. Most github issues will close.
-
the new plugin scheme makes customization cleaner and more simple
-
.export()
,.import()
to serialize and compress a document object -
new
@termMethod
match-syntax feature - to query non-tag info -
.json()
and.text()
outputs are configurable now -
paragraph support!
-
better unicode support
-
moved all documentation to observablehq
-
cleaned-up internal handling of whitespace/punctuation
-
.getPunctuation()- use
.pre()
or.post()
-
.setPunctuation() - use
.pre(str)
or.post(str)
-
.whitespace() - use
.json({terms:{whitespace:true}})
-
.flatten() - use
.join()
- .lump() - this was an anti-pattern.
- .insertAt() - using term indexes is not fun!
- .reduce() - not sure if this ever even worked?
-
.normal() - use
.text('normal')
-
nouns().articles() - use
nouns().json()
- nlp.clone() - removed, now that nlp.extend() is more-direct.
the v12 feature - @termMethod
allows you to query things that are not in the term's tags. This allows us to clear-up the following tags:
-
#Comma -
@hasComma
-
#Quotation -
.quotations()
has been improved -
#ClauseEnd -
.clauses()
has been heavily-improved -
#NumberRange - the
compromise-numbers
plugin cleans these things up.
our new plugin scheme allows us to easily add all sorts of behaviour to compromise classes. This has allowed us to separate some functionality into plugins. These are very easy to include (promise!):
-
.values() - number parsing has been moved to compromise-numbers
-
.ngrams() - ngram functionality has been moved to compromise-ngrams
-
.dates() - date parsing has been moved to compromise-dates
-
.adjectives() - adjective conjugation has been moved to compromise-adjectives
-
.contractions() - now returns only contractions, and not possible-contractions.
.contract()
is now a stand-alone method. -
.out('html') - html output has been moved to compromise-output
These plugins can just be applied like this:
const nlp = require('compromise')
nlp.extend(require('compromise-plugin-foo'))
Once the plugin is applied, things should work just as normal.
-
.map()
,.forEach()
,.filter()
,.some()
all return full Doc objects of length 1 (instead of an undocumented internal object) -
results of
.canbe()
are more like.match()
-
.normalize()
doesn't transform numbers anymore - use compromise-numbers -
more consistent behaviour for
.replace('foo [bar]', 'baz')
-
.numbers()
results no longer include Units, by default. Get them with.numbers().units()
-
.verbs()
results no longer include leading/trailing Adverbs. Get them with.verbs().adverbs()
-
the internal compromise api has changed considerably. If you were 'reaching in' to the internal objects in v11, you'll see many changes.
-
removed no-longer-needed
prefix_
and_suffix
operators from match syntax -
.toCamelCase()
no-longer capitalizes char[0]. Run.toCamelCase().toTitleCase()
for this.
-
.reverse()
- -
.unique()
- remove duplicates using 'root' -
.cache()
- speed-up matches and lookups -
.uncache()
- manually disable the cache -
.join()
- search between sentences, for example -
.lookAhead()
- match through the terms before your current match -
.lookBehind()
-match through the terms after your current match -
.lists()
- find all comma-seperated natural-language lists -
.matchOne()
- return the first .match() -
.segment()
- split a document according to a given label -
.export()
- serialize and compress the document for saving/moving
- .extend() - change any internal compromise data
- .load() - create a new document from
.export()
results
- limited AND support in match syntax:
.match('(foo && bar)')
-
.hash()
via compromise-output -
.syllables()
via compromise-syllables -
.paragraphs()
via compromise-paragraphs - improved handling of slashed terms - like
he is/was fun.