Skip to content

Commit

Permalink
Keep infixes of punctuation + hyphens as one token (see #801)
Browse files Browse the repository at this point in the history
  • Loading branch information
ines committed Feb 2, 2017
1 parent 1219a5f commit 012f482
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion spacy/language_data/punctuation.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
r'(?<=[0-9])[+\-\*^](?=[0-9-])',
r'(?<=[{al}])\.(?=[{au}])'.format(al=ALPHA_LOWER, au=ALPHA_UPPER),
r'(?<=[{a}]),(?=[{a}])'.format(a=ALPHA),
r'(?<=[{a}])(?:{h})(?=[{a}])'.format(a=ALPHA, h=HYPHENS),
r'(?<=[{a}])[?";:=,.]*(?:{h})(?=[{a}])'.format(a=ALPHA, h=HYPHENS),
r'(?<=[{a}"])[:<>=](?=[{a}])'.format(a=ALPHA)
]
)
Expand Down

0 comments on commit 012f482

Please sign in to comment.