You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current code for detecting quotes is pretty unsophisticated. It just sequentially pairs anything the token.is_quote deems a quotation mark and assumes the indexes to be the quote boundaries. If there are an odd number of quotation marks, it throws an error.
Solution:
I've been doing quote detection in some of unreliably formatted text lately which has things like "»" used as bullet points and lots of unpredictable stray characters, so I came up with a workaround. I updated the quote detection functionality to only return quotes whose starting and ending code points match a set of pre-determined pairs.
For example: Bill told me I "shouldn‘t wear those pants" but I will.
In the current version, running quote detection here would raise an error because there are three quotation mark-like tokens in the sentence. Even if it didn't, it would return "shouldn" as a quote because textacy assumes sequential quotation marks are quote boundaries.
My version takes the first quotation mark (q) and iterates through all the later quotation marks until it finds one (q_) where (ord(q.text), ord(q_.text)) is in the list of acceptable pairs.
The text was updated successfully, but these errors were encountered:
Problem:
The current code for detecting quotes is pretty unsophisticated. It just sequentially pairs anything the
token.is_quote
deems a quotation mark and assumes the indexes to be the quote boundaries. If there are an odd number of quotation marks, it throws an error.Solution:
I've been doing quote detection in some of unreliably formatted text lately which has things like "»" used as bullet points and lots of unpredictable stray characters, so I came up with a workaround. I updated the quote detection functionality to only return quotes whose starting and ending code points match a set of pre-determined pairs.
For example:
Bill told me I "shouldn‘t wear those pants" but I will.
In the current version, running quote detection here would raise an error because there are three quotation mark-like tokens in the sentence. Even if it didn't, it would return "shouldn" as a quote because textacy assumes sequential quotation marks are quote boundaries.
My version takes the first quotation mark (q) and iterates through all the later quotation marks until it finds one (q_) where
(ord(q.text), ord(q_.text))
is in the list of acceptable pairs.The text was updated successfully, but these errors were encountered: