You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2021. It is now read-only.
I'm writing a new twitter-text library and it already passes all tests in 'twitter-text-conformance', but I'm not really confident it's correct. I say that because I feel a lot of unit tests are missing.
Take the username validation in validate.yml, for example. It has a test that says "valid username: a-z < 20 characters" and another one saying "All numeric username are allowed". How about mix-and-match? Is 20 the overall username size limit? How about Unicode? is "@" a valid username? Should the unicode "@" also be considered a valid username marker?
I had a lot of questions like that while I coded, and I still do. I'd love to volunteer and write all those tests, but I'm not the authority here so I can't pick what's valid and what's not off the top of my head - nor am I willing to try posts on my own Twitter account just for testing purposes (my followers would get crazy :)
tl;dr - Is there an implementation I can use as "correct"? This way I can use it as authoritative and see whether the new tests passes or fails.
Thanks!
The text was updated successfully, but these errors were encountered:
They're being updated frequently so we cannot say the current implementations define the "correct" behaviors, but they are the ones currently used in the productions.
Can you please clarify what you mean by "they are the ones currently used in the productions"? If they are used in production on Twitter itself, then isn't that the defacto "correct" behavior?
I ask because I've found a scenario where the twitter-text-java implementation behaves differently than the text box on Twitter.com. If you enter " http://google.com ", where the spaces before and after the URL are UTF-8 non-breaking-space characters (\u00A0 in Java), then the text box on Twitter.com will find the link, count it as 20 characters, and display "Link will appear shortened." But when you pass that same string through the twitter-text-java library, it won't find the URL when you call Extractor.extractURLs().
Yes they are used in production on Twitter itself.
And thank you for reporting a bug. Ideally they should have the same behavior (and twitter-text-conformance is to help verifying their consistency) but as you pointed out there are still some inconsistency. I'll fix the bug on twitter-text-java.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I'm writing a new twitter-text library and it already passes all tests in 'twitter-text-conformance', but I'm not really confident it's correct. I say that because I feel a lot of unit tests are missing.
Take the username validation in validate.yml, for example. It has a test that says "valid username: a-z < 20 characters" and another one saying "All numeric username are allowed". How about mix-and-match? Is 20 the overall username size limit? How about Unicode? is "@" a valid username? Should the unicode "@" also be considered a valid username marker?
I had a lot of questions like that while I coded, and I still do. I'd love to volunteer and write all those tests, but I'm not the authority here so I can't pick what's valid and what's not off the top of my head - nor am I willing to try posts on my own Twitter account just for testing purposes (my followers would get crazy :)
tl;dr - Is there an implementation I can use as "correct"? This way I can use it as authoritative and see whether the new tests passes or fails.
Thanks!
The text was updated successfully, but these errors were encountered: