-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Font change #262
base: main
Are you sure you want to change the base?
Font change #262
Conversation
Hi @pithysr I will leave some comments in the code. Correctness:All checks have failed, please fix this (maybe fetch upstream is enough) |
|
||
class FontChange(SentenceOperation): | ||
tasks = [TaskType.TEXT_CLASSIFICATION] | ||
languages = ["en"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why just English? I believe that this transformation works well for many other languages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can work with other languages in its current form but the dictionary I used (reference in the code and in the README file) does not cover the accented letters or characters outside the English alphabet. If such a resource is added, we can use this for other languages without skipping those characters.
for ttc in tokens_to_change: | ||
while True: | ||
font = random.sample(list(fonts.keys()), 1)[0] | ||
if font != "normal": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just removing "normal" from the dictionary in the JSON file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was by choice. I did not want any discrepancy between my dictionary and the original resource. Of course, it can be removed.
Hello @marco-digio |
from tasks.TaskTypes import TaskType | ||
|
||
nltkdl("stopwords") | ||
nltkdl("punkt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be moved inside the constructor I think..
nltkdl("maxent_ne_chunker") | ||
nltkdl("punkt") | ||
nltkdl("averaged_perceptron_tagger") | ||
nltkdl("stopwords") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even this could go inside the constructor..
|
||
def __init__(self, seed=666, max_outputs=1): | ||
super().__init__(seed, max_outputs=max_outputs) | ||
self.nlp = spacy.load("en_core_web_sm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of loading new spacy, please use global spacy like below:
self.nlp = spacy_nlp if spacy_nlp else spacy.load("en_core_web_sm")
return perturbed_texts | ||
|
||
|
||
class HashtagifyTransformation(SentenceOperation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring for all the classes and functions with a proper description of params and return type.
return perturbed_texts | ||
|
||
|
||
class FontChange(SentenceOperation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, add docstrings for classes and functions.
TaskType.TEXT_CLASSIFICATION, | ||
TaskType.TEXT_TAGGING, | ||
] | ||
languages = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the list is very long, better to create a separate file for this and import it here.
The conflicts are with my other submission that is already approved. Should I open a new PR? |
Hi @pithysr: You can continue fixing the conflict in this same PR. Thanks! |
## Target Tasks | ||
|
||
This transformation can be used for data augmentation in text classification tasks. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a Data And Code Provenance section to point out the correct source of all the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaustubhdhole Hi, I had this information in the README file but I added a separate section. However, now I cannot commit due to an error (No module named 'torchtext'
). I did not have this error previously when submitting the code. Can I submit with git commit --no-verify
?
Inspired by social media posts, this transformation add noise to an input sentence by randomly changing the font of words
in a sentence.