Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify project's emoji keyword functionality #359

Closed
2 tasks done
andrewtavis opened this issue Oct 15, 2024 · 30 comments
Closed
2 tasks done

Simplify project's emoji keyword functionality #359

andrewtavis opened this issue Oct 15, 2024 · 30 comments
Assignees
Labels
feature New feature or request hacktoberfest Included as a part of Hacktoberfest help wanted Extra attention is needed

Comments

@andrewtavis
Copy link
Member

Terms

Description

Something that should be changed about the project is the way that the emoji keyword functionality works. Basically all of the files in question are the same except for a few variables, and there are already CLI arguments being passed to these files. We do want the structure of the package to determine the functionality of the project, but then this is a case where there's really no benefit of repetition as there is for the queries where they serve has a record for how to get data from Wikidata via SPARQL.

Some ideas:

  • We could leave the __init__.py files in the emoji keyword directories that would serve to still include the this functionality for those languages that it works for
  • We could then move all of the functionality to the src/scribe_data/unicode directory, and this one file would be called from the CLI

Contribution

Thoughts on this would be very appreciated! Happy tor review and work with people on this 😊

@andrewtavis andrewtavis added feature New feature or request help wanted Extra attention is needed hacktoberfest Included as a part of Hacktoberfest labels Oct 15, 2024
@andrewtavis
Copy link
Member Author

CC @DeleMike, @catreedle, @KesharwaniArpita and @VNW22 for the initial discussion 😊

@KesharwaniArpita
Copy link
Contributor

Sounds good to me> The emoji keyword functionality is essentially the same for all languages, so centralizing it makes sense. @andrewtavis Can I be assigned?

@DeleMike
Copy link
Contributor

This is a great issue! @andrewtavis. I imagine we want to update something related to emojis and we would have to go through all the directories!

As you suggested, having a single point of call for the emojis is important. I went through the files, I believe gen_emoji_lexicon in src/scribe_data/unicode/process_unicode.py will be an important function.

This is just a shallow thought for now. I would love to contribute in any way to resolve this issue.

@andrewtavis
Copy link
Member Author

Let's leave this issue for a bit and continue the conversation on it. We all have a lot being worked on right now, so let's close some current issues and then we can plan the work from there :)

Thanks for your interest in helping!

@catreedle
Copy link
Contributor

I agree with centralizing the functionality, it’ll certainly save a lot of repetitive work!
I'm curious, will the emoji generation remain uniform across languages, or is there a plan to account for different linguistic forms, such as gender variations?

@andrewtavis
Copy link
Member Author

Gendered emojis should be coming out in the current setup as it's Unicode's words that are associated with a given emoji ordered by their usage and then the top X - usually 3 - are selected :) We can keep this in mind for later though 😊

@catreedle
Copy link
Contributor

I see. Then I think there shouldn't be any issue with centralizing it :)

@Ekikereabasi-Nk
Copy link
Contributor

Hi @andrewtavis @catreedle @KesharwaniArpita @VNW22 @DeleMike I'm interested in joining the team

@catreedle
Copy link
Contributor

Hi @andrewtavis @catreedle @KesharwaniArpita @VNW22 @DeleMike I'm interested in joining the team

Welcome! I think we're still very much in the discussion phase. Looking forward to collaborating with you! :)

@Ekikereabasi-Nk
Copy link
Contributor

Hi @andrewtavis @catreedle @KesharwaniArpita @VNW22 @DeleMike I'm interested in joining the team

Welcome! I think we're still very much in the discussion phase. Looking forward to collaborating with you! :)

Thank you so much. Has the discussion started? Is the discussion on the element app?

@andrewtavis
Copy link
Member Author

I was going to suggest this to you, @Ekikereabasi-Nk :) We're discussing it in the issue right now. Can you look at the emoji keyword functionality and make a suggestion on how to centralize this functionality into the src/scribe_data/unicode directory? :)

@Ekikereabasi-Nk
Copy link
Contributor

Ekikereabasi-Nk commented Oct 15, 2024

To achieve a centralize functionality I suggest the steps:

  • move the shared logic generate_emoji_keyword.py to src/scribe_data/unicode use all the code, and remove variable definition LANGUAGE = "English", emojis_per_keyword = 3 and DATA_TYPE = "emoji-keywords. Instead we will have a centralized function will take these language-specific parameters ( LANGUAGE and emojis_per_keyword) as arguments.
  • Next, we will need to modify the emoji language file for each language to import the centralized function from step 1 and create a simplify code
  • We will also need to do some modification in the process_unicode.py

So, how do you all see this suggestion? @andrewtavis

@KesharwaniArpita
Copy link
Contributor

Next, we will need to modify the emoji language file for each language to import the centralized function from step 1 and create a simplify code

Hi @andrewtavis, @DeleMike , @catreedle. @Ekikereabasi-Nk, Do you think we should modify the __init__.py files to import and call centralized function, passing in the appropriate variables as arguments (e.g., language and emoji-specific variables)? It will be able to cater the grouped languages (SA Hindustani and Norweign etc) too and any other specific required customization.

@andrewtavis
Copy link
Member Author

I'm generally thinking that we follow @Ekikereabasi-Nk's suggestions here and maybe keep the empty __init__.py files as a means of keeping the functionality from the project structure, but more the full process is done in the unicode directory. I'm actually not sure what languages Unicode has support for, so maybe that's something that we could explore a bit - i.e. what languages are included in the CLDR dataset. There's no better source of this information, and with this we'd know to just put an __init__.py file in the directories for those languages that we find have emoji support. What's more, another check could be written to find which languages do have support and make sure that each of them and only them have an __init__.py file :)

@andrewtavis
Copy link
Member Author

A basic thing is that the __init__.py files should remain empty as this is Python packaging convention. They should make it easier to load something with a different name or do nothing, as I understand it.

@KesharwaniArpita
Copy link
Contributor

Thanks for the feedback! I agree with the idea of keeping the init.py files as a Python packaging convention, especially to maintain the project's structure and potentially assist with language-specific functionality loading.

Regarding the suggestion of using the CLDR dataset to check which languages have emoji support, that sounds like a great idea. It will ensure we're only including relevant languages in the directories.

@VNW22
Copy link
Contributor

VNW22 commented Oct 15, 2024

heyy, I'm kinda late but i'd like to join in the discussion :)

@andrewtavis
Copy link
Member Author

By all means, @VNW22! Let's try to get to this soon :) @Ekikereabasi-Nk, do you want to open a PR for this and the others can review?

@VNW22
Copy link
Contributor

VNW22 commented Oct 15, 2024

I fully support the plan to centralize the emoji-keyword functionality by moving the shared logic to src/scribe_data/unicode—this will streamline the process and reduce redundancy. It seems like a solid solution has emerged from the discussion so far, but I’d be happy to assist with any part of the refactoring or the exploration of the CLDR dataset to ensure we cover all relevant languages.

@andrewtavis
Copy link
Member Author

Do you want to look into the script to check that we have emoji support for all languages that we can and don't for those that we shouldn't, @VNW22? You'd need to do the setup for CLDR, which is difficult to do on Windows (if that's your operating system, then you'll likely need WSL to run the emoji programs on a Linux machine).

Let us know!

@Ekikereabasi-Nk
Copy link
Contributor

By all means, @VNW22! Let's try to get to this soon :) @Ekikereabasi-Nk, do you want to open a PR for this and the others can review?

Alright @andrewtavis

@KesharwaniArpita
Copy link
Contributor

@Ekikereabasi-Nk and @andrewtavis , I wanted to rewrite the code for the language emoji files. I think we can start collaborating on the code. While @Ekikereabasi-Nk is working on the centralized script, is it alright that I start working on the function call for the languages? We can make the minor changes later too?

@Ekikereabasi-Nk
Copy link
Contributor

@Ekikereabasi-Nk and @andrewtavis , I wanted to rewrite the code for the language emoji files. I think we can start collaborating on the code. While @Ekikereabasi-Nk is working on the centralized script, is it alright that I start working on the function call for the languages? We can make the minor changes later too?

Sure @KesharwaniArpita I'm also through with the centralize function

@andrewtavis
Copy link
Member Author

Feel free to send along PRs and we'll see on both ends :)

@VNW22
Copy link
Contributor

VNW22 commented Oct 16, 2024

Do you want to look into the script to check that we have emoji support for all languages that we can and don't for those that we shouldn't, @VNW22? You'd need to do the setup for CLDR, which is difficult to do on Windows (if that's your operating system, then you'll likely need WSL to run the emoji programs on a Linux machine).

Let us know!

is it possible on mac?

@VNW22
Copy link
Contributor

VNW22 commented Oct 16, 2024

Do you want to look into the script to check that we have emoji support for all languages that we can and don't for those that we shouldn't, @VNW22? You'd need to do the setup for CLDR, which is difficult to do on Windows (if that's your operating system, then you'll likely need WSL to run the emoji programs on a Linux machine).

Let us know!

okay, I'll be working on it

@andrewtavis
Copy link
Member Author

Sorry I was planning on sending along an explanation here, @VNW22, but got caught up with things :)

Is much easier on Mac and Linux. Specifically we to have a guide for this here. Let me know if anything is confusing and we can update the guide!

Thanks for looking into this 😊

@Ekikereabasi-Nk
Copy link
Contributor

Ekikereabasi-Nk commented Oct 17, 2024

Thanks @KesharwaniArpita for the work here #397

@VNW22
Copy link
Contributor

VNW22 commented Oct 17, 2024

Sorry I was planning on sending along an explanation here, @VNW22, but got caught up with things :)

Is much easier on Mac and Linux. Specifically we to have a guide for this here. Let me know if anything is confusing and we can update the guide!

Thanks for looking into this 😊

no worries :) looking into it

@andrewtavis
Copy link
Member Author

Closed via #440 😊 Really appreciate everyone collaboration here to get this functionality to be dramatically more streamlined 🚀

@github-project-automation github-project-automation bot moved this from Todo to Done in Scribe Board Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request hacktoberfest Included as a part of Hacktoberfest help wanted Extra attention is needed
Projects
Archived in project
Development

No branches or pull requests

6 participants