Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to convert file inputs to either json, csv or tsv. #338

Merged
merged 14 commits into from
Oct 24, 2024

Conversation

john-thuo1
Copy link
Contributor

@john-thuo1 john-thuo1 commented Oct 13, 2024

Contributor Checklist

  • [✔️] This pull request is on a separate branch and not the main branch.

Description

This pull request adds functionality to convert data into CSV, TSV, and JSON formats. The changes include:

  • Extension of the convert_to_json and convert_to_csv_or_tsv functions in convert.py.
  • Updates to the CLI in main.py to support new conversion commands.
  • Updates to the get function to support different file formats other that JSON.

Example Commands:

  1. From JSON to CSV:
    scribe-data convert --lan French --data-type translations --input-file ./fli/French/translations.json --output-type csv --output-dir ./converted/
  2. From CSV to JSON:
    scribe-data convert --lan French --data-type translations --input-file ./converted/French/translations.csv --output-type json --output-dir ./converted/
  3. From JSON to TSV:
    scribe-data convert --lan French --data-type translations --input-file ./fli/French/translations.json --output-type tsv --output-dir ./converted/ 
  4. From TSV to JSON:
    scribe-data convert --lan French --data-type translations --input-file ./converted/French/translations.tsv --output-type json --output-dir ./converted/ 
    

image

Tests for the convert methods have been implemented in test_convert.py

Related issue

Copy link

github-actions bot commented Oct 13, 2024

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

  • The linting and formatting workflow within the PR checks do not indicate new errors in the files changed

  • The CHANGELOG has been updated with a description of the changes for the upcoming release and the corresponding issue (if necessary)

@andrewtavis andrewtavis requested review from mhmohona and andrewtavis and removed request for mhmohona October 13, 2024 15:05
@andrewtavis andrewtavis added the hacktoberfest-accepted Accepted as a part of Hacktoberfest label Oct 13, 2024
@john-thuo1 john-thuo1 changed the title feat : Functionality to convert json/csv&tsv files Add functionality to convert file inputs to either json, csv or tsv. Oct 14, 2024
Copy link
Member

@mhmohona mhmohona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Great work @john-thuo1

@john-thuo1
Copy link
Contributor Author

john-thuo1 commented Oct 15, 2024

It looks good. Great work @john-thuo1

Thanks for the review @mhmohona ! I will proceed to look into the remaining tests for the functions then,

@andrewtavis
Copy link
Member

Yes that'd be great, @john-thuo1! I'll review once those are in :)

@andrewtavis
Copy link
Member

Checking in here, @john-thuo1 :) Do you want to do the tests in a separate PR, or continue on this one? Whatever would be best for you 😊

@john-thuo1
Copy link
Contributor Author

john-thuo1 commented Oct 19, 2024

Checking in here, @john-thuo1 :) Do you want to do the tests in a separate PR, or continue on this one? Whatever would be best for you 😊

I had some challenges figuring out how to mock the input file, output directory, and output file and patching some functionalities, hence the delay, but it is all good now!

I have finished the tests. You can go ahead with the review @andrewtavis and let me know of other test scenarios I can add.

@andrewtavis
Copy link
Member

Sorry for asking, @john-thuo1, but would you be able to fix the tests given the changes that went through to the project? I'll review directly after that. I can also look into this if you'd prefer :)

@john-thuo1
Copy link
Contributor Author

john-thuo1 commented Oct 24, 2024

Sorry for asking, @john-thuo1, but would you be able to fix the tests given the changes that went through to the project? I'll review directly after that. I can also look into this if you'd prefer :)

I have just pulled the changes. Going through the tests that have failed!

@andrewtavis
Copy link
Member

Thanks, @john-thuo1!

@andrewtavis
Copy link
Member

Merged the emoji work in as well to get rid of a merge conflict for you, @john-thuo1! Let me know if there's anything else I can do to support!

@john-thuo1
Copy link
Contributor Author

john-thuo1 commented Oct 24, 2024

Merged the emoji work in as well to get rid of a merge conflict for you, @john-thuo1! Let me know if there's anything else I can do to support!

@andrewtavis The rest of the tests are working apart from this one : test_get_emoji_keywords. It raises an Assertion Error and creates a directory locally. The test implemented does not seem to call the subprocess.run() .Regarding the PyICU warning raised, perhaps you can try running it on your end and see if you will face the same issue.
image

@andrewtavis
Copy link
Member

I'll take a look at this, @john-thuo1! Thanks so much :)

@andrewtavis
Copy link
Member

Test is fine now, @john-thuo1, at least locally. I fixed it before and there was a change overwriting the fix. Checking the rest now 😊

@@ -224,7 +259,9 @@ def main() -> None:
return

if args.command in ["list", "l"]:
list_wrapper(args.language, args.data_type, args.all)
list_wrapper(
language=args.language, data_type=args.data_type, all_bool=args.all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding in the args! Really appreciate explicit function calls to make the functionality more clear.

args.output_dir,
args.overwrite,
)
convert(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functioning just like the others now 😊 I'll do a rename to convert_wrapper for consistency and so people know this functions conditionally to call sub functions. More of a switcher, but then total_switcher sounds a bit weird.

@john-thuo1
Copy link
Contributor Author

john-thuo1 commented Oct 24, 2024

Test is fine now, @john-thuo1, at least locally. I fixed it before and there was a change overwriting the fix. Checking the rest now 😊

@andrewtavis It seems the get emoji never calls subprocess.run() in the get.py hence the test fails . Is this sth you are observing as well? If it is the case would this alternative test case suffice?

in get.py :


    elif data_type in {"emoji-keywords", "emoji_keywords"}:
        generate_emoji(language=language, output_dir=output_dir)

Proposed Alternative test case

    @patch("scribe_data.cli.get.Path") 
    @patch("scribe_data.cli.get.generate_emoji") 
    def test_get_emoji_keywords(self, mock_generate_emoji, mock_path):
        mock_path_instance = mock_path.return_value  
        mock_path_instance.mkdir.return_value = None  
        mock_path_instance.exists.return_value = True 
        mock_path_instance.__str__.return_value = "scribe_data_json_export"
 
        get_data(language="English", data_type="emoji-keywords")

        self.assertTrue(mock_generate_emoji.called)

        mock_generate_emoji.assert_called_with(language="English", output_dir="scribe_data_json_export") ```

Copy link
Member

@andrewtavis andrewtavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really really nice work here, @john-thuo1 :) Some tests needed to be commented out as I'd like to get this in, but overall this is a monumental addition to the project 😊 Really appreciate all the hard work and care you put into this!

@andrewtavis andrewtavis merged commit c13f50e into scribe-org:main Oct 24, 2024
7 checks passed
@john-thuo1
Copy link
Contributor Author

Really really nice work here, @john-thuo1 :) Some tests needed to be commented out as I'd like to get this in, but overall this is a monumental addition to the project 😊 Really appreciate all the hard work and care you put into this!

@andrewtavis @mhmohona Thank you. The numerous back-and forths were very helpful in grasping the nuances of the task.

@andrewtavis
Copy link
Member

Thank you, @john-thuo1! You were a perfect person to work through this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Accepted as a part of Hacktoberfest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants