Skip to content

Python library for dealing with duplicated training data.

Notifications You must be signed in to change notification settings

khasbilegt/sanitizer

Repository files navigation

Sanitizer

Sanitizer is a Python library for dealing with duplicated training data. It utilizes a module that is added in Python 3, called concurrent.futures to minimizes the time that is needed for the general process.

Usage

What you have to do first is to change labels.json and config.json files to your needs.

labels.json - This contains the input folder names as key and their related id, ascii symbol as value in json format.

config.json - This holds symbol values from labels.json as key and result folder name as value in json format.

poetry run sanitizer

Screenshots

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

About

Python library for dealing with duplicated training data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages