Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build same classification with deferent categories and dataset #7

Open
Baha2Odeh opened this issue Sep 7, 2019 · 4 comments
Open

Comments

@Baha2Odeh
Copy link

Hello
I want to build the same model

I want to use my own dataset and categories

I have a list of questions with categories

but the issue here I could not find the code you used to prepare the dataset

as dataset prepared on model

can you share the code with us

thank you

@saidziani
Copy link
Owner

saidziani commented Sep 7, 2019

Hi, there's a file made especially for preprocessing helper.py.
The librairies used during this phase are mostly Free/Open source (NLTK)
For the stemming (which is the most important step) I used Farasa, written in Java btw, but you can use sys calls to run the JARs.

@Baha2Odeh
Copy link
Author

i got FarasaSegmenter file
but there is something wrong
i used pipeline method from helper to preprocess the text
this is the input
أمرت السلطات القطرية الأسواق والمراكز التجارية في البلاد برفع وإزالة السلع الواردة من السعودية والبحرين والإمارات ومصر في الذكرى الأولى لإعلان هذه الدول الحصار عليها.
after run getLemmaArticle output is
امر+ت ال+سلط+ات ال+قطري+ه ال+اسواق و+ال+مراكز ال+تجاري+ه ال+بلاد ب+رفع و+ازال+ه ال+سلع ال+وارد+ه ال+سعودي+ه و+ال+بحرين و+ال+امار+ات و+مصر ال+ذكري ال+اولي ل+اعلان ال+دول ال+حصار
and this is the output
امرت السلطات القطريه الاسواق والمراكز التجاريه البلاد برفع وازاله السلع الوارده السعوديه والبحرين والامارات ومصر الذكري الاولي لاعلان الدول الحصار
only stop words removed

@itsani4u
Copy link

itsani4u commented Apr 8, 2020

Where can i get train_data.pkl, I want to creat my own train_data.pkl. Please guide me.

@sbkgith
Copy link

sbkgith commented Nov 26, 2021

Try different input like this, it works:
امرت السلطات القطريه الاسواق والمراكز التجاريه البلاد برفع وازاله السلع الوارده السعوديه والبحرين والامارات ومصر الذكري الاولي لاعلان
الدول المقاطعة بسبب دعم الارهاب

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants