How to handle inappropriate word segmentation in English data? #1273
-
Version number of KH Coder3.Beta.07b Your QeuestionYour Operating System Your Qeuestion When I generate a frequency word list from my English data, I notice that some word segments are incorrect because they are not complete words. For example, the list contains fragments like “r”, “ek”, and “Ho” instead of the correct words “your”, “week”, and “house”. I have already tried using the function [Select Word for Analysis]. I correctly added “your”, “week”, and “house” in the [Force Pick-up] column, then [Run Pre-Processing]. However, the incorrect forms, such as “r”, “ek”, and “Ho”, still appear. What language of text are you trying to analyze? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
Hmm, would you click “r”, “ek”, and “Ho” in the word list screen to open KWIC? We will be able to see where the "r" comes from. Or, if you can attach the input data file here, it would be very helpful. |
Beta Was this translation helpful? Give feedback.
-
Sorry for my poor English. But, can you reproduce the problem with the Sample.xlsx? It would be helpful if you could provide a file that reproduces the problem. Also, how about cleaning your data? A good place to start is the CLEAN function in Excel. Please try creating a new Excel file with CLEANed text and making a new KH Coder project with that new file. |
Beta Was this translation helpful? Give feedback.
Sorry for my poor English. But, can you reproduce the problem with the Sample.xlsx?
It would be helpful if you could provide a file that reproduces the problem.
Also, how about cleaning your data? A good place to start is the CLEAN function in Excel. Please try creating a new Excel file with CLEANed text and making a new KH Coder project with that new file.
https://www.educba.com/clean-in-excel/