korean word frequency (counting philosophy as a content analysis tool) #1270
-
Version number of KH Coder3.Beta.08e Your QeuestionI apologize for disturbing you. While analyzing Korean data, I noticed that many words (verbs and adjectives) in Korean are composed of "N (noun) + 하다." However, in the word frequency results, "N (noun)" and "하다" appear separately, rather than as a whole "N하다," leading to inaccuracies in the analysis. Additionally, if I specify a string to set "N하다" as a single string "TAG," it still doesn't yield the correct result due to the inflection of the word. Do you have any suggestions on how to solve this issue? Thank you for your response. |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
Can you tell me what "하다" means? Is there such a big difference between "N하다" and "N"? |
Beta Was this translation helpful? Give feedback.
-
If you want to count "N" and "N하다" separately, (A1) you may use coding rule like:
If necessary please force pick up 하다 as a word. Also please select "sentences" as the coding unit. (A2) Or you can just see "Collocation Stats" of the "N." You can open it by clicking "Stats" button in lower left corner of KWIC window. (B) If you want to count or analyze other things regarding "N하다", please specify your exact goal. Maybe, we can find the solution to achieve the goal. |
Beta Was this translation helpful? Give feedback.
-
In Korean, "하다" can function as a standalone verb, but at the same time, it is also a suffix in many words, forming compound words with many nouns. For example,“exercise”, the verb "운동하다" is a compound word made up of "운동" and "하다". However, in KH Coder's word frequency analysis, "운동" and "하다" are counted separately as two different words. This causes a discrepancy because I need the word frequency of "운동하다". However, I can't "force pick up" "운동하다" as a whole, because it may not always appear in the text in the form of "운동하다". It could appear in its past tense as "운동했". I'm not sure if my explanation is clear. Thank you for your response. |
Beta Was this translation helpful? Give feedback.
-
Then, you can use "Collocation Stats" of "운동".
Find "하다" line and "R1" number. If R1 number is 10, "운동하다" appeared 10 times. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Thank you for the valuable feedback. But currently, as a content analysis tool, KH Coder is developed with the following philosophy. Counting "Publication (출간)" as a whole is the priority goal, and distinguishing between "Doing Publication (출간하다)" and just "Publication (출간)" is the next step. It doesn't matter what conjugation it is, or how it forms a compound word. We just want to count how many times the concept "Publication (출간)" appears. That number is more important for statistically understanding the content of the text. When you break down the numbers into frequency of "Publications (출간)" and frequency of "Doing Publications (출간하다)," it may become more difficult to statistically understand the content of the text. Again, content analysis of the text is the goal of KH Coder. If you need such a distinction between "publication (출간)" and "doing publication (출간하다)", please use Collocation Stats or coding rules mentioned above. To make such a distinction for the top 100 nouns, for example, you just need to do the same operation 100 times. Input the new word to KWIC window and hit Enter key. Collocation Stats window will update automatically. Or, just make 100 entries in the coding rule file. (If necessary, you can develop a Perl plugin that operates KH Coder automatically. If you do not want to develop it yourself, you can outsource it to a software company.) |
Beta Was this translation helpful? Give feedback.
-
Thank you for your response. Have a good time! |
Beta Was this translation helpful? Give feedback.
Then, you can use "Collocation Stats" of "운동".
https://khcoder.net/en/gallery/pages/image/imagepage4.html
Find "하다" line and "R1" number. If R1 number is 10, "운동하다" appeared 10 times.