You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thendash characters in word_count.txt cause an error when following the "Run your first Spark Job" tutorial. There are only two occurences of this character here: "from 1913–74." and here: "near–bankruptcy".
To Recreate:
using spark-2.3.2-bin-hadoop2.7 on Ubuntu18, pyspark/python 2.7, Installed following instructions from lecture 5, go to directory where you cloned python-spark-tutorial and run the following from lecture 6:
spark-submit ./rdd/WordCount.py
The execution halts about halfway through the frequency counter with the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position4: ordinal not in range(128)
Spoiler, it's the dash. I'm not sure whether or not the utf16 dash was intentional, so I'm posting.
Work-Around:
I changed the two ndash characters to "from 1913-74." and "near-bankruptcy", which solved the issue for me. Related stackoverflow thread where someone else ran into a similar problem with python2.7 and used the same solution.
The text was updated successfully, but these errors were encountered:
Issue:
The
ndash
characters inword_count.txt
cause an error when following the "Run your first Spark Job" tutorial. There are only two occurences of this character here: "from 1913–74.
" and here: "near–bankruptcy
".To Recreate:
using
spark-2.3.2-bin-hadoop2.7
on Ubuntu18, pyspark/python 2.7, Installed following instructions from lecture 5, go to directory where you clonedpython-spark-tutorial
and run the following from lecture 6:spark-submit ./rdd/WordCount.py
The execution halts about halfway through the frequency counter with the following error:
Spoiler, it's the dash. I'm not sure whether or not the utf16 dash was intentional, so I'm posting.
Work-Around:
I changed the two
ndash
characters to "from 1913-74.
" and "near-bankruptcy
", which solved the issue for me. Related stackoverflow thread where someone else ran into a similar problem with python2.7 and used the same solution.The text was updated successfully, but these errors were encountered: