-
Notifications
You must be signed in to change notification settings - Fork 1
IDE Setting for Pyspark
melanie edited this page Feb 2, 2018
·
3 revisions
IntelliJ Idea is a complete IDE with, between others, Java, Scala and Python pluggins. PyCharm is an equivalent IDE, but with Python as only pluggin (therefore lighter).
Download one of those two IDEs (community edition)
- PyCharm: https://www.jetbrains.com/pycharm/download/
- IntelliJ Idea: https://www.jetbrains.com/idea/download/
If you choose IntelliJ Idea, you must install the Python pluggin, which is not incorporated by default.
One first needs to add specific PySpark paths to the ones of the Anaconda Interpreter
- Open your chosen IDE
- Open the cloned Python project with the Anaconda Interpreter
- (IntelliJ only) File -> Project Structure -> SDKs -> your Anaconda interpreter
- (PyCharm only) File -> Default Settings -> Project Interpreter -> your Anaconda interpreter
- (PyCharm only) Click on the "..." icon on the right of your interpreter path, then on "More...", your project interpreter, and finally on the last icon on the bottom right ("Show paths for the selected interpreter")
- Click on "+"
- Select your_path_to_spark/spark-X.X.X-bin-hadoopX.X/python
- "OK"
- Click once again on "+"
- Select your_path_to_spark/spark-X.X.X-bin-hadoopX.X/python/lib/py4j-X.X-src.zip
- Cliquer sur OK
- OK -> Apply -> OK
Finally, we have to set the specific PySpark environment variables to be able to run it in local.
- Run -> Edit Configurations -> Defaults -> Python
- In the "Environment variables" section, click on "...", then on "+"
- Cliquer sur l'icône "+"
- Name: PYTHONPATH
- Value: your_path_to_spark/spark-X.X.X-bin-hadoopX.X/python:your_path_to_spark/spark-X.X.X-bin-hadoopX.X/python/lib/py4j-X.X-src.zip
- Click again on "+"
- Name: SPARK_HOME
- Value: your_path_to_spark/spark-X.X.X-bin-hadoopX.X
- OK -> Apply
- Add the same paths for each test module you will use (Python tests - Unittests for example). Add them for every test module to not have any problem later
- OK
The PySpark imports in your code should now be recognized, and the code should be able to run without any error.