Skip to content Skip to sidebar Skip to footer

Using Python With Zeppelin Under The Spark 2 Interpreter

I have deployed HDP: 2.6.4 on a virtual machine I can see that the spark2 is not pointing to the correct python folder. My questions are 1) How can I find where my python is locat

Solution 1:

For a Sandbox environment like yours, a sandbox image is made on a Linux OS (CentOS). The Zeppelin Notebook points, in all probability, to the Python installation that comes along with every Linux OS. If you wish to have your own installation of Python and your own set of libraries for Data Analysis like those in the SciPy stack. You need to install Anaconda on your Virtual machine. Your VM eed to be connected to the internet so that you can download and install the Anaconda package for testing.

You can then point Zeppelin to the anaconda's directory till the following path : /home/user/anaconda3/bin/python where user is your username

Zeppelin Configuration also confirms the fact that it uses the default python installation at /usr/bin/python. You can go through its documentation for more Information

UPDATE

Hi Joseph, Spark Installations, by default, use the Python interpreter and the python libraries that have been installed on your OS. The folder structure that you have shown only tell you the location of the PySpark module. This module is a library like Pandas ior NumPy.

What you can do is install the SciPy Stack[NumPy, Pandas, MatplotLib etc..] via the command pip install package name and import those libraries directly into your Zeppelin Notebook.

Use the command whereis python in the terminal of your snadbox, the result would give you something as follows /usr/bin/python /usr/bin/python2.7 ....

In your Zeppelin Configuration, for the property zeppelin.pyspark.python you can set the first value from the out put of the previous command i.e /usr/bin/python. So now all the libraries you installed via the pip install command would be available for you in zeppelin.

This process would only work for your Sandbox environment. In a real production cluster, your administrator needs to install all these libraries on all the nodes of your Spark cluster.

Post a Comment for "Using Python With Zeppelin Under The Spark 2 Interpreter"