第9章 在IPython Notebook 執 行Python Spark 程式


9.1 安裝Anaconda
Step1. 複製安裝Anaconda 下載網址
連結continuum網址
https://repo.continuum.io/archive/index.html
Step2. 下載Anaconda2-2.5.0-Linux-x86_64.sh

wget https://repo.continuum.io/archive/Anaconda2-2.5.0-Linux-x86_64.sh
Step3. 安裝Anaconda
bash Anaconda2-2.5.0-Linux-x86_64.sh -b
Step4. 編輯~/.bashrc 加入模組路徑
修改~/.bashrc
sudo gedit ~/.bashrc
輸入下列內容
export PATH=/home/hduser/anaconda2/bin:$PATH
export ANACONDA_PATH=/home/hduser/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_PATH/bin/python
Step5 使讓~/.bashrc修改生效
source ~/.bashrc
Step6. 查看python 版本
python --version
9.2 在IPython Notebook使用Spark
Step1. 建立ipynotebook 工作目錄
mkdir -p ~/pythonwork/ipynotebook
cd ~/pythonwork/ipynotebook
Step2. 在IPython Notebook 介面執行pyspark

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
Step6. 在IPython Notebook 執行程式碼
sc.master
Step8. 讀取本機檔案程式碼
textFile=sc.textFile("file:/usr/local/spark/README.md")
textFile.count()
Step9. 輸入讀取HDFS 檔案程式碼
textFile=sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt")
textFile.count()
ch09.ipynb 完整內容請參考本書附錄(APPENDIX A 本書範例程式下載與安裝說明) ,下載本章IPython Notebook 範例檔案。
9.7 使用IPython Notebook在hadoop yarnclient模式執行

start-all.sh
cd ~/pythonwork/ipynotebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client pyspark
Step5. 在Hadoop Web 介面可以查看pyspark App
http://localhost:8088/
9.8 使用IPython Notebook在Spark Stand Alone模式執行
Step1. 啟動Spark Stand Alone cluster
/usr/local/spark/sbin/start-all.sh
Step2. 啟動IPython Notebook 在Spark Stand Alone 模式
cd ~/pythonwork/ipynotebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://master:7077 pyspark --num-executors 1 --total-executor-cores 2 --executor-memory 512m
Step5. 查看Spark Standalone Web UI 介面
http://master:8080/
9.9 在不同的模式執行IPython Notebook指令整理
9.9.1 在Local 啟動IPython Notebook
cd ~/pythonwork/ipynotebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*]
9.9.2 在hadoop yarn-client 模式啟動IPython Notebook
start-all.sh

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
9.9.3 在Spark Stand Alone 模式啟動IPython Notebook
start-all.sh

/usr/local/spark/sbin/start-all.sh

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://master:7077 pyspark --num-executors 1 --total-executor-cores 3 --executor-memory 512m


此圖出自Spark官網 https://spark.apache.org/
Share on Google Plus

About kevin

This is a short description in the author block about the author. You edit it by entering text in the "Biographical Info" field in the user admin panel.
    Blogger Comment
    Facebook Comment

2 意見:

  1. mkdir -p ~/pythonwork/ipynotebook, pythonwork -> pythonsparkexample

    回覆刪除
  2. mkdir -p ~/pythonsparkexample/ipynotebook

    回覆刪除