wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.6.tgz

tar -xvf spark-2.1.0-bin-hadoop2.6.tgz -C ./



Install spark on yarn cluster

Spark dataframe操作

Dataframe Column Operation

To select a column from the data frame

val ageCol = people("age")

The Column type can also be manipulated through its various functions

// The following creates a new column that increases everybody's age by 10.

people("age") + 10 // in Sc......

Spark SQL常用命令

Create a Dummy DataFrame

Prepare a sample json file called employee.json


{"id" : "1201", "name" : "satish", "age" : "25"}

{"id" : "1202", "name" : "krishna", "age" : "28"}