Apache Pig script example

posted on Nov 20th, 2016

Apache Pig

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMSs. Pig Latin can be extended using User Defined Functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system

2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)

3) Apache Pig pre installed (How to install Pig on Ubuntu 14.04)

Pig Script Example

We will see how how to run Apache Pig scripts in batch mode.

Comments in Pig Script

While writing a script in a file, we can include comments in it as shown below.

a) Multi-line comments

/* ...... */ - multiline comment

b) Single -line comments

--single line commnet

Executing Pig Script in Batch mode

1) Local Mode

Step 1 - Create a employee.txt file.

employee.txt

Step 2 - Add these following lines to employee.txt file.

1201,satish,25
1202,krishna,28
1203,amith,39
1204,javed,23
1205,prudvi,23

Step 3 - Create a sample pig script.

sample_script.pig

Step 4 - Add these following lines to sample_script.pig. Save and close.

employee = LOAD '/home/hduser/Desktop/employee.txt' USING
PigStorage(',') as (id:int,name:chararray,age:int);
Dump employee;

Step 5 - Change the directory to /usr/local/pig/bin

$ cd /usr/local/pig/bin

Step 6 - Run the sample_script.pig In my case, the sample_script.pig script is saved in /home/hduser/Desktop/PIG/ directory.

$ pig -x local /home/hduser/Desktop/PIG/sample_script.pig

OR

You can execute it from the Grunt shell as well using the exec command as shown below.

$ pig -x local

exec /home/hduser/Desktop/PIG/sample_script.pig

exec hdfs://localhost:9000/user/hduser/pig/sample_script.pig

2) MapReduce Mode

employee.txt
1201,satish,25
1202,krishna,28
1203,amith,39
1204,javed,23
1205,prudvi,23

Step 7 - Copy employee.txt from local file system to HDFS. In my case, the employee.txt file is stored in /home/hduser/Desktop/ directory.

$ hdfs dfs -copyFromLocal /home/hduser/Desktop/employee.txt /user/hduser/emp

Step 8 - Create a sample pig script.

sample_script.pig

Step 9 - Add these following lines to sample_script.pig. Save and close.

employee = LOAD 'hdfs://localhost:9000/user/hduser/emp/employee.txt' USING
PigStorage(',') as (id:int,name:chararray,age:int);
Dump employee;

Executing a Pig Script from HDFS

Step 10 - Run the sample_script.pig In my case, the sample_script.pig script is stored in HDFS.

$ pig -x mapreduce hdfs://localhost:9000/user/hduser/pig/sample_script.pig

Apache Pig Script Example

OR

You can execute it from the Grunt shell as well using the exec command as shown below.

$ pig -x mapreduce

exec /home/hduser/Desktop/PIG/sample_script.pig

exec hdfs://localhost:9000/user/hduser/pig/sample_script.pig

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Pig Installation   Pig Execution Mechanism   Pig GRUNT Shell Usage   Pig Load and Store Operations   Pig Diagnostic Operators   Pig Group Example   Pig Join Example   Pig Cross Example   Pig Union Example   Pig Split Example   Pig Filter Example   Pig Distinct Example   Pig Foreach Example   Pig OrderBy Example   Limit Example   Pig Eval Functions Example   Pig BagToString Example   Pig Concat Example   Pig Tokenize Example   Pig UDF's Java Example