Apache Pig diagnostic operators

posted on Nov 20th, 2016

Apache Pig

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMSs. Pig Latin can be extended using User Defined Functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system

2) Apache Hadoop 2.6.4 pre installed (How to install Hadoop on Ubuntu 14.04)

3) Apache Pig pre installed (How to install Pig on Ubuntu 14.04)

Pig Diagnostic Operators

The load statement will simply load the data into the specified relation in Apache Pig. To verify the execution of the Load statement, you have to use the Diagnostic Operators. Pig Latin provides four different types of diagnostic operators:

1) Dump operator

2) Describe operator

3) Explain operator

4) Illustrate operator

Step 1 - Change the directory to /usr/local/pig/bin

$ cd /usr/local/pig/bin

Step 2 - Enter into grunt shell in MapReduce mode.

$ ./pig -x mapreduce

Step 3 - Make a pig directory in HDFS. Make sure hadoop daemons are running.

$ hdfs dfs -mkdir /user/hduser/pig

Step 4 - Create a student_data.txt file.

student_data.txt

Step 5 - Add the following lines to student_data.txt file. Save and close.

001,Rajiv,Reddy,9848022337,Hyderabad
002,siddarth,Battacharya,9848022338,Kolkata
003,Rajesh,Khanna,9848022339,Delhi
004,Preethi,Agarwal,9848022330,Pune
005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar
006,Archana,Mishra,9848022335,Chennai

Step 6 - Copy student_data.txt file from local file system to HDFS. In my case, the student_data.txt file is stored in /home/hduser/Desktop/PIG/ directory.

$ hdfs dfs -copyFromLocal /home/hduser/Desktop/PIG/student_data.txt /user/hduser/pig/

Step 7 - Verify the copy by using cat command.

$ hdfs dfs -cat hdfs://localhost:9000/user/hduser/pig/student_data.txt

Step 8 - Load data.

student = LOAD 'hdfs://localhost:9000/user/hduser/pig/student_data.txt' USING PigStorage(',')as ( id:int,firstname:chararray,lastname:chararray,phone:chararray, city:chararray );

Dump Operator - The Dump operator is used to run the Pig Latin statements and display the results on the screen. It is generally used for debugging Purpose.

dump student

Describe Operator - The describe operator is used to view the schema of a relation.

describe student

Explain Operator - The explain operator is used to display the logical, physical, and MapReduce execution plans of a relation.

explain student

Illustrator Operator - The illustrate operator gives you the step-by-step execution of a sequence of statements.

illustrate student

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Pig Installation   Pig Execution Mechanism   Pig GRUNT Shell Usage   Pig Load and Store Operations   Pig Group Example   Pig Join Example   Pig Cross Example   Pig Union Example   Pig Split Example   Pig Filter Example   Pig Distinct Example   Pig Foreach Example   Pig OrderBy Example   Limit Example   Pig Eval Functions Example   Pig BagToString Example   Pig Concat Example   Pig Tokenize Example   Pig UDF's Java Example   Pig SCRIPT