Apache-Pig and Apache-Hive for data processing

I'm new in Hadoop, and now i'm resarching Hive and Pig for data processing.

I found that Hive is good for data warehouse(data presentation) and pig is good for data factory(Data Preparation).

And I want to understand clearly use cases for both Hive and Pig.

Thank you!

-------------Problems Reply------------


Pig Latin is procedural, while SQL is declarative.Pig allows the developer to directly select specific operator implementations & it supports splits in the pipeline. If you are well verse with SQL , go ahead with Hive, but writing that part in pig , allows you to use your optimizers, better control over data. as a fundamental both hive & pig code translate in to Map Reduce. Hive is like SQL, so for any SQL developer the learning curve for Hive will almost be negligible.Hive has gained popularity as it is supported by Hue

Generally ETL part of a application is written in Pig, while adhoc queries goes well with hive. although you can use anything for your Data transformations. Pig Hadoop follows a multi query approach thus it cuts down on the number times the data is scanned. Pig has various user groups for instance 90% of Yahoo’s MapReduce is done by Pig, 80% of Twitter’s MapReduce is also done by Pig and various other companies such as Sales force, LinkedIn, AOL and Nokia also employ Pig.

you can see this link, a good blog : https://developer.yahoo.com/blogs/hadoop/comparing-pig-latin-sql-constructing-data-processing-pipelines-444.html

Category:hadoop Views:1 Time:2019-01-11

Related post

Copyright (C) dskims.com, All Rights Reserved.

processed in 0.145 (s). 11 q(s)