Thursday, 4 September 2014

Lateral View in Hive


Lateral view is used in conjunction with user-defined "table generating functions"(UDTF) such as explode(), parse_url_tuple. A UDTF generates zero or more output rows for each input row. 
A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

1. Syntax

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
fromClause: FROM baseTable (lateralView)*

For example:
SELECT b.*
FROM hw LATERAL VIEW parse_url_tuple("http://us-u.openx.net/w/1.0/sd?id=537072399", 
'HOST', 'PATH', 'QUERY', 'QUERY:id') b as host, path, query, query_id

2. explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. 
UDTF's can be used in the SELECT expression list and as a part of LATERAL VIEW.



pageid
adid_list
contact_page[3, 4, 5]
front_page[1, 2, 3]

Let's say the user would like to count the total number of times an ad appears across all pages.

A lateral view with explode() can be used to convert adid_list into separate rows using the query:


SELECT pageid, adid
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

The resulting output will be
pageid
adid
"front_page"1
"front_page"2
"front_page"3
"contact_page"3
"contact_page"4
"contact_page"5

Then in order to count the number of times a particular ad appears, count/group by can be used:


SELECT adid, count(1)
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
GROUP BY adid;


adidcount(1)
11
21
32
41
51


Reference:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode

No comments:

Post a Comment