Lateral view is used in conjunction with user-defined "table generating functions"(UDTF) such as
explode(), parse_url_tuple
. A UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.
1. Syntax
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (
','
columnAlias)*
fromClause: FROM baseTable (lateralView)*
For example:
SELECT b.* FROM hw LATERAL VIEW parse_url_tuple("http://us-u.openx.net/w/1.0/sd?id=537072399", 'HOST', 'PATH', 'QUERY', 'QUERY:id') b as host, path, query, query_id
2. explode()
takes
in an array (or a map) as an input and outputs the elements of the
array (map) as separate rows.
UDTF's can be used in the SELECT
expression list and as a part of LATERAL VIEW.
pageid
|
adid_list
|
---|---|
contact_page | [3, 4, 5] |
front_page | [1, 2, 3] |
Let's say the user would like to count the total number of times an ad appears across all pages.
A lateral view with explode() can be used to convert adid_list
into separate rows using the query:
SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid; |
The resulting output will be
pageid
|
adid
|
---|---|
"front_page" | 1 |
"front_page" | 2 |
"front_page" | 3 |
"contact_page" | 3 |
"contact_page" | 4 |
"contact_page" | 5 |
Then in order to count the number of times a particular ad appears, count/group by can be used:
SELECT adid, count(1) FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid GROUP BY adid;
adid | count(1) |
1 | 1 |
2 | 1 |
3 | 2 |
4 | 1 |
5 | 1 |
Reference:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode
No comments:
Post a Comment