Lateral view is used in conjunction with user-defined "table generating functions"(UDTF) such as
explode(), parse_url_tuple. A UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.
1. SyntaxlateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*fromClause: FROM baseTable (lateralView)*
For example:SELECT b.*
FROM hw LATERAL VIEW parse_url_tuple("http://us-u.openx.net/w/1.0/sd?id=537072399",
'HOST', 'PATH', 'QUERY', 'QUERY:id') b as host, path, query, query_id
2. explode() takes
in an array (or a map) as an input and outputs the elements of the
array (map) as separate rows. UDTF's can be used in the SELECT
expression list and as a part of LATERAL VIEW.
pageid
|
adid_list
|
|---|---|
| contact_page | [3, 4, 5] |
| front_page | [1, 2, 3] |
Let's say the user would like to count the total number of times an ad appears across all pages.
A lateral view with explode() can be used to convert adid_list into separate rows using the query:
SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid; |
The resulting output will be
pageid
|
adid
|
|---|---|
| "front_page" | 1 |
| "front_page" | 2 |
| "front_page" | 3 |
| "contact_page" | 3 |
| "contact_page" | 4 |
| "contact_page" | 5 |
Then in order to count the number of times a particular ad appears, count/group by can be used:
SELECT adid, count(1) FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid GROUP BY adid;
| adid | count(1) |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
| 5 | 1 |
Reference:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode
No comments:
Post a Comment