To achieve this, we need to map 2 columns in one map entry, pivot it,
then lookup from the map.
Brickhouse provides a collect UDF.
Example: Before,
| ID | Category | Metric |
| 1 | A | 3 |
| 1 | C | 5 |
| 2 | B | 7 |
| 3 | A | 2 |
| 3 | B | 1 |
| 4 | D | 4 |
After,
| ID | A | B | C | D |
| 1 | 3 | 0 | 5 | 3 |
| 2 | 0 | 7 | 0 | 0 |
| 3 | 2 | 1 | 0 | 0 |
| 4 | 0 | 0 | 0 | 4 |
add jar brickhouse-0.7.0-SNAPSHOT.jar; CREATE TEMPORARY FUNCTION collect AS 'brickhouse.udf.collect.CollectUDAF';
create table catergory_members_vector row format delimited fields terminated by ',' stored as textfile
select Id, COALESCE(group_map['A'],0.0),
COALESCE(group_map['B'],0.0) from ( select Id, collect(catergory, Metric/10.0) as group_map from catergory_members group by Id ) gm
For the create table as clause
- Target cannot be partitioned table.
- Target cannot be external table.
- It copies the structure as well as the data
Reference:
https://github.com/klout/brickhouse/wiki/Collect-UDFs
http://brickhouseconfessions.wordpress.com/2013/03/05/use-collect-to-avoid-the-self-join/
http://stackoverflow.com/questions/23025380/how-to-transpose-pivot-data-in-hive
http://www.qubole.com/hive-function-cheat-sheet/
No comments:
Post a Comment