Cassandra 2.2 and later allows users to define aggregate functions that can be applied to data stored in a table as part of a query result.
- The function must be created prior to its use in a SELECT statement and the query must only include the aggregate function itself, but no columns.
- The state function is called once for each row, and the value returned by the state function becomes the new state.
- After all rows are processed, the optional final function is executed with the last state value as its argument.
Aggregation is performed by the coordinator. So if you don't include a partition key in your query all the results are brought back to the coordinator for your function to be executed, if you do a full table scan for your UDF/A don't expect it to be fast if your table is huge.
User defined aggregates work by calling your user defined function on every row returned from your query, they differ from a function because the first value to the function is state that is passed between rows, much like a fold.
Creating an aggregate is a two or three step process:
- Create a function that takes in state as the first parameter and any number of additional parameters
- (Optionally) Create a final function that is called after the state function has been called on every row
- Refer to these in an aggregate, which starts with (INITCOND) null (so it will return null for an empty table)
Some examples:
CREATE FUNCTION state_group_and_total( state map, type text, amount int ) CALLED ON NULL INPUT RETURNS map LANGUAGE java AS ' Integer count = (Integer) state.get(type); if (count == null) count = amount; else count = count + amount; state.put(type, count); return state; ' ; CREATE OR REPLACE AGGREGATE group_and_total(text, int) SFUNC state_group_and_total STYPE map INITCOND {}; SELECT GROUP_AND_TOTAL(customer_id, amount) FROM CUSTOMER_PURCHASES;
Reference:
http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDA.html
http://christopher-batey.blogspot.ca/2015/05/cassandra-aggregates-min-max-avg-group.html
No comments:
Post a Comment