attack on the Internet. Here are field definition and example:
User ID TimeStamp Activity Count
123 9:45am 10
234 9:46am 12
234 9:50am 20
456 9:53am 100
123 9:55am 33
456 9:56am 312
123 10:03am 110
123 10:16am 312
234 10:20am 201
456 10:23am 180
123 10:25am 393
456 10:27am 312
Please design an algorithm to identify user IDs that have more than 500
activities within any given 10 minutes.
1. Hive Solution:
SELECT distinct a.id FROM (
SELECT a.id, a.time FROM table a JOIN table b ON a.id=b.id WHERE TIMEDIFF(a.time, b.time)<=10 AND TIMEDIFF(a.time, b.time)>=0 GROUP BY a.id, a.time HAVING SUM(b.count)>500
)
2. Another approach to select data is a time window(the last 30 days) is:
where to_date(timestamp) < from_unixtime(unix_timestamp(), 'yyyy-MM-dd') and to_date(timestamp) >= date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),30)Here, to_date converts string to yyyy-MM-dd format. unix_timestamp() gets the current sys timestamp in seconds.
No comments:
Post a Comment