Sunday, 4 January 2015

Class Constructor in Scala

1. visibility of constructor fields
  • If a field is declared as a var, Scala generates both getter and setter methods for that field.
  • If the field is a val, Scala generates only a getter method for it.
  • If a field doesn’t have a var or val modifier, Scala gets conservative, and doesn’t generate a getter or setter method for the field.
  • Additionally, var and val fields can be modified with the private keyword, which prevents getters and setters from being generated.
  • Case class constructor parameters are val by default. 
2. The primary constructor of a Scala class is a combination of:
  • The constructor parameters
  • Methods that are called in the body of the class
  • Statements and expressions that are executed in the body of the class
Anything defined within the body of the class other than method declarations is a part of the primary class constructor.

3. Define one or more auxiliary constructors for a class to give consumers of the class different ways to create object instances.

  • Auxiliary constructors are defined by creating methods named this.
  • Each auxiliary constructor must begin with a call to a previously defined constructor.
  • Each constructor must have a different signature.
  • One constructor calls another constructor with the name this.

Saturday, 3 January 2015

Read a file then Write another file in Scala

import java.io._

object CopyBytes extends App {

  var in = None: Option[FileInputStream]
  var out = None: Option[FileOutputStream]

  try {
    in = Some(new FileInputStream("/tmp/Test.class"))
    out = Some(new FileOutputStream("/tmp/Test.class.copy"))
    var c = 0
    while ({c = in.get.read; c != −1}) {
      out.get.write(c)
    }
  } catch {
    case e: IOException => e.printStackTrace
  } finally {
    println("entered finally ...")
    if (in.isDefined) in.get.close
    if (out.isDefined) out.get.close
  }

}

Pattern Matching in Match Expressions

Different Patterns:

1. Constant patterns

A constant pattern can only match itself.
case 0 => "zero"

2. Variable patterns

a variable pattern matches any object just like the _ wildcard character.
case foo => s"Hmm, you gave me a $foo"

3.  Constructor patterns
The constructor pattern lets you match a constructor in a case statement.
case Person(first, "Alexander") => s"An Alexander, first name = $first"

4. Sequence patterns
Use the _ character to stand for one element in the sequence, and use _* to stand for “zero or more elements”
case List(1, _*) => "a list beginning with 1, having any number of elements"

5. Tuple patterns
match tuple patterns and access the value of each element in the tuple. Use the _ wildcard if you’re not interested in the value of an element:
case (a, b, c, _) => s"4-elem tuple: got $a, $b, and $c"

6. Type patterns
list is the pattern variable, which can be accessed in the expression.
case list: List[_] => s"thanks for the List: $list"

7. Class patterns



trait Animal
case class Dog(name: String) extends Animal
case class Cat(name: String) extends Animal

  def determineType(x: Animal): String = x match {
    case Dog(moniker) => "Got a Dog, name = " + moniker
    case _:Cat => "Got a Cat (ignoring the name)"
    case _ => "That was something else"
  }


How a for loop is translated



1. How a for loop is translated under various conditions.

  1. A simple for loop that iterates over a collection is translated to a foreach method call on the collection.
  2. A for loop with a guard (see Recipe 3.3) is translated to a sequence of a withFilter method call on the collection followed by a foreach call.
  3. A for loop with a yield expression is translated to a map method call on the collection.
  4. A for loop with a yield expression and a guard is translated to a withFilter method call on the collection, followed by a map method call.

For example, the below two statements are equal.
scala> val out = for (e <- fruits) yield e.toUpperCase
scala> val out = fruits.map(_.toUpperCase)

The way to check these:

class Main {
  for (i <- 1 to 10) println(i)
}

$ scalac -Xprint:parse Main.scala



2. A 
yield statement with a for loop and your algorithm to create a new collection from an existing collection. 


  • When it begins running, the for/yield loop immediately creates a new, empty collection that is of the same type as the input collection. For example, if the input type is a Vector, the output type will also be a Vector. You can think of this new collection as being like a bucket.
  • On each iteration of the for loop, a new output element is created from the current element of the input collection. When the output element is created, it’s placed in the bucket.
  • When the loop finishes running, the entire contents of the bucket are returned.

Friday, 2 January 2015

Find User Activities in a Time Window in Hive Table

Suppose there is a website tracking user activities to prevent robotic 
attack on the Internet. Here are field definition and example:
User ID    TimeStamp    Activity Count
123    9:45am    10
234    9:46am    12
234    9:50am    20
456    9:53am    100
123    9:55am    33
456    9:56am    312
123    10:03am    110
123    10:16am    312
234    10:20am    201
456    10:23am    180
123    10:25am    393
456    10:27am    312

Please design an algorithm to identify user IDs that have more than 500 
activities within any given 10 minutes.

1. Hive Solution:


SELECT distinct a.id FROM 
( 
 SELECT a.id, a.time FROM table a 
 JOIN table b ON a.id=b.id 
 WHERE TIMEDIFF(a.time, b.time)<=10 AND TIMEDIFF(a.time, b.time)>=0 
 GROUP BY a.id, a.time 
 HAVING SUM(b.count)>500
)


2. Another approach to select data is a time window(the last 30 days) is:

where to_date(timestamp) < from_unixtime(unix_timestamp(), 'yyyy-MM-dd')
and to_date(timestamp) >= date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),30)

Here, to_date converts string to yyyy-MM-dd format. unix_timestamp() gets the current sys timestamp in seconds.


Thursday, 1 January 2015

Regular Expression in Scala


Create a Regex object by invoking the .r method on a String .e.g
A sequence of one or more numeric characters


scala> val numPattern = "[0-9]+".r
scala> val matches = numPattern.findAllIn(address).toArray
match1: Option[String] = Some(123)

A method defined to return an Option[String] will either return a Some(String), or a None.
The normal way to work with an Option is to use one of these approaches:
  • Call getOrElse on the value.
  • Use the Option in a match expression.
  • Use the Option in a foreach loop.

scala> val result = numPattern.findFirstIn(address).getOrElse("no match")

With the getOrElse approach, you attempt to “get” the result, while also specifying a default value that should be used if the method failed.


String Operations in Scala


1. Scala treats a string as a sequence of characters. A Scala String is a Java String, so you can use all the normal Java string methods.

2. But because Scala offers the magic of implicit conversions, String instances also have access to all the methods of the StringOps class, so you can do many other things with them, such as treating a String instance as a sequence of characters.

scala> "hello".foreach(println)

3.
In Scala, you test object(Value) equality with the == method. This is different than Java, where you use the equals method to compare two objects.


In Scala, the == method defined in the AnyRef class first checks for null values, and then calls the equals method on the first object (i.e., this) to see if the two objects are equal. As a result, you don’t have to check for null values when comparing strings.

scala> val s1: String = null
scala> val s2 = "Hello"
scala> s1 == s2
res3: Boolean = false

4. Imagine that Scala doesn’t even have a null keyword. Any time you feel like using a null, use an Option instead. 

5. Sometimes, the string is too long to write in one line, below example can write a string in multiple lines, but display in one line as result.


val speech = """Four score and
               |seven years ago

               |our fathers""".stripMargin.replaceAll("\n", " ")

6. Adding yield to a for loop essentially places the result from each loop iteration into a temporary holding area. When the loop completes, all of the elements in the holding area are returned as a single collection.


scala> val upper = for (c <- "hello, world") yield c.toUpper

7. Map approach is used to transform one collection into another. The map method treats a String as a sequential collection of Char elements. The map method has an implicit loop, and in that loop, it passes one Char at a time to the algorithm it’s given.


val toLower = (c: Char) => (c.toByte+32).toChar
scala> "HELLO".map(toLower)