Sunday 5 June 2016

Dataframe Column contains Array

How to check whether a value exists in a Dataframe column, which is a List type?

Column array_contains(Column column,java.lang.Object value)

val report = df.select ("*").where (array_contains (df("tags"), "storm"))

To fetch this list column,

report.map{row => row.getAs[Seq[String]]("tags") }

Here, we use Seq[String], instead of Array or List.
Otherwise, you will see
java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Ljava.lang.String;


Reference:
https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#array_contains(org.apache.spark.sql.Column,%20java.lang.Object)

No comments:

Post a Comment