Some models support save() and load() methods.
model.save(sc, "myModelPath")
val sameModel = RandomForestModel.load(sc, "myModelPath")
Below models have PMML export support
https://spark.apache.org/docs/latest/mllib-pmml-model-export.html
For example,
// Export the model to a String in PMML format
clusters.toPMML
// Export the model to a local file in PMML format
clusters.toPMML("/tmp/kmeans.xml")
// Export the model to a directory on a distributed file system in PMML format
clusters.toPMML(sc,"/tmp/kmeans")
2. Save ML Model
Currently, Model export/import for ML Pipeline is not supported yet.
There is a Jira ticket for it. https://issues.apache.org/jira/browse/SPARK-6725
We can use a general approach to save the model as a java object by using RDD.saveAsObjectFile, then load it by SparkContext.objectFile
val linRegModel = sc.objectFile[LinearRegressionModel]("linReg.model").first()
sc.parallelize(Seq(model), 1).saveAsObjectFile("hdfs:///user/root/linReg.model")
RDD.saveAsObjectFile
and SparkContext.objectFile
support saving an RDD in a simple format consisting of serialized Java objects. While this is not as efficient as specialized formats like Avro, it offers an easy way to save any RDD
Reference:
https://phdata.io/exploring-spark-mllib-part-4-exporting-the-model-for-use-outside-of-spark/
No comments:
Post a Comment