Running your first Apache Spark app

The Spark Getting Started guide is pretty good, but it’s not immediately obvious that your don’t run your app using Spark API as a standalone executable app. If you try, you’ll get an error like this:

17/11/07 19:15:20 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:376)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at kh.textanalysis.spark.SparkWordCount.workCount(SparkWordCount.java:16)
at kh.textanalysis.spark.SparkWordCount.main(SparkWordCount.java:10)

Instead, if using Maven, package the app with ‘mvn package’, start a local master node:

./sbin/start-master.sh

and then you submit it to your Spark node for processing:

./sbin/spark-submit \
  --class "MyApp" \
  --master local[1] \
  target/MyApp-1.0.jar

 

2 Replies to “Running your first Apache Spark app”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.