[EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker containers. Also includes spark-notebook and HDFS FileBrowser.
To start an HDFS/Spark Workbench:
docker-compose up -d
docker-compose does not work to scale up spark-workers, for distributed setup see swarm folder
Before starting the next command, check that the previous service is running correctly (with docker logs servicename).
docker-compose -f docker-compose-hive.yml up -d namenode hive-metastore-postgresql docker-compose -f docker-compose-hive.yml up -d datanode hive-metastore docker-compose -f docker-compose-hive.yml up -d hive-server docker-compose -f docker-compose-hive.yml up -d spark-master spark-worker spark-notebook hue
When opening Hue, you might encounter
NoReverseMatch: u'about' is not a registered namespaceerror after login. I disabled 'about' page (which is default one), because it caused docker container to hang. To access Hue when you have such an error, you need to append /home to your URI:
http://docker-host-ip:8088/home
val spark = SparkSession .builder() .appName("Simple Count Example") .getOrCreate()val tf = spark.read.textFile("/data.csv") tf.count()
Note: this repository was a part of BDE H2020 EU project and no longer actively maintained by the project participants.