by big-data-europe

[EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker conta...

501 Stars 282 Forks Last release: Not found 51 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Gitter chat

How to use HDFS/Spark Workbench

To start an HDFS/Spark Workbench:

    docker-compose up -d

docker-compose does not work to scale up spark-workers, for distributed setup see swarm folder

Starting workbench with Hive support

Before starting the next command, check that the previous service is running correctly (with docker logs servicename).

docker-compose -f docker-compose-hive.yml up -d namenode hive-metastore-postgresql
docker-compose -f docker-compose-hive.yml up -d datanode hive-metastore
docker-compose -f docker-compose-hive.yml up -d hive-server
docker-compose -f docker-compose-hive.yml up -d spark-master spark-worker spark-notebook hue


  • Namenode: http://localhost:50070
  • Datanode: http://localhost:50075
  • Spark-master: http://localhost:8080
  • Spark-notebook: http://localhost:9001
  • Hue (HDFS Filebrowser): http://localhost:8088/home


When opening Hue, you might encounter

NoReverseMatch: u'about' is not a registered namespace
error after login. I disabled 'about' page (which is default one), because it caused docker container to hang. To access Hue when you have such an error, you need to append /home to your URI:


Count Example for Spark Notebooks

val spark = SparkSession
  .appName("Simple Count Example")

val tf ="/data.csv") tf.count()


  • Ivan Ermilov @earthquakesan

Note: this repository was a part of BDE H2020 EU project and no longer actively maintained by the project participants.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.