Platform for starters in the geo sector

200 courses, 20 online supports, 60 moocs,

10 work to work trajectories,

30 trainees

Pyspark with apache

  • Description
  • Planning and Registration

This two-day course pyspark with Apache costs € 995.

Group discount: If you register several students for this course, a discount of 25% on the 2nd student, 50% on the 3rd student and 75% on the 4th, 5th, 6th, 7th and 8th student.

Apache Spark is a powerful, open-source processing engine for Big Data in the Hadoop cluster. With Spark it is possible to process datasets that differ in nature and source. The biggest advantages of Apache Spark are speed, ease of use, combining SQL, streaming and complex analyzes and the fact that data Spark can run anywhere. With the Python API, all kinds of actions can easily take place in Apache. Some basic knowledge of python is desirable but not required.


First of all, we will go into installing. Then an introduction to the framework is given. Then you will learn how to work with RDDs and HDFS. We also discuss parallel processing and building Spark applications. Finally, you will learn more about Spark streaming, Spark algorithms and improving the performance of the framework. In this Spark training / course you will learn how to develop Spark applications using Python. For example, you will learn how to test and deploy Spark applications to a cluster and how to subsequently monitor these clusters.

What you will learn this course:

  • After following this training you are familiar with the basics of Apache Spark.
  • RDDs, HDFS and Spark algorithms.
  • You understand how parallel processing works and how you build applications.
  • How to improve Apache Spark performance and detect issues.