Spark Streaming with Scala

In this Spark Streaming course you will learn how to Stream big data in real time with Spark and integrate any data source, from Kafka to Twitter.

 

Nothing static, all in motion.

You probably know by now: Spark is the most popular computing engine for big data, the most maintained, and with a proven track record of performance. It's 100 times faster than the old MapReduce paradigm, and can easily be extended with machine learning and streaming capabilities, and much more.

In this Spark Streaming course, we'll take the natural step forward: process big data as it arrives.

What's in for you:

  • You'll learn how Spark Structured Streaming and "normal" Spark batch operations are similar and different
  • You'll work with new streaming abstractions (DStreams) for low-level, high-control processing
  • You'll integrate Kafka, JDBC, Cassandra and Akka Streams (!) so that you can later integrate anything you like
  • You'll work with powerful stateful APIs that only a few know how to properly use

And some extra perks:

  • You'll have access to the entire code I write on camera (2200+ LOC)
  • You'll be invited to our private Slack room where I'll share latest updates, discounts, talks, conferences, and recruitment opportunities
  • (soon) You'll have access to the takeaway slides
  • (soon) You'll be able to download the videos for your offline view
 

Skills you'll get:

  • Same comfort with Spark Structured Streaming APIs as with "normal" Spark batch:
    • projections
    • joins
    • aggregations
    • sums
    • groups
  • High control over how data is processed with DStreams:
    • map, flatMap, filter
    • transform
    • by-key operations
    • process each RDD individually
  • Ability to work with time columns and window functions, both on structured and low-level streams
    • sliding windows
    • tumbling windows
    • reduce by window
    • reduce by window and key
  • Integration between Spark and other data sources, including
    • Kafka (structured and low-level)
    • JDBC
    • NoSQL
    • and something that's not "natural" to Spark, like Akka
  • Ability to manually manage stateful data processing in ways SQL is incapable of
    • mapGroupsWithState
    • flatMapGroupsWithState

This course is for Scala and Spark programmers who need to process streaming data rather than one-time or batch. If you've never done Scala or Spark, this course is not for you.

 

Project 1: Twitter

In this project we will integrate live data from Twitter. We will create a custom data source that we use with Spark, and we will do various analyses: tweet lengths, most used hashtags in real time. You will be able to use this project as a blueprint for any data source that you might want to integrate. At the very end, we will use an NLP library from Stanford to do sentiment analysis on tweets and find the general state of social media.

You will learn:

  • how to set up your own data receiver, that you can manage yourself and "pull" new data
  • how to create a DStream from your custom code
  • how to pull data from Twitter
  • how to aggregate tweets
  • how to use Stanford's coreNLP library for sentiment analysis
  • how to apply sentiment analysis on tweets in real time

Project 2: A Science Project

In this project we will write a full-stack web application which will support multiple users that are test subjects of a scientific test. We will investigate the effects of alcohol/substances/insert_your_addictive_drug_like_Scala on reflexes and response times. We will send the data through a web UI connected to a REST endpoint, then the data will flow through a Kafka broker and finally to a Spark Streaming backend which will do the data crunching. You can use this application as a blueprint for any full-stack application that aggregates and processes data with Spark Streaming in real time, from any number of concurrent users.

You will learn:

  • how to set up an HTTP server in minutes with Akka HTTP
  • how to manually send data through Kafka
  • how to aggregate data in a way that's almost impossible in SQL
  • how to write a full-stack application with a web UI, Akka HTTP, Kafka and Spark Streaming

Course Overview


  Prologue
Available in days
days after you enroll

 

Get started now!



 

Take the proven path.

As with the other Rock the JVM courses, Spark Streaming will take you through a battle-tested path to Spark proficiency as a data scientist and engineer.

As always, I've

  • deconstructed the complexity of Spark in bite-sized chunks that you can practice in isolation
  • selected the essential concepts and exercises with the appropriate complexity
  • sequenced the topics in increasing order of difficulty so that they "click" along the way
  • applied everything in live code

Risk-free: 100% money back guarantee.


If you're not happy with this course, I want you to have your money back. If that happens, email me at [email protected] with a copy of your welcome email and I will refund you the course.

Less than 1.4 percent of students refunded the course, and every payment was returned in less than 72 hours.

Your Instructor


Daniel Ciocîrlan
Daniel Ciocîrlan

I'm a software engineer and the founder of Rock the JVM. I started the Rock the JVM project out of love for Scala and the technologies it powers - they are all amazing tools and I want to share as much of my experience with them as I can.

As of February 2024, I've taught Java, Scala, Kotlin and related tech (e.g. Cats, ZIO, Spark) to 100000+ students at various levels and I've held live training sessions for some of the best companies in the industry, including Adobe and Apple. I've also taught university students who now work at Google and Facebook (among others), I've held Hour of Code for 7-year-olds and I've taught more than 35000 kids to code.

I have a Master's Degree in Computer Science and I wrote my Bachelor and Master theses on Quantum Computation. Before starting to learn programming, I won medals at international Physics competitions.


Frequently Asked Questions


How long is the course? Will I have time for it?
The course is a full 11 hours in length, with lessons 20-30 minutes each, and we write 2200 lines of code. For a complex topic like Spark Streaming, I don't believe in 5-minute lectures or in fill-in-the-blanks quizzes. To learn Spark Streaming in the most effective way, I recommend chunks of 1 hour of learning at a time.
How will I learn Spark Streaming in this course?
Code is king, and we write from scratch. In a typical lesson I'll explain some concepts in short, then I'll dive right into the code. We'll write it together, and at every topic I will give you exercises. You'll usually pause the video to try them yourself, after which I will also solve them on camera.
Can I expense this at my company?
Of course! Most (wise) companies will reimburse employees taking courses like this, and it's a really cheap training for them.
Is Spark Streaming difficult to learn?
It could be, if you're learning on your own, but I've designed this course with a clear learning path that you can follow step by step. The course was designed to give you a challenge so you're not bored, but not so much that you flip the table in anger. In case you struggle, we have a community willing to help, and I'm responsive for questions!
What if I'm not happy with the course?
If you're not 100% happy with the course, I want you to have your money back. It's a risk-free investment.
Daniel, I can't afford the course. What do I do?
For a while, I told everyone who could not afford the course to email me and I gave them discounts. But then I looked at the stats. Almost all the people who actually took the time and completed the course had paid for it in full. So I'm not offering discounts anymore. This is an investment in yourself, which will pay off 100x if you commit.
I have very little Scala or Spark experience. Can I learn Spark Streaming?
I don't recommend diving into Spark Streaming without any previous Spark experience. We have two recap lessons at the beginning, but they're not a crash course into Scala or Spark. You should take the Scala beginners course and the Spark Essentials course at least.
What is Spark Streaming anyway?
Spark Streaming is an extension to the the popular big data computing engine Apache Spark. While the "classical" Spark engine allows you to process massive data of any scale in a "static" way (data at rest), Spark Streaming gives you the opportunity to process data at scale immediately as it arrives, hence allowing you to process potentially infinite amounts of data.