Spark Optimization with Scala
Go fast or go home. Learn the ins and outs of Spark and get the best out your code.
Why the $&*(# is my job running so slow?
You and I have had this. Let me describe it, then tell me if it sounds like you: you run a 4-line job on a gig of data, with two innocent joins, and it takes a bloody hour to run. Or another one: you have an hour long job which was progressing smoothly, until the task 1149/1150 where it hangs, and after two more hours you decide to kill it because you don't know if it's you, a bug in Spark, or some big data god that's angry at you right when you were praying for your job to finish.
Then you go like, "hm, maybe my Spark cluster is too small, let me bump some CPU and mem". Then... same thing. Amazon's probably laughing now. So this has to be the million dollar question.
You are looking at the only course on the web on Spark optimization. With the techniques you learn here you will save time, money, energy and massive headaches.
Let's fix it.
In this course, we cut the weeds at the root. We dive deep into Spark and understand why jobs are taking so long before we get to touch any code, or worse, waste compute money. And then we bring the guns. You will learn 20+ techniques and optimization strategies. Each of them individually can give at least a 2x perf boost for your jobs, and I show it on camera.
What's in for you:
- You'll understand Spark internals to explain if you're writing good code or not
- You'll be able to predict in advance if a job will take a long time
- You'll read query plans and DAGs while the jobs are running, to understand if you're doing anything wrong
- You'll optimize DataFrame transformations way beyond the standard Spark auto-optimizer
- You'll do fast custom data processing with efficient RDDs, in a way SQL is incapable of
- You'll diagnose hanging jobs, stages and tasks
- You'll spot and fix data skews
- Plus you'll fix a few memory crashes along the way
And some extra perks:
- You'll have access to the entire code I write on camera (2200+ LOC)
- You'll be invited to our private Slack room where I'll share latest updates, discounts, talks, conferences, and recruitment opportunities
- (soon) You'll have access to the takeaway slides
- (soon) You'll be able to download the videos for your offline view
Skills you'll get:
- Deep understanding of Spark internals so you can predict job performance
- stage & task decomposition
- reading query plans before jobs will run
- reading DAGs while jobs are running
- performance differences between the different Spark APIs
- packaging and deploying a Spark app
- configuring Spark in 3 different ways
- DataFrame and Spark SQL Optimizations
- understanding join mechanics and why they are expensive
- writing broadcast joins, or what to do when you join a large and a small DataFrame
- write pre-join optimizations: column pruning, pre-partitioning
- bucketing for fast access
- fixing data skews, "straggling" tasks and OOMs
- Optimizing RDDs
- using broadcast joins "manually"
- cogrouping RDDs in multi-way joins
- fixing data skews
- writing optimizations that Spark doesn't generate for us
- Optimizing key-value RDDs, as most useful transformations need them
- using the different _byKey methods intelligently
- reusing JVM objects for when performance is critical and even a few seconds count
- using the powerful iterator-to-iterator pattern for arbitrary efficient processing
This course is for Scala and Spark programmers who need to improve the run time of their jobs. If you've never done Scala or Spark, this course is not for you.
I'm a software engineer and the lead instructor for Rock the JVM. I started the Rock the JVM project out of love for Scala and the technologies it powers - they are all amazing tools and I want to share as much of my experience with them as I can.
For the last 7 years, I've taught a variety of Computer Science topics to 27000+ students at various levels and I've held live trainings for some of the best companies in the industry, including Adobe and Apple. I've also taught university students who now work at Google and Facebook (among others), I've held Hour of Code for 7-year-olds and I've taught 6000 kids to code.
I have a Master's Degree in Computer Science and I wrote my Bachelor and Master theses on Quantum Computation. Before starting to learn programming, I won medals at international Physics competitions.
Get started now!
Take the proven path.
As with the other Rock the JVM courses, Spark Optimization will take you through a battle-tested path to Spark proficiency as a data scientist and engineer.
As always, I've
- deconstructed the complexity of Spark in bite-sized chunks that you can practice in isolation
- selected the essential concepts and exercises with the appropriate complexity
- sequenced the topics in increasing order of difficulty so that they "click" along the way
- applied everything in live code
The value of this course is in showing you different techniques with their direct and immediate effect, so you can later apply them in your own projects.
Risk-free: 100% money back guarantee.
If you're not happy with this course, I want you to have your money back. If that happens, email me at [email protected] with a copy of your welcome email and I will refund you the course.
Less than 0.3% of students refunded a course on the entire site, and every payment was returned in less than 72 hours.