Apache Spark Performance Tuning with Scala
Learn how to optimize Apache Spark with Scala for peak performance with our comprehensive course. Master Spark internals and configurations to enhance speed and memory efficiency for your cluster.
- Duration
- 8h of 4K content
- Lessons
- 24 lessons
By Daniel Ciocîrlan
Money-back guarantee · Unlimited access · Free updates
Course Roadmap
Skills You'll Learn
- Understand Spark internals and predict job performance
- Read query plans and DAGs to diagnose performance problems
- Diagnose hanging jobs, stages, and tasks
- Tune Spark executor memory zones
- Use caching for fast data reuse
- Make tradeoffs between speed, memory usage, and fault tolerance
- Use checkpoints when jobs fail or are expensive to recompute
- Leverage Catalyst and Tungsten for massive performance gains
- Use repartitions and coalesce effectively
- Pick the right partition count at shuffles
- Use custom partitioners for specialized jobs
- Allocate the right cluster resources for optimal throughput
- Fix data skews and straggling tasks with salting
- Fix serialization problems and pick the right serializers
Goal
They say Spark is fast. How do I make the best out of it?
I wrote a lot of Spark jobs over the past few years. Some of my old data pipelines are probably still running as you’re reading this. However, my journey with Spark had massive pain. You’ve probably seen this too.
- You run 3 big jobs with the same DataFrame, so you try to cache it - but then you look in the UI and it’s nowhere to be found.
- You’re finally given the cluster you’ve been asking for… and then you’re like “OK, now how many executors do I pick?”.
- You have a simple job with 1GB of data that takes 5 minutes for 1149 tasks… and 3 hours on the last task.
- You have a big dataset and you know you’re supposed to partition it right, but you can’t pick a number between 2 and 50000 because you can find good reasons for both!
- You search for “caching”, “serialization”, “partitioning”, “tuning” and you only find obscure blog posts and narrow StackOverflow questions.
Unless you have some massive experience or you’re a Spark committer, you’re probably using 10% of Spark capabilities.
In the Spark Optimization course you learned how to write performant code. It’s time to kick the high gear and tune Spark for the best it can be. You are looking at the only course on the web which leverages Spark features and capabilities for the best performance. With the techniques you learn here you will save time, money, energy and massive headaches.
Let’s rock.
In this course, we cut the weeds at the root. We dive deep into Spark and understand what tools you have at your disposal - and you might just be surprised at how much leverage you have. You will learn 20+ techniques for boosting Spark performance. Each of them individually can give at least a 2x perf boost for your jobs (some of them even 10x), and I show it on camera.
What Our Students Say
-
My team is expanding the use of Akka in our products so I needed a quick introduction on this topic. I have tried a couple of courses but the introduction to Akka was always too abrupt, too hard to comprehend. I blamed Akka for this as being too hard to explain. This was until I was exposed to the Rock The JVM courses which were an absolute delight when it comes to presenting such complex topics in such an easy to understand way. And Daniel has not stopped at Akka but has added to his portfolio amazing courses on Scala and Spark too. It seems like he is quite enjoying taking such challenges like complex technologies and making them so simple for everyone. I have instantly recommended Daniel’s work to my team, which helped them immensely with taking their skills to a new level, and I do recommend these courses to anyone who wants to have the fastest ramp-up in these tough but popular technologies.
Mihai FecioruAdobe · California
-
From Scala, to Akka, to Spark, Daniel delivers exceptional material in each and every one of these technologies. I’ve been using them for a long time and there is always something new I will discover from him. The level of detail he gets into as well as the way he delivers material is mindblowing. I personally find his latest course Spark Optimization pure gold and one of a kind. I’ve been using Spark for a year now and I haven’t even thought how much you can leverage query plans to make such optimizations. I can’t stop thinking every time, how he manages to go so deep - because using a technology is one thing, but knowing its internals so well and how everything works behind the scenes is another story when it comes to distributed systems. Long story short Daniel is definitely the best instructor I’ve come across and each one of his courses is the best resource you can find online. Kudos for all your work and knowledge sharing.
Giannis PolyzosVerverica · Greece
-
Daniel’s courses on Scala and Big Data are the best in class. I’ve been in touch with Daniel’s teaching and courses since early 2018. The first course that I took from him was Scala & Functional Programming; I was skeptical about it because over the internet there are many courses you can find, but few really worthy. I remember the very first day when Daniel started to speak and shared his examples - I started to love Scala, and then more as we went on. I am with Scala for the last 5 years now, but never ever has anyone explained to me or gave me comparable resources to Rock the JVM. Daniel gave me a shift in life and helped me crack top tech company interviews. His courses on big data are a must for any aspiring big data developer or data enthusiast. I highly recommend Daniel as an educator both online and on campus.
Anirban GoswamiApple · California
What's Included
Meet Rock the JVM
Daniel Ciocîrlan
Founder, Rock the JVM
I'm a software engineer and the founder of Rock the JVM.
I started Rock the JVM out of love for Scala and the technologies it powers. They are amazing tools, and I want to share as much of my experience with them as I can.
I've taught Java, Scala, Kotlin and related technologies such as Cats, ZIO and Spark to 100,000+ students at various levels. I've held live training sessions for companies including Adobe and Apple, taught university students who now work at Google and Facebook, run Hour of Code for 7-year-olds, and taught more than 50,000+ kids to code.
I have a Master's Degree in Computer Science and I wrote my Bachelor and Master thesis on Quantum Computation. Before learning programming, I won medals at international Physics competitions.
Enroll now!
All-Access Membership
Full (and growing) catalog
$195 billed yearly —Save 54%
Unlimited access to every Rock the JVM course
- 348 hours of 4K content
- All Scala courses
- All Kotlin courses
- All Typelevel courses
- All ZIO courses
- All Apache Spark courses
- All Apache Flink courses
- All Akka/Pekko courses
- Access to the private Rock the JVM community
- New courses included automatically
The Apache Spark Bundle with Scala
4 courses, one price
$180All courses in this bundle with a one-time payment
- 4 courses included
- 38 hours of 4K content
- All PDF slides
- Free updates
- Lifetime access
- Access to the private Rock the JVM community
Apache Spark Performance Tuning with Scala
Lifetime license
$75Just this course with a one-time payment
- 8 hours of 4K content
- All PDF slides
- Free updates
- Lifetime access
- Access to the private Rock the JVM community
100% Money Back Guarantee
If you're not happy with this course, I want you to have your money back. Contact me with a copy of your welcome email and I will refund you.
Less than 0.05% of students have ever asked for a refund — and every payment was returned in under 72 hours.