Why the $&*(# is my Spark job running so slow?
I’ve had my fair share of pain with Spark, and if you’re reading this, you’ve probably seen this too: you run a 4-line job on a gig of data, with two innocent joins, and it takes a bloody hour to run. Or another one: you have an hour long job which was progressing smoothly, until the task 1149/1150 where it hangs, and after two more hours you decide to kill it because you don’t know if it’s you or a bug in Spark. Usually, PIBKAC - problem is between keyboard and chair - but in desperation, the only idea you have is turn it off and on again.
Then you go like, “hm, maybe my Spark cluster is too small, let me bump some CPU and mem”. Then… same thing. Amazon’s probably laughing now and you’re paying for it. So this has to be the million dollar question.
This is the only course on the web where you can learn how to optimize Spark jobs and master Spark optimization techniques. With the strategies you learn in this Spark optimization course you will save yourself time, headaches and money.
In this Spark optimization course, we cut the weeds at the root. We dive deep into Spark performance optimization and you will learn how it works under the hood. We’ll see that we have incredible leverage, IF we write intelligent code, and you will do exactly that. You will learn 20+ Spark optimization techniques and strategies. Each of them individually can give at least a 2x perf boost for your jobs (some of them even 10x), and I show it on camera.