Welcome to

pySparkReference.com

Your technical destination for pySpark

Tools

Spark Memory Calculator - WORK IN PROGRESS - Get the max partition size your executors can handle

Why this website?

I'm slowly building up this website to make the experience of getting up to speed with pyspark less painful.

Right now, if you want to find anything for pySpark besides the documentation, the experience is very painful and time consuming -

  • Articles are scattered throughout personal blogs, medium etc. There is no authorative place to refer to. Searching out good articles is a pain.
  • The websites lack structure. Most articles are in blogs, which are terrible for finding related things.
  • In pySpark, there are too many ways to do the same thing. Articles rarely give you a map of the ways, and tell which way you should actually prefer.
  • You can find a lot of courses for learning. But suppose you're working and need a website to quickly confirm something - you aren't going to open the course, find the lecture and rewind to the point. You actually need a written article to quickly refer to.

This is why I'm working on this website. I just want one authorative place. This started with me sending my friends copies of my personal notes, and has now matured into a dedicated website.

I'm not a great technical writer or anything. I'm learning as time passes. So, this website will improve as time passes.

And obviously, everything here is free. Just become a good engineer, and build better solutions into the world.
Enjoy.