Spark notes
From Simson Garfinkel
Jump to navigationJump to search
Spark on MacOS
1. Install anaconda. 2. pip install pyspark
Adding files to the nodes:
sc.addFile(filename)
Tutorials
Tuning
- https://stackoverflow.com/questions/37871194/how-to-tune-spark-executor-number-cores-and-executor-memory
- https://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
- http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/
Good Blog Entries
- https://developerzen.com/best-practices-writing-production-grade-pyspark-jobs-cb688ac4d20f (most about packaging and a shared context)