O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
-
Updated
Jun 26, 2023 - Python
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Sentiment Analysis and Data Visualization
Ophelian On Mars! More than a simple framework.
Pyspark RDD, DataFrame and Dataset Examples in Python language
All in one
PySpark RDD and DataFrame Examples
Evaluates the execution time differences between RDD (Resilient Distributed Datasets) and DataFrame data structures in Apache Spark. Also takes into account the file format being used, such as CSV or Parquet.
PageRank - Pig vs PySpark comparison https://madoc.univ-nantes.fr/mod/assign/view.php?id=1511791
This project aims to more closely represent what happens in the brain by simulating a spiking neural net. It uses RDD to try and learn from the edge cases too!
ECE NTUA Assignment
This repository contains projects and exercises I completed during my "Big Data Architecture" course. It reflects the concepts I’ve learned about data processing using Apache Spark and PySpark.
Project: Spark SQL & DataFrames - Course: Advanced Topics in Databases (9th semester) NTUA
Repo to contain the assignments for DSCI 553: Foundations and Applications of Data Mining course at USC
[ECE NTUA] Advanced Topics in Databases - Course project (2022-2023)
Add a description, image, and links to the rdd topic page so that developers can more easily learn about it.
To associate your repository with the rdd topic, visit your repo's landing page and select "manage topics."