Big Data is usually used when the size of the datasets used is so large that conventional systems and techniques for management of this data fails. Performing data analysis cannot be done the same way we do data analysis for small datasets. With the rise of AI and Deep Learning and data being produced faster than ever the datasets have become so large that traditional ways of handling data would just be a big mess.
In this project I have done Data Analysis and found some inferences on the datasets and this is done considering the datasets are very large.
Pyspark has been used as the tool for management and analysis of the datasets and MongoDB is used as the database.
All the steps have been explained elaborately in the python file.