Skip to content

Performing Data Analysis of Text Data and CSV data considering the data is very large. PySpark has been used as the tool for analysis and MongoDB is used as Database for storage.

Notifications You must be signed in to change notification settings

sohail-sankanur/Big-Data-Analytics

Repository files navigation

Big-Data-Analytics

Big Data is usually used when the size of the datasets used is so large that conventional systems and techniques for management of this data fails. Performing data analysis cannot be done the same way we do data analysis for small datasets. With the rise of AI and Deep Learning and data being produced faster than ever the datasets have become so large that traditional ways of handling data would just be a big mess.

In this project I have done Data Analysis and found some inferences on the datasets and this is done considering the datasets are very large.

Pyspark has been used as the tool for management and analysis of the datasets and MongoDB is used as the database.

All the steps have been explained elaborately in the python file.

About

Performing Data Analysis of Text Data and CSV data considering the data is very large. PySpark has been used as the tool for analysis and MongoDB is used as Database for storage.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published