Big-Data-Analytics

Big Data is usually used when the size of the datasets used is so large that conventional systems and techniques for management of this data fails. Performing data analysis cannot be done the same way we do data analysis for small datasets. With the rise of AI and Deep Learning and data being produced faster than ever the datasets have become so large that traditional ways of handling data would just be a big mess.

In this project I have done Data Analysis and found some inferences on the datasets and this is done considering the datasets are very large.

Pyspark has been used as the tool for management and analysis of the datasets and MongoDB is used as the database.

All the steps have been explained elaborately in the python file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Agile Processes in Software Engineering and Extreme Programming.txt		Agile Processes in Software Engineering and Extreme Programming.txt
BigData_csv_Analysis.ipynb		BigData_csv_Analysis.ipynb
BigData_text_Analysis.ipynb		BigData_text_Analysis.ipynb
Crime_Statistics_SA_2010_present.csv.zip		Crime_Statistics_SA_2010_present.csv.zip
README.md		README.md
Scrum Handbook.txt		Scrum Handbook.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big-Data-Analytics

About

Releases

Packages

Languages

sohail-sankanur/Big-Data-Analytics

Folders and files

Latest commit

History

Repository files navigation

Big-Data-Analytics

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages