HPC Project: Parallelized KMeans Algorithm and Performance Benchmarking

Overview

In this project, we parallelized the KMeans machine learning algorithm and benchmarked its performance using strong scaling and weak scaling. We also developed a distributed data loader to load data in parallel to each MPI (Message Passing Interface) process.

Contents of the Project

Understanding the Algorithm and Parallel Traits:

We analyzed the KMeans algorithm and identified opportunities for parallelization to optimize its performance.
Distributed Data Loader:

We developed a distributed data loader that supports the following scenarios:
- Loading data from a single file with balanced loading among MPI processes.
- Loading data from multiple files with varying row counts while maintaining load balance.
- Optional: Loading data from a message broker in batches. Documentation and examples were provided for each scenario.
Parallelizing the Algorithm:

We parallelized the KMeans algorithm to distribute the computation across multiple MPI processes for enhanced performance.
Running and Benchmarking:

We executed the code in distributed mode and performed strong scaling and weak scaling experiments:
- We plotted charts to visualize performance under various settings, taking into account system limitations and parallelism choices.
- We evaluated and documented the following attributes:
  - Data loading time
  - Algorithm computation time
  - Communication time

Feel free to customize this template further to fit the specific details of your project. Markdown is a versatile format, so you can include links, images, and other elements as needed.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.vscode		.vscode
Chunk_reader_task_1		Chunk_reader_task_1
Load_balance_task_2		Load_balance_task_2
RabbitMQ_task_3		RabbitMQ_task_3
Scatter_task_1		Scatter_task_1
Sequential		Sequential
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HPC Project: Parallelized KMeans Algorithm and Performance Benchmarking

Overview

Contents of the Project

About

Releases

Packages

Languages

layanmoyura/HPC_project

Folders and files

Latest commit

History

Repository files navigation

HPC Project: Parallelized KMeans Algorithm and Performance Benchmarking

Overview

Contents of the Project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages