Parallelizing AdaBoost on Multi Core Machines using open MP in C++

AdaBoost, short for Adaptive Boosting, is a type of boosting algorithm which combines several weak classifiers to create one strong classifier. AdaBoosts fundamental nature doesn’t allow for parallelizing finding the weak classifiers, we present a way which helps achieve nearly 22.14x times the speedup compared to a serial implementaiton. In this project, we develop a parallel AdaBoost algorithm that exploits the multiple cores in a CPU via light weight threads. We propose different algorithms for different types of datasets and machines.

Prerequisites

Python3: To generate the data set for experimentation

C++ with OpenMP

Refer this for learning more about open mp and multi threading with C++. https://bisqwit.iki.fi/story/howto/openmp/

Installing

Run c++/create_data.sh to create the data set.
import the implimentation you like to use

There are 2 header files (details in report) which you can use:

Parallization to find the best feature threhold parallel: adaboost.h
Parallization everywhere: adaboost_best.h

To import simply type:

#include "adaboost_best.h"


Fit function: 
clf.fit(X,labels,t);

Predict function: 
vector<int> predictions = clf.predict(X); 

X here is a vector of vectors of dimention n*m, 
where n is number of examples and m is number of dimentions.

We also time different transposse implimentations in c++/time_transpose.cpp
We also have a python implimentation in final-adaboost.ipynb

Benchmark of Implimentations:

Project Report parallelizing-adaboost.pdf

Scope for improvement:

Change the naive formula used in error rate to the optimized one (with weight rescaling) mentioned in MIT video

Authors

Vibhu Jawa Vibhu Jawa
Praateek Mahajan Praateek Mahajan

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
c++		c++
data-analysis-files		data-analysis-files
python		python
.gitignore		.gitignore
README.md		README.md
data-analysis.ipynb		data-analysis.ipynb
parallelizing-adaboost.pdf		parallelizing-adaboost.pdf
runloop.sh		runloop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallelizing AdaBoost on Multi Core Machines using open MP in C++

Prerequisites

Installing

Benchmark of Implimentations:

Scope for improvement:

Authors

About

Releases

Packages

Contributors 2

Languages

praateekmahajan/parallel-adaboost

Folders and files

Latest commit

History

Repository files navigation

Parallelizing AdaBoost on Multi Core Machines using open MP in C++

Prerequisites

Installing

Benchmark of Implimentations:

Scope for improvement:

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages