What factors should I consider when choosing a predictive model technique?

This is a very broad question, and the answer would basically fill an entire book. In a nutshell, I would come up with the

1. How does your target variable look like?

continuous target variable? -> regression
categorical (nominal) target variable? -> classification
ordinal target variable? -> ranked classification
no target variable and want to find structure in data? -> cluster analysis, projection

2. Is computational performance an issue?

use "cheaper" models/algorithms
dimensionality reduction
feature selection
lazy learner (e.g,. k-nearest neighbors)

3. Does my dataset fit into memory? If no:

out of core learning
distributed systems

4. Is my data linearly separable?

hard to know the answer upfront
always a good idea to compare different models

5. Finding a good bias variance threshold. Does my model overfit?

increase regularization strength if supported by the model
dimensionality reduction or feature selection otherwise
collect more training data if possible (check via learning curves first)

6. Are you planning to update your model with new data on the fly?

one option are lazy learners (e.g., K-nearest neighbors); needs to keep training data around; no learning necessary but more expensive predictions
it's generally relatively cheap to update generative models
another option is stochastic gradient descent for online learning

...

The list goes on and on :). I think Andreas Mueller's scikit-learn algorithm "cheat-sheet" is an excellent resource. (Click on the image to view the original, interactive version on scikit-learn)

[Source: http://scikit-learn.org/dev/tutorial/machine_learning_map/index.html]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

choosing-technique.md

choosing-technique.md

What factors should I consider when choosing a predictive model technique?

1. How does your target variable look like?

2. Is computational performance an issue?

3. Does my dataset fit into memory? If no:

4. Is my data linearly separable?

5. Finding a good bias variance threshold. Does my model overfit?

6. Are you planning to update your model with new data on the fly?

Files

choosing-technique.md

Latest commit

History

choosing-technique.md

File metadata and controls

What factors should I consider when choosing a predictive model technique?

1. How does your target variable look like?

2. Is computational performance an issue?

3. Does my dataset fit into memory? If no:

4. Is my data linearly separable?

5. Finding a good bias variance threshold. Does my model overfit?

6. Are you planning to update your model with new data on the fly?