Machine learning theory encompasses various concepts that underpin how algorithms learn from data. Here are some of the key concepts:
In supervised learning, the algorithm learns from labeled data, where each training example is paired with an output label. Common algorithms include:
- Linear Regression: For predicting continuous values based on linear relationships.
- Logistic Regression: For binary classification, estimating probabilities using a logistic function.
- Decision Trees: Tree-like models that make decisions based on feature values.
- Support Vector Machines (SVM): Finds the optimal hyperplane that separates data points of different classes.
- K-Nearest Neighbors (KNN): Classifies data points based on the majority label of their nearest neighbors.
- Neural Networks: Composed of layers of interconnected nodes, effective for complex patterns.
Unsupervised learning algorithms work with unlabeled data to find patterns or structures. Key algorithms include:
- K-Means Clustering: Partitions data into K clusters.
- Hierarchical Clustering: Builds a tree of clusters based on similarity.
- Principal Component Analysis (PCA): A dimensionality reduction technique that preserves variance.
- Autoencoders: Neural networks used for unsupervised learning that compress and reconstruct data.
Semi-supervised learning uses a combination of a small amount of labeled data and a large amount of unlabeled data. Techniques often adapt supervised learning algorithms to incorporate unlabeled data.
Reinforcement learning involves an agent learning to make decisions by interacting with an environment, receiving rewards or penalties. Key algorithms include:
- Q-Learning: A value-based method that learns the value of actions in given states to maximize cumulative rewards.
- Deep Q-Networks (DQN): Combines Q-learning with deep learning for high-dimensional input spaces.
- Policy Gradients: Directly optimizes the policy (decision-making strategy) for complex action spaces.
Ensemble methods combine multiple models to improve performance. Techniques include:
- Bagging: Trains multiple models on different subsets of data.
- Boosting: Trains models sequentially to correct errors made by previous models.
A popular ensemble method is the Random Forest algorithm:
- Overview: Constructs a multitude of decision trees during training and outputs the mode (for classification) or mean prediction (for regression).
- Key Features:
- Bagging: Uses bootstrap aggregating to create different subsets for each tree.
- Feature Randomness: Randomly selects a subset of features when splitting nodes, which helps to reduce correlation among trees.
- Robustness: Less prone to overfitting compared to individual decision trees and can handle large datasets with higher dimensionality.
- Applications: Widely used in medical diagnosis, credit scoring, fraud detection, and recommendation systems.
Meta-learning, or "learning to learn," involves algorithms that adapt their learning strategies based on past experiences. This approach is useful for improving model performance across different tasks.
- Overfitting: When a model learns the training data too well, capturing noise and leading to poor performance on new data.
- Underfitting: When a model is too simple to capture the underlying trend, resulting in poor performance on both training and unseen data.
The balance between two types of error:
- Bias: Error due to overly simplistic assumptions, leading to underfitting.
- Variance: Error due to excessive complexity, leading to overfitting.
The ability of a model to perform well on unseen data, assessed using techniques like cross-validation.
Selecting, modifying, or creating features from raw data to improve model performance.
Metrics used to assess the performance of machine learning models, including accuracy, precision, recall, F1-score, and AUC-ROC.
Choosing the right algorithm for a specific task and optimizing model settings to improve performance.
Preparing data for training, including normalization, handling missing values, and encoding categorical variables.
Techniques such as L1 (Lasso) and L2 (Ridge) regularization to prevent overfitting by adding penalties for larger coefficients.