This is just a brain dump at this point, not tested
Integration of Mamba and Transformer– MAT for Long-Short Range Time Series Forecasting using Rust
This is an implementation of the paper arxiv.2409.08530v1
Abstract—Long-short range time series forecasting is essential for predicting future trends and patterns over extended periods. While deep learning models such as Transformers have made significant strides in advancing time series forecasting, they often encounter difficulties in capturing long-term dependencies and effectively managing sparse semantic features. The state space model, Mamba, addresses these issues through its adept handling of selective input and parallel computing, striking a balance between computational efficiency and prediction accuracy. This article examines the advantages and disadvantages of both Mamba and Transformer models, and introduces a combined approach, MAT, which leverages the strengths of each model to capture unique long-short range dependencies and inherent evo lutionary patterns in multivariate time series. Specifically, MAT harnesses the long-range dependency capabilities of Mamba and the short-range characteristics of Transformers. Experimental results on benchmark weather datasets demonstrate that MAT outperforms existing comparable methods in terms of prediction accuracy, scalability, and memory efficiency.
- Reversible Instance Normalization (RevIN): Preprocesses and reverses the normalization of input data.
- Two-Stage Embedding: Reduces the dimensionality of input data to improve efficiency.
- Multi-Scale Context Extraction: Convolves the input sequence at multiple temporal scales to capture context.
- Mamba Module: Handles long-range dependencies using a basic state-space model.
- Transformer Module: Captures short-range dependencies with multi-head self-attention.
- Residual Connections: Helps maintain gradient flow and enhance model depth.
- Two-Stage Projection: Projects the fused outputs back to the original time series dimension.
- Custom Loss Function (Optional): A weighted combination of MSE and MAE to balance short-term and long-term predictions.
- Selective Scanning in Mamba (Optional): The Mamba module currently implements a basic state-space model, but the selective scanning mechanism, which modulates interactions contextually and filters irrelevant information, is missing. This is enhanced by adding a gating mechanism or a learned selection layer to the Mamba module.