Time Series / Anomaly Detection / Tabular Data

tags: Database , Prometheus , Machine Learning , Statistics , Probability , Representing Time and Date

See TS section in Machine Learning

FAQ

What are the different methods

Comparison Table

Method	Category	Desc/Features	Use Cases/Strengths	Weaknesses	Introduced
Classical/Statistical

Naive/Seasonal Naive	Statistical	Forecast = last / last seasonal value	Baseline; simple; fast	Assumes persistence; often inaccurate	Foundational
Simple Exp. Smooth (SES)	Statistical	Weighted avg; models level	Univar; no trend/season; simple	No trend/season handling	~1950s
Holt’s Linear Trend	Statistical	SES + linear trend	Univar; trend; simple	Assumes linear trend; no season	~1957
Holt-Winters	Statistical	Holt + seasonality (add/mult)	Univar; trend & season; good benchmark	Assumes fixed patterns	~1960
ETS	Statistical	State-space framework for ES; auto-selects	General univar; robust; auto-select; prob.	Univar only; assumes state-space	~2002
ARIMA/SARIMA	Statistical	Models autocorrelation (AR+I+MA); SARIMA=seasonal	Univar; models autocorrelation; benchmark; prob.	Requires stationarity; param tuning	~1970
Theta Method	Statistical	Decompose + damped linear extrapolation	Univar; strong M3/M4 perf.; simple	Less intuitive; mainly univar	~2000
VAR	Statistical	Multivariate AR; models linear interdep.	Multivar linear interactions; interp.	Assumes linearity; needs stationarity	~1980
TAR/SETAR/STAR	Statistical	Threshold AR; regime-switching; nonlinear	Nonlinear univar w/ regimes	Complex thresholds; mainly univar	~1978
INLA	Bayesian Stat.	Approx. Bayesian inference; latent Gaussian	Complex models; hierarchy; uncertainty (prob.)	Approx. method; learning curve	~2009
Prophet	Statistical/Curve Fit	Decompose trend/season/holidays; Bayesian	Univar; strong season/holidays; robust; prob.	Less accurate on some benchmarks	~2017
Machine Learning & DL		(Often need more data; less interpretable)	(Can model complex nonlinearity/interactions)	(Compute intensive; tuning crucial)

Tree-based (RF, XGB…)	ML	Uses lagged/derived features in trees/ensembles	Nonlinearity/interactions; feature imp.; robust	Needs features; no trend extrap.	~1984+
SVR	ML	SVM for regression; uses tolerance margin	Robust to outliers; high-dim features	Less intuitive; kernel/param sensitive	~1996
Gaussian Processes (GP)	Bayesian ML	Non-parametric; models distribution over func.	Probabilistic; complex nonlinear; flex.	Slow (cubic); kernel tuning difficult	~2006
MLP	DL	Feedforward NN; needs lagged features	General nonlinear; covariates	Needs features; tuning; can overfit	~1980s
RNN	DL	NN w/ loops for sequence processing	Sequential data; time dependencies	Vanishing gradients; often outperformed	~1980s
LSTM	DL	RNN w/ gates for long dependencies	Complex seq; long dependency; multivar	Needs data; slow; tuning; can overfit	~1997
GRU	DL	Simpler LSTM variant; similar perf.	Like LSTM; potentially faster	Like LSTM; needs data; tuning	~2014
CNN (1D)	DL	Uses convolutions for sequence feature extraction	Feature extraction; fast pattern recog.	Less natural for long dependencies	~1989/2012
DeepAR/DeepVAR	DL	Autoregressive RNN outputs distribution params	Probabilistic forecast; covariates; global	Needs lots of data; complex; slow train	~2017
N-BEATS	DL	Non-recurrent NN; basis expansion; interp.	Univar; state-of-art M4/M3; interp.	Mainly univar; compute intensive	~2019
Transformer Variants	DL	Self-attention mechanism; parallel processing	Long dependencies; parallel; multivar	Data hungry; quadratic complexity	~2017+
Samformer	DL	Transformer variant	(Specific capabilities TBD)	(Likely transformer limitations)	Recent
TabPFN (Time Series)	DL	Transformer for small tabular data; zero-shot TS	Small datasets; little tuning needed	Newer; focus on specific niche	~2024

Additional notes

For time-series forecasting, we can either use

Deep learning

Traditional ML/stats methods

“In my projects, DL models outperform both statistical and ML methods in datasets with higher frequencies (hourly or more). I use TFT, NHITS, and a customized TSMixer. The most underrated statistical model that I often use is DynamicOptimizedTheta.”

Traditional based

DL based

DL was better for probabilistic time series forecasting when trying to quantify uncertainty. But based on ppl on the internet DL is not good for time series forcasting.
- https://github.com/romilbert/samformer
- nixtla
  - https://github.com/Nixtla/neuralforecast
  - https://github.com/Nixtla/statsforecast
    - in many cases their statsforecast implementation Beats Prophet in terms of Speed and accuracy
- CARTE-AI Documentation — CARTE-AI 1.0.0 documentation
- TabPFN 🌟

LLM Based

The fundamental challenge is that LLMs like O1 and Claude 3.5 simply aren’t built for the unique structures of tabular data. When processing tables through LLMs, the inefficiencies quickly become apparent - tokenizing a 10,000 x 100 table as a sequence and numerical values as tokens creates massive inefficiencies.

There’s some interesting work on using LLMs for tabular data (TabLLM: TabLLM: Few-shot Classification of Tabular Data with Large Language Models ), but this only works for datasets with tens of samples rather than the thousands of rows needed in real-world applications.

What o1 and other LLMs typically do is wrap around existing tabular tools like XGBoost or scikit-learn. While this works, they’re ultimately constrained by these tools' limitations. We’re taking a fundamentally different approach - building foundation models that natively understand tabular relationships and patterns. Our approach combines the benefits of foundation models with architectures specifically designed for tabular data structures.

Things ppl say

An aha moment for me was realizing that the way you can think of anomaly models working is that they’re effectively forecasting the next N steps, and then noticing when the actual measured values are “different enough” from the expected. This is simple to draw on a whiteboard for one signal but when it’s multi variate, pretty neat that it works.

mogoz

Time Series / Anomaly Detection / Tabular Data

FAQ

What are the different methods

Comparison Table

Additional notes

Things ppl say

Links

Links to this note