When You Need ML Algorithms Implemented Clearly in Just NumPy
You know the feeling. You're trying to understand how a machine learning algorithm actually works under the hood, but every implementation you find is either a black-box library call or a tangled mess of framework-specific code. The math makes sense on paper, but the code? Not so much. Enter numpy-ml: a collection of machine learning algorithms written exclusively in NumPy, designed for readability over raw performance.
This isn't another ML framework trying to compete with TensorFlow or PyTorch. It's something rarer—a teaching tool that doubles as a prototyping sandbox. If you've ever wished you could step through the forward pass of a Transformer attention mechanism line by line, or trace the exact math behind a Wasserstein GAN loss, this project is for you.
What It Does
numpy-ml is a Python library that implements a wide range of machine learning algorithms using only NumPy as its numerical backend. No TensorFlow, no PyTorch, no JAX—just NumPy arrays and the algorithms built on top of them. The project covers everything from classical statistical models to modern deep learning architectures.
The available models include Gaussian mixture models (with EM training), hidden Markov models (with Viterbi decoding and Baum-Welch parameter estimation), latent Dirichlet allocation for topic modeling, and a substantial neural networks module. That neural networks section alone covers layers (LSTM, Elman RNN, convolutional layers with padding, dilation, and stride), modules (bidirectional LSTM, Transformer-style multi-headed attention, WaveNet-style residual blocks), optimizers (SGD with momentum, AdaGrad, RMSProp, Adam), weight initializers (Xavier, He), and full models like variational autoencoders and Wasserstein GANs.
Tree-based models are here too—CART decision trees, random forests, and gradient-boosted decision trees. Linear models cover ridge regression, logistic regression, ordinary least squares, and Bayesian linear regression with conjugate priors. The reinforcement learning agents even train on OpenAI Gym environments, though that's an optional install.
Why It's Cool
The value proposition here is unusual, and that's what makes it interesting.
-
Readability is the primary design goal. Most ML codebases optimize for speed or memory efficiency. numpy-ml optimizes for legibility. If you're trying to understand how the forward-backward algorithm works, you can read the actual implementation rather than deciphering optimized C++ bindings.
-
It covers the full spectrum. You get classical models (HMMs, LDA, Gaussian mixtures) alongside modern deep learning components (Transformer attention, WaveNet blocks, GANs). This makes it useful as a reference for both traditional and contemporary ML techniques.
-
The neural network module is surprisingly comprehensive. It includes not just standard layers but also sparse evolutionary connections, restricted Boltzmann machines, and noise contrastive estimation loss. The weight initializers, regularizers, and learning rate schedulers are all implemented from scratch.
-
It works as a prototyping starting point. The installation instructions explicitly encourage cloning the repo and hacking on the code. You're meant to modify it, experiment with it, and learn from it.
-
The documentation exists. There's a full project documentation site at readthedocs, which suggests the code is accompanied by explanations beyond just the source.
How to Try It
You have two options depending on your goals.
For experimentation and learning, clone the repository and set up a virtual environment:
git clone https://github.com/ddbourgin/numpy-ml.git
cd numpy-ml && virtualenv npml && source npml/bin/activate
pip3 install -r requirements-dev.txt
For just using the library, install it as a package:
pip3 install -u numpy_ml
If you want the reinforcement learning agents that train on OpenAI Gym environments, add the optional dependency:
pip3 install -u 'numpy_ml[rl]'
From there, you can explore the models in the source code or check the project documentation for API details. The README lists all available models with expandable sections, so you can see exactly what's implemented before diving in.
Final Thoughts
numpy-ml isn't going to replace PyTorch for production training or scikit-learn for quick pipelines. But that's not the point. If you're a developer who learns best by reading code, or someone who needs to verify their understanding of an algorithm against a clean reference implementation, this project is a genuine resource. It's the kind of codebase you keep open in a second window while you're reading a textbook or paper, cross-referencing the math against the NumPy operations. For that purpose alone, it's worth having in your toolkit.
Follow @githubprojects for more developer tools and open source projects.
Repository: https://github.com/ddbourgin/numpy-ml