mgrep: The CLI Tool That Greps Everything, Semantically
Ever found yourself grepping through a codebase for a specific concept, only to realize you need to search through PDFs, images, or documents too? Or maybe you've tried to find "that one function" but can only remember what it does, not its exact name. Traditional grep hits a wall when the search needs to be about meaning, not just raw text.
Enter mgrep from mixedbread-ai. It's a CLI-native tool that brings semantic search to your terminal, letting you grep not just code, but virtually any file type—using the meaning behind your words.
What It Does
mgrep is a command-line tool that performs semantic search across your files. You give it a natural language query (like "function that validates user login"), and it finds relevant content in your code, text files, PDFs, images, and more. It works by generating embeddings (vector representations) of both your query and your file contents, then finding the closest matches. It's grep, but for ideas and concepts.
Why It's Cool
The magic of mgrep isn't just that it searches semantically; it's that it does so across mixed modalities from your terminal. You can point it at a directory and it will intelligently process different file types using the appropriate models.
- Truly Multi-Format: It uses dedicated models for different content. Code, text, PDF text, and images are all encoded into the same vector space, so you can search across all of them with one query.
- CLI-Native: It feels like a classic Unix tool. Pipe it, redirect it, use it in your scripts. It slots right into a developer's existing workflow.
- Offline-First (mostly): While it can use cloud embedding APIs for high performance, it also supports local, offline models, keeping your data private.
- Smart Chunking: It breaks down large documents and images into meaningful chunks before creating embeddings, so your results are precise and relevant, not just a whole-file match.
Imagine searching your project for "database schema diagram" and having it return both the schema.sql file and the whiteboard screenshot you saved in your docs/ folder. That's the power mgrep unlocks.
How to Try It
Getting started is straightforward. You'll need Python (3.9+).
-
Install it:
pip install mgrep -
Run your first semantic search: The simplest way is to use the default, free API (requires an internet connection).
mgrep "query about error handling" /path/to/your/project -
Configure for power use: For offline work or higher limits, you can set up local models or use your own API keys for services like OpenAI or mixedbread-ai. Check the
mgrep configcommand and the GitHub repository for detailed setup.
The repo has excellent documentation on all the options, from model choices to output formatting.
Final Thoughts
mgrep feels like a glimpse into a next-generation developer workflow. It won't replace classic grep for exact string matching, but it solves a different, often frustrating problem: finding things when you don't know the exact words you're looking for. The ability to seamlessly pull relevant results from a PDF spec, a code comment, and a diagram with one intuitive query is a genuine productivity boost.
It's a tool that respects the Unix philosophy while leveraging modern AI in a practical, non-gimmicky way. If you've ever lost time trawling through folders for a specific concept, mgrep is worth adding to your toolkit.
Follow for more interesting projects: @githubprojects
Repository: https://github.com/mixedbread-ai/mgrep