One Prompt, 50+ Models, One Winner. All Open Source.
Intro
You know that feeling when you have a task, and you're pretty sure an LLM can handle it, but you're not sure which LLM? Maybe you want to try Claude, Gemini, Mistral, Llama, and a dozen others on the same input, then pick the best result. That's exactly what G0DM0D3 does. It sends one prompt to over 50 open source models, runs them all in parallel, then picks the winner based on your criteria.
No vendor lock-in. No manual copy-pasting. Just one command and you get the best answer from a swarm of models.
What It Does
G0DM0D3 is a CLI tool that takes a single prompt and broadcasts it to a curated list of open source LLMs. It collects all responses, scores them (by default using a small judge model), and returns the top result. You can also choose to see all responses, rank them manually, or set your own evaluation metrics.
Under the hood, it uses a combination of Hugging Face's inference endpoints, local models via llama.cpp, and a few other open APIs. No proprietary cloud lock-in. The "winner" is determined by a separate judge model that compares each output against the original prompt for relevance, coherence, and helpfulness.
Why It's Cool
The clever bit is the "judge" mechanism. Instead of just piping outputs to a fixed rubric, G0DM0D3 uses a smaller, faster model to score each response. This lets you get a nuanced, context aware ranking without needing a human in the loop each time.
Other standout features:
- Parallel execution. All 50+ models run simultaneously. You get the winner in seconds, not minutes.
- Customizable model lists. Add or remove models via a YAML config. Want to test only Mixtral, Command R+, and Qwen? Easy.
- No API keys for most models. The default list uses publicly accessible endpoints. You can run it immediately.
- Output formats. Raw text, JSON, or even a side by side comparison for manual review.
Use cases
- Prompt debugging. Is your prompt weak? See how different models interpret it. The winner often reveals phrasing you'd never think of.
- Benchmarking. Run a set of prompts across all models to find the best general purpose open model for your specific use case.
- Creative tasks. Sometimes you want variety. G0DM0D3 can return all responses so you can pick the most interesting, not just the "best."
How To Try It
Clone the repo, install dependencies, and run:
git clone https://github.com/elder-plinius/G0DM0D3
cd G0DM0D3
pip install -r requirements.txt
python g0dm0d3.py --prompt "Explain the difference between a transformer and a RNN in one sentence"
That's it. It will fetch responses from 50+ models, pick the winner, and print it. For more options, check the --help flag.
Final Thoughts
G0DM0D3 is a neat tool if you've ever wished for a "best of" bot for open models. It's not a production grade evaluation framework — the judge model can be inconsistent, and some endpoints may be flaky — but for rapid experimentation, it's brilliant. I'd use it for prompt engineering, quick A/B testing, or just satisfying that curiosity about which model really is "best" for a given task.
If you're tired of model FOMO and just want the best open source answer right now, this is a solid ally.
Shared by @githubprojects
Repository: https://github.com/elder-plinius/G0DM0D3