Nanobanana: Generate AI Training Data Without the Cost
If you've ever built a machine learning model, you know the hardest part isn't always the architecture—it's getting enough good, clean data to train it. Services that generate synthetic training data exist, but they can get expensive fast, especially when you're iterating or just starting out. What if you could generate that data yourself, locally, for free?
Enter Nanobanana. It's a tool that lets you generate structured, synthetic datasets right on your machine. Think of it as your own private data factory, no API keys or monthly subscriptions required.
What It Does
Nanobanana is a Python-based tool designed to create synthetic datasets for AI training and testing. You define the structure and rules for the data you need—like customer profiles, transaction records, or sensor readings—and Nanobanana generates a dataset (in formats like CSV or JSON) that fits your specifications. It handles the messy work of creating realistic, varied data so you can focus on building your model.
Why It's Cool
The real appeal here is control and cost. Because it runs locally, you have complete ownership over the data generation process. There's no sending your schema to a third-party service, no worrying about usage limits, and no surprise bills. This makes it perfect for prototyping, for creating data to test pipelines, or for generating supplementary data to balance a real dataset.
It's also built with developers in mind. You configure your datasets in a straightforward way, and the tool takes care of generating coherent data that respects the relationships and constraints you define. It’s a pragmatic solution to a very common bottleneck.
How to Try It
Getting started is straightforward. Head over to the GitHub repository to clone it and check the README for setup.
git clone https://github.com/xiguapiwork/nanobanana
cd nanobanana
# Follow the setup instructions in the README.md
The repository contains the source code and examples to get you going. You'll need Python installed. From there, you can start defining your own data templates and generating datasets locally in minutes.
Final Thoughts
As a developer, tools that remove friction and cost from the development loop are always a win. Nanobanana tackles a specific, expensive pain point in the ML workflow with a simple, self-hosted alternative. It won't replace all your data needs, but for generating synthetic training data, testing data pipelines, or bootstrapping a project, it’s a incredibly useful tool to have in your kit. Give it a spin the next time you need data to test an idea—your wallet will thank you.
Follow for more cool projects: @githubprojects
Repository: https://github.com/xiguapiwork/nanobanana