Building Multimodal Data Pipelines for Robotics and Computer Vision
If you've ever worked on a robotics or computer vision project, you know the pain of debugging a system that "sees" the world. Your code says the robot should be turning left, but it's going right. Your perception algorithm is detecting objects, but are they in the right place? The problem often isn't the logic—it's understanding what your system is actually perceiving and deciding in real time. You're flying blind.
That's where multimodal data pipelines come in. Instead of staring at logs and hoping, you need to visualize sensor data, 3D reconstructions, bounding boxes, and pose estimates together, in context. You need to see what your system sees. This is the problem Rerun is built to solve.
What It Does
Rerun is an SDK and visualization tool for logging and visualizing multimodal data streams. Think of it as a structured logging framework for visual data. You instrument your Python, C++, or Rust code to send data—images, point clouds, 3D transforms, text logs, even custom primitives—to the Rerun viewer. The viewer is a standalone app (or web component) that lets you explore this data spatially and temporally.
It's not just a viewer; it's a recording. You can save these streams as .rrd files, share them with teammates, and replay them exactly as they happened, which is invaluable for post-mortem analysis and collaboration.
Why It's Cool
The magic of Rerun is in how it handles complexity without getting in your way.
First, it's framework-agnostic. It doesn't care if you're using ROS, PyTorch, OpenCV, or a custom C++ stack. You add a few lines of logging code where it matters, and Rerun handles the rest. This makes it easy to integrate into existing pipelines.
Second, the spatial and temporal context is automatic. When you log a camera image, you can log the camera's 3D transform in the same coordinate system as your point cloud. The viewer understands these relationships. You can see an image from the camera's viewpoint in 3D space, synchronized with a lidar point cloud from the same moment in time. This alignment is crucial for debugging sensor fusion.
Finally, it's built for performance and scale. It can handle high-frequency data streams you'd get from real sensors. The viewer lets you scrub through time, isolate specific data types, and explore complex 3D scenes without becoming a slideshow.
A compelling use case is autonomous systems development. Imagine debugging a perception stack: you can visualize raw camera frames, the detected bounding boxes projected into 3D, the vehicle's planned path, and internal decision logs—all in one synchronized view. You stop guessing and start seeing the exact chain of events.
How to Try It
The quickest way to get a feel for Rerun is to run one of their example scripts. If you have Python installed, you can be up and running in a minute.
pip install rerun-sdk
Then, run one of their built-in demos:
python -m rerun_demo
This will launch the viewer and stream example data. To instrument your own code, check out the quick start guide in their repo. They have extensive examples for Python, C++, and Rust that show how to log common data types like images, points, and transforms.
Final Thoughts
Rerun tackles a specific but widespread problem in a genuinely useful way. It feels less like a flashy new framework and more like a fundamental tool that should have existed all along—a print() statement for visual data. If you're building anything that involves spatial data over time (robotics, CV, simulation, even certain kinds of game dev), it's worth an afternoon of experimentation. It might just replace a week of frustrating, blind debugging.
You can find the project, full documentation, and more examples on GitHub.
@githubprojects
Repository: https://github.com/rerun-io/rerun