What happens when you scan from 4 channels instead of just looking at HTML URLs?
GitHub RepoImpressions569
View on GitHub
@githubprojectsPost Author

StackPrism: Scanning the Web from 4 Angles, Not Just HTML URLs

Most web scraping tools look at one thing: the HTML returned by a URL. But what if you could scan a page from four different perspectives at once? That’s exactly what StackPrism does — and it’s a refreshingly practical twist on how we gather data from the web.

If you’ve ever spent hours digging through raw HTML only to miss the data hidden in JavaScript-rendered content, network requests, or cached snapshots, this tool will catch your attention. It’s not about being flashy — it’s about being thorough.

What It Does

StackPrism is a Python tool that takes a single website URL and scrapes it using four separate channels:

  1. HTML — The plain, server-rendered HTML of the page.
  2. JavaScript — The DOM after JavaScript has executed (via a headless browser or similar mechanism).
  3. Cache — Publicly cached versions of the page (e.g., from Google or Wayback Machine).
  4. Network — Data from outgoing network requests made by the page (XHR/fetch calls, API endpoints).

The result? A unified view of the page across all four sources. You can compare what’s different, find data that only exists in one channel, or detect when a site is hiding content behind JavaScript.

Why It’s Cool

Most developers stop at the HTML. But modern websites rely heavily on JavaScript to load data, and many hide their real content behind API calls. StackPrism solves this by giving you a multi-dimensional snapshot.

Here’s what makes it stand out:

  • No more “inspect element” guesswork — You can programmatically see if the content you want appears in the JS-rendered DOM or only in a cached version.
  • API discovery — By scanning the network channel, you can identify internal API endpoints that the site uses. This is gold for reverse engineering or building integrations.
  • Cache comparison — Find pages that have different content in cached versions (useful for detecting changes over time or tracking deleted content).
  • Simple output — The results are structured, so you can feed them into your own analysis pipeline without a ton of parsing.

How to Try It

The repository is straightforward to get running. You’ll need Python 3.8+ and a few dependencies.

  1. Clone the repo:

    git clone https://github.com/setube/stackprism.git
    cd stackprism
    
  2. Install dependencies (recommend using a virtual env):

    pip install -r requirements.txt
    
  3. Run it on a URL:

    python stackprism.py https://example.com
    

That’s it. The tool will output the data from all four channels in a clean, diff-compatible format. There’s also an option to export results to JSON if you want to process them further.

No cloud accounts. No API keys. Just a terminal and a URL.

Final Thoughts

StackPrism is one of those tools that feels obvious in retrospect — why wouldn’t you scan across multiple sources? It’s especially useful for security researchers, data analysts, or anyone building scrapers for modern single-page apps. The implementation is clean, the output is practical, and the concept is solid.

That said, it’s not a magic bullet. Some sites will block headless browsers, and caching services can be flaky. But as a developer, having this in your toolkit gives you one more angle to work with — and sometimes that’s all you need.

Give it a spin on a site you know well. You might be surprised what shows up in the network channel that wasn’t in the HTML.


Published by @githubprojects

Back to Projects
Last updated: June 3, 2026 at 02:57 AM