Why I Built TokenFlow: A Chrome Extension to See How AI Thinks

Over the last few years, AI has moved from research labs into our browsers, chats, and daily tools. We ask ChatGPT to explain code, we let translators handle entire documents, and we talk to voice assistants as if they “understand” us. But behind all this magic is a surprisingly concrete process: your text is turned into numbers, organized in a huge mathematical space, and then interpreted by a model. Models like all-MiniLM-L6-v2 map sentences into compact 384‑dimensional vectors that capture meaning for search, clustering, and more.

I built TokenFlow, a Chrome extension, because I wanted a way to see this process in action—directly in the browser, on any page, without spinning up servers or sending data to external APIs. I decided to go with this kind of interactive, privacy‑first visualization.

Why I Decided to Build TokenFlow

The idea for TokenFlow came from three recurring frustrations I kept seeing in my own work and in conversations with students and developers:

AI feels like a black box: People hear terms like “tokens”, “embeddings”, “vector space,” but rarely get to see them in a concrete, visual way.
Learning NLP is too abstract: Most explanations live in slides, textbooks, or Jupyter notebooks—not in the context where people actually read and write text every day: the browser.
Privacy is an afterthought: Many demo tools send your text to remote servers. I wanted something that could run completely offline in Chrome using local models via Transformers.js and ONNX

At the same time, compact models like all-MiniLM-L6-v2 proved that we can get high‑quality embeddings with relatively few parameters (~22.7M) and 384‑dimensional vectors, fast enough for real‑time use in a browser. That combination—performance, quality, and size—made it the perfect backbone for TokenFlow.

So TokenFlow started as a personal experiment: Can I take the concepts I explain in my sessions, and turn them into an interactive experience that anyone can run on any webpage?

What TokenFlow Actually Does

TokenFlow is a Chrome extension that shows you, step by step, how your text is transformed inside a modern language model. Under the hood, it uses a sentence-transformers model (all-MiniLM-L6-v2) to produce 384‑dimensional embeddings for text, just like you’d use for semantic search or clustering.

When you select text and run TokenFlow, you walk through four main stages:

Tokenization – breaking text into pieces
The extension shows how your sentence is split into tokens and subwords, explaining why something like “tokenization” becomes pieces instead of staying as one unit. This mirrors standard transformer tokenization workflows used in NLP.
Token IDs – turning text into numbers
Each token maps to an integer ID from the model’s vocabulary. This is the “language” that the model actually understands and feeds into its internal layers.
Vector embeddings – capturing meaning in numbers
Using all-MiniLM-L6-v2, each token (or sentence) is converted into a 384‑dimensional vector that captures semantic meaning—exactly what’s used in applications like semantic search, clustering, and recommendation systems.
3D vector space – visualizing meaning
Because 384 dimensions are impossible to visualize directly, TokenFlow applies PCA to reduce them down to 3 dimensions and plots them interactively. PCA is a standard technique for projecting high‑dimensional data like word embeddings into 2D/3D while keeping as much structure as possible, making the relationships between words visible.

The result: you can literally rotate, zoom, and explore how your text is represented in the model’s internal space.

Benefits: Who TokenFlow Is For and Why It Helps

Although TokenFlow started as a learning project, it turned out to be useful for several different groups.

1. Students and Self‑Learners

If you’re studying NLP or machine learning, it’s one thing to read that “embeddings capture semantic similarity,” and another to see related words form clusters in 3D space. Visualizing embeddings with PCA is a common technique in research and teaching because it makes otherwise abstract 300–400D vectors understandable.linkedin+1

With TokenFlow, you can:

Highlight sentences from your textbook and see how they are tokenized and embedded.
Test ambiguous words (“bank”, “bat”, “lead”) in different contexts and watch their positions change in vector space.
Build intuition for how models handle context, rare words, and punctuation.

2. Developers and Engineers

If you build AI‑powered features, you’ve probably run into issues like unexpected token counts or strange tokenization behavior. Models like all-MiniLM-L6-v2 are widely used for embeddings in real applications, so understanding how they break down inputs can directly help with debugging and optimization.dhiwise+1

TokenFlow helps you:

Inspect how your prompts or user inputs are tokenized before sending them to APIs.
Understand why a “short” prompt may still produce a high token count.
Experiment with small text changes and immediately see their effect on tokens and embeddings.

3. Educators and Trainers

For teachers explaining NLP, showing live visualizations is far more effective than static slides. The same PCA‑based visualizations used in blog posts and research demos can now be triggered on any text in the browser.

With TokenFlow, you can:

Project a browser window in class and let students suggest sentences to analyze.
Demonstrate tokenization differences between languages.
Show how semantically similar phrases cluster together in 3D.

4. Researchers and Curious Practitioners

If you already know embeddings, TokenFlow becomes a lightweight playground for qualitative analysis. Compact models like all-MiniLM-L6-v2 are often used to prototype semantic search or clustering, and interactive visualizations make it easier to spot patterns and anomalies.

You can:

Compare how slight wording changes move points in embedding space.
Observe clusters for domain-specific terms like tech jargon or product names.
Use the visualization as a sanity check before deploying models in production.

Privacy‑First and Fully Local

A core design requirement for TokenFlow was privacy. Thanks to libraries like Transformers.js and ONNX Runtime Web, it’s now possible to run transformer models entirely in the browser using WebAssembly, without sending text to external servers.

That means:

Your selected text never leaves your machine.
All inference runs locally in the browser tab.
You get the benefits of semantic embeddings and visualization without sacrificing privacy.

The model itself is compact enough to load quickly and produce embeddings in milliseconds, which is exactly why all-MiniLM-L6-v2 is popular for real‑time applications.

How to Use TokenFlow (Step by Step)

Using TokenFlow is designed to feel as natural as highlighting text in your browser.

Install the extension
- Go to the Chrome Web Store and install the TokenFlow extension.
Open any webpage
- Read an article, documentation, a research paper, a blog—anything with text you care about.
Select text
- Highlight a word, sentence, or paragraph that you want to inspect.
Trigger TokenFlow
- Right‑click and choose the TokenFlow option, or click the extension icon in the toolbar.
Explore the pipeline
- Walk through the stages: tokens, token IDs, embeddings, and the 3D PCA plot.
- Rotate the 3D visualization, hover for details, and read the inline explanations about each step and why it matters.
Experiment with variations
- Change a single word and observe how the embedding point moves.
- Try multiple languages—compact multilingual models like all-MiniLM-L6-v2 are often used to map sentences from different languages into the same vector space.
- Add or remove punctuation and see how tokenization reacts.

Under the Hood: The Tech Stack

For those who enjoy peeking behind the curtain, TokenFlow relies on a modern in‑browser ML stack:

Model: all-MiniLM-L6-v2, a sentence-transformers model that maps text to 384‑dimensional embeddings for tasks like clustering and semantic search.huggingface+1
Runtime: Transformers.js + ONNX Runtime Web, which make it possible to run transformer models in JavaScript via WebAssembly or WebGPU for fast in‑browser inference.techcommunity.microsoft +1
Visualization: PCA for dimensionality reduction and 3D plotting, a common combination used in many embedding visualization projects to turn high‑dimensional vectors into intuitive visual structures.github+1

The goal isn’t just to “use AI,” but to give people a direct window into how this stack actually operates on their text.

What’s Next

TokenFlow is just the beginning. There’s a lot of room to grow:

Support for more models and comparison views.
Additional visualization modes beyond PCA.
Export options for screenshots or embedding data.

But the core mission will stay the same: make the inner workings of language models concrete, visual, and accessible—right where people already live on the web.

If you’re curious about how AI “understands” language, or you teach, build, or research NLP, I’d love for you to try TokenFlow, break it, and tell me what you discover along the way.