Python Space: we built a Jupyter kernel from scratch so notebooks feel native in Yee

Tyler Garrett ·July 3, 2026

#update
#software

Yee has been unapologetically web-first. React, Vite, Astro, Tailwind — that's where the live preview, the token-efficiency tooling, and every measured number live. If you write web apps, the differentiated machinery is all pointed at you.

Python was the awkward middle child. The agent could edit Python beautifully — grep, ranged reads, symbol maps, traceback detection, the command interceptor that already knew uvicorn from flask — but there was no preview. No ▶. A data scientist opening a Python repo got a competent text editor and a chat box, which is exactly what Cursor and Claude Code already do. We were honest about it: in our own LANGUAGES.md tier list, Python sat at A−, and the one thing holding it back was written down in plain language — no preview surface.

So we built one. And the shape it wanted to take was obvious the moment we said it out loud: a notebook is Python's preview.

The idea: a notebook that primaries as files

The pitch was a "huge Python space you can just start using with your Python or Jupyter files" — but with a twist that matters. It shouldn't be a notebook app bolted onto a code editor. It should primary as regular file tooling and feel like Jupyter on top.

Concretely: Python Space is a third body for the same sliding pane that hosts the web preview. Where a React repo shows a <webview>, a Python repo shows a dark-glass cell list. But there's no hidden notebook database — the cells are your .ipynb (or a .py with # %% markers) on disk. The human runs cells with Shift+Enter; the agent edits the same file with its normal tools; the pane live-reloads the instant the agent's edit lands. Two views of one file. Nothing to sync, nothing to diverge.

That framing paid off immediately, because it meant Python Space could inherit the entire preview lifecycle Yee already had — the ▶ button, per-repo state, the honest "here's what's happening" awareness note the agent reads every turn. We didn't build a new subsystem. We taught an existing one a new trick.

The hard part: a kernel, from scratch, over plain `python3`

Here's the decision that shaped everything. We refused to depend on Jupyter.

Not jupyter, not ipykernel, not the ZMQ kernel protocol. If Python Space required a pip install before your first cell ran, we'd have failed the "just start using it" promise for a huge fraction of users — the ones on a locked-down machine, a fresh venv, a corporate environment where installing Jupyter is a ticket. Yee's whole thesis is a high-quality JavaScript ecosystem that hosts Python gracefully; the kernel had to be ours.

So the kernel is a ~250-line pure-standard-library Python script (bootstrap.py) that Node spawns and talks to over stdin/stdout with a line-delimited JSON protocol. Simple in outline. The interesting part is what happens in its first ten lines.

Making the protocol unforgeable

A naive version of this has a gaping hole: if the kernel talks to Node over stdout, and user code can also print() to stdout, then a cell that runs print('{"kind": "done"}') can forge a protocol message. Worse, a subprocess the user spawns inherits that same stdout. You cannot trust a single byte.

The fix is a small piece of Unix plumbing that runs before any user code can execute. The kernel duplicates the real stdin/stdout to private handles, then points the public file descriptors somewhere harmless:

fd 0 (stdin) → /dev/null. User input() gets a clean, immediate EOFError — it can never read

a protocol request meant for the kernel.

fd 1 & 2 (stdout/stderr) → capture pipes. Everything user code writes — print(), C-extension

output, and crucially, subprocess output — flows into pipes that a reader thread forwards to Node as JSON stream messages. The text lands inside a JSON string, structurally incapable of forging a frame.

The real stdout, now private, carries the protocol. Every message is also prefixed with a random

per-session nonce, so Node drops anything that isn't ours as a belt-and-suspenders second line of defense.

We wrote adversarial tests that try to forge frames — a nonce-prefixed print, a raw os.write(1, ...), a subprocess echoing a fake message — and confirmed every one arrives as honest, quoted text. That test file spawns a real python3 and runs 43 assertions against it; it's the load-bearing proof that this thing is safe.

Persistent state, REPL semantics, and the little niceties

Because it's one long-lived process, your variables survive between cells — the entire point of a kernel versus a script runner. Load a dataframe in cell 2, use it in cell 8. We parse each cell's AST so that a trailing bare expression gets evaluated and its repr shown, exactly like a real Python prompt, with _ bound to the last result. matplotlib figures are captured to inline PNGs after each cell if you imported pyplot — and cost nothing if you didn't.

Then we went looking for the walls — before users did

Here's where the process matters as much as the code. Shipping a notebook surface and hoping real notebooks work is a great way to have data scientists bounce on their first file. So before calling it done, we ran an adversarial audit: 85 agents across five "hardcore Python user" personas — someone opening a real Jupyter export, someone deep in conda/poetry hell, someone running tqdm loops and multiprocessing, someone with a 50MB notebook, someone who lives in Django. Each finding was checked by two independent verifiers reading the actual code, so we fixed real bugs, not hypotheticals.

They found 40 confirmed walls. A few were exactly the kind of thing that ends a first impression:

The %matplotlib inline wall. The single most common first line in notebooks in the wild is an

IPython magic — and plain Python treats %matplotlib inline as a syntax error. Cell 1 of every real notebook would have failed. We added a compatibility layer that only activates when a cell fails to parse (so pure-Python cells are never touched), transforming magics, !shell commands, %%bash blocks, and obj? help syntax, and seeding display() and get_ipython() shims that exported notebooks call. Now cell 1 just runs.

The "my imports fail" wall. When you launch a Mac app from the Dock instead of a terminal, it gets

a bare PATH with none of your shell's environment — no conda, no pyenv, no Homebrew. So a conda user's import pandas would fail against Apple's stock Python while the exact same import worked in the integrated terminal. Maximally confusing. We now capture your login shell's environment at startup (the well-worn "fix path for shell" trick) and detect poetry/pipenv virtualenvs that live outside the repo. Your env is your env, however you launched.

The torch.save wall. An ML engineer defines a model class in a cell and pickles it — and it

breaks, because the cell namespace wasn't a real __main__ module. We made it one. Now pickle/torch.save/joblib of cell-defined classes resolve correctly, and multiprocessing spawn stops re-executing the kernel into its own pipes.

Plus a long tail: top-level await, pandas HTML tables and interactive-plot handling, tqdm progress bars, sys.exit() that doesn't kill the kernel, atomic crash-safe saves, a file-watcher that survives git/black/atomic-rename saves, and — importantly for backend developers — a Django repo is no longer hijacked into a notebook, and clicking a plain utils.py references it in chat instead of spawning a kernel. (One of the hunt's findings was even a real UI bug in code we'd already shipped: the output drain could chop a line mid-word right before a cell finished.)

We fixed roughly 36 of the 40 and wrote down the rest — an explicit interpreter picker, big-notebook virtualization, a couple of edge cases — because "we know about it and it's logged" is a very different posture than "we didn't think of it." None of the deferred items block the first run.

Where Python sits now

Python moves from A− to A in our tier list: a real preview surface, wall-hardened for the notebooks and environments people actually have. We're deliberately not claiming the top S− tier yet, because our own rule says a tier claim needs a measured eval behind it, and the "edit a Flask route while the kernel's live" benchmark doesn't exist yet. That's the next honest step.

But the thing we set out to build is real and running: open your Python or Jupyter files and just start using them — cells that run, plots that render, state that persists, an agent that watches and fixes, and not a single pip install jupyter in sight.

Under the hood: lib/pybridge/bootstrap.py (the kernel), lib/pyKernel.js (lifecycle), lib/pythonEnv.js + lib/shellEnv.js (environment detection), lib/notebook.js (the one parser), app/pySpace.js (the surface). ~130 assertions across five test suites, plus live smoke against real python3. Full engineering log in the root [CHANGELOG.md](../CHANGELOG.md) → "Deep engineering log".