This language is an exploration in many fronts, my personal programming language aesthetics, a programming language that fits how I think, and a programming language to attempt to reduce the unreliability of LLM-based programming.
Introduction
Some years ago I wrote about Exploratory Programming and how a fast feedback loop is essential for me to understand a problem and build intuition and a robust mental model.
At that time I started working on Rekishi, an exploration of what a programming environment for exploratory programming could look like on top of Common Lisp.
I think that essay is fairly relevant in the 2/3 years since I wrote it even with the immense change that has happened with the improvement of LLMs. A couple of months ago I was thinking that the same principles that made Exploratory Programming interesting for me could improve the output of coding agents. If they are able to interact, explore, prod and iterate fast the output could be better.
I am personally worried about the current state of LLM-based programming, setting asidethe ethical concerns around LLMs clearly they are here to stay. Wouldn’t it be good if we had better tools to develop software with them? Wouldn’t it be good to offer the tooling to make the software generated by them more reliable, more trustable, more accurate and better suited to be used in production?
At the same time I want a language that I can hand-code and that brings me the iterative process that I wanted with Rekishi.
Background
This actually started as a language I’ve been thinking for a long time to make game development for game jams more interactive between all participants. I took a lot of inspiration from CEPL, a library for Common Lisp that allows you to operate on a graphical OpenGL context in a repl-oriented way. I was also very inspired by the Tomorrow Corporation Tech Demo. They have a custom language that provides things like deterministic computation of the game loop, brancheable shareable time-traveling debugging sessions, etc.
I’d really recommend looking at these two to get inspired with what’s possible when we build tools as first-class citizens instead of afterthoughts and when cohesive trade-off design allows things that looks like magic.
When I started building this language it was fully game focused. It had an integrated ECS, an embedded graphical game loop, entities, sprites, etc. At some point one day I integrated a small MCP server and asked Claude Code to test some games with it.
I was shocked at how fast it was finding and solving issues. The same reasons I thrived in an Exploratory Programming environment was helping Claude thrive as well. Instead of guessing it can test, it can build state incrementally, modify it, prod at it, go back in time, redefine a function and run it again. We spend so much time tearing down and rebuilding our testing state.
That first version was based on Rust, Cranelift and Bevy.
I then thought, okay, this may be bigger than games, I went through a phase where I wanted to build a product on this. I still do, but I think the language itself should be open source so it’s available to reduce the effect of slop on the world.
Coordinates in the PL Design Space
Yabu is a programming language heavily inspired by four programming languages:
- From Lisps I take the interactivity, the REPL-driven development, the malleability. Everything I wrote about in Exploratory Programming
- From Elm I take the strong but very simple type system and The Elm Architecture.
- From Unison I take the content-addressable store and their own take of interactivity.
- From Elixir I take the immutability and in
particular I take
Phoenix.LiveView. The process model, theGenServerapproach to reacting to events and the everything is simple data that can be passed around
I think for me all of these puzzle pieces are needed to fix the state of agentic programming.
There is one last piece that I’m also adding to the language, an attempt at improving the reliability of the language by construction. I want programs in Yabu to have automatic deterministic simulation testing (similar to Antithesis) without the user having to do anything about it. I am also exploring compilation into formal methods languages like TLA+ or Quint
I want to elaborate a bit more on each of these, I’ll skip the interactivity and REPL-driven development one since I think it’s already fairly well explained in my Exploratory Programming essay.
Elm and Elm Architecture
The Elm Architecture (TEA), intersects very well with the LiveView idea as well as with the simulation testing and compilation to formal methods.
The shape of a liveview, where you get events (Msg in TEA) that update a state
is basically the same exact idea, however in the case of TEA it gets even
deeper as any side effect needs to go through the Cmd architecture. This
forces the code to deal with errors, loading states and all kinds of
intermediate states at the type-system level.
For example this simple counter liveview
defmodule Some.LiveView do
def render(assigns) do
~H"""
Counter: {@counter}
<button phx-click="inc">+</button>
"""
end
def mount(_params, _session, socket) do
{:ok, assign(socket, :counter, 0)}
end
def handle_event("inc", _params, socket) do
{:noreply, update(socket, :counter, &(&1 + 1))}
end
end
In Elm
type alias Model =
{ counter : Int }
init : Model
init =
{ counter = 0 }
type Msg
= Inc
update : Msg -> Model -> Model
update msg model =
case msg of
Inc ->
{ model | counter = model.counter + 1 }
view : Model -> Html Msg
view model =
div []
[ text ("Counter: " ++ String.fromInt model.counter)
, button [ onClick Inc ] [ text "+" ]
]
This also maps very well how you model things in TLA+, where you have an init, and an update function. The update function is normally split into individual actions that can happen. And those define what the next state is. Each of the actions is usually an atomic unit of change.
VARIABLE counter
Init == counter = 0
Inc == /\ counter' = counter + 1
Next == Inc
Spec == Init /\ [][Next]_counter /\ WF_counter(Inc)
The shape of all of these is very similar, so the mapping of a Yabu program maps almost 1<>1 into TLA+. This of course doesn’t mean that it’s easy to check and there is work to be done in that direction, but the general translation is very mechanical.
TEA also enables simulation testing quite easily since you don’t need complex
hypervisors, you can mostly just… emit Msg based on the program you are
simulating. Spawn some processes, explore the program state and you have
simulation testing for free!
Basing Yabu on TEA makes all programs have this shape, the constraints of that gives us the three benefits above, which I think is a very good tradeoff!
Unison’s Content Addressable Store
This is something that I lightly touched in the original essay and something I was already doing with Rekishi. The idea of a function-granularity content-addressable store helps with the exploration by making the granularity of changes a lot smaller than the commit. My original idea in Rekishi was that I could get individual function evolution in tree form, being able to go back and forth in the exploration I had been doing.
With Yabu, while this is still important there is another benefit when dealing with Agents. Making the individual unit of change the function and making function definitions be addressed by content helps us make incremental type checking on function re-definition. One of the goals of yabu is to make feedback loops as tight as possible.
Here’s an example of a REPL session as I envision it:
(defun some-func ((a int) (b int)) int (+ a b))
# => defined some-func (-> (int int) int)
(defun other-func () int (some-func 4 9))
# => defined other-func (-> () int)
;; Now we try to redefine some-func in a way that is incompatible with other-func
(defun some-func ((a int)) int (+ a 5))
# => redefinition of some-func staged due to type error
# in other-func:
# (some-func 4 9) expected (-> (int int) int)
# redefinition would change type signature to (-> (int) int)
#
# some-func (a8d720eff) staged
Everytime you redefine you get immediate feedback on the type status, not when
you run a separate build step. Not only that but the some-func redefinition is
not lost or commited. It stays staged. Staged definitions are those that don’t
typecheck and they will be reevaluate as new definitions are processed.
For example this could be a REPL session continuation:
(defun other-func () int (some-func 4))
# => redefined other-func (-> () int)
# => merged staged some-func (a8d720eff) unblocked by other-func redefinition
This has a nice side-effect, the image always typechecks, if something doesn’t typecheck it’s staged and you can still call the function as it is now. Once everything typechecks it’s merged.
THis of course has repercussions for multi-agent systems, a change by one agent cannot block another agent from interacting and testing the system. I’m unsure how important that’s going to be but in an interactive system I think it’s important to have the system always proddable. With Common Lisp this is almost always the case, but with a strong type system some care needs to be taken to make it fully interactable.
More context on the incremental typechecking and compilation can be found at Incremental Computation Graphs and Incremental Type System
LiveView
One of my favourite programming environments is Elixir with Phoenix LiveView. The ability to make interactive UIs without touching a line of javascript is amazing. Suddenly you don’t need to think about APIs, how to reload things, what entities to expose. You click on a button and you can query the DB directly and get access to everything your backend offers.
With Yabu you’d get the same idea by using TEA to organize LiveViews that run on the backend and interact with the frontend through a websocket.
An interesting future development in this direction is to be able to move the network boundary anywhere we want. One of the problems with LiveView is latency.
In normal liveview the network is between user actions and the liveview handlers:
User Actions -> Network -> Yabu LiveView + Interactivity -> Database
But what if we could granularly move the network where we need it
User Actions -> Interactivity -> Network -> Liveview -> Database
Targeting Common Lisp
This is something relatively new, Yabu started implemented in Rust + WASM, then moved into Elixir and finally Common Lisp.
As of right now the language works even though it lacks a good DX and stdlib. The reasons to target Common Lisp are (non-exhaustive and not ordered):
- A lot of the things I was building in Rust/Elixir around reloading, recompilation, etc are built-in. No need to reinvent the wheel.
- Can reuse the native compilation infrastructure, I just need to emit common lisp
- Emitting common lisp is very simple, at the end of the day code is data :D
- Decent ecosystem I can leverage (not as good as Rust/Elixir though, a bit of a loss there)
Brain Dump
Here’s a list of things that have been going through my mind around Yabu in no particular order and without spending too much time fleshing them out.
Production Time-Travel Debugging
Another feature I’m working with is to record production data, if a request fails we can store it permanently and it can be routed back to agents in a way that they can time-travel debug the system after the fact.
Each request has a deterministic seed that is saved alongside the trace so we
only need to save the Msg history.
Managed Storage
As part of the Cmd interface we would offer managed storage, with
safe-by-default schema evolution. Similar to how protobuf safe schema evolution
works.
Crons
Crons are also shaped using TEA but without a view. You get init and update, side-effects and traceability. This basically gives you Temporal for free without even having to change your code.
Zero-downtime deploys
The content-addressable store + TEA + schema evolution allows you to have multiple versions of the app in flight at the same time providing zero-downtime deploys by default.
External dependencies in simulation testing
This gets tricky, you need to model real-time external dependencies (think Stripe) as state machines at least for the relevant parts for your application. This is something the ecosystem could provide.
Informal Evals
I did some informal evals a month or so back where I compared Sonnet 4.6 and
gpt-oss-120b running on cerebras (~1000 tok/s). The very fast feedback loop
offered by cerebras + the Yabu REPL closed the feedback loop enough that
gpt-oss-120b outperformed Sonnet 4.6 just by virtue of having a lot more
inference speed + full introspection of the system.
— marce coll, 2026-04-23