2025-03-31 Patron letter - A startling glimpse of malleable software: LLM-generated Obsidian plugins

One of the great dreams of the personal computing revolution was that we’d all have not only our own computers, but also our own software, exquisitely tuned to our own purposes. Instead, we got the application model. Software is expensive to build, so developers try to capture as large a market as possible, creating one-size-fits-all packages. You can only change what the developer gives you knobs to change; you can only combine apps at junctions the developer has provided. In short, our workflows often don’t work the way we wish they did.

People have been trying for decades to break us free of the siloed application model (for a review, I recommend Marcel Borowski’s doctoral thesis, p31-44). One approach has been to let people make their own computational tools, without any complicated programming. That has turned out to be difficult. Digital spreadsheets’ computed cells are usually cited as the major success story, but it’s hard to find others.

Now we have large language models (LLMs). Everyone can build their own software, right? Well—maybe, eventually. In spring 2025, my impression is that most software built end-to-end by LLMs is either a shiny demo or a low-stakes component at the periphery of the real work. Filmmakers aren’t using LLMs to personalize their editing workflow. Musicians aren’t using LLMs to personalize their composition workflow.

I want to see the personal dynamic media dream fulfilled for experts’ central work in their domains. Part of the trouble here seems to be those activities often take place in extremely complex “mega-apps”. If people want to customize a few design details for their personal workflow, they usually need to recreate all that complexity. Neither today’s LLMs nor their typical users are prepared to rewrite Photoshop on the fly to fit their personal needs.

So: malleable software needs a more malleable substrate than today’s zero-to-one “text-to-app” workflow. Researchers working on this problem know that, and they’ve proposed many principled architectures for flexibly extending software behavior. Unfortunately, none of these approaches is as versatile and accessible as the “text-to-app” workflow. Can we find some happy middle—enough structure to permit adaptation without rewriting well-honed functionality from scratch, but with enough versatility and accessibility for typical users in serious, real-world contexts?

Some surprising experiments

A startling question occurred to me a few weeks ago: can it be possible that Obsidian is the only deeply extensible WYSIWYG word processor that has ever reached production? By “deeply extensible”, I don’t just mean theming, macros, or even custom sidebars (e.g. Google Apps Script’s). I mean that third parties can alter the interface of the text editor itself—for instance, with custom interactive elements interleaved into the document, custom behavior in response to interactive input, or custom interpretation of the data. Now, we’ve had deeply extensible plaintext editors (like Emacs) for decades. OpenDoc and OLE aspired to some part of this flexibility, but were abandoned before maturity. As far as my limited awareness goes, we got our first deeply extensible WYSIWYG word processor in 2022.

Deep extensibility doesn’t automatically translate into the kind of personal adaptation we might want. Obsidian plugins require programming. Extensions which modify the interaction design of the WYSIWYG text editor itself require surprisingly complex programming. So, in practice, a typical Obsidian user must rely on someone else to create the functionality they want. The standard pressures of software development still exist: because plugins are costly to make, the focus on is on plugins which can serve a wide audience; because these coarse-grained plugins become somewhat one-size-fits-all, they often become bloated with settings and complexity.

Yet now: LLMs. As part of my project with Taylor Rogalski, we’ve been building a very complicated Obsidian plugin—basically using Obsidian as a prototyping substrate for a speculative tool for thought. I began with a traditional software engineering approach, augmented by line-level LLM suggestions. Meanwhile, Taylor enthusiastically experimented with having LLM agents build features for our plugin, end-to-end. There are limitations there, of course, but I was surprised by how much they can handle.

Inspired by Taylor’s experiences, I started asking LLM agents to produce simpler Obsidian plugins end-to-end, with no expert intervention on my part. I just cloned a plugin sample project, installed it into a test Obsidian vault, and let the agent do the rest.

My first attempt was actually to fulfill a workflow idea that Taylor had jotted down. His design was based on The Email Game, a gamified and timed two-pass inbox triage. Taylor wanted a sidebar listing all incomplete to-dos in the current file. Once he selects a few to-dos and clicks “Start”, the interface should show one task at a time, with a 2-minute timer and a text box for jotting notes. When the timer’s up, or when a “Done” button is pressed, the view should advance to the next task and add his text notes as a bullet in the document, beneath the task just completed. Cursor’s Agent (with claude-sonnet-3.7-thinking) built that on the first try, without any extra input on my part. I was honestly quite stunned. Taylor’s desired workflow is idiosyncratic enough that no general plugin exists for it, but the agent could make it in just a few minutes. And, critically, this isn’t a separate app; it’s integrated into an environment where Taylor does serious creative work.

Next, I asked for a plugin which continuously tallies up all the new words I write, records them in a CSV (updating the current day’s entry as necessary), and shows an unobtrusive celebratory banner every time I add another 250 words. One shot—done. No manual edits or iteration needed.

These examples begin to show how far it’s possible to tailor my writing environment without any programming. But I’m interested in malleability at the level of the writing surface itself. You could implement my examples in Emacs, or with a graphical sidebar via Google Apps Script. What’s so interesting about Obsidian is that the editor itself can be modified to include custom behaviors and interface elements. The trouble is that those plugins are much more complicated to build. I’m an experienced engineer, and I find them very difficult to write correctly by hand.

Potluck was one admirable attempt to let users add their own interactive behavior to text editors, without traditional programming. It introduced a shared representation for the desired behavior, riffing on spreadsheet cells:
Of course, since we’re still stuck in the app-centric world, this solution required the authors to build their own custom text editor. You can’t bring Potluck’s behavior to your own preferred editor. You can’t bring your preferred editor’s behavior to Potluck. And you’re constrained by the primitives they provide: a few basics like buttons, sliders, timers, etc.

The Potluck report includes many interesting examples of custom interactive behavior. So I decided to test the AI-generated Obsidian plugin workflow by replicating the first four featured examples. To my great surprise, Claude Code successfully implemented all four examples with zero or minor feedback, and with no expert intervention (see more details in Appendix 1).

Plant tracker lets you track when your plants need to be watered. It colors red any plants which are due for watering. When you’ve watered a plant, you can click a button to update the date in the document.

Agenda lets you plan in terms of agenda item durations, then see how those turn into actual times once you set a start time. Note that the blue text is just an interface decoration—your durations are revealed when you move your cursor into them.

Cash register lets you specify items for sale as emojis with prices, then “ring up” transactions and show a total. Again, this blue text is a non-editable dynamic interface decoration.

Timekeeping lets a freelancer track their different projects as separate headings in a text document. It adds buttons to the editor for starting and stopping billable work. Again, this colored text is a non-editable dynamic interface decoration.

Inspired by these successes, I attempted two more difficult plugins, neither of which Claude Code could implement:

Footnotes as marginalia: Display the contents of Markdown footnotes in the margins next to their reference points, adjusting the vertical positions to avoid collisions as necessary.
- Claude Code couldn’t get close on the layout. I think that if it had a tool which let it interactively query the DOM and rendered output, it would have been able to do this. That intuition is based on watching it build the project, look at the build errors, and make appropriate changes to fix those errors.
Book covers: If I add the “book” tag to a note, find its cover (using the filename as the query) and display it as a “cover image” at the top of the editor when the note is opened.
- I deliberately didn’t specify an API to use for finding those images. Claude Code stumbled here a bit until I insisted that it just pick one. But then it didn’t know the correct structure of the response data. I think that it could have succeeded if it had been permitted to search the web for suitable APIs and their documentation.
- The agent’s solution also struggled to interpret the varieties of filenames I might use to specify a book: e.g. with or without an author, with or without a dash to separate the author from the title, maybe using an abbreviation for the author’s first name, etc. I think it could have fixed this with some iteration if I gave it a bunch of examples, or if one could find a “fuzzy” book search API.

Some lessons from Obsidian

I’m not saying that if we rub some AI agents on Obsidian, we’ve suddenly solved this longstanding problem. Rather, I think it’s worth examining what makes Obsidian such a suitable substrate for this kind of idiosyncratic personalization. I hope that can inform future work that’s more explicitly focused on complex surfaces (like Obsidian’s editor) central to serious work.

The most decisive architectural decisions here weren’t made by the Obsidian team. Its extensible editor is an open-source library called CodeMirror; Obsidian exposes that editor for plugins to modify. CodeMirror has an unusually complex—and unusually thoughtful—programming interface for modifying its behavior. Its author, Marijn Haverbeke, has refined that interface over many years of working with developers trying to use it in unusual ways. It’s very difficult to design a system like CodeMirror. You have to find a way to express the behavior of your system in an extremely general and composable way, so that extensions can modify that behavior consistently and without interfering with each other. But now that CodeMirror’s programming interface has been refined, I can absolutely imagine translating its core concepts to, say, a Figma-like canvas or an Excel-like spreadsheet.

Malleable software researcher Clemens Klokmose proposed this list of six properties we might like in our truly personal dynamic media:

malleable: users can adapt tools to fit idiosyncratic preferences and uses
shareable: multiple users can work on the same data, using their own views and tools
distributed: the data and tools can be used on different kinds of devices, in different situations (desktop, tablet, phone)
composable: the user’s preferred tools can be combined into workflows well adapted to their activity
interoperable: different tools can interact with the same data, so that the user can always use the “right tool for the job”
computational: the system behaves and responds dynamically

Obsidian provides key infrastructure which allows it to perform surprisingly well as this kind of substrate:

malleable: An extensive plugin programming interface, combined with Claude Code, provides rich malleability.
shareable: Obsidian has built-in support for collaborative editing, including plugin sharing. Plugins sync, but each user can choose which ones they enable. It’s monolithic, though: custom behavior (e.g. the plant tracker) is attached to the “vault”; there’s no way to “attach” it to the plant tracking document and share just that document. The collaborative editing support is pretty simple—it doesn’t respect any semantics of rich syntax or plugin syntax. But it’s good enough for a lot.
distributed: Obsidian runs on tablets and phones. The programming interface includes ways to customize behavior for those devices, and it applies reasonable defaults. Plugins run on mobile devices without any extra intervention. I was startled when the first plugin I built “just worked” on my phone: we don’t usually see rich user-defined behavior in apps on our locked-down phones.
composable: Obsidian’s interface language of tabs, panes, sidebars, command strips, and so on provides a lot of latitude for composability. But there’s still a significant boundary between Obsidian and the other applications (Figma, Slack, etc) you might use in a single activity. Its support for non-text documents is limited.
interoperable: Obsidian vaults are just folders of Markdown files. A huge ecosystem of tools support that format.
computational: Obsidian plugins can dynamically behave and respond in complex ways, as the examples I’ve given illustrate.

Of course, I want to be able to work this way not just with my text documents, but with my design work, and with my sheet music, and with my books, and so on. I think the interesting thought experiment here is to imagine if one’s whole operating system behaved more like Obsidian in this way. Actually, like its spiritual predecessor Emacs, Obsidian already does many operating-system-like things. Note that it also provides for all plugins:

a shared data layer to read from and write into
syncing across devices
a custom metadata layer for documents (Markdown “frontmatter”)
a themeable style language (semantic colors, type, spacing, etc)
standard interface concepts and implementations (modals, tooltips, popovers, sidebars, notifications, the command palette, keyboard shortcuts, settings)
a system for configuring, updating, and sharing plugins

Important missing conceptual pieces

Potluck includes quite a lot of beautiful conceptual design work to make it plausible for non-expert users to create the examples I showed above. I replicated its featured examples without any of that machinery—but that doesn’t mean all that design work is unnecessary. On the contrary, I believe the biggest problem with the agent-driven route to malleability is the lack of a shared representation of system behavior.

When I was demonstrating the plant tracker to my wife, I created a sample entry which read “every 2 weeks, last watered on 03/01/2025”. The plugin didn’t work. I knew I must have made a mistake in the formatting, but I wasn’t sure where. It turned out that the plugin only supported “every N days”—not “weeks”. I figured that out by copying and pasting the original Potluck example document into my editor and poking around. By contrast, if I’m confused about the behavior of a document in Potluck, I can view this representation of the system’s behavior:

This interface makes the situation much clearer: the plugin looks for “every {number} days”. I can see right away that it won’t support “every 1 day” or “every day”. In the agent-based workflow, if I want to support weeks, I could ask the agent to add that, but I could only understand whether it’s working (and whether the “days” format still works after that change) by trying it out in the editor. It would be better if, as Potluck co-author Geoffrey Litt has suggested, as you asked the model to make that change, you could see the impact directly on a shared representation of the system’s behavior.

I don’t think this is a nice-to-have. I think that our understanding bounds the complexity of systems we can create—even if an LLM is doing the programming. If we can’t really understand a system’s behavior, except through trial and error every time a change is made, problems and confusions will pile up. It will become more and more difficult to change the system predictably.

In a recent talk at Stanford, Haijun Xia made a similar argument. He proposes that AI-augmented malleable systems should have user-modifiable representations of both the activity (e.g. freelance time tracking with separate projects and totals) and also the interface (e.g. the start/stop buttons, the styling on the computed hours display).

In the talk, Haijun demonstrates a project called Jelly led by Yining (Rima) Cao, one of the students in his lab. Jelly shows the user an explicit shared representation of the activity, visually indicates how it maps onto a generated interface, and highlights changes to that specification over time.

A key challenge here is that if we want our custom interfaces to involve behavior, then in the limit, this specification must be Turing-complete. I can ask for a button that does anything. How can we clearly represent arbitrary behavior to the user? Aren’t programming languages created to do exactly that? If so, are we back to solving the problem of helping non-programmers understand how to program?

Perhaps one meaningful difference is in the precision that these users need. Programming languages are designed to let engineers specify exactly what should happen. “Higher-level” languages leave more details to the computer; “lower-level” languages take control of more details. Perhaps LLMs can enable an even higher-level language which specifies “just the important details”—an admittedly undecidable category—in English. A user would give a short description of what they want; the model would synthesize this more detailed shared English representation; then it would generate actual code which implements that intermediate representation. Conceivably, a user could read, understand, and modify the details of that shared representation. (See Appendix 2 for an example of that for one of the Potluck samples.)

But such a representation is still abstract. For the Timekeeping example, it might include “Identifies lines formatted as "[duration] minutes [description]" (e.g., "30 minutes Introduction”)”. But suppose that you wrote “30-45 minutes for questions”, and you were confused why that agenda item didn’t get included properly. I think the ideal here is that you could directly manipulate specific instances of system behavior to contextualize changes to the system. For example: selecting that text in your document, and asking “Why doesn’t this show up as an agenda item?” or “Make time ranges work too.”

My instinct here is that if one looks back through the decades of work in end-user programming and reconsiders those ideas with an eye to the new capabilities of LLMs, one would find many interesting possibilities to try.

Thanks most importantly to Taylor Rogalski for enthusiastically trying to make LLMs write Obsidian plugins. I’m so engineer-brained that I doubt I’d have seen this letter’s central insight without his example.

Thanks to Geoffrey Litt for influencing much of how I think about malleability and LLMs, and for his paean to browser extensions, which I’m sure shaped my thinking here quite a lot. Thanks to Haijun Xia and Adam Wiggins for conversations which helpfully prodded my thinking on this topic. Thanks to Marijn Haverbeke for his remarkable work on CodeMirror’s API design, and to the Obsidian team for building a uniquely flexible substrate around it.

Appendix 1: Potluck example replication details

Cursor Agent failed to implement my first two tests, so I switched to Claude Code, which did very well. They’re both using sonnet-3.7-thinking, so it’s possible that this is just a matter of luck. I haven’t tested that thoroughly.

Some minor details and caveats on the replications:

Plant tracker worked perfectly in one shot.
Agenda needed to be told to fix the build errors, to be told that a particular subfeature wasn’t working, and to be given some console logs to help it fix that subfeature. Future models will surely see build errors and console error logs, so the only real user intervention would be saying “this bit doesn’t work”. More speculatively, UI automation could solve that too.
Cash register needed to be told that a subfeature wasn’t working (but no technical detail).
Timekeeping needed to be told about a bug (but no technical detail).

I should also note that Claude’s implementation isn’t styled nearly so well as the original authors’!

For those interested, I’ve reproduced below the Claude Code prompts which produced the Potluck replica interfaces. I’ve also included all of the “bug reports” and feedback I gave. Some of these prompts include a fair amount of detailed guidance. I haven’t tried reducing them to see if that guidance was necessary. It would be interesting to find the minimal guidance needed.

Plant tracker

Implement an Obsidian plugin called "Plant tracker".
  * It will use a CodeMirror editor extension.
  * If a line in the editor begins with the 🌱 emoji, then search that line for a string like "every 5
   days, last watered on 10/31/2022". Notice the interval ("every 5 days") and the date 
  ("10/31/2022").
  * Underline the "every N day(s)" string and the date.
  * If the date is recent enough (i.e. it's fewer than N days ago), color it green. If it's too old, 
  color it red.
  * Insert a small inline button widget after the date. The button's text is the 🚿 emoji. When I 
  click the button, it changes the date to be today's date.

Agenda

> Implement an Obsidian plugin called "Agenda".
  * It will use a CodeMirror editor extension.
  * Look for a line which contains a time, like "19:00". We'll call this the start time. Underline the
   time in blue.
  * Look for a line which begins with a duration, like "30 minutes Introduction". We'll call this an 
  agenda item. Without altering the document, visually replace that string with the times it would 
  take place if the meeting started at the start time.
    * Write that replacement in blue text.
    * So, for a start time of 19:00, if the first item is "30 minutes Introduction", it would display 
  as "7:00 PM - 7:30 PM Introduction". If the next item is "15 minutes Snacks", it would display as 
  "7:45 PM - 8:00 PM Snacks". Note that the times accumulate.
  * If the user's cursor moves into an agenda item duration, display the original duration as an 
  underlined string--so the user can edit it directly--and move the timing preview which replaced it 
  into a floating tooltip above the duration.

---

app.js:1 Plugin failure: obsidian-agenda TypeError: Cannot read properties of undefined (reading 'decorations')
    at Object.provide (plugin:obsidian-agenda:216:39)
    at e.define (app.js:1:277778)
    at anonymous (plugin:obsidian-agenda:198:53)
    at e.<anonymous> (app.js:1:2267896)
    at app.js:1:239667
    at Object.next (app.js:1:239772)
    at a (app.js:1:238490)

---

Try to build; fix the errors.

---

> OK. The start time and time range previews work. But when I try to move 
  the cursor into a duration, I get this console error:

  Uncaught Error: Ranges must be added sorted by `from` position and 
  `startSide`
      at e.addInner (app.js:1:300017)
      at e.add (app.js:1:299791)
      at createDecorations (plugin:obsidian-agenda:182:15)
      at e.update [as updateF] (plugin:obsidian-agenda:206:14)
      at Object.update (app.js:1:278099)
      at e.computeSlot (app.js:1:288949)
      at We (app.js:1:282043)
      at new e (app.js:1:287918)
      at e.applyTransaction (app.js:1:288904)
      at e.get (app.js:1:284047)

Cash register

 Implement an Obsidian plugin called "Cash register".
  * It will use a CodeMirror editor extension.
  * Look for strings like "<emoji> = $<price>", e.g. "🍰 = $4". These strings specify the prices of 
  items. Underline these strings and track the mapping between emoji and dollar values.
  * Look for lines consisting of only one or more emoji in sequence. These lines represent sales. 
  Without modifying the document, display the total dollar value of all the emoji on the line in blue 
  text as a decoration at the end of the line.
  * Look for a line which says "Total: ". Without modifying the document, display the total dollar 
  value of all sales lines in blue text as a decoration after that string.

---

Great! The price and sales lines seem to work, but the "Total: " line doesn't show a decoration. My test document:

🌱 = $3
🍰 = $4

🌱🍰🍰

Total:

Timekeeping

> Implement an Obsidian plugin called "Timekeeping".
  * It will use a CodeMirror editor extension.
  * It's an interactive timekeeping tool for plaintext documents.
  * Each H1 heading in a document represents a separate project.
  * Without modifying the document, add a "start" button widget to the end
   of each H1 heading line.
  * When the start button is pressed, it inserts a line immediately below 
  the heading like "13:11 -" (where 13:11 is the current time).
  * Anytime an open time range like that ("HH:MM -") appears, display the 
  current time in blue text as a decoration after the hyphen, without 
  modifying the document. Display a "stop" button after that. When the 
  stop button is pressed, the current time is inserted with a space after 
  the hyphen, so that it will read e.g. "13:11 - 13:30".
  * Anytime a closed time range ("HH:MM - HH:MM") appears, display the 
  total number of hours in the time range (rounded to at most two decimal 
  places) in blue text as a decoration after the time range, without 
  modifying the document.
  * Anytime the string "Total:" appears, display the total amount of time 
  in the closed time ranges between that position and the previous H1 
  heading--i.e. the total time worked on that project.
  * When you think you're done, build the plugin and fix any errors.

---

> Looking good. Just one bug: if I have two projects, and I click the 
  first project's "start" button, the new entry is inserted below the 
  second project's heading.

Appendix 2: Intermediate-level English specification example

For the Timekeeping example, I asked Claude Code: “Think carefully to produce a detailed outliner-style explanation of this system's behavior. It should specify the behavior clearly enough that a user can understand its functionality and interaction design in detail. But assume the user is non-technical, a non-programmer, so don't surface implementation details. Keep it at the level of conceptual and interaction design.”

I’ve included its result below. It’s perhaps a decent start, in that it seems complete. But it’s quite a lot of fairly undifferentiated text to read, even for this relatively small plugin. I don’t know that this would scale very well to a system with moderate complexity.

  Purpose

  - A visual agenda time calculator that automatically transforms meeting
  agenda items into a scheduled timeline
  - Helps you plan and visualize meeting schedules without manual time
  calculations

  Core Functionality

  1. Start Time Detection
    - Recognizes a time in 24-hour format (like "14:30" or "19:00") as the
   meeting start time
    - Visually highlights the start time with a blue underline
  2. Agenda Item Processing
    - Identifies lines formatted as "[duration] minutes [description]"
  (e.g., "30 minutes Introduction")
    - Automatically calculates start and end times for each agenda item
    - Builds a sequential timeline based on the durations
  3. Visual Transformation
    - When viewing notes: replaces duration text with calculated time
  ranges (e.g., "7:00 PM - 7:30 PM")
    - Shows times in 12-hour format with AM/PM indicators
    - Maintains the original description text
  4. Interactive Elements
    - When editing (cursor on a duration): shows the original text (e.g.,
  "30 minutes")
    - Displays a helpful tooltip with the calculated time range
    - Highlights the duration with a blue underline while editing

  User Experience

  1. Creating Agendas
    - Write a start time in 24-hour format anywhere in your note
    - List agenda items with durations (e.g., "45 minutes Discussion")
    - The plugin automatically calculates and displays the time schedule
  2. Editing Agendas
    - Original text remains when editing for easy duration adjustments
    - Timeline automatically recalculates when durations change
    - Non-destructive approach preserves your original markdown
  3. Viewing Benefits
    - Clear visual representation of meeting flow
    - Easy identification of when activities begin and end
    - Helps with time management and meeting planning

#manuscript/patreon