I've been exploring how to describe UI layouts to LLMs efficiently.
The problem: When you ask an AI to generate or modify UI, how do you describe the current state?
- Natural language ("header on top, form below") is ambiguous
- ASCII art breaks when edited (alignment issues)
- HTML is precise but verbose
I ran some measurements. For a simple login form:
- Natural language: 102 tokens
- ASCII art: 84 tokens
- HTML: 330 tokens
I experimented with a grid-based text format using Excel-like cell references:
grid: 4x3
A1..D1: { type: txt, value: "Login" }
A2..D2: { type: input, label: "Email" }
D3: { type: btn, value: "Submit" }
This came out to 120 tokens – less than HTML, more precise than natural language.
Built a CLI to render it to SVG/PNG: npx ktr input.kui -o output.png
Curious what approaches others have tried for this problem. Is there something
I'm missing that already solves this well?
Code: https://github.com/enlinks-llc/katsuragi