Natural language as UI

Updated:

Current UIs are mostly graphical (point and click). It served very well (in comparison to DOS command line). But it is costly to develop, and once made, it is rigid. Adding a button, option, or new function is a constant fight between a clear, intuitive interface and a cluttered “what-does-this-even-do” UI.

Search google for "word toolbar cluttered" to see what I mean.

What if we could just ask? "Intent left". "Make it contingency table". If something more is needed, the UI will ask ("Which column to use?").

Clicking or using keyboard shortcuts may be faster for some users, but not for most.

This approach has one great feature. You do not have to do anything to the UI when you add new functionality. Yesterday: "Make background gradient from red to orange." -> "I cannot do that" to today's ... perfect gradient background added.

Ah sure... it is not so straightforward and many important problems have to be solved and fine tuned. But it is possible. The early pioneers were command palettes in IDEs (developer tools).

We have AI today. With very good understanding of the natural language (without barrier of different languages). We can tell it which tools are available and how to use them (oversimplification). And this is exactly what MCP (Model Context Protocol) strives to introduce as a potential standard.

So once your app adds "add_background_gradient" as a feature, you can describe it and update the available tools and tell MCP client (like Claude) to use it.

I'm working on a demo to show this. It is simple, very limited image editor that works in browser. It has literally no UI controls. You can control it via Claude Desktop (or other MCP client).

Some actions would be better done “manually” instead of through chatting (like panning or zooming). For the sake of the demo I will not implement this, only pure chat interface. But it would not be that hard at all.

There are interesting benefits to using AI for such (simple) task like manipulating the image manipulation.

The first one is workflow. You can tell "zoom on the head of left dog". It will perform a workflow that consists of getting snapshot of what you see, identifying the left dog and then issuing "pan" command and "zoom" command.

Another is merging the vast knowledge AI has with this simple tool. So you can say "I need passport photo from this picture". And it can locate face, knows the requirements for passport photos, knows the dimensions (or ratio) of a passport photo (it may ask you for which country) and then do all of this and ... vzzzzum, image is downloaded to your computer. No need on the tool side to know the passport photo requirements.

I hope to release the demo in a week or so. Interested? Just drop me an email.