Primer
We want to remove the unnecessary user friction that exists in everyday life. Thousands of hours of brainpower are burned doing menial tasks like 'buy me a ticket to the zoo'. The current meta for executing this task is:
Have the intention of wanting to go to the zoo
Find the zoo website (useless)
Find the ticket you want (useless)
Create an account (useless)
Enter card details (useless)
2FA card (useless)
You get the ticket
That's five unnecessary steps that waste your time and keep you from what you want.
There's one way you can automate this today, and that's by having another human that works for you. They can abstract all of the complexity contained in steps 2-6. You tell your assistant, "buy me a ticket to the Taipei zoo," they figure out the rigmarole, and send you step 7.
Why can the human do it? Humans are the API format that the website was designed for. They have an identity (or can pretend to be you), and have access to payments. They also have good error handling so they can figure out anything unexpected that comes up.
It's just now becoming possible to automate away tasks like this with AI + Blockchain.
Some teams like Humane and Rabbit have tried to automate different menial tasks, but the approach they are taking does not work well. They have a hardware device that you interact with and have AI-Agents that navigate the world like humans. So for the zoo example, the AI-Agent would do a Google search for the zoo website, open it, and then analyze each of the pages to decide what to click on next. This approach is slow and unreliable. There was a high level of demand for their products, but their launches failed because the product did not work well.
So... what does a realistic solution to this problem look like?
There are a few things that need to be in place to solve the zoo issue:
Way to turn human intents into commands (possible now with AI)
Abstracted user identity that can be shared (possible with AI agents)
A way to pay in an API call (crypto)
Open backends (blockchain)
The idea I like the best is having a chat interface that is able to sit in front of applications and handle executing your intents. There's an AI-Agent that is able to figure out your intentions and which interactions are needed in order to execute on your intentions. For this to work, the AI-Agent would need to be able to communicate directly with different applications.
A few examples of what this could look like:
Find me a ride to the LAX airport
I don’t care what company it is
I want a good price and safety
Your intent is shared with the network (I want a ride from X to Y with whatever other qualifiers)
Solvers compete to execute your intent (who can find a ride that fits your criteria at the best price)
A smart contract call is made (with payment) to the company that fulfills the ride. Any required identity credentials are also shared.
User experience:
Expresses intent through an interface, chat, voice, etc "Find me a ride to the LAX airport."
They are shown a preview of what will happen if they accept.
They confirm and have a ride coming to pick them up.
(How to deliver a similar experience to users today like when they are in the app and can call for help, etc?)
Ride share companies have a smart contract that can be called with money to request a ride.
Here are a few things that can be made better with this process:
Buying any type of ticket:
Buy me a ticket to iFLY for Saturday for 2
Find and book me an escape room for 2
Find and schedule whale watching
Figuring out transportation:
Get me a flight on the 29th from LAX to EWR
Order me a ride to LAX
Figuring out housing:
Get me a hotel near the LAX airport for the 29th
Ordering food:
Order me something veggie and healthy
Booking appointments:
Setup a massage for me
Setup a dentist appointment
Meeting people:
Find a time to meet with X and try and make my meetings back-to-back if possible
Put OOO on my calendar when I have flights
There are a few issues with building some sort of assistant that can execute these for you:
Ticket places don't accept it. So you try and it fails because that company is user-toxic
Less options through the intent executor which makes it not worth using
So maybe the place to start isn't the AI agents that execute the intents for their users and instead is something to help companies open up their backends.
If we wanted the zoo to open its backend, how would we do it? It's likely staffed by people who don't care about the human attention that is being wasted purchasing tickets. Technically the solution would be simple, but socially...
Technically we can add a system to the zoo booking website that:
Exposes a way to purchase tickets through an API or RPC call
Allows a program to call it passing in the user identity, what they want to purchase, and a payment.
There is a cold-start problem here where the value prop is low if there are no demands for AI agents (won't boost sales) and no AI agents (because no open backends).
However, there is an industry where this cold start problem doesn't exist. Crypto. All of the smart-contract backends are already open by default. Theoretically today you could create an AI assistant that has read in all of the different smart contract backends, can take in a user intent, and create the smart contract calls required to execute on that intent.
Here's what a simple example of what the architecture for intentless could look like:
A single user interface where a user can interact with anything in crypto
They express their intent, an AI assistant that has been trained on all the open backends & turns their intent into the smart-contract calls that need to be made
The user sees a preview of what will happen and confirms or denies.
Problems
This works really well for a single protocol as you can load the ABI and all the context into the agent. Then the agent can go from command to transactions pretty well. The GPT-4 context window isn't large enough for a single agent to be able to absorb every single DeFi protocol. Seems like we need some intermediary agent that knows all the other agents and everything that they do in some sort of AI tree structure ...? "Orchestrator AI" that interacts with a SWAP AI, that then interacts with a vault AI..?
Use Cases
Use Case 1
There are also some verticalized products that could be made a lot better by using some sort of intents-based framework with an AI. One obvious example here is for trading tokens. There are some telegram bots that have garnered significant adoption and revenue from trading fees (Banana Gun, Unibot, Maestro). These applications still have a rudimentary user experience where users can't type what they want to do, instead they have to click different commands in a telegram bot interface.
Click 'Trade token'
Manually enter wallet address of token to buy/sell
Click button on amount
Click button on price
Confirm and trade
While that's better than having to go to an application and do the same flow, but also sign the transaction, it leaves a lot of room for improvement. The current flow is much more clunky than:
Type "Buy 1000 USDC worth of ETH at the current market price"
Bot shows preview of the transaction, confirm or decline.
Use Case 2
Another use-case where this is useful is coordinating across many different applications. For example, I recently wanted to move my assets from Avalanche to Arbitrum. The process was:
Had to search for bridges, eventually found the official Avalanche <> ETH bridge
Had to search for a way to go from ETH to Arbitrum, and found the official Arbitrum bridge
I started a bridge to ETH, which took 10 minutes, I had to be on high alert and wait for it to complete before I could do the next transaction to bridge to Arbitrum
Once I had my assets on ETH I couldn't bridge them because I only got 'wrapped ETH' from the bridge and not ETH.
I had to transfer myself ETH from another wallet to do the bridge to Arbitrum
I initiated the bridge from mainnet ETH to Arbitrum
While the process instead should be:
Bridge 1 ETH from AVAX to Arbitrum
Be shown what will happen, and confirm.
Then in the background, it handles all of the steps automatically. How this could work is that we have an AI agent that is given the context of all the different bridge contracts and is able to construct the optimal routes. For this specific example, an AI agent would know about the AVAX->ETH bridge, and would know about the ETH->ARB route. It would then interpret the user's intent, construct two transactions (for the AVAX->ETH) then for the (ETH->ARB). It executes the first one immediately, then once the AVAX->ETH transaction completes, it would lend the user ETH for their bridge to Arbitrum and automatically execute that transaction as well. Then once it's successful it would send back the ETH that was lent to it on the ARB network.
Use Case 3
DCA
User wants to sell or buy a certain amount of an asset daily. Currently, the only way to do that is to manually go in every single day and execute the sell transaction.
With intents, you could say 'sell 100 USDC worth of ETH every day' and it would continue until you cancel it.
Our First Stab
We created a simple chat interface for interacting with Uniswap. When the user visits the site, we set up a new wallet for them that is controlled by their AI assistant. This is so we can easily execute smart contract calls on their behalf.
On the backend, the application uses GPT-4 with a context window of 128k. We used a majority of that context window to load up GPT-4 with the needed context to be able to execute correctly. We fed GPT-4 the methods it was allowed to call and not allowed to call via context, and instructed it to output a list of JSON objects. We then used GPT to take human language "swap 5 GGP for USDC" and turn it into the proper transactions that need to be executed to execute on that intent. We also used Tenderly's simulation framework to simulate what would be done for the user, before they confirm it and execute the transactions.
It works pretty well, but had a few hiccups:
Seems potentially really expensive, like $1 per intent, but needs more testing to confirm.
Next Steps
I think we should start building the product in crypto around a verticalized use case. The memecoin trading is the most interesting and promising at the moment and can easily cash flow due to the ability to take transaction fees. It also fits well into the distribution model we have with intentless.
Miscellaneous Notes
There's a buzzword called "LAM" which stands for large action model. It sucks though.
We have developed a system that can infer and model human actions on computer applications, perform the actions reliably and quickly, and is well-suited for deployment in various AI assistants and operating systems. Our system is called the Large Action Model (LAM). Enabled by recent advances in neuro-symbolic programming, the LAM allows for the direct modeling of the structure of various applications and user actions performed on them without a transitory representation, such as text. The LAM system achieves results competitive with state-of-the-art approaches in terms of accuracy, interpretability, and speed. Engineering the LAM architecture involves overcoming both research challenges and engineering complexities, from real-time communication to virtual network computing technologies. We hope that our efforts can help shape the next generation of natural-language-driven consumer experiences.
There's an idea of 'generative UI' which has AI automatically create a UI interface based on the backend code.
Last updated