April 2024

April 2024

From: Brian and Tobias

Subject: AI agents - hype or not?

Welcome back to the April edition of the B&T infra newsletter. Tobias and I just came back from a week in San Francisco, which included RSA and getting to see some of you!

We started the year talking about the POC chasm, and we believe it’s being crossed. As companies like Servicenow and Palo Alto Networks have shown, sophisticated enterprises are finding real value from LLMs. A recent Insight piece highlighted a few pretty compelling examples of enterprise value from LLMs as well.

And yet, to justify the current hype and investment into Gen AI, we need so much more. Digital experiences need to be transformed, near unthinkable value needs to be created, and automation needs to be supercharged. One promise of getting there is agents.
The promise of agents has been grand, but the results so far are a let down. Why is that the case, and what needs to change in the future? Let’s dive in.

‍

What are agents?

People have varying definitions of AI agents. The one from AWS is pretty standard: An artificial intelligence (AI) agent is a software program that can interact with its environment, collect data, and use the data to perform self-determined tasks to meet predetermined goals.

The one we’ve been going with is even simpler: an AI that can navigate through multiple tasks in order to achieve a goal.

The key here is that agents can theoretically handle ambiguity and reason about how to achieve a goal, unlike bots which perform straightforward directives. Good examples of agentic tasks include booking a flight on the internet or resolving a customer support ticket.

Although there has been plenty of hype around agents, we haven’t really engaged with them in a meaningful way yet. Instead, our experience of Generative AI so far has been via a simpler command-and-perform style. We can ask ChatGPT to write a speech or have Sora create a 10 second clip. However, we are yet to experience agent-like workflows where an AI can actually perform a complex task start-to-finish on our behalf.

There are some companies and people working on building productive, useful agents. Companies like Cresta (customer support), Hippocratic AI (healthcare) and Norm AI (compliance) are building agents in a B2B context, while there are other companies, like MultiOn, trying to build agents that are more consumer-focused (MultiOn is also building agent infrastructure to enable companies to build agents for their end-customers too). The most notable AI agent to recently come on the market is Devin, a powerful full-stack coding agent that went absolutely viral on Twitter. Overall, the volume of agents to have come out in the last year is staggering. Just look at the agents section of Insight’s latest AI market map.

‍

There was so much hype – what happened?

Still, we are in the early innings and the agentic products that have hit the market so far have been a disappointment. Six months ago, techno optimistics and all of Silicon Valley were gearing up for an inevitable agentic digital world. Now, the promise of agents remains elusive – so far, they have been a let down.

Agents burst onto the scene in amazing fashion, and the open source projects and Twitter demos wowed. The best illustration of this craze was AutoGPT, which is an open source project meant to ingest natural language instructions and complete complex tasks. The project reached a staggering 100k Github stars in fewer than three months. Check out the below chart comparing its Github stars over time to Pytorch!

Despite the fanfare, users of AutoGPT quickly started getting frustrated and complaining about issues. This reddit post is pretty representative of where the consensus shook out:

‍

‍

Even more recent and well-capitalized agents have been flawed. Devin, which is the most advanced coding agent to hit the market, has been tested with varying degrees of success, but definitely lots of failure. In one Youtube video that went viral, Devin fails to solve simple Upwork tasks. A more complete description of the many ways Devin messes up was cataloged in this Hacker News post.

OpenAI released its GPTs product in November, which was a first attempt at agentic workflows, via both partnering with third-party companies and sourcing from the community. This release has seemingly fallen flat. There are reports that OpenAI is developing agents, but time will tell how robust and effective they are.

Up until now, agents have been a lot of fluff and no substance. Hype and excitement, without use cases and value. And this is only on the consumer side. We’ve also heard very little about adoption of agents within the enterprise to drive real value and automation.

‍

We are still missing key infrastructure and tech

The obvious question here is: why? What’s holding us back? Our general thesis, inspired by people like Gavin Uberti of Etched, is that we need better infra on both the hardware and software sides to make effective agents a reality.

Benedict Evans recently wrote a piece on use cases in Generative AI. One instructive framework from this piece, among lots of great insights, is how to think about the form factor that Gen AI applications take in the future. Ben outlines a few possibilities:

A master, multi-modal agent (e.g. GPT-5) that can perform complex agentic tasks on its own. This is “one model to rule them all.”
Multi-agent collaboration, where more use case-specific agents can work with each other when a job extends outside of a single use case or domain. There is growing research to suggest this approach has legs, and is compelling because it is not reliant on the development of a single AGI-like model.
Agents never really get smart enough to do complex tasks on their own, and we end up needing to interact with them via a GUI or an application that limits the set of instructions we give them, looking more like a script or a bot in a specific vertical. This result is effectively a slightly more automated version of what we have now — apps for specific use cases controlled by humans.

‍

Hardware:

The most fundamental thing we need for better agents is more powerful models. Experts we talk to from places like NVIDIA and MosaicML agree the models simply aren’t good enough yet at understanding enough context to gather requirements to complete a goal and then navigate towards the completion of that goal. This is true for even seemingly straightforward tasks (see Benedict Evans example below).

‍

‍

Maybe GPT-5 will be able to produce much more intelligent agents. GPT-5 requires more hardware for training. Getting to GPT-6 and beyond will require even more, and potentially new data centers to support even higher volumes of training. Supported by conversations we’ve been having with academics at places like Harvard, we believe that underpinning a need for better models is actually a need for more and better hardware. This point of view aligns with the vision of massive, all powerful models requiring unprecedented compute.

In an alternate future, agents come to be via many specific task-based agents that can collaborate with each other. In this world, rapid inference on these tasks is critical, and there needs to be hardware to support these interactions and different agents working together. In part, this is the future we believe Etched can unlock.

‍

Software:

Most of what we’ve seen recently, however, has been on the software side, where we need many improvements as well. Think about your experience on a website. You want to book a flight. You know the context of a website like kayak.com, which helps you navigate to the right sub-pages and search for the right trips. You may want to compare flight prices on different days, so you switch back and forth and maybe open a couple of different tabs. You already know how Kayak works, so you know the right way to navigate it.

Agents have none of this reasoning right now, but they need it in order to act like human beings. We spoke with one founder who is trying to build an indexing tool that traces common task-completion patterns across websites and provides that “memory” to agents, theoretically improving performance.

Another area we’ve spoken about with a couple of founders is security and permissioning. In a world where agents are performing tasks on a machine or in the browser, they will need to access third-party applications and services. This access will require permissions and authorization, and a bunch of security functionality to make sure certain data is handled carefully and treated differently than other data. To use security jargon, there likely needs to be an identity and access management (IAM) layer for agents. Anon is an interesting company taking a dev-focused approach in a similar space.

Finally, one more infra area we’ve considered is testing. Developers will not be able to deploy agents until we have testing infrastructure and observability to understand how they will perform. Today, it is extremely difficult to simulate agent behavior in non-production environments, but that needs to change. We’re talking to yet another company working on solving that problem.

‍

What are we looking for?

At a high level, we’re excited about talking to founders who are solving the hardware or software problem, which more concretely means making compute more accessible or efficient, or solving problems that engineers face building agentic apps and experiences.

We’re also intrigued by founders working creatively to inject automation in new places, even if it is not technically agentic. As an example, we’ve been speaking to a founder building an AI-enabled integrations platform. The team is still early, but the solution is not only about Generative AI and LLMs. That is definitely a component, but it’s just one of many in addition to running more traditional scripts. The scripts are tied together in certain workflows by LLMs, and there’s a lot of manual hacking going on, but they’re making progress on an automation business by using LLMs as just one tool in the toolkit.

These solutions are interesting in the short- to medium-term because the technology is still evolving, and although agents aren’t really here yet, the LLM tool is a new and valuable one that creates new opportunities for automation. The path to agents will be gradual, and right now copilots and some domain-specific agents provide builders the opportunity to build new apps that solve new problems, even if they’re not the future we were imagining. This visual from Insight helps demonstrate this idea:

‍

‍

And yet, we believe agents will come. When this happens, they will change everything. We’re experiencing bumps in the road, which will continue as entrepreneurs work on solutions to the underlying hardware and software problems plaguing agents. We’re excited to invest in exactly these sorts of companies.

‍

As always, thoughts and feedback are encouraged. Thanks for reading.

‍

Best,

B&T

‍