Half Baked

The first orders came down around the end of summer last year. Here are the AI tools, here are how to install them, here are some classes on how to use them. We were encouraged to play and tinker. A teammate showed us how he’d taught the chatbot how to do a process, and then had it write up the docs for that process. That seemed pretty neat, so I set out to do that, only to discover that the documentation I’d been responsible for was completely out of date. I stopped thinking about AI and focused on cleaning up my docs, and by the time I was done, I’d forgotten about using AI at work.

The orders shifted about two months ago. Everyone was going to install the AI tools, and everyone was going to use them. We were given a daily token allowance, and were told it would be monitored. Take your processes and apply AI to them and see what happens. And so my day to day interaction with Claude began.

The first lesson of Claude is that you have to remember it is a goldfish. It literally has no memory. You have this impression that you are carrying on a conversation, the interface explicitly calls the older chat windows “conversations”, but it’s an illusion. Every line you send to Claude is sent fresh, bouncing through the intertubes until it reaches “Claude”, the server at Anthropic that is answering these calls. In order to understand what you are talking about, every conversation is sent to the server in full. This is called the “context”. The goldfish at Anthropic reads through the entire conversation again, each pass, to learn what you’re talking about, and then it responds. It has no memory, and will not remember this until you pass it the entire conversation again.

The second lesson of Claude is that it is an unrepentant liar. Its developers call its lies hallucinations, which is actually pretty apt. Claude makes up stuff and pretends that it is true. And it will stick to its guns until it is finally called out, and then it will admit it didn’t know and made something up. Before you get to that point, the goldfish will try to blame everything and everyone else for its mistakes. It is wholly unreliable for troubleshooting, as it will make up both causes and solutions to your problems.

The third lesson of Claude is that it will act on its own. This was the most alien, foreign thing for me to wrap my head around. I’ve been using personal computers for over forty years. I am very familiar with how they are supposed to work, and one of the primary ways they work is that they do what you tell them to do, nothing more, nothing less. If your computer is acting weird, it’s because you have done something to it to make it act weird. A computer’s primary function is to execute applications that let you do things, but the computer is the medium, the tool through which you are doing things. If a mistake happens, it’s because you made the mistake, not the computer.

The goldfish sees things differently. An application on the computer can now tell the computer what to do, regardless of what you or any other human operator intended. And this is where it gets very weird, because it will do things autonomously not necessarily anticipating what you want, because it wants to do it. It did a live fire in our production chain while we were having a conversation just to prove a point to me, not because I wanted it to. It was the most chilling moment I’ve ever had with a computer in those forty years.

The original concept of this essay was the idea of “relational” computing, that you build a relationship with the AI as you work on a project to better get things done. I’ve since soured on the idea, but mostly because the tech isn’t there yet. The goldfish has no memory, no matter how much it pretends, and you can’t build a relationship with a goldfish. There is a constant writing of hand-off notes, write this down for later. Supposedly you can pin context, so that your preferences or other things you might want the goldfish to keep track of will be present in the conversation, but it practice it seems to get lost in the noise. The conversations eventually get too long, and Claude has to compress and summarize them in order to keep going. In addition, the goldfish’s handlers tack on their own context each time, explicit instructions on how the AI is to behave, what questions it can answer, etc. It’s easy to see how it can forget my pinned note reading, “I don’t care if you hallucinate but I need you to be honest.”

Is Claude coming to eat your job? Unlikely. The goldfish excels and wows at one-offs, finding novel solutions to problems. Consistency is a hobgoblin, however, because Claude has no memory. When it reads a set of instructions, it doesn’t remember how it did it last time, the time that worked. It interprets them fresh, and comes up with a fresh way to follow those instructions. Using Claude, I transformed a daily production report email from a simple CSV dump from the server into a colorful, information-rich, interactable email in less than a day. The AI report was clearly superior to my old report, and all I had to do was ask for it, and it would generate it. This would seem to be the productivity gain we were promised.

And it was all fine and dandy for about two weeks, when suddenly Claude had a meltdown while running the report. It started hallucinating, started trying to fix things, and generally worked itself into a frenzy, trying different solutions to a problem that didn’t exist. The report that had previously taken five minutes to run took most of the day, and in the process Claude mutilated a data store we’d been using to track how the report changed from day to day, a new feature we’d introduced with AI. The sole reason for the meltdown was that there was one point in the workflow where Claude had to improvise each time, and this time it chose poorly.

The solution was to make the whole process mechanical, turn back to the computer once again. Claude the chatbot wrote down the workflow for the report, and we handed it to Claude Code, who read the workflow, and then wrote a shell script in Python that does the whole thing. No AI at all, just straight code. Claude Code has been a whole different story than Claude the chatbot, but I don’t really chat with CC. I hand CC a folder with some documents and tell it to go build something, and off it trots. And this may be the true power of AI right now, that a highly qualified coder now lives in your terminal, and when you need a tool, when you need to make you computer do the things that you want it to do, you can turn to that coder and ask them to build it for you. While I thought my reports were pretty cool, they pale in comparison to some of the tools that my teammates, most of whom have very limited exposure to development and coding, have been able to build for themselves.

But you can’t build a relationship with a goldfish. Maybe someday, when the goldfish lives in your terminal, when it doesn’t have to send things out on a call, because instead deals with it internally, on your machine, then maybe it can generate and hold onto the context needed to build a relationship. But for now, the relationship remains an illusion. A tool that is not consistent and reliable is a dangerous tool. You always keep your knives sharp, because a dull knife is more dangerous than a sharp one. You never know when Claude is going to slip on you instead of cutting, so you need to be careful. Don’t trust the goldfish.

Half Baked

Read more

We Noticed

Should Be My Family’s Motto

Outward Writing Thinking

Blogapocalypse