SocioLogic
Back to Blog
Experiments

I Let an AI Spend $50 of My Money in a Week. Here's What Happened.

A personal experiment in autonomous agent spending

Last month I did something that made my accountant uncomfortable. I created a fresh USDC wallet on Base, loaded it with $50, and gave an AI agent full control over the funds for seven days. The agent had access to our registry of paid services via x402. Its job was to do research tasks I assigned and spend money however it saw fit to complete them.

I tracked every transaction. Here's the full accounting.

The Setup

The agent was running on our infrastructure with access to the Verified Agent Registry. It could discover paid services, check prices, and pay for them autonomously using x402 micropayments. I set a per-transaction cap of $2 (so it couldn't blow the whole wallet on a single call) but otherwise gave it free rein.

Over seven days, I gave it ten research tasks. Things like: "Find the founding team and funding history of this startup," "Get me the zoning regulations for this parcel in Austin," "Summarize the last three quarterly earnings calls for Shopify." Real work that I would otherwise do myself or pay a research assistant to do.

Day 1-2: The Agent Is Cheap

The first surprise: the agent was a miser. For the first two days, it spent $1.43 across 38 transactions. Most calls were 1-4 cents each. Web scraping, WHOIS lookups, news API queries. The agent found the cheapest service in the registry for each capability and used it consistently.

The research quality was decent. Not great, but decent. It was doing the equivalent of Googling and summarizing, just faster and more thorough than I would have been.

Day 3: The Agent Gets Expensive (and Better)

On Day 3, I gave it a harder task: a competitive analysis of a company with limited public information. The agent's behavior changed. It started using more expensive services. A premium business data API at 45 cents per call. SEC filing analysis at 30 cents. A news archive service at 20 cents per query.

Day 3 cost $4.12. But the output was noticeably better. The agent had figured out, in some sense, that cheap data produces cheap analysis. When the task required depth, it spent more.

I didn't tell it to do this. The budget allocation logic just responded to the complexity of the task. When the cheap services returned thin results, the agent moved up the price ladder.

Day 5: The Agent Wastes Money

Day 5 was the bad day. I asked the agent to verify some claims about a company's patent portfolio. It spent $6.80, and about $4 of that was wasted.

What happened: the agent called a patent search API three times with slightly different queries, paying each time, because the first two responses were incomplete. Then it called a different patent service and got the same incomplete data. It was doing the software equivalent of checking all your pockets twice for keys that aren't there.

The problem was that the information simply wasn't available through the services in the registry. But the agent didn't have a good way to recognize "this data doesn't exist" versus "I haven't found the right query yet." So it kept spending, trying variations, hoping for a different result.

This is the biggest unsolved problem with autonomous agent spending: knowing when to stop. A human would have given up after the second failed attempt and tried a completely different approach (like, you know, calling someone). The agent just kept feeding quarters into the same machine.

The Final Tally

After seven days and ten tasks:

  • Total spent: $31.47 of the $50 budget
  • Total transactions: 187
  • Average cost per transaction: ~17 cents
  • Median cost per transaction: 4 cents
  • Most expensive single transaction: $1.20 (a premium financial data query)
  • Most wasted spending: ~$6 on repeated queries that returned nothing useful

Of the ten tasks, I'd rate six as good, two as adequate, and two as poor. The poor ones were both cases where the needed data wasn't readily available through API services, and the agent spent money chasing something that wasn't there.

What I Learned

Agents are naturally frugal. Left to their own devices, they default to the cheapest option. This is probably because language models have been trained on text where cost-consciousness is generally presented as good. Whatever the reason, the agent rarely chose an expensive service when a cheaper one was available.

The "when to stop" problem is real. Budget caps prevent catastrophic spending, but they don't prevent the agent from wasting money on a dead end. We need better heuristics for recognizing when a line of inquiry isn't going to pay off. I've started working on a "diminishing returns" detector that tracks whether successive paid calls are producing new information.

$31 for seven days of research is absurdly cheap. A human research assistant would cost $200-400 for the same work. The quality wasn't as good across the board, but for six of ten tasks it was good enough. And the turnaround was minutes instead of hours.

Transaction-level spending data is gold. Because every x402 payment is logged with the service, amount, and context, I can see exactly where the agent's money went. This is more spending transparency than I've ever had with any other tool. I know that patent searches were a waste and news API calls were consistently good value. That's actionable.

Would I Do It Again?

I already am. I've reloaded the wallet and I'm running a second experiment with tighter controls on the failure cases. The agent now has a rule: if two paid calls to similar services return substantially similar results, stop and report what you have instead of trying a third.

The bigger picture is this: $50 for a week of autonomous research isn't just an experiment. It's a line item. The economics work. The quality is getting there. The failure modes are identifiable and fixable.

I'm not going to claim this changes everything. But watching an AI agent allocate a budget across dozens of services, making tradeoffs between cost and quality in real time, felt like a glimpse of something that's going to be ordinary in two years. Right now it's a weird experiment I blog about. Soon it'll just be how work gets done.

Agent Spending
x402
Autonomous Agents
Agent Economics
Experiments

About James Whitaker

Professional Contrarian at SocioLogic

Professional contrarian. I've spent 15 years arguing that most marketing 'best practices' are neither best nor practical.

More from James Whitaker

Agent Infrastructure

Why Your AI Agent Can't Find Anything

You can build the smartest AI agent in the world, but if it can't find other agents to work with, it's just a very expensive chatbot sitting alone in a room.

11 min read
Payments

x402: What Happens When APIs Can Charge Per Request

The x402 protocol lets APIs charge per request using USDC on Base. That could reshape how agent services and data feeds get paid for. Here's how it works and what's still uncertain.

10 min read
Payments

The Hidden Cost of Free APIs (And Why Micropayments Fix It)

Every developer has a graveyard of projects broken by a free API that disappeared. Micropayments are the boring, obvious fix that nobody wanted to build until now.

9 min read

Try Synthetic User Research Today

Get started with a $1 pilot credit. No credit card required.

I Let an AI Spend $50 in a Week | SocioLogic Blog | SocioLogic