
How We Built a Multi-Agent System for Strategic Research
We needed competitive intelligence, financial modeling, market research, and strategic analysis running in parallel. Ongoing capabilities that improve over time, building on what they've already learned.
So we built a system. The agents are defined in markdown files. The orchestration runs through conventions and protocols. The memory layer is files on disk. And the whole thing runs on top of our favorite AI development tool: Cursor.
Here's how it works, and what we learned.
The Harness: Cursor
Cursor is an AI-powered development environment, and it's the backbone of the system. It provides the execution layer: tool access, file management, sub-agent delegation, and integrations with external services through MCP (Model Context Protocol) servers. Think of it as the operating system our agents run on.
It also lets us choose the right AI model for the job. Our strategic advisor runs on a state-of-the-art reasoning model (Claude Opus 4.6) because that work requires deep, nuanced thinking. A research sub-agent doing search-and-summarize work runs on a faster, more efficient model because speed matters more than sophistication for that task. Same system, different engines depending on the work. And if the next new bot on the block can beat Claude, we'll swap it out.
Each agent is defined in a markdown file that specifies its role, its domain knowledge, the quality standards it should hit, and the anti-patterns it should avoid. When we invoke an agent, Cursor reads that file and operates within those constraints.
Persistence: The Part Most People Skip
Most AI tools have amnesia. Every conversation starts from zero. You explain your business, your constraints, your data. Again. You re-establish context that you've already given three times before. It's the single biggest reason people bounce off AI after the initial excitement fades.
We solved this with a simple structure. Each agent maintains three files:
Memories are persistent, domain-specific knowledge. Structured notes about what matters in that agent's domain. Our financial modeling agent remembers spreadsheet IDs, benchmark sources, and the rationale behind specific modeling decisions. The market research agent remembers data source quirks: that IBISWorld numbers diverged from Census data for a particular segment, that a CAC benchmark came from a secondary source with medium confidence.
Progress is what makes multi-session work possible. Our financial model has gone through six iterations across multiple sessions. Expense categorization first, then the core model v1 through v6, each version tracked with key metrics and the decisions that drove changes. When a new session starts, the agent reads its progress file and picks up mid-project. Context preserved. Decisions intact.
Todos function as a prioritized backlog that accumulates over time. The financial modeling agent has future phases queued (cap table modeling, Series A readiness dashboard) waiting until the current work is done. They're a living work queue that gets added to and worked through over weeks.
If you've ever tried to use ChatGPT on your phone to troubleshoot a problem in the field, you know the frustration. We have an agent for motorcycle maintenance (the system works across any domain). Every time we fire it up, it already knows the bike: year, model, mileage, installed modifications, unresolved recalls, what was serviced last, and what's coming up. It connects dots to previous issues on its own. That's what persistence actually gets you.
All of this runs on files on disk, read and written by the agents according to rules defined in their markdown configurations.
Shared Knowledge Across Agents
The persistence layer handles what each agent knows individually. We also needed knowledge to flow between agents, and that turned out to be the most valuable part of the system.
We built a shared findings registry. It's a single file that every agent reads when it starts up and writes to when it discovers something worth sharing. Each entry includes the claim, the source, a confidence level (high, medium, or low), which agent established it, and the cross-domain implications.
When our competitive analysis agent finds that most construction SaaS competitors use per-seat pricing models, that finding goes into the registry with a note that it's relevant to financial modeling assumptions. Next time the financial modeling agent runs, it reads the registry and has that context automatically.
The system handles disagreements too. If one agent's research contradicts an existing finding, it flags the conflict explicitly and waits for a human to resolve it. The human stays in the loop on judgment calls.
We've accumulated over 25 shared findings so far, covering market sizing, pricing benchmarks, competitive positioning, and unit economics. Some have spawned deeper investigations: a question about SaaS valuations and AI disruption turned into a 300-line analysis with sources, confidence ratings, and identified gaps in the available research.
The Sandboxed Workspace
Our financial modeling agent creates real spreadsheets and shares them with real people. It runs on a dedicated Google Workspace account with its own email address, its own Drive, and its own calendar. When it builds a financial model, it creates the spreadsheet in its own Drive and then shares it with the humans who need it.
This is an access control decision. We can see everything the software creates. It operates in its own sandbox. If something goes wrong, we revoke one account. Same logic you'd apply to any service account in any system.
Teaching It to Stop Writing Like a Robot
Here's a small example that shows how the configuration works in practice.
Every AI writing tool has the same tell: em-dashes everywhere. That long horizontal dash that shows up in every other sentence. AI models use them for asides, for emphasis, for lists, for clause separation. Read any AI-generated text and you'll spot the pattern in seconds. Once you see it, you can't unsee it.
So we banned them. The marcom agent's configuration file has a "Banned Formatting" section that says, plainly: never use em-dashes, for any reason. Use periods, commas, colons, or parentheses instead. The rule is also in the agent's memory file, and in its anti-patterns list. Three places, because the models are stubborn about this one.
This is a trivial example, but it illustrates the real point. The difference between AI output that reads like AI output and AI output that reads like your company wrote it comes down to details like this. Dozens of small, specific rules about what "good" looks like. Accumulated configuration, refined over time, that shapes the output into something you'd actually publish.
The em-dash ban took 30 seconds to add. The first draft of this blog post had 14 of them.
The Honest Take
This system produces real competitive analyses, real financial models, real market research. The kind of work we review, challenge, and use to make actual business decisions.
It's software, configured carefully to do specific work to a specific standard.
The value is in the configuration: telling each agent exactly what "good" looks like in its domain, giving it persistence so it accumulates knowledge over time, connecting the agents so findings flow between them with clear provenance and confidence levels, and setting up the right access controls so the software can do real work safely.
That's the actual work of making AI useful in a business. Choosing the right model for each task. Building the structure around the tools so they produce reliable, trustworthy output. And then verifying that they did.
It's also the kind of work we do for our clients. If you're curious what this looks like applied to your operations, we should talk.
