Four Generations of the Rapidflare Agent Harness - Spark, Flame, Blaze and Forge
How the Rapidflare agent harness has evolved across four generations — from a simple RAG pipeline in 2023 to a long-running, broader, deeper harness in 2026.
Contents
Since the start of Rapidflare, we have shipped four distinct generations of our agent harness. Each generation was a step-change over the previous in terms of what it could accomplish. Breaking that down, each harness improved upon the previous in terms of:
- What it knows about the target customer’s world
- The set of capabilities it is powered with
- Ultimately, what outcomes and user experiences it enabled
Internally we brand the evolutions as: Spark, Flame, Blaze, and Forge.
The pace of evolution in the agentic world has been breakneck — and we’ve felt every bit of it. Forge is our newest and most capable harness, and it would be easy to just talk about where it’s taking us. But we think the evolution itself is worth telling. Each generation exposed us to hard problems, limitations, reliable techniques and approaches. Equipped with that knowledge, as well as the decisions we made, and sometimes got wrong, we were able to shape what came next.
v1 — Spark
Late 2023 → mid 2024. Simple RAG done carefully.
Rapidflare started with a retrieval-augmented generation pipeline. The interesting work wasn’t in the diagram — the diagram is well-known — it was in making each stage actually carry weight on real electronics-distribution content.
The pipeline:
- A Query Rewriter / Enricher — interprets user’s latest turn based on the conversation, expands abbreviations, canonicalizes part numbers, pulls forward conversation context needed for the current turn.
- A Retrieval + Rerank stage takes a hybrid (vector + keyword) search across a per-customer knowledge base, followed by a cross-encoder rerank pass
- A Context Formatter arranges the context in a logical fashion
- A final Answer Generator step invokes an LLM to provide a grounded answer that cites the retrieved sources
What Spark got right:
- Tight grounding — every claim backed by a retrieved chunk
- Cheap, fast, predictable — one pass, one answer
Where Spark hit walls:
- A single one-size-fits-all retrieval pipeline doesn’t elegantly handle content source diversity, query specific nuances. A specification lookup is very different from a competitive comparison, and the approach for answering those questions can be significantly different. So generation quality plateaued because the retrieval was forced to be generic
- We also hit walls with an off the shelf reranker. Rerankers were only good at reranking each text chunk relative to other chunks, but not in a way that was aligned to the original query.
- Customer specific needs and nuances were significant and the very simplistic pipeline did not provide hooks where enough of those concerns could be addressed efficiently
v2 — Flame
Mid 2024 → early 2025. Query-typed static context engineering pipelines.
Learning from Spark, we decide to get more opinionated about the query shape and architect purpose built pipelines, each with multiple extension points for bringing in customer specific nuances. This was Flame which replaced the single Spark pipeline with a classify-then-route design. We trained a query classifier to bucket every incoming question into one of seven query types that were common in our domain, and built a dedicated, static context engineering pipeline for each. Each pipeline could lean into its query type’s goals in deeper ways, such as customizing each pipeline step’s prompts, retrievers, rerankers, output generators. Thus instead of trying to be everything for every question, we codified our insights into what’s needed for each. This achieved greater answer quality across a diverse set of use cases. One more explicit design direction was to tie the UX to the backend handler pipeline. For instance, a product comparison would start emitting specific comparison widgets that’d be understood by the UX and rendered. We also started treating and handling incoming human messages as “queries” (directives, commentary, formatting or summarization instructions) rather than just questions.
The routes:
products_spec: “What’s the operating voltage of X?” → spec extraction + unit normalization + citeproducts_lookup: “Tell me about parts with 300 Mbps bandwidth” → catalog fetch, filter based on specification, answer over resulting listproducts_comparison: “How does X compare to Y?” → resolve both parts, fetch full specs, compute deltas, render a purpose built comparison tableproducts_by_usecase: “What chip should I use for outdoor BLE?” → parse usecase, reason over application specific requirements, derive specification filters, , answer over shortlisted and ranked candidateskeyword_lookup: glossary / terminology questions → match + disambiguate + definegeneral_qa: Treat as generalized retrieval augmented generation, flexibly handle anything that’s not one of the earlier query typesagent_capability: “What can you do?” → describe self based on observable knowledge and configuration context, promote clarity of capabilities
We built these on Haystack, which gave us clean component models, first class Pipeline abstractions, and a routing framework that can instantiate and render appropriate pipelines on demand.
What Flame unlocked:
- A massive accuracy jump on the question shapes that mattered most for technical sales — comparisons, spec lookups, use-case selection
- Faster iteration: changing the comparison prompt didn’t touch the spec lookup prompt, each was independently testable
- Better UX control — each pipeline could emit its own widget (comparison table, product card, spec table)
Where Flame hit walls:
- Conversations don’t stay in one bucket. A user starts with a product lookup, drifts into a spec question, ends in troubleshooting. The classifier-router model fights this with every turn becoming a forced and rigid routing decision. With that rigidness, the query type pipelines and prompts were often fighting us in handling the richer surrounding conversation nuances.
- Adding a new question shape meant adding a new pipeline. The catalog of pipelines grew faster than the team could keep them all sharp.
Rather than constructing more pipelines, we wanted a different unit of capability to work with. Simultaneously, there was a quantum leap in the ability of LLMs to power agentic approaches.
v3 — Blaze
mid 2025 → today. An agentic approach that’s seen numerous internal evolutions.
In 2025, reasoning and tool calling LLMs started crossing certain capability thresholds unlocking greater ability to orchestrate dynamically. Initial experiments were promising but not 100% reliable. Tool calling was still unpredictable, hard to tune or constrain, inefficient. Towards the end of 2025, frontier models such as Sonnet and Opus 4.5, GPT 5.1 and Gemini 3.0 became much more reliable, allowing us to delegate more directly to a powerful orchestrator LLM. Our reliance on this technique steadily grew over 2025, and towards the end of 2026 culminated in Blaze, the architecture all customer-facing Rapidflare agents run on today.
Our first class agent solutions, Product Selection, Cross-Reference, Proposals and Tech Support all share this core harness, but discover and load solution specific skills to accomplish these outcomes.
- There is no mode toggle, no classifier deciding their fate.
- A single front-door agent harness sees every human message, interprets intent in context, and chooses how to respond.
- A small set of base skills — knowledge search, glossary lookup, multiple-choice prompting, citation rules, formatting, tone — is always active. They cover the majority of turns without loading anything more.
- Specialized capability starts living in progressively discovered and loaded skills. Some examples are —
product_catalog,troubleshooting,cross_reference,self_reflection. We also package each skill with its own set of colocated and lazily loaded tools. - The harness runs an agentic loop - interpret → reason → call tools → maybe load a new skill → reason further with the new tools available → act → … → answer.
What landed in Blaze (and is in production now):
- Single front door, no mode switching. The same agent harness fluidly handles product selection in turn 1, troubleshooting in turn 2, comparison in turn 3.
- Progressive skill disclosure. Base context stays lean; we only pay the prompt cost for
product_catalogonce we need it. - Cross-turn skill rehydration. A skill loaded on turn 3 is automatically re-available on turn 4 without a reload round-trip.
- Customer-configurable skills. The product catalog skill for customer A is different from the skill for customer B.
- Prompt-cache discipline. We carefully orchestrate how the sequence of System prompt, followed by tool definitions, human / AI / tool message turns, is maintained so as to create the best prompt cache performance. We hold the cache warm both within a turn and across turns. (This sounds boring; it’s a load-bearing piece of the unit economics).
- Ambient Context - We inject dynamic context via
<system-reminder>messages - for instance, the current day / time.
What Blaze taught us:
- A smart and capable harness is crucial for being able to ignore it from there on and focus on the skills.
- Most of the last mile quality wins come from authoring and maintaining good skills.
- Putting tools next to skills (a
tools.pycolocated withSKILL.md) keeps each skill self-contained and reviewable as one unit. - Skills are markdown — anyone on the team can author one. We are opening up this capability to our CS team as well, who are closest to customer needs and asks and can over time own the skill as IP.
Blaze is still actively evolving — sub-agent dispatch, full message-history persistence, memory compaction, bounded skill lifecycle, lazy just-in-time instructions from tools are all evolving.
v4 — Forge
2026 →. Sophisticated work, deep skills, customer-tailored context — online, offline, or scheduled.
Blaze is excellent at conversational, in-the-moment work — a user asks something, the harness does a number of steps including retrieval and tool use, and answers within seconds or a minute at most.
Now comes the next frontier. Forge is for the work that:
- Doesn’t finish in seconds. Work that can run for hours or days.
- Requires numerous sleep / wake cycles
- Sophisticated, end-to-end tasks that use the file system, computers, browsers, programming languages and cloud execution environments.
- This allows us to execute tasks on behalf of our customers that cannot be pre-defined tasks
- This also allows us to expose a large library of skills, including skills and tools authored by the community at large
- More sophisticated customer-specific context injection, integrations (connections), ACL policies and more.
- The connectors in particular, reaches across many sources of capability - MCP tools, internal APIs, browsers, the customer’s CRM, email, Slack, code repos
So we set out to build a powerful, persistent, high capable agentic harness that runs in private sandboxes (per invocation) and blends Rapidflare’s product intelligence with powerful primitives including LLMs, cloud compute, storage, network. This harness can be online in the moment, offline in the background, or on a schedule or trigger.
The harness shape stretches in the following directions:
- Spans many agentic loops, not one
- Persistent file-system access — a real workspace where the harness can write, read, edit, and accumulate state
- Can sleep, wake on triggers, and resume — a webhook fires, a schedule hits, a human hands off, a file changes
- Real plan / act / verify / critique structure rather than a single tool-use loop
Forge is exciting! And frankly, it can do so much that users in our customer base can get overwhelmed with the numerous ways to use it. We are taking a dogfooding approach. The combination of our internal business knowledge and Forge’s deep agentic capability is going to reshape how we work — and in turn how our customers work.
Thus, starting this month, we are using Forge on our own business operations first. A few things we are already using it for or looking to do soon:
- Sales motion — prospect research, multi-day outreach sequences, prep for sales calls
- Customer success — onboarding tracking, engagement presentations, data analytics, quality reports
What we are focussing on learning quickly:
- Paved paths over open canvas. Forge’s open-ended power can be its own UX problem. We are figuring out how to package it as a library of well-paved, well-tested tasks so customers can drop in and get value without first having to learn the harness.
- Memory & state at length. Long-running harnesses stress-test memory and compaction much harder than chat-shaped ones do. We are exploring persistent, per-customer context graphs that survive across tasks — so Forge doesn’t have to rediscover a hub on every new run.
- Cost legibility. Forge’s leverage comes with real LLM, compute and storage spend. The ROI is clear to us; we are still working out how to make it equally legible to customers — when to reach for Forge, what each run costs, and what comes back.
- Governance that stays simple. We are learning what enterprise-grade security, audit and policy controls look like for an autonomous harness — without bolting on yet another complex layer for customers to manage.
- Observability for autonomous fleets. Operating a set of microservices is one thing; operating a fleet of autonomous agents acting under customer directives is another. We are figuring out the right shape of observability, operations and remediation tooling for this regime.
Putting it together — every agent, every mode
Forge is not a replacement for Blaze. It’s an additional gear. Every Rapidflare agent — Sales, Support, Proposals, Cross-Reference — can run in Blaze mode for in-the-moment turns, or Forge mode for the long-haul work that the same agent should also be able to take on.
This is what the full Rapidflare stack looks like today, with that gearing baked in:
A few observations to call out from the diagram:
- Agent UX — every customer-facing surface (Web UI, Slack/Teams, mobile, phone, email, API) flows into the same harness. We meet customers where their technical sales journeys already happen.
- Agent Harness and Skills — durable agent sandboxes, agent state and memory, agent orchestration (Blaze and Forge), and a per-domain skills catalog. The harness is the shared substrate; skills are the differentiated capability.
- Product Intelligence and Infra Layer — knowledge graphs, an LLM suite with routing, content DBs, and an enterprise context system. Continuously tuned by AI + humans.
- Enterprise Integrations — read-only knowledge sources (websites, docs, playbooks, support KBs, Slack/Teams, past proposals, CRM) and read/write systems (CRM, CMS, email/Slack, enterprise APIs).
Rapid Sales, Rapid Support, Rapid Proposals, Rapid Cross-Reference sit as agentic capabilities at the top, but underneath, the same harness runs them all. Blaze for the conversational turn. Forge for the work that stretches beyond the turn.
Where do we go from here
This post was about our harnesses. The next leap isn’t — it’s about leaving the single-agent, single-user frame behind. If 2026 was the year of SKILLS.md, we are building towards CHARTER.md. A single-harness-with-skills, even at Forge’s scale, is still a single-player game. The real question: what happens when a customer’s deployment becomes multiple agents — each with its own harness and skills — organized around a shared, governed purpose.
Stay tuned for a post on the evolution from skills to charters next!
Want to see Blaze in action on your own technical content? Or want to see if Forge can handle your most challenging tasks? Come talk to us!