Rapidflare Blog
All posts

Four Generations of the Rapidflare Agent Harness - Spark, Flame, Blaze and Forge

How the Rapidflare agent harness has evolved across four generations — from a simple RAG pipeline in 2023 to a long-running, broader, deeper harness in 2026.

Vasanth Asokan ·
Contents
  1. v1 — Spark
  2. v2 — Flame
  3. v3 — Blaze
  4. v4 — Forge
  5. Putting it together — every agent, every mode
  6. Where do we go from here

Since the start of Rapidflare, we have shipped four distinct generations of our agent harness. Each generation was a step-change over the previous in terms of what it could accomplish. Breaking that down, each harness improved upon the previous in terms of:

  • What it knows about the target customer’s world
  • The set of capabilities it is powered with
  • Ultimately, what outcomes and user experiences it enabled

Internally we brand the evolutions as: Spark, Flame, Blaze, and Forge.

Spark
Simple RAG
Flame
One pipeline per query type
Blaze
Single front door harness, on demand skills
Forge
Persistent, super powerful harness

The pace of evolution in the agentic world has been breakneck — and we’ve felt every bit of it. Forge is our newest and most capable harness, and it would be easy to just talk about where it’s taking us. But we think the evolution itself is worth telling. Each generation exposed us to hard problems, limitations, reliable techniques and approaches. Equipped with that knowledge, as well as the decisions we made, and sometimes got wrong, we were able to shape what came next.


v1 — Spark

Late 2023 → mid 2024. Simple RAG done carefully.

Rapidflare started with a retrieval-augmented generation pipeline. The interesting work wasn’t in the diagram — the diagram is well-known — it was in making each stage actually carry weight on real electronics-distribution content.

v1 — Spark · Linear RAG pipeline
User question
1
Query Rewriter / Enricher
Interpret latest turn, expand abbreviations, canonicalize part numbers, pull conversation context
2
Retrieval + Rerank
Hybrid (vector + keyword) search, cross-encoder rerank
3
Context Formatter
Arrange the context in a logical fashion
4
Answer Generator
Grounded LLM answer that cites the retrieved sources
Answer + citations

The pipeline:

  • A Query Rewriter / Enricher — interprets user’s latest turn based on the conversation, expands abbreviations, canonicalizes part numbers, pulls forward conversation context needed for the current turn.
  • A Retrieval + Rerank stage takes a hybrid (vector + keyword) search across a per-customer knowledge base, followed by a cross-encoder rerank pass
  • A Context Formatter arranges the context in a logical fashion
  • A final Answer Generator step invokes an LLM to provide a grounded answer that cites the retrieved sources

What Spark got right:

  • Tight grounding — every claim backed by a retrieved chunk
  • Cheap, fast, predictable — one pass, one answer

Where Spark hit walls:

  • A single one-size-fits-all retrieval pipeline doesn’t elegantly handle content source diversity, query specific nuances. A specification lookup is very different from a competitive comparison, and the approach for answering those questions can be significantly different. So generation quality plateaued because the retrieval was forced to be generic
  • We also hit walls with an off the shelf reranker. Rerankers were only good at reranking each text chunk relative to other chunks, but not in a way that was aligned to the original query.
  • Customer specific needs and nuances were significant and the very simplistic pipeline did not provide hooks where enough of those concerns could be addressed efficiently

v2 — Flame

Mid 2024 → early 2025. Query-typed static context engineering pipelines.

Learning from Spark, we decide to get more opinionated about the query shape and architect purpose built pipelines, each with multiple extension points for bringing in customer specific nuances. This was Flame which replaced the single Spark pipeline with a classify-then-route design. We trained a query classifier to bucket every incoming question into one of seven query types that were common in our domain, and built a dedicated, static context engineering pipeline for each. Each pipeline could lean into its query type’s goals in deeper ways, such as customizing each pipeline step’s prompts, retrievers, rerankers, output generators. Thus instead of trying to be everything for every question, we codified our insights into what’s needed for each. This achieved greater answer quality across a diverse set of use cases. One more explicit design direction was to tie the UX to the backend handler pipeline. For instance, a product comparison would start emitting specific comparison widgets that’d be understood by the UX and rendered. We also started treating and handling incoming human messages as “queries” (directives, commentary, formatting or summarization instructions) rather than just questions.

v2 — Flame · One pipeline per query type
User question
Query Classifier
products_spec
Extract spec, normalize units, cite
products_lookup
Filter catalog by spec, answer over list
products_comparison
Resolve parts, diff specs, render comparison
products_by_usecase
Parse usecase → derive filters → rank candidates
keyword_lookup
Match → disambiguate → define
general_qa
Generalized RAG for everything else
agent_capability
Describe self from config + knowledge
Each pipeline owns its own retrievers, prompts, post-processing, citations and rendering widgets.

The routes:

  • products_spec: “What’s the operating voltage of X?” → spec extraction + unit normalization + cite
  • products_lookup: “Tell me about parts with 300 Mbps bandwidth” → catalog fetch, filter based on specification, answer over resulting list
  • products_comparison: “How does X compare to Y?” → resolve both parts, fetch full specs, compute deltas, render a purpose built comparison table
  • products_by_usecase: “What chip should I use for outdoor BLE?” → parse usecase, reason over application specific requirements, derive specification filters, , answer over shortlisted and ranked candidates
  • keyword_lookup: glossary / terminology questions → match + disambiguate + define
  • general_qa: Treat as generalized retrieval augmented generation, flexibly handle anything that’s not one of the earlier query types
  • agent_capability: “What can you do?” → describe self based on observable knowledge and configuration context, promote clarity of capabilities

We built these on Haystack, which gave us clean component models, first class Pipeline abstractions, and a routing framework that can instantiate and render appropriate pipelines on demand.

What Flame unlocked:

  • A massive accuracy jump on the question shapes that mattered most for technical sales — comparisons, spec lookups, use-case selection
  • Faster iteration: changing the comparison prompt didn’t touch the spec lookup prompt, each was independently testable
  • Better UX control — each pipeline could emit its own widget (comparison table, product card, spec table)

Where Flame hit walls:

  • Conversations don’t stay in one bucket. A user starts with a product lookup, drifts into a spec question, ends in troubleshooting. The classifier-router model fights this with every turn becoming a forced and rigid routing decision. With that rigidness, the query type pipelines and prompts were often fighting us in handling the richer surrounding conversation nuances.
  • Adding a new question shape meant adding a new pipeline. The catalog of pipelines grew faster than the team could keep them all sharp.

Rather than constructing more pipelines, we wanted a different unit of capability to work with. Simultaneously, there was a quantum leap in the ability of LLMs to power agentic approaches.

v3 — Blaze

mid 2025 → today. An agentic approach that’s seen numerous internal evolutions.

In 2025, reasoning and tool calling LLMs started crossing certain capability thresholds unlocking greater ability to orchestrate dynamically. Initial experiments were promising but not 100% reliable. Tool calling was still unpredictable, hard to tune or constrain, inefficient. Towards the end of 2025, frontier models such as Sonnet and Opus 4.5, GPT 5.1 and Gemini 3.0 became much more reliable, allowing us to delegate more directly to a powerful orchestrator LLM. Our reliance on this technique steadily grew over 2025, and towards the end of 2026 culminated in Blaze, the architecture all customer-facing Rapidflare agents run on today.

Our first class agent solutions, Product Selection, Cross-Reference, Proposals and Tech Support all share this core harness, but discover and load solution specific skills to accomplish these outcomes.

v3 — Blaze · Single front door harness, on demand skills
User message
Single front-door agent harness — interpret → reason → tools → maybe load skill → reason → act → answer
Knowledge search
Glossary lookup
Multiple-choice prompting
Citation rules
Formatting
Tone
product_catalog
SKILL.md + tools.py
troubleshooting
SKILL.md + tools.py
cross_reference
SKILL.md + tools.py
self_reflection
SKILL.md + tools.py
Same harness powers Product Selection, Cross-Reference, Proposals, and Tech Support — different skills, same shape.
  • There is no mode toggle, no classifier deciding their fate.
  • A single front-door agent harness sees every human message, interprets intent in context, and chooses how to respond.
  • A small set of base skills — knowledge search, glossary lookup, multiple-choice prompting, citation rules, formatting, tone — is always active. They cover the majority of turns without loading anything more.
  • Specialized capability starts living in progressively discovered and loaded skills. Some examples are — product_catalog, troubleshooting, cross_reference, self_reflection. We also package each skill with its own set of colocated and lazily loaded tools.
  • The harness runs an agentic loop - interpret → reason → call tools → maybe load a new skill → reason further with the new tools available → act → … → answer.

What landed in Blaze (and is in production now):

  • Single front door, no mode switching. The same agent harness fluidly handles product selection in turn 1, troubleshooting in turn 2, comparison in turn 3.
  • Progressive skill disclosure. Base context stays lean; we only pay the prompt cost for product_catalog once we need it.
  • Cross-turn skill rehydration. A skill loaded on turn 3 is automatically re-available on turn 4 without a reload round-trip.
  • Customer-configurable skills. The product catalog skill for customer A is different from the skill for customer B.
  • Prompt-cache discipline. We carefully orchestrate how the sequence of System prompt, followed by tool definitions, human / AI / tool message turns, is maintained so as to create the best prompt cache performance. We hold the cache warm both within a turn and across turns. (This sounds boring; it’s a load-bearing piece of the unit economics).
  • Ambient Context - We inject dynamic context via <system-reminder> messages - for instance, the current day / time.

What Blaze taught us:

  • A smart and capable harness is crucial for being able to ignore it from there on and focus on the skills.
  • Most of the last mile quality wins come from authoring and maintaining good skills.
  • Putting tools next to skills (a tools.py colocated with SKILL.md) keeps each skill self-contained and reviewable as one unit.
  • Skills are markdown — anyone on the team can author one. We are opening up this capability to our CS team as well, who are closest to customer needs and asks and can over time own the skill as IP.

Blaze is still actively evolving — sub-agent dispatch, full message-history persistence, memory compaction, bounded skill lifecycle, lazy just-in-time instructions from tools are all evolving.

v4 — Forge

2026 →. Sophisticated work, deep skills, customer-tailored context — online, offline, or scheduled.

Blaze is excellent at conversational, in-the-moment work — a user asks something, the harness does a number of steps including retrieval and tool use, and answers within seconds or a minute at most.

Now comes the next frontier. Forge is for the work that:

  • Doesn’t finish in seconds. Work that can run for hours or days.
    • Requires numerous sleep / wake cycles
  • Sophisticated, end-to-end tasks that use the file system, computers, browsers, programming languages and cloud execution environments.
    • This allows us to execute tasks on behalf of our customers that cannot be pre-defined tasks
    • This also allows us to expose a large library of skills, including skills and tools authored by the community at large
  • More sophisticated customer-specific context injection, integrations (connections), ACL policies and more.
    • The connectors in particular, reaches across many sources of capability - MCP tools, internal APIs, browsers, the customer’s CRM, email, Slack, code repos

So we set out to build a powerful, persistent, high capable agentic harness that runs in private sandboxes (per invocation) and blends Rapidflare’s product intelligence with powerful primitives including LLMs, cloud compute, storage, network. This harness can be online in the moment, offline in the background, or on a schedule or trigger.

v4 — Forge · Long-running harness, blending product intelligence with the Claude SDK
User chat
Webhook
Schedule
File change
Human handoff
Forge harness
Persistent · sandboxed · multi-cycle
Plan
Act
Verify
Critique
Sandbox Private, per invocation
Cadence Hours / days · sleep · wake · resume
Primitives LLMs · file system · compute · storage · network
Skills libraryRapidflare + community-authored
Sub-agentsspecialist dispatch
ConnectorsMCP, APIs, browser, CRM, email, Slack, repos
Product intelligenceknowledge graph + LLM suite
Customer context + ACLper-customer policies
Persistent, sandboxed harness — runs for hours or days across many sleep / wake cycles, blending Rapidflare product intelligence with cloud primitives.

The harness shape stretches in the following directions:

  • Spans many agentic loops, not one
  • Persistent file-system access — a real workspace where the harness can write, read, edit, and accumulate state
  • Can sleep, wake on triggers, and resume — a webhook fires, a schedule hits, a human hands off, a file changes
  • Real plan / act / verify / critique structure rather than a single tool-use loop

Forge is exciting! And frankly, it can do so much that users in our customer base can get overwhelmed with the numerous ways to use it. We are taking a dogfooding approach. The combination of our internal business knowledge and Forge’s deep agentic capability is going to reshape how we work — and in turn how our customers work.

Thus, starting this month, we are using Forge on our own business operations first. A few things we are already using it for or looking to do soon:

  • Sales motion — prospect research, multi-day outreach sequences, prep for sales calls
  • Customer success — onboarding tracking, engagement presentations, data analytics, quality reports

What we are focussing on learning quickly:

  • Paved paths over open canvas. Forge’s open-ended power can be its own UX problem. We are figuring out how to package it as a library of well-paved, well-tested tasks so customers can drop in and get value without first having to learn the harness.
  • Memory & state at length. Long-running harnesses stress-test memory and compaction much harder than chat-shaped ones do. We are exploring persistent, per-customer context graphs that survive across tasks — so Forge doesn’t have to rediscover a hub on every new run.
  • Cost legibility. Forge’s leverage comes with real LLM, compute and storage spend. The ROI is clear to us; we are still working out how to make it equally legible to customers — when to reach for Forge, what each run costs, and what comes back.
  • Governance that stays simple. We are learning what enterprise-grade security, audit and policy controls look like for an autonomous harness — without bolting on yet another complex layer for customers to manage.
  • Observability for autonomous fleets. Operating a set of microservices is one thing; operating a fleet of autonomous agents acting under customer directives is another. We are figuring out the right shape of observability, operations and remediation tooling for this regime.

Putting it together — every agent, every mode

Forge is not a replacement for Blaze. It’s an additional gear. Every Rapidflare agent — Sales, Support, Proposals, Cross-Reference — can run in Blaze mode for in-the-moment turns, or Forge mode for the long-haul work that the same agent should also be able to take on.

This is what the full Rapidflare stack looks like today, with that gearing baked in:

Rapid Sales
Rapid Support
Rapid Proposals
Rapid Cross-Reference
Agent UX
Web UI
Slack / Teams
Mobile
Phone
Email
API
Agent Harness & Skills
Durable agent sandboxes
Agent state & memory
Orchestration (Blaze · Forge)
Per-domain skills catalog
Product Intelligence & Infra
Knowledge graphs
LLM suite + routing
Content DBs
Enterprise context
Continuously tuned by AI + humans
Enterprise Integrations
Read-only knowledge
Websites Docs & playbooks Support KBs Slack / Teams Past proposals CRM (read)
Read / write systems
CRM (write) CMS Email / Slack Enterprise APIs

A few observations to call out from the diagram:

  • Agent UX — every customer-facing surface (Web UI, Slack/Teams, mobile, phone, email, API) flows into the same harness. We meet customers where their technical sales journeys already happen.
  • Agent Harness and Skills — durable agent sandboxes, agent state and memory, agent orchestration (Blaze and Forge), and a per-domain skills catalog. The harness is the shared substrate; skills are the differentiated capability.
  • Product Intelligence and Infra Layer — knowledge graphs, an LLM suite with routing, content DBs, and an enterprise context system. Continuously tuned by AI + humans.
  • Enterprise Integrations — read-only knowledge sources (websites, docs, playbooks, support KBs, Slack/Teams, past proposals, CRM) and read/write systems (CRM, CMS, email/Slack, enterprise APIs).

Rapid Sales, Rapid Support, Rapid Proposals, Rapid Cross-Reference sit as agentic capabilities at the top, but underneath, the same harness runs them all. Blaze for the conversational turn. Forge for the work that stretches beyond the turn.

Where do we go from here

This post was about our harnesses. The next leap isn’t — it’s about leaving the single-agent, single-user frame behind. If 2026 was the year of SKILLS.md, we are building towards CHARTER.md. A single-harness-with-skills, even at Forge’s scale, is still a single-player game. The real question: what happens when a customer’s deployment becomes multiple agents — each with its own harness and skills — organized around a shared, governed purpose.

Stay tuned for a post on the evolution from skills to charters next!


Want to see Blaze in action on your own technical content? Or want to see if Forge can handle your most challenging tasks? Come talk to us!