Four Generations of the Rapidflare Agent Harness - Spark, Flame, Blaze and Forge

Since the start of Rapidflare, we have shipped four distinct generations of our agent harness. Each generation was a step-change over the previous in terms of what it could accomplish. Breaking that down, each harness improved upon the previous in terms of:

What it knows about the target customer’s world
The set of capabilities it is powered with
Ultimately, what outcomes and user experiences it enabled

Internally we brand the evolutions as: Spark, Flame, Blaze, and Forge.

Spark

Simple RAG

Flame

One pipeline per query type

Blaze

Single front door harness, on demand skills

Forge

Persistent, super powerful harness

The pace of evolution in the agentic world has been breakneck — and we’ve felt every bit of it. Forge is our newest and most capable harness, and it would be easy to just talk about where it’s taking us. But we think the evolution itself is worth telling. Each generation exposed us to hard problems, limitations, reliable techniques and approaches. Equipped with that knowledge, as well as the decisions we made, and sometimes got wrong, we were able to shape what came next.

v1 — Spark

Late 2023 → mid 2024. Simple RAG done carefully.

Rapidflare started with a retrieval-augmented generation pipeline. The interesting work wasn’t in the diagram — the diagram is well-known — it was in making each stage actually carry weight on real electronics-distribution content.

v1 — Spark · Linear RAG pipeline

User question

Query Rewriter / Enricher

Interpret latest turn, expand abbreviations, canonicalize part numbers, pull conversation context

Retrieval + Rerank

Hybrid (vector + keyword) search, cross-encoder rerank

Context Formatter

Arrange the context in a logical fashion

Answer Generator

Grounded LLM answer that cites the retrieved sources

Answer + citations

The pipeline:

A Query Rewriter / Enricher — interprets user’s latest turn based on the conversation, expands abbreviations, canonicalizes part numbers, pulls forward conversation context needed for the current turn.
A Retrieval + Rerank stage takes a hybrid (vector + keyword) search across a per-customer knowledge base, followed by a cross-encoder rerank pass
A Context Formatter arranges the context in a logical fashion
A final Answer Generator step invokes an LLM to provide a grounded answer that cites the retrieved sources

What Spark got right:

Tight grounding — every claim backed by a retrieved chunk
Cheap, fast, predictable — one pass, one answer

Where Spark hit walls:

A single one-size-fits-all retrieval pipeline doesn’t elegantly handle content source diversity, query specific nuances. A specification lookup is very different from a competitive comparison, and the approach for answering those questions can be significantly different. So generation quality plateaued because the retrieval was forced to be generic
We also hit walls with an off the shelf reranker. Rerankers were only good at reranking each text chunk relative to other chunks, but not in a way that was aligned to the original query.
Customer specific needs and nuances were significant and the very simplistic pipeline did not provide hooks where enough of those concerns could be addressed efficiently

v2 — Flame

Mid 2024 → early 2025. Query-typed static context engineering pipelines.

Learning from Spark, we decide to get more opinionated about the query shape and architect purpose built pipelines, each with multiple extension points for bringing in customer specific nuances. This was Flame which replaced the single Spark pipeline with a classify-then-route design. We trained a query classifier to bucket every incoming question into one of seven query types that were common in our domain, and built a dedicated, static context engineering pipeline for each. Each pipeline could lean into its query type’s goals in deeper ways, such as customizing each pipeline step’s prompts, retrievers, rerankers, output generators. Thus instead of trying to be everything for every question, we codified our insights into what’s needed for each. This achieved greater answer quality across a diverse set of use cases. One more explicit design direction was to tie the UX to the backend handler pipeline. For instance, a product comparison would start emitting specific comparison widgets that’d be understood by the UX and rendered. We also started treating and handling incoming human messages as “queries” (directives, commentary, formatting or summarization instructions) rather than just questions.

v2 — Flame · One pipeline per query type

User question

Query Classifier

products_spec

Extract spec, normalize units, cite

products_lookup

Filter catalog by spec, answer over list

products_comparison

Resolve parts, diff specs, render comparison

products_by_usecase

Parse usecase → derive filters → rank candidates

keyword_lookup

Match → disambiguate → define

general_qa

Generalized RAG for everything else

agent_capability

Describe self from config + knowledge

Each pipeline owns its own retrievers, prompts, post-processing, citations and rendering widgets.

The routes:

products_spec: “What’s the operating voltage of X?” → spec extraction + unit normalization + cite
products_lookup: “Tell me about parts with 300 Mbps bandwidth” → catalog fetch, filter based on specification, answer over resulting list
products_comparison: “How does X compare to Y?” → resolve both parts, fetch full specs, compute deltas, render a purpose built comparison table
products_by_usecase: “What chip should I use for outdoor BLE?” → parse usecase, reason over application specific requirements, derive specification filters, , answer over shortlisted and ranked candidates
keyword_lookup: glossary / terminology questions → match + disambiguate + define
general_qa: Treat as generalized retrieval augmented generation, flexibly handle anything that’s not one of the earlier query types
agent_capability: “What can you do?” → describe self based on observable knowledge and configuration context, promote clarity of capabilities

We built these on Haystack, which gave us clean component models, first class Pipeline abstractions, and a routing framework that can instantiate and render appropriate pipelines on demand.

What Flame unlocked:

A massive accuracy jump on the question shapes that mattered most for technical sales — comparisons, spec lookups, use-case selection
Faster iteration: changing the comparison prompt didn’t touch the spec lookup prompt, each was independently testable
Better UX control — each pipeline could emit its own widget (comparison table, product card, spec table)

Where Flame hit walls:

Conversations don’t stay in one bucket. A user starts with a product lookup, drifts into a spec question, ends in troubleshooting. The classifier-router model fights this with every turn becoming a forced and rigid routing decision. With that rigidness, the query type pipelines and prompts were often fighting us in handling the richer surrounding conversation nuances.
Adding a new question shape meant adding a new pipeline. The catalog of pipelines grew faster than the team could keep them all sharp.

Rather than constructing more pipelines, we wanted a different unit of capability to work with. Simultaneously, there was a quantum leap in the ability of LLMs to power agentic approaches.

v3 — Blaze

mid 2025 → today. An agentic approach that’s seen numerous internal evolutions.

In 2025, reasoning and tool calling LLMs started crossing certain capability thresholds unlocking greater ability to orchestrate dynamically. Initial experiments were promising but not 100% reliable. Tool calling was still unpredictable, hard to tune or constrain, inefficient. Towards the end of 2025, frontier models such as Sonnet and Opus 4.5, GPT 5.1 and Gemini 3.0 became much more reliable, allowing us to delegate more directly to a powerful orchestrator LLM. Our reliance on this technique steadily grew over 2025, and towards the end of 2026 culminated in Blaze, the architecture all customer-facing Rapidflare agents run on today.

Our first class agent solutions, Product Selection, Cross-Reference, Proposals and Tech Support all share this core harness, but discover and load solution specific skills to accomplish these outcomes.

v3 — Blaze · Single front door harness, on demand skills

User message

Single front-door agent harness — interpret → reason → tools → maybe load skill → reason → act → answer

Base skills · always active

Knowledge search

Glossary lookup

Multiple-choice prompting

Citation rules

Formatting

Tone

Specialized skills · progressively discovered & loaded · each with colocated, lazy-loaded tools

product_catalog

SKILL.md + tools.py

troubleshooting

SKILL.md + tools.py

cross_reference

SKILL.md + tools.py

self_reflection

SKILL.md + tools.py

Same harness powers Product Selection, Cross-Reference, Proposals, and Tech Support — different skills, same shape.

There is no mode toggle, no classifier deciding their fate.
A single front-door agent harness sees every human message, interprets intent in context, and chooses how to respond.
A small set of base skills — knowledge search, glossary lookup, multiple-choice prompting, citation rules, formatting, tone — is always active. They cover the majority of turns without loading anything more.
Specialized capability starts living in progressively discovered and loaded skills. Some examples are — product_catalog, troubleshooting, cross_reference, self_reflection. We also package each skill with its own set of colocated and lazily loaded tools.
The harness runs an agentic loop - interpret → reason → call tools → maybe load a new skill → reason further with the new tools available → act → … → answer.

What landed in Blaze (and is in production now):

Single front door, no mode switching. The same agent harness fluidly handles product selection in turn 1, troubleshooting in turn 2, comparison in turn 3.
Progressive skill disclosure. Base context stays lean; we only pay the prompt cost for product_catalog once we need it.
Cross-turn skill rehydration. A skill loaded on turn 3 is automatically re-available on turn 4 without a reload round-trip.
Customer-configurable skills. The product catalog skill for customer A is different from the skill for customer B.
Prompt-cache discipline. We carefully orchestrate how the sequence of System prompt, followed by tool definitions, human / AI / tool message turns, is maintained so as to create the best prompt cache performance. We hold the cache warm both within a turn and across turns. (This sounds boring; it’s a load-bearing piece of the unit economics).
Ambient Context - We inject dynamic context via <system-reminder> messages - for instance, the current day / time.

What Blaze taught us:

A smart and capable harness is crucial for being able to ignore it from there on and focus on the skills.
Most of the last mile quality wins come from authoring and maintaining good skills.
Putting tools next to skills (a tools.py colocated with SKILL.md) keeps each skill self-contained and reviewable as one unit.
Skills are markdown — anyone on the team can author one. We are opening up this capability to our CS team as well, who are closest to customer needs and asks and can over time own the skill as IP.

Blaze is still actively evolving — sub-agent dispatch, full message-history persistence, memory compaction, bounded skill lifecycle, lazy just-in-time instructions from tools are all evolving.

v4 — Forge

2026 →. Sophisticated work, deep skills, customer-tailored context — online, offline, or scheduled.

Blaze is excellent at conversational, in-the-moment work — a user asks something, the harness does a number of steps including retrieval and tool use, and answers within seconds or a minute at most.

Now comes the next frontier. Forge is for the work that:

Doesn’t finish in seconds. Work that can run for hours or days.
- Requires numerous sleep / wake cycles
Sophisticated, end-to-end tasks that use the file system, computers, browsers, programming languages and cloud execution environments.
- This allows us to execute tasks on behalf of our customers that cannot be pre-defined tasks
- This also allows us to expose a large library of skills, including skills and tools authored by the community at large
More sophisticated customer-specific context injection, integrations (connections), ACL policies and more.
- The connectors in particular, reaches across many sources of capability - MCP tools, internal APIs, browsers, the customer’s CRM, email, Slack, code repos

So we set out to build a powerful, persistent, high capable agentic harness that runs in private sandboxes (per invocation) and blends Rapidflare’s product intelligence with powerful primitives including LLMs, cloud compute, storage, network. This harness can be online in the moment, offline in the background, or on a schedule or trigger.

v4 — Forge · Long-running harness, blending product intelligence with the Claude SDK

Triggers

User chat

Webhook

Schedule

File change

Human handoff

Forge harness

Persistent · sandboxed · multi-cycle

Plan

→

Act

→

Verify

→

Critique

Sandbox Private, per invocation

Cadence Hours / days · sleep · wake · resume

Primitives LLMs · file system · compute · storage · network

Capabilities

Skills libraryRapidflare + community-authored

Sub-agentsspecialist dispatch

ConnectorsMCP, APIs, browser, CRM, email, Slack, repos

Product intelligenceknowledge graph + LLM suite

Customer context + ACLper-customer policies

Persistent, sandboxed harness — runs for hours or days across many sleep / wake cycles, blending Rapidflare product intelligence with cloud primitives.

The harness shape stretches in the following directions:

Spans many agentic loops, not one
Persistent file-system access — a real workspace where the harness can write, read, edit, and accumulate state
Can sleep, wake on triggers, and resume — a webhook fires, a schedule hits, a human hands off, a file changes
Real plan / act / verify / critique structure rather than a single tool-use loop

Forge is exciting! And frankly, it can do so much that users in our customer base can get overwhelmed with the numerous ways to use it. We are taking a dogfooding approach. The combination of our internal business knowledge and Forge’s deep agentic capability is going to reshape how we work — and in turn how our customers work.

Thus, starting this month, we are using Forge on our own business operations first. A few things we are already using it for or looking to do soon:

Sales motion — prospect research, multi-day outreach sequences, prep for sales calls
Customer success — onboarding tracking, engagement presentations, data analytics, quality reports

What we are focussing on learning quickly:

Paved paths over open canvas. Forge’s open-ended power can be its own UX problem. We are figuring out how to package it as a library of well-paved, well-tested tasks so customers can drop in and get value without first having to learn the harness.
Memory & state at length. Long-running harnesses stress-test memory and compaction much harder than chat-shaped ones do. We are exploring persistent, per-customer context graphs that survive across tasks — so Forge doesn’t have to rediscover a hub on every new run.
Cost legibility. Forge’s leverage comes with real LLM, compute and storage spend. The ROI is clear to us; we are still working out how to make it equally legible to customers — when to reach for Forge, what each run costs, and what comes back.
Governance that stays simple. We are learning what enterprise-grade security, audit and policy controls look like for an autonomous harness — without bolting on yet another complex layer for customers to manage.
Observability for autonomous fleets. Operating a set of microservices is one thing; operating a fleet of autonomous agents acting under customer directives is another. We are figuring out the right shape of observability, operations and remediation tooling for this regime.

Putting it together — every agent, every mode

Forge is not a replacement for Blaze. It’s an additional gear. Every Rapidflare agent — Sales, Support, Proposals, Cross-Reference — can run in Blaze mode for in-the-moment turns, or Forge mode for the long-haul work that the same agent should also be able to take on.

This is what the full Rapidflare stack looks like today, with that gearing baked in:

Rapid Sales

Rapid Support

Rapid Proposals

Rapid Cross-Reference

Agent UX

Web UI

Slack / Teams

Mobile

Phone

API

Agent Harness & Skills

Durable agent sandboxes

Agent state & memory

Orchestration (Blaze · Forge)

Per-domain skills catalog

Product Intelligence & Infra

Knowledge graphs

LLM suite + routing

Content DBs

Enterprise context

Continuously tuned by AI + humans

Enterprise Integrations

Read-only knowledge

Websites Docs & playbooks Support KBs Slack / Teams Past proposals CRM (read)

Read / write systems

CRM (write) CMS Email / Slack Enterprise APIs

A few observations to call out from the diagram:

Agent UX — every customer-facing surface (Web UI, Slack/Teams, mobile, phone, email, API) flows into the same harness. We meet customers where their technical sales journeys already happen.
Agent Harness and Skills — durable agent sandboxes, agent state and memory, agent orchestration (Blaze and Forge), and a per-domain skills catalog. The harness is the shared substrate; skills are the differentiated capability.
Product Intelligence and Infra Layer — knowledge graphs, an LLM suite with routing, content DBs, and an enterprise context system. Continuously tuned by AI + humans.
Enterprise Integrations — read-only knowledge sources (websites, docs, playbooks, support KBs, Slack/Teams, past proposals, CRM) and read/write systems (CRM, CMS, email/Slack, enterprise APIs).

Rapid Sales, Rapid Support, Rapid Proposals, Rapid Cross-Reference sit as agentic capabilities at the top, but underneath, the same harness runs them all. Blaze for the conversational turn. Forge for the work that stretches beyond the turn.

Where do we go from here

This post was about our harnesses. The next leap isn’t — it’s about leaving the single-agent, single-user frame behind. If 2026 was the year of SKILLS.md, we are building towards CHARTER.md. A single-harness-with-skills, even at Forge’s scale, is still a single-player game. The real question: what happens when a customer’s deployment becomes multiple agents — each with its own harness and skills — organized around a shared, governed purpose.

Stay tuned for a post on the evolution from skills to charters next!

Want to see Blaze in action on your own technical content? Or want to see if Forge can handle your most challenging tasks? Come talk to us!