The Rapidflare Fire Shield, Part II: Beyond the LLM

Part II of two. Part I covers the AI safety filter at the prompt boundary. This post covers the rest of the Rapidflare Fire Shield: the non-LLM layers that wrap that filter when the assistant is deployed on a public commercial website.

What it takes to run an AI assistant on the public web

When a Rapidflare assistant is deployed inside a customer’s authenticated product (such as an internal tool, a partner portal, something sitting behind SSO), the operational picture is relatively contained. Users are identifiable, sessions are accountable, and the class of traffic the assistant has to reason about is fairly narrow. A different picture emerges when the same assistant is embedded on a customer’s public commercial website: a product page, a documentation site, a marketing landing page, a support surface. At that point the assistant is exposed to the open internet, which means it is exposed to everything the open internet sends at a public endpoint. These can be prompt-injection probes, jailbreak attempts, WAF-level web attacks, bot-driven volume abuse. Ultimately, this can turn into a long tail of off-topic traffic and at scale, a form of denial of service.

Part I of this series covered the AI safety filter at the prompt boundary. This post covers the rest of the Rapidflare Fire Shield: the layers that sit outside the LLM and that, in our experience, do the bulk of the work on a public surface. The short version is that no single control is load-bearing on its own. Multiple parts come together to create, monitor and maintain a full safety posture.

Why one layer is not enough

A public web deployment attracts a wide range of threats; why does each one need its own control?

The threats against a public assistant do not fall into a single category, and they cannot be addressed by a single mechanism. Volumetric abuse looks nothing like prompt injection; a WAF pattern for SQL injection doesn’t help with a jailbreak prompt; a semantic off-topic classifier cannot prevent an endpoint probe from a residential-IP botnet. Each class of threat wants its own sensor, and each sensor needs to sit in the right place in the request path — some before the application stack is even reached, others inside the AI pipeline, others after answer generation, others sitting out-of-band on logs and analytics.

The layered architecture that results is the direct consequence of that threat heterogeneity. Below we walk through each non-LLM layer, what it actually does, and where in the pipeline it lives. The LLM-side filter (covered in Part I), sits between the edge layers and the monitoring/analytics layer in this picture. Internal human monitoring and external human adversarial validation sit observe the whole stack from outside.

The Rapidflare Fire Shield: pipeline layers shed traffic in sequence; out-of-band controls observe and validate the whole stack.

The pipeline sheds traffic in sequence as a request travels through it. An observability stack runs alongside, capturing signals from every stage so we can see what the pipeline is actually doing. And from the outside, the whole system gets probed: by Customer Success on a continuing basis, looking at customer-specific traffic patterns and dashboard health; and by an annual external PEN test that exercises the web application and API endpoints adversarially. The Fire Shield’s safety properties emerge from the combination, not from any single piece.

Layer 1: Web-front abuse controls at the edge

Most abusive traffic on a public surface should never reach the application stack at all; how do you shed it at the edge?

The first line of defense is the request edge, before the AI pipeline is invoked. A handful of controls operate here:

Rate limiting and throttling. Per-session, per-IP, and per-widget request limits help contain volumetric abuse, scraping, and request-burst patterns. Per-IP throttling in particular is effective against the low-sophistication end of the bot spectrum.

Origin and domain enforcement. The widget is bound to the customer’s approved origins and domains. This helps block embedding, replay, and unauthorized use of the widget from unapproved sites. This category of abuse is easy to overlook because it does not look like an attack on the application; it looks like normal traffic originating from the wrong place.

Domain-bound publishable API keys. Customers generate publishable API keys from the Rapidflare dashboard with configurable expiry. Domain binding (including wildcard origins) ties each key to its allow-listed hostnames, so a key copied off the customer’s site is not usable from an attacker’s own infrastructure. The publishable key is, by design, safe to embed in the customer’s frontend; it is not a gate by itself, which is why the rest of the controls in this layer matter.

AppCheck with invisible reCAPTCHA v3 and single-use tokens. Every widget request carries a cryptographic attestation of the client, paired with a Google reCAPTCHA v3 token that scores the session’s behavior on a 0.0–1.0 risk scale without ever interrupting the user. Each token is single-use, short-lived, and replay-protected. Critically, our backend does the verification — token validity, action match, hostname match, and a score threshold — rather than treating the presence of a token as a pass. This last point is where many naïve reCAPTCHA integrations fail; we’ll come back to it in the war story below.

Session and conversation shaping. Conversation length, turn frequency, and payload size limits reduce the attack surface for automated probing and prompt abuse. These controls help constrain malformed, repetitive, or oversized requests before they reach the application stack. The point here is not to catch a sophisticated attacker; it is to make the cheap attacks expensive.

Anti-automation heuristics. Behavioral signals such as request cadence, repetition patterns, and session shape are used to identify likely bot traffic. Suspicious patterns can be flagged, slowed, or blocked before they propagate further.

Cloud-provider edge security and DDoS protection. Rapidflare API servers run on GCP. Traffic entering our VPC uses Google Cloud Armor as the first stop for inbound traffic. The Cloud Armor security policy layers a per-IP request-rate throttle, a stack of preconfigured WAF deny rules for the OWASP categories described below, separate deny rules for scanner and protocol-level activity, and a default allow rule that only fires when none of the higher-priority deny rules has matched. Network-layer abuse and denial-of-service events are absorbed at the GCP edge before reaching our infrastructure. Malicious requests matching a deny rule are blocked with a 403 at the edge; legitimate traffic continues through normally.

Managed WAF protections. The Cloud Armor deny rules align with the OWASP Top 10 and related threat categories. Coverage includes SQL injection, cross-site scripting (XSS), local file inclusion (LFI), remote file inclusion (RFI), remote code execution (RCE), malicious scanner activity, protocol-level abuse, and session fixation.

The key takeaway from this layer is that a meaningful fraction of abusive traffic on a public surface is not AI-specific — it is ordinary web attack traffic, and it should be handled with ordinary web defenses before the AI pipeline is engaged.

A war story: the SEO-spam bot wave

A few weeks ago our CS team, as part of their regular health check on agent usage, noticed a pattern of strange inbound queries showing up in the conversation logs of one of customer’s public marketing site deployment. The messages all looked roughly like this:

[tcp4.com]black hat seo kya hai-black hat seo practices911

[tcp4.com]grandbet133

[tcp4.com] betmgm nj phone number SEO934

🔥[joyobet.com]marko kantele-deportivo tachira783

Over the course of a month, around a thousand such queries had hit our agent, all conforming to the same template: [domain.com]<spam phrase><random number>. The domains in the rotation were a mix of tcp4.com, joyobet.com, and a long tail of similar shells. The phrases mixed black-hat SEO terminology, sportsbook brands, and gambling references.

This was not a jailbreak. The attacker was not trying to manipulate the LLM. They were treating a public AI endpoint the same way they treat any public text input on the open web. as a placement surface for commodity SEO spam. The hope was that one of the following would happen: the agent might echo a domain back into a publicly indexed transcript, the site might store the text somewhere searchable, the app might trigger external searches that left traces, or — failing all of that — the endpoint would simply be a low-cost target to fire at in volume. The traffic shape (templated structure, residential-IP origins, headless-browser fingerprints) was consistent with a scripted spam corpus pointed at any input field that takes free text.

This particular customer’s public deployment predated our AppCheck and reCAPTCHA-based protections. By quickly migrating them to our full set of safety controls, we were able to bring the bot traffic down. The operational pattern in their analytics returned to what their actual user base looks like. The main system kicking in for this case is our reCAPTCHA mechanism.

Now,

The reCAPTCHA token is verified server-side against Google’s assessment API on every request. It is not just checked for presence, but it is also verified to belong to the allowed domains. Hostname and action consistency are validated, so a token issued for one customer’s site cannot be replayed against another.
A minimum score threshold is enforced per action; tokens below the threshold are rejected.
Tokens are single-use and short-lived, which collapses the window for replay attacks.

Layer 2: AI input and output safety (covered in Part I)

This is the LLM-side layer of the Fire Shield and is well covered in Part I.

Layer 3: Continuous monitoring and anomaly detection

A stack this layered produces a lot of signal; how do you turn it into operational awareness?

Our systems emit telemetry at all levels. We have Cloud Armor logs, API request logs, reCAPTCHA usage and assessment logs, e2e AI workflow traces and AI agent usage analytics. Spikes in failed safety checks, unusual session shapes, repeated identical queries, unexpected origin distributions trigger alerts.

An interesting observation with war story narrated earlier is that no individual telemetry signal was enough to trigger alerts. Usage traffic did not spike, the number of blocked requests was below our thresholds and since reCAPTCHA had not been turned on, it did not block or emit block signals.

This is an all too common pattern in large scale system observability and operations. All the logs, alerts in the world cannot warn you of a new scenario pattern with a new combination of signals. Teams only learn from encountering those scenarios in practice, and then instrument systems to warn of the scenario happening going forward.

Layer 4: Ongoing human-in-the-loop reviews

If the stack and the analytics are doing their job, why is a human still in the loop?

Because the threat profile and attack patterns keep varying, and no static set of rules — or set of classifiers — keeps up with that on its own. It pays to have humans actively reviewing the system: skimming dashboards, sampling conversation logs, looking at the shape of inbound traffic against a baseline, and asking whether the controls in place still describe what is actually happening on the wire. Spot checks routinely catch things that look unremarkable to any individual layer but stand out to a person looking across all of them.

The war story above is the cleanest example we have of why this matters. A human looking at the analytics is what closed the loop. The standing principle is that we keep learning by keenly observing how customer assistants are actually used in production, and by being willing to act on what we see. The Fire Shield is a moving target, not a checklist.

External validation: the other side of the puzzle

How much of this has been tested adversarially by someone other than the team that built it?

The four layers above are the request-path controls and the in-house operational practices around them. There is one more piece of the puzzle but is load-bearing for the whole picture: independent adversarial testing.

Static controls and internal benchmarks have an obvious problem. The team that designs the controls is also the team that measures them, which tends to produce confident numbers that are not always load-bearing under real adversarial conditions. Rapidflare addresses this by running an annual third-party penetration test of the web application and API endpoints, with targeted re-tests after material web-stack changes.

Our testing partner is Workstreet, the cybersecurity firm also engaged by Cursor, Clay, Granola, Exa, and Black Forest Labs, and Vanta’s #1 MSP for SOC 2 and ISO programs. Testing coverage on the web side includes the OWASP Web Application Top 10, application security testing (SAST, DAST, SCA across the development lifecycle), vulnerability scanning of the deployment environment, and network-layer penetration testing against exposed endpoints. Rapidflare holds SOC 2 Type II, with the independent service auditor’s report covering threat and vulnerability management among the Trust Services Criteria.

Future Work

There is exciting future work for us in this area. With the advent of agentic systems, there’s a significant opportunity to create agentic observability that catch patterns we haven’t explicitly coded into our checks. The edge layer, the AI filter, and the analytics signals each tell a partial story. A human looking at those patterns together can spot these (which is what our keen eyed customer success engineer did). Agentic observability systems offer a new future, where there’s AI working behind the scenes 24x7 to try to spot new patterns and attempt to self remediate.

It’s clear that the AI specific threat vectors will evolve significantly in the next few years. The challenges will present opportunity for not just Rapidflare, but the broader industry to innovate on scalable and reliable mechanisms to handle that threat surface.

Security documentation referenced in this post — the penetration test report, the SOC 2 Type II report, and CAIQ-aligned questionnaire responses — is available to customers under NDA. Customers deploying on a commercial website should request these early in procurement so that security review runs in parallel with the technical rollout rather than after.