Handle Bounce Back Messages for AI Agents
Detect, parse, & handle bounce back messages for AI agents. This developer guide covers retry logic & suppression lists using Robotomail.

Your agent sent the email. The task moved on. Hours later, nothing happened.
That failure mode is common in AI systems that treat email as a fire-and-forget side effect instead of a stateful channel. A bounced message isn’t just a delivery problem. It can break a handoff, kill a thread, stall an approval, or leave your agent waiting on a reply that will never come.
Traditional advice on bounce back messages assumes a human operator will log into a dashboard, inspect a report, and clean up a list later. That model doesn't fit an autonomous workflow. Agents need machine-readable failure signals, immediate routing, and durable rules for what happens next.
Why Silent Email Failures Cripple AI Agents
A human notices when email goes wrong. An agent usually doesn't.
If your support bot sends a follow-up to confirm a refund, your procurement agent emails a vendor for a quote, or your ops agent reaches out to a customer contact with a scheduling update, a bad address can subtly collapse the workflow. The message bounces. No reply arrives. The agent keeps waiting or keeps retrying the wrong thing.

Most documentation still treats bounce handling like a marketing hygiene chore. That's the wrong mental model for agent systems. Existing guidance largely assumes human-managed email systems, while AI agents need automated, code-level handling. It also notes that 20-30% of bounces are soft and recoverable with automation (Network Solutions on common bounceback messages).
What breaks when the agent never sees the bounce
The damage isn't limited to one failed send.
- Conversation state goes stale: Your agent thinks it contacted someone and may continue reasoning from a false assumption.
- Fallback logic never triggers: If bounce events don't enter your app, the agent can't switch channels, request a corrected address, or escalate.
- Sender reputation erodes: Repeated sends to bad addresses teach providers to distrust your mail.
- Debugging gets ugly: Teams inspect prompts, tools, and memory layers when the actual failure was an undetected delivery rejection.
This problem often appears alongside basic inbound issues. If you're already debugging why an agent can send but not receive, the underlying workflow gap usually extends to bounce handling too. That pattern shows up clearly in this breakdown of send-without-receive email setups.
Silent bounces create fake success. That's worse than an explicit error because your agent keeps planning around a message that never arrived.
The scale makes this infrastructure, not edge-case cleanup
Email volume is too large to treat bounce back messages as a rare exception. In 2024, about 361.1 billion emails were sent per day globally, with an average bounce rate of 2.33%, which works out to roughly 97,000 bounces every second. The same benchmark also notes that keeping bounce rate below 2% is considered excellent, while rates above 5% can put you in risky territory for blacklisting (No2Bounce global bounce analysis).
For agent builders, the practical takeaway is simple. If your system sends email autonomously, bounce handling belongs in core workflow design. Not in ops cleanup. Not in a future dashboard. In the runtime path.
Anatomy of a Bounce Message
A bounce message looks messy if you read it like an email. It becomes usable once you treat it as structured transport data.
The useful parts are usually buried inside a Delivery Status Notification, or DSN. That's the machine-readable section generated by mail servers to explain why delivery failed. Your application shouldn't care much about the decorative text around it. It should care about the action code, status code, recipient, and diagnostic detail.
The two categories that matter
The basic split still matters, but only as the first branch in your logic.
A hard bounce is a permanent failure. The address doesn't exist, the domain is invalid, or the recipient system rejected delivery in a way that won't improve by trying again. In practice, your agent should stop sending to that address.
A soft bounce is temporary. The mailbox may be unavailable, the server may be rate limiting, or the receiving side may be having a transient issue. These are candidates for automated retry.
Read the status line, not the apology text
Mail servers often include vague human prose like "message could not be delivered" or "try again later." Don't build logic on that copy. Build it on the status code.
A few patterns matter most:
| Field | Example | What your code should infer |
|---|---|---|
| SMTP class | 5xx |
Permanent failure. Suppress. |
| SMTP class | 4xx |
Temporary failure. Retry with policy. |
| DSN code | 5.1.1 |
User unknown. Treat as hard bounce. |
| DSN code | 4.4.1 |
Persistent transient failure. Treat as soft bounce. |
| SMTP response | 550 |
Often maps to invalid recipient or rejection. Inspect context and suppress if permanent. |
A practical way to inspect a bounce
When I review bounce payloads in production systems, I look for this order of trust:
- Bounce classification from the provider
- DSN status code
- SMTP response code
- Diagnostic text for logs and human review
That ordering matters because free-form text varies wildly between providers. Code fields are more stable.
Practical rule: If your parser depends on wording like "mailbox full" or "user unknown" in the message body, it will break across providers. Parse structured fields first.
What a parsed bounce object should contain
Your normalized event should be boring. That's good.
At minimum, keep these fields:
- Recipient address: The address your system attempted to reach.
- Bounce type: Hard or soft.
- SMTP code: The top-level response category.
- DSN status: More specific machine-readable failure detail.
- Diagnostic message: Useful for logs, support, and edge-case inspection.
- Message identifier: So you can connect the bounce to the original agent action.
- Timestamp: For retry policy, audit trails, and workflow timing.
If you don't normalize these fields into a single internal schema, every downstream component ends up re-parsing provider-specific payloads. That creates brittle code and conflicting behavior between teams.
Why developers should care about the boring details
The distinction between 5.1.1 and 4.4.1 isn't academic. One means your agent should stop and correct the destination. The other means your system should preserve context and try again later.
Treating all bounce back messages the same produces two bad outcomes. Either you churn valid contacts too early, or you keep hammering invalid ones and degrade deliverability. Both are expensive in agent workflows because both feed bad state back into the planner.
Capturing and Parsing Bounces Programmatically
Polling an inbox for nondelivery reports is the old way. It works poorly for autonomous systems.
You want bounce events delivered into your app as they happen, ideally as signed webhook payloads. That lets your runtime update task state immediately, queue retries, suppress invalid recipients, and preserve thread context without a human reading a mailbox.

If you're wiring this into an agent stack, start with a webhook endpoint and make signature verification mandatory. The Robotomail webhooks concept docs are the kind of reference I like for this pattern because they focus on event delivery, integrity, and app-side handling instead of forcing you into mailbox polling.
What your endpoint should do on day one
Keep the first version narrow. It only needs to do four things well:
- Accept the POST
- Verify the HMAC signature
- Parse the JSON payload
- Write a normalized event to durable storage or a queue
Don't trigger complex business logic before you've made event ingestion reliable. Teams often cram suppression, retries, CRM writes, and analytics into the webhook handler itself. Then one downstream outage causes dropped bounce events.
A thin handler with durable handoff is safer.
Example in Python
This example shows the shape of the receiver, not a provider-specific schema. Adjust field names to match your payload.
from flask import Flask, request, abort
import hmac
import hashlib
import json
app = Flask(__name__)
WEBHOOK_SECRET = b"replace-with-your-secret"
def verify_signature(raw_body, provided_signature):
expected = hmac.new(WEBHOOK_SECRET, raw_body, hashlib.sha256).hexdigest()
return hmac.compare_digest(expected, provided_signature)
@app.post("/webhooks/email")
def handle_email_event():
raw_body = request.get_data()
signature = request.headers.get("X-Signature", "")
if not verify_signature(raw_body, signature):
abort(401)
payload = request.get_json(force=True)
event = {
"event_type": payload.get("type"),
"recipient": payload.get("recipient"),
"bounce_type": payload.get("bounce", {}).get("type"),
"smtp_code": payload.get("bounce", {}).get("smtp_code"),
"dsn_status": payload.get("bounce", {}).get("dsn_status"),
"diagnostic": payload.get("bounce", {}).get("message"),
"message_id": payload.get("message_id"),
}
print(json.dumps(event))
return {"ok": True}, 200
The important part isn't Flask. It's the discipline. Verify first, parse second, enqueue third.
What to normalize immediately
I like to collapse provider payloads into a stable internal event like this:
recipientmessage_idmailbox_idbounce_typesmtp_codedsn_statusdiagnosticoccurred_at
That schema is enough to drive retries, suppression, analytics, and incident review without carrying around raw provider quirks in every subsystem.
Here’s a useful walkthrough before you move deeper into implementation:
Common mistakes that break capture
The failures here are usually operational, not conceptual.
- Skipping signature verification: If your endpoint trusts unauthenticated POSTs, anyone can inject fake bounce events.
- Parsing only the message body: You'll miss the structured fields that matter.
- Doing too much synchronously: Webhook handlers should acknowledge fast and hand off work.
- Not keeping raw payloads: When classification goes wrong, raw event storage saves you.
Store the normalized event for application logic, but keep the raw payload for forensic debugging. You'll want both the first time a provider changes a field shape.
Why real-time ingestion changes agent behavior
Once bounce capture is wired correctly, your agent can behave like a real system instead of a hopeful script.
A bounced approval request can open a fallback task. A failed customer reply can trigger address verification. A transient failure can stay attached to the same conversation and re-attempt later. None of that is possible if bounce back messages sit in a mailbox that no code ever reads.
Mapping SMTP Codes to Agent Actions
The parser tells you what happened. Your router decides what to do.
How most agent mailflows handle bounces determines their reliability or sloppiness. Teams often stop at logging the bounce. Logging is not handling. Your code needs an explicit action map that turns transport failures into workflow decisions.
Industry analysis puts the average split at about 0.21% hard bounces and 0.70% soft bounces, and the recommended response is straightforward: hard bounces should be suppressed immediately, while soft bounces should enter a retry queue with exponential backoff starting at 1 hour, doubling up to 48 hours, with a maximum of 3 attempts (Mailerio bounce rate benchmark).
SMTP code to agent action mapping
| SMTP Code | Meaning | Bounce Type | Recommended Agent Action |
|---|---|---|---|
| 550 | Recipient rejected or invalid destination | Hard in most cases | Suppress recipient, mark task as undeliverable, request alternate contact if workflow allows |
| 5.1.1 | User unknown | Hard | Immediate suppression and stop future sends to that address |
| 4xx | Temporary delivery issue | Soft | Queue retry with exponential backoff and preserve conversation state |
| 4.4.1 | Persistent transient failure | Soft | Retry up to policy limit, then escalate or quarantine if unresolved |
| 5xx tied to policy or filtering | Sender or content issue | Hard or review case | Pause related sends and alert a human for investigation |
The decision framework I recommend
A good router doesn't try to be clever. It stays conservative.
- Permanent recipient failure: Suppress immediately.
- Temporary remote failure: Retry on schedule.
- Sender reputation or content rejection: Alert and pause the relevant stream.
- Unknown classification: Quarantine for review instead of guessing.
This prevents the two worst mistakes. Repeatedly sending to dead addresses, and deleting recoverable contacts too early.
Sample routing logic
function routeBounce(event) {
const code = event.smtp_code || "";
const dsn = event.dsn_status || "";
if (dsn === "5.1.1" || String(code).startsWith("5")) {
return "SUPPRESS";
}
if (dsn === "4.4.1" || String(code).startsWith("4")) {
return "RETRY";
}
return "ALERT";
}
That logic is intentionally plain. Put nuance around it later. The first job is to make sure every bounce back message lands in one of three buckets: suppress, retry, or alert.
If a bounce suggests your sender identity or message policy is the problem, don't let the agent keep improvising. Stop the flow and let a human inspect it.
Implementing Suppression and Retry Logic
Decision logic isn't enough. You need durable state.
If an address hard-bounced yesterday and your system tries it again today because the result lived only in memory, your architecture failed. The same is true for soft bounces. If retries depend on an in-process timer in a worker that restarts overnight, you'll lose recoverable messages.

Properly implemented bounce handling changes outcomes. Automating hard bounce removal and using a 3-retry limit for soft bounces can reduce false positives by 50%. The same source also notes that using a platform with auto-configured SPF/DKIM/DMARC can improve delivery rates by 15-25% before messages ever hit your bounce pipeline (Twilio on email bounce management).
Suppression should be persistent and checked pre-send
A suppression list isn't a reporting artifact. It's a send-time guardrail.
Every outbound email attempt should check suppression state before the message leaves your system. That check should happen close to the send action, not just when the contact record is edited. In agent systems, addresses can come from memory, extracted docs, CRM records, human input, and prior threads. Bad data can re-enter from anywhere.
A minimal suppression record should include:
- Recipient address
- Reason for suppression
- Source event or message ID
- Timestamp
- Mailbox or tenant scope
Retry queues need policy, not hope
For soft bounces, use a real job queue. Celery, Bull, Sidekiq, and similar tools all work fine if you treat retry policy as first-class state.
I like this shape:
def schedule_retry(message_id, recipient, attempt):
if attempt >= 3:
mark_as_failed(message_id, recipient, reason="soft-bounce-limit")
return
delay_hours = min(2 ** attempt, 48)
enqueue_send(message_id=message_id, recipient=recipient, delay_hours=delay_hours)
That does three useful things:
- It keeps retries bounded.
- It spaces retries so you don't hammer a struggling remote server.
- It gives the application a clean point to stop and escalate.
What works and what doesn't
Here's the blunt version.
- Works well: Durable queues, idempotent retry jobs, pre-send suppression checks, and explicit failure states on the task.
- Fails in practice: Ad hoc cron resends, manual spreadsheet cleanup, and "just try again later" logic with no max attempt count.
Another common mistake is converting every repeated soft bounce into a hard bounce automatically without context. Sometimes that's right. Sometimes the recipient domain was flaky for a while. Quarantine is often better than outright deletion if the address is high value.
Keep workflow state attached to the message
An agent isn't just sending mail. It's pursuing an objective.
So when a send retries, preserve the thread ID, task ID, and original intent. If you separate bounce handling from agent state, you end up with duplicate threads, confused memory, and retries that no longer match the conversation they came from.
That linkage matters more than most deliverability guides admit. Human operators can infer context. Autonomous systems need it stored.
Building Resilient Agent-Native Mailflows
A resilient mailflow is a closed loop. Send, observe, classify, act, persist.
That's the shift that matters for AI agents. Bounce back messages stop being passive error artifacts and become runtime events that influence planning. Hard failures teach the system not to retry dead destinations. Soft failures give it a controlled recovery path. Reputation-related failures tell you when autonomous sending needs human review.
The old model is mailbox-centric. A person reads a nondelivery report and cleans things up later. The better model is event-centric. The application receives the bounce, updates the workflow, and keeps the agent operating inside real delivery constraints.
That design also changes how you evaluate tooling. For agent work, the important primitives are programmatic mailbox creation, signed webhooks, send-and-receive APIs, suppression controls, threading, and rate limits that your application can reason about. If you're exploring options, this curated list of a new AI email tool is useful because it frames email infrastructure from the automation side instead of the old human inbox side.
The best bounce handling system is the one your agent can actually use at runtime without browser logins, manual triage, or inbox scraping.
When teams get this right, email stops being a fragile plugin bolted onto an agent. It becomes a dependable interface the agent can use, recover, and learn from.
If you're building autonomous email workflows and want infrastructure designed for agents instead of humans, Robotomail is worth a look. It gives agents real mailboxes through an API, supports signed webhooks for inbound handling, preserves conversation threading, and includes the operational controls that make bounce handling practical in production.