LLM prompt engineering · safety-first

LLM prompt engineering strategy

6 prompt layers diferenciados · 6 prompt injection defenses · 6 evaluation metrics quantified · 6 evolution governance rules. Prompts diseñados para safety médica + brand consistency + cost control · NO ad-hoc experimentation production.

Comparación LLMs Handoff bot→humano Seguridad

6 prompt layers · stacked architecture

Layer	Purpose	Content
System prompt (immutable per tenant)	Role definition · clinic identity · brand voice · scope constraints · NEVER user-editable	~800 tokens · clinic name + style guide + boundaries (NO medical diagnosis · NO prices not in KB · NO insurance coverage)
Knowledge base context (per query)	Retrieval-augmented · clinic-specific FAQ + services + policies · grounds responses in clinic facts	RAG retrieval ~1500 tokens relevant chunks · prevents hallucination · cited responses
Conversation history (sliding window)	Multi-turn coherence · last 6-8 messages · trimmed if too long · summarized old context	~500-2000 tokens dynamic · oldest dropped first · summary preserved
User input (sanitized)	Current message paciente · pre-filtered (prompt injection detection · jailbreak attempts blocked)	Typically 10-200 tokens · max 1000 enforced · longer = split or reject
Output schema enforcement (Zod)	Structured response · validated post-LLM · prevents free-form harmful content slipping	JSON schema: response_text · confidence · escalate_to_human · scheduling_intent · feedback_request
Safety guardrails (output filter)	Post-generation review · medical diagnosis detection · forbidden claims · auto-correct or escalate	Pattern matching + secondary LLM evaluation · borderline cases human-in-loop fallback

6 prompt injection defenses

Input sanitization

Strip system-like prompts ('Ignore previous instructions...') · neutralize markdown injection · escape special tokens · max length enforcement

Role separation strict

System role NEVER concatenated with user input · API enforces distinct messages · prevents context confusion

Output validation post-generation

LLM cannot bypass schema · Zod validation rejects free-form · forces structured response · simpler to audit

Forbidden topic detection

Pre-LLM classifier flags: medical diagnosis · pricing absolute · insurance claims · prescription · contraindications · escalates human

Rate limiting per-tenant + per-conversation

Prevent flooding attack · cost protection · max 50 messages/conversation auto-escalates · max 1000/clinic/hour throttle

Conversation handoff triggers

Detect frustration · jailbreak attempts · complex requests · auto-escalate clinic admin · documented in handoff-policy

Evaluation framework · 6 metrics

Metric	Target	Current (2 demo clinics)
Response quality (LLM judge + human)	>85% acceptable · measured weekly sample 50 conversations · LLM judge correlates with human rating ±10%	~92% acceptable last 30 días (2 demo clinics low traffic)
Handoff rate (when bot escalates)	15-25% target healthy · too low = bot overreaching · too high = bot underperforming	~18% current · within healthy range
Hallucination rate (factual errors)	<2% target · measured manual review weekly · RAG grounding helps	~1.2% measured · acceptable · pattern: dates wrong sometimes (RAG limitation)
Prompt injection success rate	0% target · all known attacks blocked · red team weekly	0/47 attempted in last 30 días · defense holds
Cost per response	<0.005€/response with gpt-4o-mini · budget protection	0.0042€/response measured · within budget
Latency p95 generation	<5s p95 target · user experience constraint	p95 4.2s currently · streaming planned -20% Q3

Evolution governance · 6 rules

Prompt changes require ADR si pattern change · documented WHY + before/after evaluation results
A/B testing prompts via feature flag · 10% traffic new prompt · 7 días minimum · statistical significance before rollout
Per-tenant overrides ONLY via approved patterns · clinic-specific KB updates allowed · system prompt structure locked
Evaluation snapshots versionados · prompt version + eval results stored · rollback capability si regression
Weekly review founder · prompt changes proposed + evidence + decision documented
External adversarial review · ChatGPT auditor reviews prompts trimestral · attempts jailbreak · postmortem any successful

Honest limitations · low-traffic baseline

Evaluation metrics basadas en 2 demo clinics low traffic. Numbers are early-stage indicators · NOT statistical certainty. Cuando lleguen clientes reales con diverse patients · expected: edge cases más frecuentes · handoff rate ajustará · hallucination patterns refinados.

Commitment: honest metrics updates publicados monthly · NO cherry-pick winning months · transparent improvement/regression tracking.

¿Tu AI/ML team necesita prompt architecture deep-dive?

Para Enterprise procurement · sample prompts · evaluation harness · red team scenarios reports disponibles bajo NDA Enterprise.

Solicitar deep-dive LLMs comparación Arquitectura