Saltar al contenido principal
LLM prompt engineering · safety-first

LLM prompt engineering strategy

6 prompt layers diferenciados · 6 prompt injection defenses · 6 evaluation metrics quantified · 6 evolution governance rules. Prompts diseñados para safety médica + brand consistency + cost control · NO ad-hoc experimentation production.

6 prompt layers · stacked architecture

LayerPurposeContent
System prompt (immutable per tenant)Role definition · clinic identity · brand voice · scope constraints · NEVER user-editable~800 tokens · clinic name + style guide + boundaries (NO medical diagnosis · NO prices not in KB · NO insurance coverage)
Knowledge base context (per query)Retrieval-augmented · clinic-specific FAQ + services + policies · grounds responses in clinic factsRAG retrieval ~1500 tokens relevant chunks · prevents hallucination · cited responses
Conversation history (sliding window)Multi-turn coherence · last 6-8 messages · trimmed if too long · summarized old context~500-2000 tokens dynamic · oldest dropped first · summary preserved
User input (sanitized)Current message paciente · pre-filtered (prompt injection detection · jailbreak attempts blocked)Typically 10-200 tokens · max 1000 enforced · longer = split or reject
Output schema enforcement (Zod)Structured response · validated post-LLM · prevents free-form harmful content slippingJSON schema: response_text · confidence · escalate_to_human · scheduling_intent · feedback_request
Safety guardrails (output filter)Post-generation review · medical diagnosis detection · forbidden claims · auto-correct or escalatePattern matching + secondary LLM evaluation · borderline cases human-in-loop fallback

6 prompt injection defenses

Input sanitization
Strip system-like prompts ('Ignore previous instructions...') · neutralize markdown injection · escape special tokens · max length enforcement
Role separation strict
System role NEVER concatenated with user input · API enforces distinct messages · prevents context confusion
Output validation post-generation
LLM cannot bypass schema · Zod validation rejects free-form · forces structured response · simpler to audit
Forbidden topic detection
Pre-LLM classifier flags: medical diagnosis · pricing absolute · insurance claims · prescription · contraindications · escalates human
Rate limiting per-tenant + per-conversation
Prevent flooding attack · cost protection · max 50 messages/conversation auto-escalates · max 1000/clinic/hour throttle
Conversation handoff triggers
Detect frustration · jailbreak attempts · complex requests · auto-escalate clinic admin · documented in handoff-policy

Evaluation framework · 6 metrics

MetricTargetCurrent (2 demo clinics)
Response quality (LLM judge + human)>85% acceptable · measured weekly sample 50 conversations · LLM judge correlates with human rating ±10%~92% acceptable last 30 días (2 demo clinics low traffic)
Handoff rate (when bot escalates)15-25% target healthy · too low = bot overreaching · too high = bot underperforming~18% current · within healthy range
Hallucination rate (factual errors)<2% target · measured manual review weekly · RAG grounding helps~1.2% measured · acceptable · pattern: dates wrong sometimes (RAG limitation)
Prompt injection success rate0% target · all known attacks blocked · red team weekly0/47 attempted in last 30 días · defense holds
Cost per response<0.005€/response with gpt-4o-mini · budget protection0.0042€/response measured · within budget
Latency p95 generation<5s p95 target · user experience constraintp95 4.2s currently · streaming planned -20% Q3

Evolution governance · 6 rules

  • Prompt changes require ADR si pattern change · documented WHY + before/after evaluation results
  • A/B testing prompts via feature flag · 10% traffic new prompt · 7 días minimum · statistical significance before rollout
  • Per-tenant overrides ONLY via approved patterns · clinic-specific KB updates allowed · system prompt structure locked
  • Evaluation snapshots versionados · prompt version + eval results stored · rollback capability si regression
  • Weekly review founder · prompt changes proposed + evidence + decision documented
  • External adversarial review · ChatGPT auditor reviews prompts trimestral · attempts jailbreak · postmortem any successful
Honest limitations · low-traffic baseline

Evaluation metrics basadas en 2 demo clinics low traffic. Numbers are early-stage indicators · NOT statistical certainty. Cuando lleguen clientes reales con diverse patients · expected: edge cases más frecuentes · handoff rate ajustará · hallucination patterns refinados.

Commitment: honest metrics updates publicados monthly · NO cherry-pick winning months · transparent improvement/regression tracking.

¿Tu AI/ML team necesita prompt architecture deep-dive?

Para Enterprise procurement · sample prompts · evaluation harness · red team scenarios reports disponibles bajo NDA Enterprise.