LLM prompt engineering strategy
6 prompt layers diferenciados · 6 prompt injection defenses · 6 evaluation metrics quantified · 6 evolution governance rules. Prompts diseñados para safety médica + brand consistency + cost control · NO ad-hoc experimentation production.
6 prompt layers · stacked architecture
| Layer | Purpose | Content |
|---|---|---|
| System prompt (immutable per tenant) | Role definition · clinic identity · brand voice · scope constraints · NEVER user-editable | ~800 tokens · clinic name + style guide + boundaries (NO medical diagnosis · NO prices not in KB · NO insurance coverage) |
| Knowledge base context (per query) | Retrieval-augmented · clinic-specific FAQ + services + policies · grounds responses in clinic facts | RAG retrieval ~1500 tokens relevant chunks · prevents hallucination · cited responses |
| Conversation history (sliding window) | Multi-turn coherence · last 6-8 messages · trimmed if too long · summarized old context | ~500-2000 tokens dynamic · oldest dropped first · summary preserved |
| User input (sanitized) | Current message paciente · pre-filtered (prompt injection detection · jailbreak attempts blocked) | Typically 10-200 tokens · max 1000 enforced · longer = split or reject |
| Output schema enforcement (Zod) | Structured response · validated post-LLM · prevents free-form harmful content slipping | JSON schema: response_text · confidence · escalate_to_human · scheduling_intent · feedback_request |
| Safety guardrails (output filter) | Post-generation review · medical diagnosis detection · forbidden claims · auto-correct or escalate | Pattern matching + secondary LLM evaluation · borderline cases human-in-loop fallback |
6 prompt injection defenses
Evaluation framework · 6 metrics
| Metric | Target | Current (2 demo clinics) |
|---|---|---|
| Response quality (LLM judge + human) | >85% acceptable · measured weekly sample 50 conversations · LLM judge correlates with human rating ±10% | ~92% acceptable last 30 días (2 demo clinics low traffic) |
| Handoff rate (when bot escalates) | 15-25% target healthy · too low = bot overreaching · too high = bot underperforming | ~18% current · within healthy range |
| Hallucination rate (factual errors) | <2% target · measured manual review weekly · RAG grounding helps | ~1.2% measured · acceptable · pattern: dates wrong sometimes (RAG limitation) |
| Prompt injection success rate | 0% target · all known attacks blocked · red team weekly | 0/47 attempted in last 30 días · defense holds |
| Cost per response | <0.005€/response with gpt-4o-mini · budget protection | 0.0042€/response measured · within budget |
| Latency p95 generation | <5s p95 target · user experience constraint | p95 4.2s currently · streaming planned -20% Q3 |
Evolution governance · 6 rules
- Prompt changes require ADR si pattern change · documented WHY + before/after evaluation results
- A/B testing prompts via feature flag · 10% traffic new prompt · 7 días minimum · statistical significance before rollout
- Per-tenant overrides ONLY via approved patterns · clinic-specific KB updates allowed · system prompt structure locked
- Evaluation snapshots versionados · prompt version + eval results stored · rollback capability si regression
- Weekly review founder · prompt changes proposed + evidence + decision documented
- External adversarial review · ChatGPT auditor reviews prompts trimestral · attempts jailbreak · postmortem any successful
Evaluation metrics basadas en 2 demo clinics low traffic. Numbers are early-stage indicators · NOT statistical certainty. Cuando lleguen clientes reales con diverse patients · expected: edge cases más frecuentes · handoff rate ajustará · hallucination patterns refinados.
Commitment: honest metrics updates publicados monthly · NO cherry-pick winning months · transparent improvement/regression tracking.
¿Tu AI/ML team necesita prompt architecture deep-dive?
Para Enterprise procurement · sample prompts · evaluation harness · red team scenarios reports disponibles bajo NDA Enterprise.