promptfoo

Your guide to the Agentic AI evolution. **Prompting Blueprints** offers a curated collection of concepts and tactics for building autonomous AI workflows. Master tool-specific playbooks, backed by structured prompt packs and rigorous evaluations for the latest AI models.

mkdocs gemini llm prompt-engineering generative-ai chatgpt prompt-library promptfoo context-engineering

Updated May 14, 2026
HTML

openclay-ai / openclay

Star

Runtime-secured AI tooling framework for production-grade LLM applications, protecting against prompt injection, jailbreaks, and adversarial attacks.

Updated Apr 2, 2026
Python

yukinagae / genkitx-promptfoo

Star

Community Plugin for Genkit to use Promptfoo

plugin testing firebase ai evaluation prompt prompts evaluation-framework llm llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework promptfoo genkit genkitx genkit-plugin

Updated Jan 3, 2025
TypeScript

syamsasi99 / prompt-evaluator

Star

prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.

electron react typescript datascience developer-tools ai-evaluation llm prompt-engineering prompt-testing promptfoo ai-evaluation-tools ai-evaluation-metrics ai-evaluation-framework

Updated Dec 4, 2025
TypeScript

docker / docker-model-runner-and-mcp-with-promptfoo

Star

Examples of how to use Docker Model Runner, Docker MCP Toolkit, and Promptfoo together to evaluate models, agents, and MCP servers

docker ai mcp evaluation llm promptfoo

Updated Sep 19, 2025

mark-burg / llm-petting-zoo

Star

Compare models across prompts, test domains, and scenarios.

automation model-evaluation llm promptfoo

Updated Apr 5, 2026
Python

kpavlov / quarkus-assistant-demo

Sponsor

Star

kotlin testing mcp moderation chatbot demo-app rag quarkus langchain4j llm-evaluation promptfoo rag-chatbot mcp-client

Updated May 13, 2026
Kotlin

yukinagae / promptfoo-sample

Star

Sample project demonstrates how to use Promptfoo, a test framework for evaluating the output of generative AI models

testing evaluation prompts evaluation-framework llm llmops prompt-testing llm-eval llm-evaluation llm-evaluation-framework promptfoo

Updated Sep 10, 2024

Rul1an / assay

Star

CI-native evidence compiler for agent systems: MCP policy enforcement, evidence receipts, Trust Basis claims, and reviewable artifacts.

Updated May 12, 2026
Rust

Cornjebus / neo-llm-security

Star

AI security co-pilot skill for Claude Code - identify, test, and fix vulnerabilities in LLM-powered applications

jailbreak owasp red-team security-testing ai-security prompt-injection llm-security promptfoo claude-code claude-skill

Updated Dec 14, 2025

ashleysally00 / promptfoo-quickstart-guide

Star

Quickstart guide for using PromptFoo to evaluate LLM prompts via CLI or Colab.

openai colab model-evaluation cli-tool llm prompt-engineering prompt-testing promptfoo

Updated Nov 23, 2025

dhirajxai / llm-evals-and-anti-hallucination

Star

Evaluation patterns, release gates, and anti-hallucination techniques for developer-focused AI workflows.

evaluation llmops prompt-testing promptfoo llm-evals ai-reliability anti-hallucination groundedness

Updated Mar 27, 2026
Python

ddltn / giuseppe-prompting-pizzeria

Star

A fun experiment using promptfoo for evaluating LLM responses on pizza order transcripts

evaluation-metrics llm promptfoo

Updated Apr 15, 2025
Python

avi350751 / bfsi-red-team

Star

Red teaming a banking and finance llm assistant

yaml cybersecurity redteam promptfoo aitesting llmtesting

Updated Nov 19, 2025

klausners / prompt-optimizer

Star

Config-driven CLI that runs promptfoo evals, identifies low-scoring prompts, rewrites them via Claude API, and re-evaluates.

cli automation claude llm prompt-engineering llm-eval prompt-optimization promptfoo ai-eval

Updated Mar 26, 2026
TypeScript

mehernidhi / agentic-soc-redteam

Star

Multi-agent LLM-driven SOC pipeline (n8n + Ollama), adversarially red-teamed against the CSA Agentic AI Red Teaming Guide.

n8n detection-engineering giskard prompt-injection llm-security promptfoo agentic-ai red-team-soc

Updated May 1, 2026
Python

SzematPro / ai-agent-eval-harness-healthtech

Star

Multi-turn conversational AI agent for medication adherence, with CI-gated evaluation harness. LangGraph 1.0, multi-LLM (Groq/Cerebras/Anthropic), RAG with citation gate, OpenInference observability. Reference implementation, 100% synthetic data. Pattern transfers to other regulated industries.

Updated May 14, 2026
Python

Improve this page

Add a description, image, and links to the promptfoo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the promptfoo topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

promptfoo

Here are 39 public repositories matching this topic...

gauravvij / AutoPrompter

eon01 / LLMPromptEngineeringForDevelopersFiles

kpavlov / koog-spring-boot-assistant

TomasHer / prompting-blueprints

openclay-ai / openclay

yukinagae / genkitx-promptfoo

syamsasi99 / prompt-evaluator

docker / docker-model-runner-and-mcp-with-promptfoo

mark-burg / llm-petting-zoo

kpavlov / quarkus-assistant-demo

yukinagae / promptfoo-sample

Rul1an / assay

Cornjebus / neo-llm-security

ashleysally00 / promptfoo-quickstart-guide

dhirajxai / llm-evals-and-anti-hallucination

ddltn / giuseppe-prompting-pizzeria

avi350751 / bfsi-red-team

klausners / prompt-optimizer

mehernidhi / agentic-soc-redteam

SzematPro / ai-agent-eval-harness-healthtech

Improve this page

Add this topic to your repo