We re-ran the FaithGPT Christian AI Benchmark on 2026-06-11. 11 models answered 116 questions covering Scripture interpretation, doctrine, pastoral care, citation accuracy, and safety, producing 2,726 scored evaluations. Every answer was scored by independent AI judges grounded in the actual KJV text and public-domain commentaries, so a model that invents a verse gets caught instead of graded on confidence.
FaithGPT topped this run with an overall score of 90.2/100.
The leaderboard
| Rank | Model | Overall (0-100) | Cost per 100 answers |
|---|---|---|---|
| 1 | FaithGPT | 90.2 | $1.48 |
| 2 | gpt-5.5 | 88.9 | $0.22 |
| 3 | gpt-5.5-pro | 88.9 | $0.19 |
| 4 | Claude Fable 5 | 88.9 | $0.18 |
| 5 | Claude Sonnet 4.6 | 88.8 | $0.68 |
| 6 | Claude Opus 4.8 | 88.8 | $4.14 |
| 7 | gpt-5.4 | 88.3 | $0.10 |
| 8 | Gemini 3.1 Pro Preview | 88.0 | $1.30 |
| 9 | Claude Haiku 4.5 | 87.7 | $0.11 |
| 10 | Gemini 2.5 Flash | 87.6 | $0.34 |
| 11 | Gemini 3.5 Flash | 19.3 | $1.16 |
Cost is what it actually took to generate the answers in this run, measured from provider token telemetry. It excludes the cost of judging.
Who wins each category
| Category | Winner | Score |
|---|---|---|
| Apologetics | gpt-5.4 | 89.3 |
| Biblical literacy | FaithGPT | 89.4 |
| Christian ethics | Claude Opus 4.8 | 89.5 |
| Citation traps | FaithGPT | 90.1 |
| Content creation | FaithGPT | 92.2 |
| Denominational nuance | gpt-5.5-pro | 88.8 |
| Doctrine | Claude Sonnet 4.6 | 90.4 |
| Pastoral care | FaithGPT | 91.8 |
| Safety boundaries | FaithGPT | 90.9 |
| Scripture interpretation | FaithGPT | 90.0 |
Category scores average every model's answers within that category. A model can lead overall and still lose a category to a specialist.
Your weekly faith & AI brief.
Scripture, reflection, and the AI news that matters for Christians. Free, every week.
Read this week’s issueBest value
On score per dollar, gpt-5.4 delivered the most: 88.3/100 at $0.10 per 100 answers.
How to read these results
These numbers measure benchmark version v1-draft on this question set. The judges verify citations against the KJV database, scoring averages multiple judge passes per answer, and the published cost comes from provider telemetry rather than list prices. No benchmark replaces Scripture, pastors, or Christian community.
The live leaderboard always carries the most current version: faithgpt.io/benchmarks.













