Latest from the Blog

Insights, tutorials, and news about AI evaluation, LLM judges, and building reliable GenAI applications.

2026-04-13

Bootstrapping AI Evals from Context (Why 'Just Asking Claude' Fails)

A design pattern and protocol that lets you bootstrap a maximally strong evaluation stack for the AI features in your codebase with minimum effort, using the Prosecutor Pattern.

2026-03-31

Evals Are Your Competitive Edge: DIY Eval System vs. Eval Platform

We stress-tested the build vs. buy question for AI evals two ways: a barebones eval system from scratch, then a platform-backed one using Scorable. Here's what actually differs, and what doesn't.

2026-03-23

How do we create the evaluators?

A look into how we built the Evaluator Factory, a tool to automatically create evaluation stacks for your LLM apps.

2026-01-20

Get Clear AI Evaluation Insights in Slack - Scorable Slack App

AI systems generate metrics constantly, but teams struggle to understand which metrics matter right now. The Scorable Slack app brings evaluation insights directly into Slack, where decisions actually happen.

2025-10-27

The Easiest Way to Start Using Scorable Evals in Your AI App

Scorable evals make it easy to automatically evaluate and refine your model's responses, improving performance and consistency with minimal setup.

2025-10-15

Ensuring the Safety of Healthcare AI with LLM Judges

Gosta Labs is transforming healthcare with AI-powered tools that save time and improve patient care. With Scorable, every model iteration can be tested, validated, and trusted before reaching real-world use.

2025-10-06

Build Custom AI Evaluators from Policies & Examples with Scorable (in Minutes)

Generic benchmarks only tell part of the story. With Scorable, you can transform your own policies and examples into custom evaluators that measure what truly matters for your business.

2025-09-18

Scorable Builds Your Customized AI Evaluation Stack in 1 Minute

How can you make sure your AI application isn't hallucinating? Learn how Scorable builds your customized AI evaluation stack in just 1 minute to ensure reliability and accuracy.

2025-09-03

Scorable is Now Available on AWS Marketplace!

Scorable is now transactable on AWS Marketplace! Access our LLM evaluation and monitoring platform faster with simplified procurement and seamless AWS integration.

2025-08-25

Scorable Achieves SOC 2 Type II Certification

Scorable demonstrates commitment to security and compliance by achieving SOC 2 Type II certification.

2025-07-16

RAG Evaluation Fundamentals: A Complete Guide to Measuring RAG Performance

Master the fundamentals of RAG evaluation with this comprehensive guide covering key metrics, methodologies, and best practices for assessing retrieval-augmented generation systems.

2025-06-17

Why do LLMs still hallucinate in 2025?

Newer AI models are experiencing MORE hallucinations, not fewer. Explore why hallucinations are complex and not simply resolved by adding context.

2025-02-19

Scorable Introduces Root Judge: The State-of-the-Art Judge Model

Root Judge is a groundbreaking LLM that sets a new standard for reliable, customizable, and locally-deployable evaluation models, fine-tuned from Llama-3.3-70B.

2024-10-17

LLM as a Judge vs. Human Evaluation

In the rapidly evolving landscape of AI, we're witnessing a paradigm shift in how we evaluate and validate LLM-generated content.

2024-09-04

Scorable raises $2.8M to accelerate GenAI business adoption by having AI watch AI

Despite global hype for GenAI, most businesses have so far failed to take their GenAI prototypes from experimentation to production. Scorable has raised $2.8M to solve this.