James Padolsey 2025-05-22 James Padolsey 2025-05-22

LLM Judges Are Unreliable

When Large Language Models are used as judges for decision-making across various sensitive domains, they consistently exhibit unpredictable and hidden measurement biases, making their verdicts unreliable despite common prompt engineering practices.

Joal Stein 2024-10-30 Joal Stein 2024-10-30

Runtime AI, and the Subtleties of Language

Interacting with AI is not a passive experience. It's an ongoing dialogue where our words hold significant sway over the future of the conversation. What does AI literacy mean today? It means paying attention to how our use of language shapes the behavior of AI.