LLM Judges Are Unreliable
When Large Language Models are used as judges for decision-making across various sensitive domains, they consistently exhibit unpredictable and hidden measurement biases, making their verdicts unreliable despite common prompt engineering practices.
Runtime AI, and the Subtleties of Language
Interacting with AI is not a passive experience. It's an ongoing dialogue where our words hold significant sway over the future of the conversation. What does AI literacy mean today? It means paying attention to how our use of language shapes the behavior of AI.