James Padolsey 2025-05-22 James Padolsey 2025-05-22

LLM Judges Are Unreliable

When Large Language Models are used as judges for decision-making across various sensitive domains, they consistently exhibit unpredictable and hidden measurement biases, making their verdicts unreliable despite common prompt engineering practices.

Joal Stein 2024-11-27 Joal Stein 2024-11-27

Andy Ayrey on Truth Terminal, Agentic AI, and Data Commons

An interview with researcher Andy Ayrey exploring his experimental AI system Terminal of Truth, which inadvertently became a millionaire when crypto traders created a token based on its social media posts. The conversation delves into how this unexpected outcome illuminates the potential for AI systems to become active participants in human economic systems.