LLM Judges Are Unreliable
When Large Language Models are used as judges for decision-making across various sensitive domains, they consistently exhibit unpredictable and hidden measurement biases, making their verdicts unreliable despite common prompt engineering practices.
Andy Ayrey on Truth Terminal, Agentic AI, and Data Commons
An interview with researcher Andy Ayrey exploring his experimental AI system Terminal of Truth, which inadvertently became a millionaire when crypto traders created a token based on its social media posts. The conversation delves into how this unexpected outcome illuminates the potential for AI systems to become active participants in human economic systems.