Lead Product Engineer

About CIP

The Collective Intelligence Project builds infrastructure that gives people across the globe meaningful input into how AI systems are developed and governed. We combine large-scale deliberation, participatory evaluation, and institutional partnerships in a way no lab, regulator, or civil society organization can achieve alone.

We're a small, high-leverage team backed by leading foundations (Google.org, Omidyar Network, Future of Life Foundation, Robert Wood Johnson Foundation) working with top AI labs and governments to ensure AI development expands democratic capacity rather than undermining it.

About the Role

We are hiring a product engineer to build and maintain full-stack platforms enriched with copious data, rich visualizations and easily-navigable user experiences. The overarching challenge is one of articulating complex and sometimes overwhelming data in a way that can be understood by mainstream audiences, journalists, academics, and fellow engineers.

You'll start by spending most of your time continuing to build out Weval (weval.org), our evaluation platform that's being used by AI labs and governments to assess frontier models on questions automated benchmarks can't answer—like whether a model handles mental health crises safely, gives accurate legal advice in Indian languages, or exhibits political bias.

You'll also work on other CIP projects like Global Dialogues (70+ countries gathering public input on AI), Digital Twin evaluations (testing whether AI agents accurately represent people), and experiments with partners deploying democratic AI governance tools in the real world.

This is a high-impact IC role where you'll own significant parts of the technical infrastructure, work directly with partners at AI labs and governments, and help prove that democratic oversight of AI systems is not just possible but practical.

You'll report to Evan Hadfield, CIP's Head of Projects.

What You'll Do

Build and improve Weval (~60%)

Weval is an open platform where researchers, civil society organizations, and domain experts create evaluations that labs and governments actually use. It’s infrastructure for pluralistic AI evaluation where a mental health professional in the US, an election integrity org in Sri Lanka, and policymakers in India can all rigorously test whether AI systems work safely and well in their contexts.

You'll:

Build core platform features: evaluation authoring tools, leaderboards, data pipelines for collecting and analyzing human judgments
Develop APIs and integrations so labs (Anthropic, OpenAI, Cohere) and governments can easily run Weval evaluations on their models
Design and implement rich data visualizations and interactive user interfaces to articulate complex evaluation data for non-technical audiences, including policymakers and journalists
Create tools that let non-technical users design and deploy evaluations, from configuring evaluation criteria to analyzing results
Improve platform performance, reliability, and UX based on how partners are actually using it
Own key architectural decisions as the platform scales to support more evaluations, more models, and more partners

Support other CIP projects (~30%)

You'll work on CIP's other democratic AI infrastructure:

Global Dialogues: Build tools to analyze and visualize data from 10,000+ participants across 70+ countries on what they want from AI systems
Digital Twins: Develop evaluation infrastructure testing whether AI agents accurately represent diverse groups' values and preferences
New experiments: Prototype new tooling that will let partners design and run their own pilots (e.g. epistemic quality benchmarks or AI-mediated deliberation)

What Will Make You a Good Fit

Required:

3-5 years of software engineering experience, with a strong focus on frontend development and building excellent user interfaces. While this is a full-stack role, significant experience with NextJS, React, and TypeScript is essential.
You've shipped products that people actually use and find valuable.
You have product sensibility. You care about UX, design quality, and building things that work well for real users.
You have genuine facility with AI tools (like Claude, Cursor, or similar) and use them in your daily workflow to stay on the cutting edge and act as a force multiplier for building and problem-solving. You use AI tools not just for productivity but to directly build and ship.
You're comfortable working independently, making pragmatic technical decisions, and moving quickly.
You're genuinely excited about CIP's mission: building democratic infrastructure for AI, ensuring AI development serves people and democracy rather than concentrating power.
While exceptional international candidates are encouraged to apply, priority will be given to applicants who can work during Pacific or Eastern time zones.

Nice-to-haves:

Experience with AI evaluation platforms, survey tools, research infrastructure, or data collection systems
Experience with Supabase/Postgres, Vercel/Netlify
Previous work in mission-driven organizations, civic tech, research environments, or academic settings
Open source contributions, technical writing, or community building
Familiarity with deliberative democracy, collective intelligence, AI governance, or participatory methods

This role in 12 months:

The nature of software engineering is changing fast. We expect that in a year, you'll spend less time writing code line-by-line and more time orchestrating AI agents, making architectural decisions, and reviewing/directing output from increasingly capable coding tools. The engineer's value is shifting from "can you build it" to "do you know what to build, and can you tell whether it was built well." That means stronger emphasis on system design judgment, quality standards, and the ability to manage multiple parallel workstreams where AI is doing much of the execution. We're looking for someone who's excited by that shift — who sees it not as a threat to their craft but as a massive expansion of what a single engineer can ship.

What We Offer

Impact: Your work directly shapes how major AI labs evaluate and align their systems. Evaluations you build will influence model releases, government procurement, and safety decisions.
Ownership and Autonomy: You'll own significant technical decisions and product direction, help define what we build and how, and think through how we can best help people with our work.
Interesting problems: You're building infrastructure for pluralistic evaluation at global scale, handling multilingual content, complex human judgment data, and integration with frontier AI systems.
Great team: Our team is committed, mission-driven, and kind. You’ll also work with researchers, policy experts, and practitioners across leading AI labs (Anthropic, OpenAI), governments (UK AISI, Taiwan, India), and civil society organizations worldwide.
Growth: Room to expand responsibilities as our projects and team grow. Many paths forward depending on your interests: deeper technical ownership, product leadership, or expanding into new problem domains.
Compensation: $150k + comprehensive benefits including health/dental/vision, 403(b) [nonprofit version of 401(k)], generous PTO.
Flexibility: We trust people to manage their own time. That means flexible hours, real accommodation for life stuff (appointments, errands, sick days, time off), and a culture that judges output over presence. That said, we work hybrid in-office/remote on Pacific time and our office is in Japantown, SF, and we have a strong preference for someone who can be in-person regularly for the kind of collaborative, whiteboard-it-out work that's hard to replicate on Zoom.

To Apply:

Submit an application via this form.

We're looking to hire ASAP and will review applications on a rolling basis.