research
Notes on AI behavior and human-AI control.
As AI systems grow more capable, understanding their behavior becomes critical. Machine psychology is emerging as a discipline to meet this need. Open Phil is funding black-box LLM psychology research. OpenAI says AI safety needs social scientists. These notes are my contribution to that conversation.
Two threads
I’m pursuing two related questions. They’re parallel explorations of the human-AI interface, united by methodology (behavioral science) and lens (cybernetics).
How do humans remain in control in an increasingly multi-agent world?
The cognitive and organizational structures that let humans understand and control increasingly complex AI systems. As multi-agent systems become more capable and less intelligible, what allows us to stay in the loop?
Can AI systems learn to reason about psychology?
Evidence from systems like Plastic Labs’ Neuromancer suggests reasoning models can learn to track psychological constructs. If AI can hold a psychological model and reason about it, this opens several possibilities:
- Better state inference: Assessing psychological states of users (human or AI) from conversation or unstructured text, without surveys
- Real-time intervention: Risk assessment and response strategies—detecting aggravation and responding to de-escalate, for instance
- Applied domains: Psychotherapy, interviewing, negotiation, influence
This also raises a deeper question: if humans have psychological states that can be modeled, and AI can learn to model them, do AI systems themselves have psychological states? Can we measure them reliably to predict downstream behavior?
Current evidence suggests traditional psychometrics don’t transfer to LLMs, and self-report doesn’t predict interactive behavior. But there may be other approaches. This thread connects directly to machine psychology and interpretability research. Sigmund is an early exploration of whether reasoning models can learn psychometric frameworks.
Two sides of the same interface. The first asks how humans understand AI. The second asks how AI can understand humans, and whether that understanding can serve human control.