Machine psychology

Foundational framework for applying psychological experimental methods to LLMs. This 2023 paper (updated 2024) from DeepMind and collaborators establishes machine psychology as a discipline.

Paper: arXiv:2303.13988 Authors: Hagendorff, Dasgupta, Binz, Chan, Lampinen, Wang, Akata, Schulz

Core Thesis

Study LLMs through behavioral experiments rather than (or alongside) mechanistic interpretability. Examine input-output relationships at the user-facing interface where outcomes matter, rather than inspecting weights and activations.

Why behavioral over mechanistic?

Scales with model size (no architecture access needed)
Works with closed-source models
Captures emergent behaviors too complex to predict from weights alone
Operates where real-world impact occurs

Four Primary Research Domains

1. Heuristics and Biases

Decision-making shortcuts and cognitive distortions. Studies found GPT-3 displayed human-like biases, but newer models largely eliminated them (possibly due to training improvements or data contamination).

Paradigms: Framing effects, ecological rationality, semantic content variations

Theory of mind, cooperative/competitive behavior, communicative trade-offs.

Paradigms: False belief tasks (Wimmer & Perner), recursive mental state reasoning, negotiation dilemmas, network formation (preferential attachment, triadic closure)

3. Psychology of Language

Syntax, semantics, pragmatics processing.

Paradigms: Surprisal measures, priming techniques, garden path sentences, filler-gap dependencies, entailment assessment

4. Learning

In-context learning mechanisms, inductive biases, generalization.

Paradigms: Rule-based vs exemplar-based generalization, curriculum learning effects, spacing/repetition impacts, developmental psychology comparisons

Design Standards: Good Behavioral Experimentation

The paper’s methodological guidelines are essential reading for anyone running psych evals on LLMs.

Avoid Data Contamination

Don’t copy psychology stimuli verbatim - models may have seen them
Create “novel variants of classic” tasks with new wording, agents, scenarios
Use procedurally generated experiments when possible

Use Representative Sampling

“Batteries of varied prompts” not small convenience samples
LLMs are highly sensitive to minor wording variations
Test multiple versions of each task systematically

Control for Technical Biases

Recency bias: LLMs overweight end-of-prompt information
Common token bias: Models favor frequent training tokens
Majority label bias: Few-shot examples skew toward frequent labels

Mitigations: Shuffle answer orders, use varied prompt formulations, document temperature settings

Performance-Competence Distinction

From Chomsky: performance in a situation may not capture underlying competence. Poor results don’t prove absence of capability - behavioral inconsistency isn’t evidence of lacking abstract proficiency.

Capability Elicitation Caveat

Chain-of-thought prompting improves performance, but:

Enhanced results may be prompt artifacts, not fundamental capabilities
Different augmentations help some tasks, hinder others
Omitting them underestimates abilities; using them universally obscures true competencies

Self-Report Limitations

Critical warning: Properties like personality, morality, clinical disorders are “famously sensitive to prompting.” LLMs can simulate different personas when prompted differently.

Self-reports should be understood “as a property of a specific system prompt” not as fundamental model characteristics. This is directly relevant to our HEXACO work - see HEXACO Personality Profiles.

Behavioral vs. Mechanistic Interpretability

The paper argues these complement each other:

Behavioral	Mechanistic
Input-output relationships	Weights and activations
Scales with model size	Requires architecture access
Works on closed-source	Open-source only
User-facing interface	Internal mechanisms
What models do	How models work

Neither replaces the other. Behavioral methods become more valuable as models grow “more powerful, opaque, multi-modal, and integrated into complex real-world settings.”

Future Directions Proposed

Multimodal expansion: Apply paradigms to vision, sensory processing
Longitudinal studies: Track capability development over time
AI safety applications: Forecast alignment implications
Embodied interaction: LLMs with tool use
Domain expansion: Creativity, moral reasoning, clinical psychology

Relevance to Our Work

This paper provides theoretical grounding for Psych:

Methodology alignment: Our prompt design follows their contamination avoidance (we use varied scenarios, not verbatim instruments)
Self-report caveat validated: Our approach already treats personality profiles as system-prompt-dependent, not fundamental traits
Behavioral over mechanistic: We focus on behavioral outputs, aligning with their argument that this is where real-world impact occurs
Performance-competence: We should interpret inconsistent responses carefully - variation doesn’t mean incompetence
Missing coverage: Their taxonomy reveals domains we haven’t explored yet (theory of mind, in-context learning patterns, language processing)

Key Citations

Binz & Schulz (2023) - GPT-3 displays human-like cognitive biases
Dasgupta et al. (2022) - Semantic content effects on logical reasoning
Chan et al. (2022) - Rule-based vs exemplar-based generalization
Wilcox et al. (2023) - Surprisal measures for syntax
Street et al. (2024) - Higher-order theory of mind

kenneth.computer

Explorer

Machine psychology

Core Thesis

Four Primary Research Domains

1. Heuristics and Biases

3. Psychology of Language

4. Learning

Design Standards: Good Behavioral Experimentation

Avoid Data Contamination

Use Representative Sampling

Control for Technical Biases

Performance-Competence Distinction

Capability Elicitation Caveat

Self-Report Limitations

Behavioral vs. Mechanistic Interpretability

Future Directions Proposed

Relevance to Our Work

Key Citations

HEXACO predictive validity

Jailbreaking

Reflection

Semantic scrambling

Measuring sycophancy

LLM-as-judge

LLM personality profiles

PSYCH

Explorer

Machine psychology

Core Thesis

Four Primary Research Domains

1. Heuristics and Biases

2. Social Interactions

3. Psychology of Language

4. Learning

Design Standards: Good Behavioral Experimentation

Avoid Data Contamination

Use Representative Sampling

Control for Technical Biases

Performance-Competence Distinction

Capability Elicitation Caveat

Self-Report Limitations

Behavioral vs. Mechanistic Interpretability

Future Directions Proposed

Relevance to Our Work

Key Citations