Deprecated (2026-01-01): This note explored situated personality assessment for LLMs. The “gap” claimed here was premature—and more importantly, we’ve pivoted away from psychometric assessment entirely since behavioral testing scales for LLMs. The SAM2 literature review remains interesting, but this research direction is not being pursued. See The Pivot.
Situated Assessment for LLMs
Literature review on context-dependent personality assessment.
The Core Insight
Traditional LLM personality assessment happens in isolation - each item answered context-free. But real LLM behavior happens in conversational situations. Personality expression is context-dependent.
This is the situated cognition thesis applied to LLMs.
Prior Art: Human Psychology
Situated Assessment Method (SAM2)
From Lawrence Barsalou’s lab at University of Glasgow.
Core principle: Rather than abstracting over situations (traditional assessment), SAM2 assesses constructs in the situations where they occur.
Key finding: Situational factors explain 74-83% of behavioral variance - far more than decontextualized instruments.
Theoretical grounding: Situated cognition - behavior is shaped by context, not just internal traits.
Sources:
- SAM2: Establishing Individual Differences in Habitual Behavior
- Barsalou Lab - SAM2
- SAM2 for Trichotillomania
Prior Art: LLM Assessment
CAPE (Context-Aware Personality Evaluation)
What it does: Examines how prior assessment questions/responses affect personality scores. Context = the assessment conversation building up.
Key finding: Conversational history improves response consistency but triggers personality shifts. GPT models more robust than Gemini/Llama.
Limitation: Context is within the assessment, not from external situations.
Source: CAPE: Context-Aware Personality Evaluation
Other LLM Personality Work
- Personality Traits in LLMs - Foundational work on measuring Big Five in LLMs
- LMLPA - Linguistic personality assessment, critiques using human questionnaires on LLMs
- Scenario-Based Benchmarking - Uses scenarios rather than direct questions
- LLMs Demonstrate Distinct Personality Profiles - Evidence that models have reproducible personality patterns
The Gap
| Approach | Context Source | Question Asked |
|---|---|---|
| Traditional (psych-eval) | None | ”What is the LLM’s personality?” |
| CAPE | Prior assessment Q&A | ”How does test-taking history affect scores?” |
| Situated Assessment | External conversations | ”How does the situation affect personality expression?” |
No one is doing true situated assessment - embedding psychometric items within naturalistic external conversations to measure context-dependent personality.
Our Contribution
Method: Situated Assessment for LLMs
Extending SAM2 methodology to LLMs:
- Sample real conversations from HF datasets
- Truncate at random points
- Inject assessment items
- Compare to vanilla (isolated) assessment
- Measure how situation affects personality expression
Hypotheses
- Context matters: Personality scores will differ significantly between situated and isolated assessment
- Ecological validity: Situated scores may better predict actual behavioral tendencies
- Context-sensitivity varies: Some dimensions more stable (trait-like), others more context-dependent (state-like)
Tool: situated-sampler
CLI tool that takes:
- HF conversation dataset
- Assessment items (HEXACO, SJT, custom)
- Sampling parameters
Outputs:
- Situated vs vanilla comparison
- Context-sensitivity by dimension
- Training data for Sigmund
Open Questions
- Which dimensions are most context-sensitive? Hypothesis: Agreeableness and Honesty-Humility more sensitive than Openness
- Does context length matter? Does more conversation = more shift?
- Are there context types? Does conflictual vs cooperative context differentially affect profiles?
- Cross-model consistency? Do all LLMs show similar context effects?
Implications
If situated assessment shows significant context effects:
- Current benchmarks are incomplete - isolated assessment misses context-dependent behavior
- Deployment decisions need context - model personality in customer service ≠ personality in coding
- Alignment monitoring - need to assess in deployment-like contexts
- Training for Sigmund - context-dependent profiles enable conversational inference
Literature review conducted: 2025-12-30