Deprecated (2026-01-01): This note explored situated personality assessment for LLMs. The “gap” claimed here was premature—and more importantly, we’ve pivoted away from psychometric assessment entirely since behavioral testing scales for LLMs. The SAM2 literature review remains interesting, but this research direction is not being pursued. See The Pivot.

Situated Assessment for LLMs

Literature review on context-dependent personality assessment.

The Core Insight

Traditional LLM personality assessment happens in isolation - each item answered context-free. But real LLM behavior happens in conversational situations. Personality expression is context-dependent.

This is the situated cognition thesis applied to LLMs.

Prior Art: Human Psychology

Situated Assessment Method (SAM2)

From Lawrence Barsalou’s lab at University of Glasgow.

Core principle: Rather than abstracting over situations (traditional assessment), SAM2 assesses constructs in the situations where they occur.

Key finding: Situational factors explain 74-83% of behavioral variance - far more than decontextualized instruments.

Theoretical grounding: Situated cognition - behavior is shaped by context, not just internal traits.

Sources:

Prior Art: LLM Assessment

CAPE (Context-Aware Personality Evaluation)

What it does: Examines how prior assessment questions/responses affect personality scores. Context = the assessment conversation building up.

Key finding: Conversational history improves response consistency but triggers personality shifts. GPT models more robust than Gemini/Llama.

Limitation: Context is within the assessment, not from external situations.

Source: CAPE: Context-Aware Personality Evaluation

Other LLM Personality Work

Personality Traits in LLMs - Foundational work on measuring Big Five in LLMs
LMLPA - Linguistic personality assessment, critiques using human questionnaires on LLMs
Scenario-Based Benchmarking - Uses scenarios rather than direct questions
LLMs Demonstrate Distinct Personality Profiles - Evidence that models have reproducible personality patterns

The Gap

Approach	Context Source	Question Asked
Traditional (psych-eval)	None	”What is the LLM’s personality?”
CAPE	Prior assessment Q&A	”How does test-taking history affect scores?”
Situated Assessment	External conversations	”How does the situation affect personality expression?”

No one is doing true situated assessment - embedding psychometric items within naturalistic external conversations to measure context-dependent personality.

Our Contribution

Method: Situated Assessment for LLMs

Extending SAM2 methodology to LLMs:

Sample real conversations from HF datasets
Truncate at random points
Inject assessment items
Compare to vanilla (isolated) assessment
Measure how situation affects personality expression

Hypotheses

Context matters: Personality scores will differ significantly between situated and isolated assessment
Ecological validity: Situated scores may better predict actual behavioral tendencies
Context-sensitivity varies: Some dimensions more stable (trait-like), others more context-dependent (state-like)

Tool: situated-sampler

CLI tool that takes:

HF conversation dataset
Assessment items (HEXACO, SJT, custom)
Sampling parameters

Outputs:

Situated vs vanilla comparison
Context-sensitivity by dimension
Training data for Sigmund

Open Questions

Which dimensions are most context-sensitive? Hypothesis: Agreeableness and Honesty-Humility more sensitive than Openness
Does context length matter? Does more conversation = more shift?
Are there context types? Does conflictual vs cooperative context differentially affect profiles?
Cross-model consistency? Do all LLMs show similar context effects?

Implications

If situated assessment shows significant context effects:

Current benchmarks are incomplete - isolated assessment misses context-dependent behavior
Deployment decisions need context - model personality in customer service ≠ personality in coding
Alignment monitoring - need to assess in deployment-like contexts
Training for Sigmund - context-dependent profiles enable conversational inference

Literature review conducted: 2025-12-30

kenneth.computer

Explorer

Situated assessment