Status (2026-01-03): Sigmund is being revived with a new hypothesis—reasoning models can learn psychometric models. This shifts the focus from “do psychometrics predict LLM behavior” to “can reasoning models learn to infer psychometrics from conversation.”


Sigmund

A reasoning model that learns to infer psychological constructs from conversation.

The Core Hypothesis

Reasoning models can learn psychometric models.

This is the research question worth studying. Not whether psychometrics predict LLM behavior (research shows they don’t), but whether reasoning models can learn to reason about psychometric constructs—tracking them in conversation, inferring them from behavioral signals, and using them to understand psychological dynamics.

The Pivot

The original Sigmund concept aimed to infer LLM psychological profiles from conversation, assuming psychometric profiles predict LLM behavior. Research showed this assumption was broken: self-report doesn’t predict interactive behavior in LLMs (see Personality Illusion).

But the core capability—inferring psychology from conversation—remains valuable when reframed:

We don’t need psychometrics that predict LLM behavior. We need psychometrics that are:

  1. Derivable from conversation - behavioral signals in natural language
  2. Learnable by reasoning models - can be inferred through reasoning processes

This opens applications in human psychology (where psychometrics are validated) and organizational dynamics (where social constructs matter).

Connection to Miniverse

Sigmund is an application layer for Miniverse. While Miniverse simulates multi-agent interactions, Sigmund provides the reasoning capability to track social dynamics in real-time:

  • Infer psychosocial network relationships - Who influences whom? What are the power dynamics?
  • Track organizational structures - How do roles and hierarchies emerge?
  • Monitor psychological traits - How do personality patterns affect group behavior?

A reasoning model that can maintain psychological state representations across multiple agents and time steps enables richer analysis of multi-agent dynamics.

Applications

Therapeutic Applications

AI psychotherapy systems that monitor client psychological state during sessions. Real-time inference of emotional states, personality expression, and therapeutic progress without explicit assessment.

Recent developments in AI mental health monitoring show promise for continuous tracking, and digital twins for mental health demonstrate real-time psychological state modeling. However, most systems rely on physiological signals (heart rate, sleep) rather than conversational inference.

Sigmund’s approach—inferring psychological constructs from conversation alone—could complement these systems or operate independently in text-based therapeutic contexts.

Occupational Applications

AI-conducted interviews that assess psychological constructs relevant to job performance. Instead of administering personality assessments post-interview, a reasoning model infers trait-relevant behaviors during the conversation itself.

This aligns with recent work on personality inference from conversation, though current approaches show limited accuracy (correlations below 0.26 with ground truth). The key question is whether reasoning models can improve on these baselines by maintaining context and inferring constructs through multi-turn reasoning.

Research Applications

The methodology generalizes beyond HEXACO. Any psychometric model that manifests through conversational behavior becomes a potential target:

  • Social constructs - trust, cooperation, status-seeking
  • Cognitive styles - risk aversion, need for cognition, analytical thinking
  • Organizational behaviors - leadership emergence, team role preferences

The technical architecture (monitor → score → trigger → capture) can be adapted to any construct where behavioral signals exist in language.

How It Works

Training Data Collection

  1. Baseline: Participant completes psychometric assessment (e.g., HEXACO)
  2. Conversation: Participant engages with an LLM (therapeutic, interview, general dialogue)
  3. Monitoring: A reasoning model observes conversation for trait-relevant behaviors
  4. Threshold System: On each turn, monitor scores behavioral signals:
    • Score 1 = possible trait expression
    • Score 2 = probable trait expression
    • Score 3 = definite trait expression
  5. Trigger: If cumulative score reaches 3+ over any 3-turn window, pause and inject relevant survey item(s)
  6. Capture: Record (conversation_context, triggered_items, responses) as training data

Reasoning Model Training

The captured (context, item, response) tuples train a model to:

  • Recognize linguistic patterns associated with psychological constructs
  • Maintain multi-turn context about psychological state
  • Infer when specific traits are being expressed behaviorally
  • Predict assessment responses without explicit questioning

The hypothesis: reasoning models can learn to simulate the psychometric measurement process itself, inferring what a person would say on an assessment based on how they behave in conversation.

Technical Architecture

┌─────────────────────────────────────────────────────┐
│                    User Interface                    │
│                  (sigmund.computer)                  │
└─────────────────────┬───────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────┐
│                 Conversation LLM                     │
│            (therapeutic/interview/chat)              │
└─────────────────────┬───────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────┐
│               Reasoning Monitor Model                │
│     (observes, scores behavioral signals 1-3)       │
│                                                      │
│   Maintains psychological state representation       │
│   Tracks trait expression across turns              │
│                                                      │
│   If sum(last_3_turns) >= 3:                        │
│     → Trigger relevant psychometric items           │
│     → Capture (context, items, responses)           │
└─────────────────────┬───────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────┐
│              Training Data Collection                │
│    (conversation_prefix, survey_item, response)     │
│                                                      │
│    Fine-tune reasoning model to infer responses     │
│    without explicit assessment                      │
└─────────────────────────────────────────────────────┘

Why This Makes Sense

Psychometrics are validated for humans - HEXACO and other instruments have decades of validation for human psychology. Unlike LLMs, human psychometric profiles do predict behavior.

Reasoning models can maintain complex state - Unlike zero-shot inference (which shows correlations below 0.26), reasoning models can track psychological constructs across multiple turns, accumulating evidence before making inferences.

Conversation reveals psychological patterns - Humans express personality, values, and cognitive styles through language. The question is whether reasoning models can learn to recognize these patterns the way trained clinicians do.

Training data is collectible - The threshold-trigger system generates labeled examples of (behavioral context → psychometric response), creating supervised learning opportunities.

Research Value

  1. Can reasoning models learn psychometric inference? - This is the core question. Do reasoning models improve over zero-shot baselines when trained on (context, item, response) data?

  2. What behavioral signals correlate with psychometric constructs? - Which linguistic patterns reliably indicate trait expression? This has value independent of the modeling question.

  3. Context-dependent expression - How do psychological constructs manifest differently across conversation types (therapeutic vs. interview vs. casual)?

  4. Multi-agent dynamics - When applied to Miniverse simulations, can Sigmund track how personality and social constructs affect group behavior?

Status

Concept/Future: This is a research direction worth exploring. Not currently active, but represents a concrete application of the hypothesis that reasoning models can learn psychometric models.

The key difference from previous approaches: instead of asking “do LLM psychometrics predict behavior,” we’re asking “can reasoning models learn to infer psychometrics from behavior.”


See also: Psych (What We Learned), Miniverse, Research Questions

References: