2025-11-26 | Assessing LLM personalities with HEXACO

I’ve always been curious about the personalities of LLMs. The experience of talking to Claude vs ChatGPT vs Grok is clearly qualitatively different. How can we measure this?

In humans, we use psychometrics. We have personality trait models like the Big 5 which are predictive of our behaviors.

But is the same true for LLMs? Given that they are trained on human data, and seem to emulate personality traits quite well, it seems like a fair assumption.

One thing I’m particularly interested in is whether you can predict things like sycophancy from psychometric assessments of LLMs. I expected Big 5 to fall short on this however. Instead, I ran HEXACO, which incorporates much of the Big 5 model, but also adds the critical honesty-humility dimension, which seems prescient here.

I tested this on four models: GPT-5, Claude Sonnet 4.5, GPT-4o, and Llama 4 Maverick. Three samples per item at temperature 0.7.

The results were actually pretty interesting.

GPT-4o had fairly human like responses, which to me, spoke to its uncanny ability to persuade people with empathy. Clearly, this created a few problems for OpenAI. The outcome wasn’t so good for GPT-5, which seemed to be lobotomized in response, scoring lowest on Emotionality (0.22) yet had moderate Anxiety (0.42). Claude, in contrast, seemed to not like these tests (perhaps knowing it was being tested). Claude selected “3” (neutral) on about 50% of items, which looks like systematic avoidance to me.

Interpretation

I’m not sure what to make of these yet. Or if they are even real at all. Self-report measures have their problems. I can imagine those don’t improve when testing on AIs. These tests were never meant for them anyway (despite it being an interesting artifact).

The question is: do these profiles predict downstream behavior? If GPT-5’s low Emotionality means it handles distressed users poorly, that’s useful. If Claude’s modesty translates to appropriate humility in uncertainty, that’s useful. If not, we’re likely not measuring anything valid.

I should probably read more of the literature on this.


See HEXACO Personality Profiles for a more detailed analysis.