2025-11-30 | Scoring Approaches
Documented the LLM-as-judge approach for scoring behavioral dimensions.
Key insight: classification is more reliable than continuous scoring. Planning to move from 0-1 scales to categorical outputs.
Also reviewed CAA’s GPT-4 scoring prompt for sycophancy. Will use theirs for direct comparison, ours for custom tests.