We publish our methods, findings, and failures. Every article is written by the team that built the system.
We ran 80,433 trials across six models to answer a specific question: does sycophancy increase as the context window fills up? The answer surprised us. Context length matters, but only for small models. The real driver is conversational pattern — and it is reversible.