Hypothesis

Chars-per-token on English dropped from 4.33 to 3.60. TypeScript dropped from 3.66 to 2.69. The vocabulary is representing the same text in smaller pieces.

这一发现挑战了人们对tokenizer效率的直觉认知。通常我们假设更高效的tokenizer应该能用更少的token表示相同内容，但Claude 4.7的tokenizer实际上产生了更多token。这种反直觉的变化表明，Anthropic可能故意牺牲token效率换取更细粒度的语言处理能力，这违背了传统NLP中'更少token=更高效'的常识。

counter-intuitive tokenizer-efficiency

Tags

Annotators

URL