Language Models Model Language

Łukasz Borchmann

20 Oct 2025 3 min read

AI-generated image, based on the article abstract

Quick Insight

Why Chatbots Get Better When We Count Words, Not Just Rules

Ever wondered why a chatbot sometimes sounds just like a friend? Scientists have discovered that the secret isn’t hidden grammar trees but simple word‑frequency patterns. Imagine learning a new language by listening to the most‑used phrases on the street instead of memorizing every rule in a textbook. That’s the fresh view brought by linguist Witold Mańczak, who says language is really the sum of everything we say and write, driven by how often we use each piece. Applying this idea to modern language models means we can build smarter, more natural‑talking AI by focusing on the everyday words people actually use. It’s like teaching a robot to speak by giving it a playlist of popular songs rather than a dense grammar manual. This breakthrough helps us design, test, and understand AI chatters in a way that feels more human and less mysterious. As we keep counting the words we love, the future of conversation with machines becomes clearer and more exciting. 🌟

Short Review

Overview: Reconceptualizing Language for Large Language Models

The article critically examines prevailing linguistic commentary on Large Language Models (LLMs), often speculative and unproductive, particularly when influenced by Saussure and Chomsky. It advocates for a fundamental paradigm shift towards the empiricist principles of Witold Mańczak, a distinguished general and historical linguist. Mańczak redefines language not as an abstract system but as the totality of all that is said and written, with frequency of use as its paramount governing principle. This framework provides a robust, quantitative foundation, challenging traditional notions like "deep structure" or "grounding." The authors leverage Mańczak's perspective to refute common critiques of LLMs and offer a constructive guide for their design, evaluation, and interpretation, asserting that LLMs inherently validate this usage-based approach.

Critical Evaluation: Strengths, Weaknesses, and Broader Implications

Strengths: Empirical Foundation and LLM Validation

This analysis offers a compelling re-evaluation of language in the AI era. Introducing Witold Mańczak's empiricist framework, the article provides a robust, data-driven alternative to speculative linguistic theories, especially for Large Language Models. It counters "ungroundedness" by redefining LLM "meaning" as mastery of relational networks within textual data, aligning with Mańczak's axiomatic semantics. The emphasis on frequency of use offers a practical, quantifiable basis for designing and evaluating LLMs. Challenging established linguistic theories with statistical data further demonstrates scientific rigor.

Weaknesses: Scope and Nuance

While advocating for a radical shift, the article could benefit from discussing potential resistance to Mańczak's framework within mainstream linguistics. The implications of defining language solely as the totality of texts, though powerful for LLMs, might warrant further exploration regarding its applicability to human language acquisition and cognitive processes. Additionally, a deeper dive into the limitations or nuances of purely frequency-based models could strengthen the argument and provide a more balanced perspective.

Implications: Reshaping Linguistic Research and AI Development

The implications of this work are profound for theoretical linguistics and AI development. By proposing Mańczak's framework, the article encourages a fundamental rethinking of language, shifting focus from abstract systems to observable, quantifiable usage patterns. This offers a clear, actionable guide for the future design and evaluation of LLMs, suggesting their success lies in modeling textual structure and relational logic. It also challenges linguists to adopt more statistics-based methodologies, potentially invalidating authority-based theories and fostering a more empirical approach. This analysis paves the way for a more unified, scientifically grounded understanding of language across human and artificial intelligence.

Conclusion: A Paradigm Shift for Language and AI

This article presents a highly impactful contribution to the discourse on Large Language Models and language. Championing Witold Mańczak's empiricist linguistic theory, it offers a compelling alternative to traditional, speculative approaches. The work provides a robust theoretical foundation for understanding LLM capabilities, reframing their "meaning" and "creativity" as mastery of textual patterns and relational logic. Its call for statistics-based validation in linguistics is a significant step towards greater scientific rigor. This analysis is essential reading for researchers in AI, computational linguistics, and theoretical linguistics, offering a fresh perspective that promises to reshape how we design, evaluate, and interpret language models and language itself.