The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution

Norbert Tihanyi, Bilel Cherif, Richard A. Dubniczky, Mohamed Amine Ferrag, Tamás Bisztray

14 Oct 2025 3 min read

AI-generated image, based on the article abstract

Quick Insight

AI's Secret Handwriting in JavaScript Revealed

Ever wondered if a computer can leave its own “fingerprint” in the code it writes? Scientists have discovered that JavaScript generated by different AI models each carries a subtle, unique style—much like a person’s handwriting. By training clever detectors, they can tell which AI wrote a piece of code with up to 95% accuracy, even after the code is scrambled or stripped of comments. This matters because it helps spot hidden vulnerabilities, catch malicious scripts, and keep developers accountable for the AI tools they use. Imagine a detective who can recognize a thief’s signature even if the thief tries to disguise it—this is the same idea, but for software. The breakthrough shows that AI isn’t a single, anonymous entity; each model has its own “voice.” As AI writes more of our everyday apps, knowing who’s behind the code will become as essential as reading the fine print. Stay curious and remember: even invisible patterns can tell a powerful story. Future tech depends on transparency.

Short Review

Overview

This study explores the authorship attribution of JavaScript code generated by various Large Language Models (LLMs), aiming to establish reliable methods for identifying the source of AI-generated code. The authors introduce the LLM-NodeJS dataset, which comprises 250,000 unique JavaScript samples derived from 20 different LLMs. By benchmarking various machine learning classifiers, the research demonstrates high accuracy in distinguishing outputs from different models, emphasizing the significance of unique stylistic signatures for effective attribution, even under code transformations.

Critical Evaluation

Strengths

The study's primary strength lies in its comprehensive approach to authorship attribution, utilizing a large-scale dataset that enhances the reliability of its findings. The introduction of the LLM-NodeJS dataset is particularly noteworthy, as it provides a robust foundation for future research in this domain. The high accuracy rates achieved by the custom architecture, CodeT5-JSA, indicate a significant advancement in the field, showcasing the ability to capture deeper structural and semantic features of code rather than relying solely on superficial characteristics.

Weaknesses

Despite its strengths, the study has limitations that warrant consideration. The reliance on a specific programming language, JavaScript, may restrict the generalizability of the findings to other languages or coding environments. Additionally, while the study addresses various code transformations, further exploration into the scalability of the models across diverse programming languages and styles is necessary to fully understand the implications of the findings.

Implications

The implications of this research are profound, particularly in the context of increasing reliance on AI-generated code. The ability to accurately attribute code to specific models can enhance accountability and security in software development, aiding in the detection of vulnerabilities and malicious content. Furthermore, the study raises important ethical considerations regarding the use of AI in programming, highlighting the need for responsible practices in model deployment and usage.

Conclusion

Overall, this study significantly contributes to the field of authorship attribution in AI-generated code, providing valuable insights into the unique stylistic signatures of different LLMs. The findings underscore the importance of nuanced identification methods in an era where AI-generated content is becoming increasingly prevalent. As the research community continues to explore these dynamics, the study sets a strong precedent for future investigations into the intersection of AI, programming, and ethical considerations.

Readability

The article is structured in a clear and accessible manner, making it easy for readers to follow the research narrative. Each section logically builds upon the previous one, enhancing comprehension and engagement. The use of concise paragraphs and straightforward language ensures that complex concepts are presented in an understandable way, catering to a professional audience while maintaining scientific rigor.