WithAnyone: Towards Controllable and ID Consistent Image Generation

Hengyuan Xu, Wei Cheng, Peng Xing, Yixiao Fang, Shuhan Wu, Rui Wang, Xianfang Zeng, Daxin Jiang, Gang Yu, Xingjun Ma, Yu-Gang Jiang

17 Oct 2025 3 min read

AI-generated image, based on the article abstract

Quick Insight

Meet WithAnyone: AI That Draws You in Any Pose Without Losing Your Face

What if you could ask an AI to sketch you smiling, dancing, or under a sunset, and it would still look unmistakably like you? WithAnyone makes that possible. Previous text‑to‑image tools often fell into a “copy‑paste” trap, simply pasting the same photo onto every new scene, so the picture looked stiff and unrealistic. Imagine a chameleon that only ever shows the same green shade no matter where it hides—that’s the problem researchers faced. The breakthrough comes from a massive new collection called MultiID‑2M, packed with thousands of pictures of the same people in different lights, angles, and moods. By teaching the AI to compare these variations, the new “contrastive identity loss” lets it keep the core **identity** while freely changing pose, expression, or background. The result? Images that stay true to you yet feel fresh and expressive, opening doors for personalized avatars, creative storytelling, and more. Next time you picture yourself on a distant planet, trust WithAnyone to keep the adventure authentic and uniquely yours. 🌟

Short Review

Overview of Identity-Consistent Image Generation

This article addresses a critical challenge in identity-consistent text-to-image generation: the "copy-paste" artifact, where models replicate reference faces instead of preserving identity across diverse variations, limiting controllability. To enhance both identity fidelity and expressive control, the authors introduce a comprehensive solution. Their work presents WithAnyone, a novel diffusion-based model, alongside MultiID-2M, a large-scale paired dataset for multi-person scenarios, and MultiID-Bench, a new benchmark for evaluating identity preservation. WithAnyone leverages a unique contrastive identity loss, balancing fidelity with diversity, significantly reducing artifacts and improving controllability.

Critical Evaluation of WithAnyone Model

Strengths: Advancing Identity Fidelity and Control

The article offers a highly comprehensive solution to a significant challenge in identity-consistent image generation. By introducing the MultiID-2M dataset, the authors directly address the critical scarcity of large-scale paired data, a fundamental limitation. This is complemented by MultiID-Bench, which provides novel metrics for evaluating identity fidelity and variation.

The proposed WithAnyone model, featuring an innovative contrastive identity loss, effectively balances fidelity and diversity. This robust approach is rigorously validated through extensive quantitative, qualitative, and user studies, providing strong evidence for its superior performance in mitigating copy-paste artifacts and enhancing controllability.

Weaknesses: Unexplored Limitations and Scalability

While robust, the analysis does not explicitly detail potential limitations of the WithAnyone model, such as performance in extreme variations or complex multi-identity interactions. The computational resources for training and inference of such a large-scale diffusion model are also not discussed, crucial for assessing practical applicability.

Further elaboration on potential biases within the MultiID-2M dataset, despite its scale, could also strengthen the work. Understanding any demographic or stylistic imbalances is important for ensuring equitable identity generation.

Implications: Advancing Controllable AI Generation

The development of WithAnyone and its accompanying resources significantly advances identity-consistent image generation. By effectively mitigating the "copy-paste" artifact, this work enables far more controllable and expressive image synthesis.

This breakthrough has profound implications for various applications, including realistic digital avatars, personalized content, and virtual try-on experiences. The enhanced ability to preserve identity across diverse poses and expressions opens new avenues for creative industries and future research.

Conclusion: A New Standard for Identity Synthesis

In conclusion, this article presents a highly impactful and valuable contribution to text-to-image generation. By comprehensively addressing the "copy-paste" artifact through the innovative WithAnyone model, the MultiID-2M dataset, and the MultiID-Bench benchmark, the authors have set a new standard for identity-consistent image synthesis.

This work not only resolves a critical limitation but also provides robust tools and methodologies that will undoubtedly accelerate future research and development, paving the way for more realistic and versatile AI-generated content.