VoxCPM2 is the latest-generation speech model released by ModelBest in collaboration with the Human-Computer Speech Interaction Lab at Tsinghua University. Since its release, it has gained nearly 70K stars on GitHub. It supports voice generation in 30+ languages, 9 Chinese dialects, and 48kHz audio sampling.
Key features of VoxCPM2 include:
Supports 30+ languages, including Chinese, English, Japanese, Korean, French, German, Russian, Arabic, and eight Southeast Asian languages: Vietnamese, Thai, Indonesian, Lao, Burmese, Khmer, Filipino, and Malay.
Supports 9 Chinese dialects: Sichuanese, Cantonese, Wu Chinese, Northeastern Mandarin, Henan dialect, Shaanxi dialect, Shandong dialect, Tianjin dialect, and Hokkien.
Text-based voice creation: you can create a brand-new voice directly through text descriptions, even one that has never existed before.
Voice cloning with emotional replication: upload a voice sample, and it can extract the speaker’s timbre, generate any text you specify, and adjust the emotion and speaking speed according to your instructions.
48kHz high-fidelity audio quality, delivering voice-over-level expressiveness.
We currently provide VoxCPM2 voice design and voice cloning workflows, both adapted for the DFCine canvas software.
Model Configuration
Download the VoxCPM2 model from Hugging Face and place it in the following directory:
ComfyUI\models\voxcpm
If the voxcpm folder does not exist, create it manually.
The complete VoxCPM2 folder should be placed directly inside this directory. The final path should look like this:
ComfyUI\models\voxcpm\VoxCPM2
Model download link:https://huggingface.co/openbmb/VoxCPM2/tree/main








