LTX2.3 Text-to-Video Audio-Visual Sync Basic Edition

/ workflows / By admin

Description:

This workflow is based on the LTX2.3 Distilled v1.1 large model and incorporates the VBVR Spatial Reasoning LoRA. It generates videos from images with automatic audio generation included.

Model Requirements

ltx-2.3-22b-distilled-1.1_transformer_only_fp8_scaled.safetensors Location: models\diffusion_models
Ltx2.3-Licon-VBVR-I2V-240K-R32.safetensors Location:models\loras
gemma_3_12B_it_fp8_scaled.safetensors Location：models\clip
ltx-2.3_text_projection_bf16.safetensors Location：models\text_encoders
LTX23_audio_vae_bf16.safetensors Location：models\vae
LTX23_video_vae_bf16.safetensors Location：models\vae