Geely Auto, Stepfun open-source multimodal AI models for video, audio generation

Shanghai (Gasgoo)- On February 18, Geely Auto Group and its tech ecosystem partner Stepfun announced the open-sourcing of two multimodal AI large models—the Step-Video-T2V for video generation and the Step-Audio for voice interaction.

The collaboration leveraged both companies’ strengths in computing power, algorithms, and scenario-based training, significantly enhancing the AI models’ performance. Stepfun stated that the initiative aims to share the latest advancements in multimodal large models with the global open-source community and contribute to its development.

Step-Video-T2V

With 30 billion parameters, the Step-Video-T2V can generate high-quality videos at 540p resolution with 204 frames, ensuring exceptional information density and consistency.

To…

Source link

Leave a Comment