Meta has recently introduced an advanced multimodal artificial intelligence model, known as SeamlessM4T, designed to revolutionize speech recognition and translation capabilities across various languages.
Speech Recognition and Translation Across Languages
SeamlessM4T is a versatile neural network that excels in recognizing speech and swiftly translating it into nearly 100 languages. Moreover, this remarkable AI model is equipped to create spoken language from textual inputs in 35 languages. Notably, SeamlessM4T showcases an impressive ability to identify language switches within a single speech instance—whether transitioning between languages or seamlessly incorporating multiple languages into the dialogue.
A Fusion of Language Projects for Unprecedented Performance
Built on the foundation of numerous language projects, SeamlessM4T presents a unified solution for multilingual and multimodal translation. This singular model draws from an extensive array of spoken sources, resulting in exceptional translation outcomes. The neural network’s capabilities encompass:
- Speech recognition proficiency in nearly 100 languages
- Conversion of speech to text across almost 100 input and output languages
- Speech-to-speech functionality, supporting nearly 100 input languages and 36 output languages, including English and Russian
- Text translation spanning nearly 100 languages
- Text-to-speech functionality, accommodating approximately 100 input languages and 35 output languages
Accessibility and Availability
Currently, SeamlessM4T is accessible to researchers and developers through an appropriate licensing arrangement. To witness the AI model in action, interested parties can explore its functionalities here.
Meta has also made significant strides with the release of the SeamlessAlign dataset, notes NIX Solutions. This open dataset stands as the largest of its kind for multimodal translation, featuring an expansive collection of 270,000 hours of matched speech-to-text data.