MVSEP Logo
  • Home
  • News
  • Plans
  • Demo
  • Create Account
  • Login
  • Theme
    Language
    • English
    • Русский
    • 中文
    • اَلْعَرَبِيَّةُ
    • Polski
    • Portugues do Brasil
    • Español
    • 日本語
    • Français
    • Oʻzbekcha
    • Türkçe
    • हिन्दी
    • Tiếng Việt
    • Deutsch
    • 한국어
    • Bahasa Indonesia
    • Italiano
    • Svenska
    • suomi
    • български език
    • magyar nyelv
    • עִבְֿרִית
    • ภาษาไทย
    • hrvatski
    • Română

MVSep DnR v3 (speech, music, effects)

MVSep DnR v3 is a cinematic model for splitting tracks into 3 stems: music, sfx and speech. It is trained on a huge multilingual dataset DnR v3. The quality metrics on the test data turned out to be better than those of a similar multilingual model Bandit v2. The model is available in 3 variants: based on SCNet, MelBand Roformer architectures, and an ensemble of these two models. See the table below:

Algorithm name SDR Metric on DnR v3 leaderboard
music (SDR) sfx (SDR) speech (SDR)
SCNet Large  9.94 11.35 12.59
Mel Band Roformer 9.45 11.24 12.27
Ensemble (Mel + SCNet) 10.15 11.67 12.81
Bandit v2 (for reference) 9.06 10.82 12.29
🗎 Copy link | Use algorithm | Demo

MVSEP Logo

turbo@mvsep.com

Site information

FAQ

Quality Checker

Algorithms

Full API Documentation

Company

Privacy Policy

Terms & Conditions

Refund Policy

Cookie Notice

Extra

Help us translate!

Help us promote!