There are a lot of algorithms at MVSep now. Which algorithm to choose?
- If you need good isolated vocals or instrumental then use one of: Ultimate Vocal Remover HQ, MDX-B, Demucs3 (Model B)
- If you need good bass, drums, other: Demucs3 (Model B)
For comparsion of algorithm we use SDR (signal-to-distortion ratio) metric. The larger the metric the better the result of algorithm.
Table 1. Comparsion on test set of MUSDB18HQ
| Algorithm |
SDR bass |
SDR drums |
SDR other |
SDR vocals |
SDR instrumental (inverted vocals) |
Model link |
Demos |
spleeter (2 stems) |
--- |
--- |
--- |
6.8647 |
13.3231 |
Link |
Demos |
spleeter (4 stems) |
4.8200 |
6.3390 |
4.5362 |
6.7021 |
13.1434 |
Link |
Demos |
spleeter (5 stems) |
4.6376 |
6.1300 |
3.8689 |
6.5027 |
12.9646 |
Link |
Demos |
Unmix XL |
5.9577 |
7.7001 |
5.2165 |
7.6852 |
14.1339 |
Link |
Demos |
Unmix HQ |
4.6124 |
6.3807 |
3.6915 |
6.0783 |
12.5660 |
Link |
Demos |
Unmix SD |
4.7894 |
6.2632 |
3.8281 |
6.1822 |
12.6689 |
Link |
Demos |
Demucs 2 |
4.6145 |
6.1588 |
3.1786 |
5.3980 |
11.8388 |
Link |
Demos |
MDX-A |
4.9803 |
6.1111 |
4.1430 |
7.1758 |
13.6192 |
Link |
Demos |
MDX-B (Default + Demucs2 data) * |
5.2035 |
7.7192 |
5.3624 |
7.9621 |
14.3854 |
Link |
Demos |
MDX-B (ONNX Only) * |
6.5687 |
10.2110 |
7.3126 |
9.9084 |
16.3305 |
Link |
Demos |
UVR HQ (2 stems) |
4.1616 |
6.1976 |
--- |
8.6975 |
14.7872 |
Link |
Demos |
Demucs 3 (Model A) |
7.6054 |
8.8748 |
5.5306 |
8.2012 |
14.6347 |
Link |
Demos |
Demucs 3 (Model B) * |
11.3270 |
12.0055 |
8.2793 |
9.9202 |
16.2890 |
Link |
Demos |
Zero Shot (QBLWLD) |
2.6324 |
3.3939 |
1.4146 |
4.1016 |
--- |
Link |
Demos |
Danna Sep (CPU) |
6.3462 |
7.8521 |
5.0470 |
7.9611 |
14.4007 |
Link |
Demos |
| Byte Dance |
--- |
--- |
--- |
8.1485 |
14.5739 |
Link |
Demos |
| UVR Demucs (Model 1) |
--- |
--- |
--- |
9.0877 |
15.4612 |
Link |
Demos |
| MVSep Vocal model v2 |
--- |
--- |
--- |
8.8292 |
15.2719 |
Link |
Demos |
| Demucs4 HT |
8.9770 |
10.0886 |
6.1301 |
9.0252 |
15.4318 |
Link |
Demos |
* - these numbers incorrect because MUSDB18 test set was used to train these models.
| Algorithm |
Quality (Bass) |
Quality (Drums) |
Quality (Other) |
Quality (Vocals) |
Examples |
| Spleeter (4 stems) |
5.774 |
5.845 |
4.321 |
6.939 |
Example |
| UmxXL |
6.619 |
6.838 |
4.891 |
7.732 |
Example |
| MDX A |
7.232 |
7.173 |
5.636 |
8.901 |
Example |
| MDX B (Orig) |
7.495 |
7.554 |
5.533 |
8.896 |
--- |
| MDX B (UVR) |
7.495 |
7.554 |
5.533 |
9.482 |
Example |
| Ultimate Vocal Remover HQ |
--- |
--- |
--- |
--- |
Example |
| Demucs 3 Model A |
8.115 |
8.037 |
5.193 |
7.968 |
Example |
| Demucs 3 Model B |
8.856 |
8.850 |
5.978 |
8.756 |
Example |
| Danna Sep |
6.993 |
7.018 |
4.901 |
7.686 |
--- |
| Byte Dance |
---- |
---- |
---- |
8.079 |
--- |
Table 3. Comparsion of algorithms based on synthetic dataset. SDR metric (higher is better)
| Algorithm |
Quality (Vocals) |
Quality (Instrumental) |
| Spleeter (2 stems) |
7.1930 |
6.6612 |
| Spleeter (4 stems) |
7.3168 |
7.0206 |
| Spleeter (5 stems) |
7.1761 |
6.8799 |
| Unmix XL |
8.4581 |
8.1619 |
| Unmix HQ |
6.9301 |
6.6339 |
| Unmix SD |
7.0438 |
6.7476 |
| MDX-A |
8.6540 |
8.3578 |
| MDX-B |
10.8872 |
10.4585 |
| UVR HQ (2 stems) |
9.4008 |
9.0839 |
| Demucs 3 (Model A) |
9.0464 |
8.7502 |
| Demucs 3 (Model B) |
9.7837 |
9.4875 |
| Demux 2 |
8.5364 |
8.2402 |
| Danna Sep |
8.5975 |
8.3013 |
| Byte Dance |
7.9893 |
7.6931 |
| UVR Demucs (Model 1) |
8.7951 |
8.6191 |
| MVSep Vocal model v2 |
10.4523 |
10.1561 |
| Demucs4 HT |
10.2397 |
9.9435 |
Table 4. Comparsion of aggresiveness for model HP2-4BAND-3090_4band_arch-500m_1 on synthetic dataset. SDR metric (higher is better)
| Aggressiveness |
Quality (Vocals) |
Quality (Instrumental) |
| 0.0 |
9.3259 |
8.8948 |
| 0.1 |
9.3580 |
8.9277 |
| 0.2 |
9.3824 |
8.9527 |
| 0.3 |
9.4008 |
8.9719 |
| 0.4 |
9.4147 |
8.9864 |
| 0.5 |
9.4250 |
8.9972 |
| 0.6 |
9.4324 |
9.0051 |
| 0.7 |
9.4374 |
9.0106 |
| 0.8 |
9.4404 |
9.0142 |
| 0.9 |
9.4419 |
9.0161 |
| 1.0 |
9.4420 |
9.0167 |