BS PolarFormer (vocals, instrumental)
BS PolarFormer belongs to the family of transformer-based models, but unlike BSRoformer, it uses embeddings based on polar coordinates, which work well with long contexts. This architecture uses longer segments of musical tracks.
This model provides an Overlap option. When processing audio, a long signal is split into fixed-length segments (“chunks”), and Overlap determines the degree to which they overlap with each other. This helps smooth transitions between windows, leverage more context, and improve separation quality.
- 50% (default): the optimal choice for most tasks.
- 87.5% (for paid users): provides an additional +0.02 SDR improvement in quality. The number of credits spent on separation will be multiplied by 1.5.
Note: a high Overlap value significantly increases system load while providing only a minor quality improvement.
Quality table
| Algorithm name | Overlap | Multisong dataset | Synth dataset | ||
| SDR Vocals | SDR Instrumental | SDR Vocals | SDR Instrumental | ||
| BS PolarFormer (vocals, instrumental) | 2 | 11.75 | 18.06 | 14.02 | 13.73 |
| 8 | 11.77 | 18.08 | 14.05 | 13.76 | |
Detailed statistics on the Multisong dataset:
| Model | Vocals fullness | Vocals bleedless | Vocals SDR | Vocals L1Freq | Instrum fullness | Instrum bleedless | Instrum SDR | Instrum L1Freq |
| BS PolarFormer (vocals, instrumental) | 17.68 | 35.90 | 11.75 | 39.86 | 28.15 | 47.27 | 18.06 | 40.59 |
Below is the relationship between SDR values and the model’s Chunk Size for the Multisong dataset:
