Part 1 - Standard evaluation
Comparison of different distortion levels according to the metric:
,
where
This metric is employed by convention in previous works on audio adversarial examples for speech recognition problems, in which distortion levels below -32dB are assumed to be acceptable. However, we show that this metric is not representative for speech related tasks. Notice that, even for small distortion levels (under this metric), the perturbations are easily detectable.
Standard evaluation: -25dB | Proposed evaluation: -7dB
Standard evaluation: -30 dB | Proposed evaluation: -11dB
Standard evaluation: -32 dB | Proposed evaluation: -8dB
Standard evaluation: -35 dB | Proposed evaluation: -8dB
Standard evaluation: -40 dB | Proposed evaluation: -19dB
Part 2 - Proposed evaluation
Comparison of different distortion levels according to the metric:
,
where
Metric applied to the background part of the audio signal.
We discovered that measuring the distortion in both vocal and background part lead to more representative results. In particular, we discovered that the perturbation is more susceptible to be detectable in the background part than in the vocal part, due to the lower sound intensity in that part. This metric is also more correlated with the human judgment, as lower distortion levels lead to a lower detectability. For these reasons, we propose the use of more rigorous approaches to measure the distortion of audio adversarial examples, in order to promote a deeper study of these vulnerabilities and the risk they suppose.
Standard evaluation: -38dB | Proposed evaluation: -20dB
Standard evaluation: -47dB | Proposed evaluation: -25dB
Standard evaluation: -45dB | Proposed evaluation: -30dB
Standard evaluation: -50dB | Proposed evaluation: -32dB
Standard evaluation: -47dB | Proposed evaluation: -35dB