Universal Adversarial Perturbations for Speech Command Classification

headphones_recommended








Part 1 - Standard evaluation

Comparison of different distortion levels according to the metric:
,
where

This metric is employed by convention in previous works on audio adversarial examples for speech recognition problems, in which distortion levels below -32dB are assumed to be acceptable. However, we show that this metric is not representative for speech related tasks. Notice that, even for small distortion levels (under this metric), the perturbations are easily detectable.

Standard evaluation: -25dB | Proposed evaluation: -7dB

Standard evaluation: -30 dB | Proposed evaluation: -11dB

Standard evaluation: -32 dB | Proposed evaluation: -8dB

Standard evaluation: -35 dB | Proposed evaluation: -8dB

Standard evaluation: -40 dB | Proposed evaluation: -19dB

Part 2 - Proposed evaluation

Comparison of different distortion levels according to the metric:
,
where

Metric applied to the background part of the audio signal.

We discovered that measuring the distortion in both vocal and background part lead to more representative results. In particular, we discovered that the perturbation is more susceptible to be detectable in the background part than in the vocal part, due to the lower sound intensity in that part. This metric is also more correlated with the human judgment, as lower distortion levels lead to a lower detectability. For these reasons, we propose the use of more rigorous approaches to measure the distortion of audio adversarial examples, in order to promote a deeper study of these vulnerabilities and the risk they suppose.

Standard evaluation: -38dB | Proposed evaluation: -20dB

Standard evaluation: -47dB | Proposed evaluation: -25dB

Standard evaluation: -45dB | Proposed evaluation: -30dB

Standard evaluation: -50dB | Proposed evaluation: -32dB

Standard evaluation: -47dB | Proposed evaluation: -35dB