Voice conversion from arbitrary speakers based on deep neural networks with adversarial learning

Sou Miyamoto, Takashi Nose, Suzunosuke Ito, Harunori Koike, Yuya Chiba, Akinori Ito, Takahiro Shinozaki

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this study, we propose a voice conversion technique from arbitrary speakers based on deep neural networks using adversarial learning, which is realized by introducing adversarial learning to the conventional voice conversion. Adversarial learning is expected to enable us more natural voice conversion by using a discriminative model which classifies input speech to natural speech or converted speech in addition to a generative model. Experiments showed that proposed method was effective to enhance global variance (GV) of melcepstrum but naturalness of converted speech was a little lower than speech using the conventional variance compensation technique.

Original languageEnglish
Title of host publicationAdvances in Intelligent Information Hiding and Multimedia Signal Processing - Proceedings of the 13th International Conference on Intelligent Information Hiding and Multimedia Signal Processing,
EditorsJunzo Watada, Lakhmi C. Jain, Jeng-Shyang Pan, Pei-Wei Tsai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages97-103
Number of pages7
ISBN (Print)9783319638584
DOIs
Publication statusPublished - 2018
Event13th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2017 - Matsue, Shimane, Japan
Duration: 2017 Aug 122017 Aug 15

Publication series

NameSmart Innovation, Systems and Technologies
Volume82
ISSN (Print)2190-3018
ISSN (Electronic)2190-3026

Conference

Conference13th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2017
Country/TerritoryJapan
CityMatsue, Shimane
Period17/8/1217/8/15

Keywords

  • Adversarial learning
  • DNN-based voice conversion
  • Model training
  • Spectral differential filter

Fingerprint

Dive into the research topics of 'Voice conversion from arbitrary speakers based on deep neural networks with adversarial learning'. Together they form a unique fingerprint.

Cite this