TY - GEN
T1 - Conversion of Speaker's Face Image Using PCA and Animation Unit for Video Chatting
AU - Saito, Yuki
AU - Nose, Takashi
AU - Shinozaki, Takahiro
AU - Ito, Akinori
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/2/19
Y1 - 2016/2/19
N2 - Video chat is a good way of personal communication, however, there is a privacy issue in the video chat because we need to disclose one's identity such as face or voice when chatting. In this paper, we propose two methods by which face image of a speaker is converted into that of different person to conceal the speaker's identity. In the first method, we first prepare the speech and video data of the original and target speakers for training the conversion model. The face image features are calculated using the PCA to the whole pixels of the image. In the second method, the animation units extracted by Kinect are used as an intermediate feature, and we train a model that converts the animation unit to the target speaker's face image. In both methods, we used a neural network as the conversion model. We conducted experiments, and the first method could convert the whole shape of the speakers, while small movements such as mouth movement cannot be converted. The second method could convert both the whole shape of the face and mouth movement, however, the quality of face image was deteriorated.
AB - Video chat is a good way of personal communication, however, there is a privacy issue in the video chat because we need to disclose one's identity such as face or voice when chatting. In this paper, we propose two methods by which face image of a speaker is converted into that of different person to conceal the speaker's identity. In the first method, we first prepare the speech and video data of the original and target speakers for training the conversion model. The face image features are calculated using the PCA to the whole pixels of the image. In the second method, the animation units extracted by Kinect are used as an intermediate feature, and we train a model that converts the animation unit to the target speaker's face image. In both methods, we used a neural network as the conversion model. We conducted experiments, and the first method could convert the whole shape of the speakers, while small movements such as mouth movement cannot be converted. The second method could convert both the whole shape of the face and mouth movement, however, the quality of face image was deteriorated.
KW - Face conversion
KW - Kinect v2
KW - Neural network
KW - Principal component analysis
KW - Speaker conversion
UR - http://www.scopus.com/inward/record.url?scp=84963795111&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84963795111&partnerID=8YFLogxK
U2 - 10.1109/IIH-MSP.2015.85
DO - 10.1109/IIH-MSP.2015.85
M3 - Conference contribution
AN - SCOPUS:84963795111
T3 - Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015
SP - 433
EP - 436
BT - Proceedings - 2015 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015
A2 - Pan, Jeng-Shyang
A2 - Yang, Ching-Yu
A2 - Huang, Hsiang-Cheh
A2 - Lee, Ivan
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 11th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP 2015
Y2 - 23 September 2015 through 25 September 2015
ER -