Direct conditional probability density estimation with sparse feature selection

Motoki Shiga, Voot Tangkaratt, Masashi Sugiyama

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)


Regression is a fundamental problem in statistical data analysis, which aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional probability density is multi-modal, asymmetric, and heteroscedastic. To overcome this limitation, various estimators of conditional densities themselves have been developed, and a kernel-based approach called least-squares conditional density estimation (LS-CDE) was demonstrated to be promising. However, LS-CDE still suffers from large estimation error if input contains many irrelevant features. In this paper, we therefore propose an extension of LS-CDE called sparse additive CDE (SA-CDE), which allows automatic feature selection in CDE. SA-CDE applies kernel LS-CDE to each input feature in an additive manner and penalizes the whole solution by a group-sparse regularizer. We also give a subgradient-based optimization method for SA-CDE training that scales well to high-dimensional large data sets. Through experiments with benchmark and humanoid robot transition datasets, we demonstrate the usefulness of SA-CDE in noisy CDE problems.

Original languageEnglish
Pages (from-to)161-182
Number of pages22
JournalMachine Learning
Issue number2-3
Publication statusPublished - 2015 Sept 17


  • Conditional density estimation
  • Feature selection
  • Sparse structured norm


Dive into the research topics of 'Direct conditional probability density estimation with sparse feature selection'. Together they form a unique fingerprint.

Cite this