In this article, we propose a regression model for sparse high-dimensional data from aggregated store-level sales data. The modeling procedure includes two sub-models of topic model and hierarchical factor regressions. These are applied in sequence to accommodate high dimensionality and sparseness and facilitate managerial interpretation. First, the topic model is applied to aggregated data to decompose the daily aggregated sales volume of a product into sub-sales for several topics by allocating each unit sale (“word” in text analysis) in a day (“document”) into a topic based on joint-purchase information. This stage reduces the dimensionality of data inside topics because the topic distribution is nonuniform and product sales are mostly allocated into smaller numbers of topics. Next, the market response regression model for the topic is estimated from information about items in the same topic. The hierarchical factor regression model we introduce, based on canonical correlation analysis for original high-dimensional sample spaces, further reduces the dimensionality within topics. Feature selection is then performed on the basis of the credible interval of the parameters' posterior density. Empirical results show that (i) our model allows managerial implications from topic-wise market responses according to the particular context, and (ii) it performs better than do conventional category regressions in both in-sample and out-of-sample forecasts.
- dimension reduction
- feature selection
- hierarchical factor regression
- high-dimensional sparse data
- topic model