Conventional spoken dialog systems cannot estimate the user's state while waiting for an input from the user because the estimation process is triggered by observing the user's utterance. This is a problem when, for some reason, the user cannot make an input utterance in response to the system's prompt. To help these users before they give up, the system should handle the requests expressed by them unconsciously. Based on this assumption, we have examined a method to estimate the state of a user before making an utterance by using the non-verbal behavior of the user. The present paper proposes an automatic discrimination method by using time sequential non-verbal information of the user. In this method, the user's internal state is estimated using multi-modal information such as speech, facial expression and gaze, modeled using a Hidden Markov Model (HMM).