Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to do prediction? #20

Open
WillSuen opened this issue Nov 14, 2017 · 4 comments
Open

how to do prediction? #20

WillSuen opened this issue Nov 14, 2017 · 4 comments

Comments

@WillSuen
Copy link

WillSuen commented Nov 14, 2017

How to do prediction for new test data after trained the model?

@2014210242
Copy link

I have the same question. Did you solve it?

@2014210242
Copy link

      How to do prediction for new test data after trained the model?

I have the same question. Did you solve it?

@ammar-n-abbas
Copy link

ammar-n-abbas commented Dec 8, 2021

Referring to "https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf" and library "https://hmmlearn.readthedocs.io/en/latest/" I have found this solution:

1- Through log_gamma (posterior distribution):

state_sequences = []
for i in range(100):
    for j in range(lengths[i]):
        state_sequences.append(np.argmax(np.exp(SHMM.log_gammas[i])[j]))
pred_state_seq = [state_sequences[df[df['unit'] == i].index[0]:df[df['unit'] == i].index[-1] + 1] for i in
                              range(1, df_A['unit'].max() + 1)]

2- Viterbi Algorithm:

from hmmlearn import _hmmc

transmat = np.empty((num_states, num_states))
for i in range(num_states):
    transmat = np.concatenate((transmat, np.exp(SHMM.model_transition[i].predict_log_proba(np.array([[]])))))
transmat = transmat[num_states:]

startprob = np.exp(SHMM.model_initial.predict_log_proba(np.array([[]]))).squeeze()


def log_mask_zero(a):
    """
    Compute the log of input probabilities masking divide by zero in log.

    Notes
    -----
    During the M-step of EM-algorithm, very small intermediate start
    or transition probabilities could be normalized to zero, causing a
    *RuntimeWarning: divide by zero encountered in log*.

    This function masks this unharmful warning.
    """
    a = np.asarray(a)
    with np.errstate(divide="ignore"):
        return np.log(a)


def _do_viterbi_pass(framelogprob):
    n_samples, n_components = framelogprob.shape
    state_sequence, logprob = _hmmc._viterbi(n_samples, n_components, log_mask_zero(startprob),
                                             log_mask_zero(transmat), framelogprob)
    return logprob, state_sequence


def _decode_viterbi(X):
    framelogprob = SHMM.log_Eys[X]
    return _do_viterbi_pass(framelogprob)


def decode():
    decoder = {"viterbi": _decode_viterbi}["viterbi"]
    logprob = 0
    sub_state_sequences = []
    for sub_X in range(100):
        # XXX decoder works on a single sample at a time!
        sub_logprob, sub_state_sequence = decoder(sub_X)
        logprob += sub_logprob
        sub_state_sequences.append(sub_state_sequence)
    return logprob, np.concatenate(sub_state_sequences)


def predict():
    """
    Find most likely state sequence corresponding to ``X``.

    Parameters
    ----------
    X : array-like, shape (n_samples, n_features)
        Feature matrix of individual samples.
    lengths : array-like of integers, shape (n_sequences, ), optional
        Lengths of the individual sequences in ``X``. The sum of
        these should be ``n_samples``.

    Returns
    -------
    state_sequence : array, shape (n_samples, )
        Labels for each sample from ``X``.
    """
    logprob, state_sequence = decode()
    return logprob, state_sequence


_, state_seq = predict()

pred_state_seq = [state_seq[df[df['unit'] == i].index[0]:df[df['unit'] == i].index[-1] + 1] for i in
                  range(1, df_A['unit'].max() + 1)]

@ucabqll
Copy link

ucabqll commented Aug 6, 2022

Referring to "https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf" and library "https://hmmlearn.readthedocs.io/en/latest/" I have found this solution:

1- Through log_gamma (posterior distribution):

state_sequences = []
for i in range(100):
    for j in range(lengths[i]):
        state_sequences.append(np.argmax(np.exp(SHMM.log_gammas[i])[j]))
pred_state_seq = [state_sequences[df[df['unit'] == i].index[0]:df[df['unit'] == i].index[-1] + 1] for i in
                              range(1, df_A['unit'].max() + 1)]

2- Viterbi Algorithm:

from hmmlearn import _hmmc

transmat = np.empty((num_states, num_states))
for i in range(num_states):
    transmat = np.concatenate((transmat, np.exp(SHMM.model_transition[i].predict_log_proba(np.array([[]])))))
transmat = transmat[num_states:]

startprob = np.exp(SHMM.model_initial.predict_log_proba(np.array([[]]))).squeeze()


def log_mask_zero(a):
    """
    Compute the log of input probabilities masking divide by zero in log.

    Notes
    -----
    During the M-step of EM-algorithm, very small intermediate start
    or transition probabilities could be normalized to zero, causing a
    *RuntimeWarning: divide by zero encountered in log*.

    This function masks this unharmful warning.
    """
    a = np.asarray(a)
    with np.errstate(divide="ignore"):
        return np.log(a)


def _do_viterbi_pass(framelogprob):
    n_samples, n_components = framelogprob.shape
    state_sequence, logprob = _hmmc._viterbi(n_samples, n_components, log_mask_zero(startprob),
                                             log_mask_zero(transmat), framelogprob)
    return logprob, state_sequence


def _decode_viterbi(X):
    framelogprob = SHMM.log_Eys[X]
    return _do_viterbi_pass(framelogprob)


def decode():
    decoder = {"viterbi": _decode_viterbi}["viterbi"]
    logprob = 0
    sub_state_sequences = []
    for sub_X in range(100):
        # XXX decoder works on a single sample at a time!
        sub_logprob, sub_state_sequence = decoder(sub_X)
        logprob += sub_logprob
        sub_state_sequences.append(sub_state_sequence)
    return logprob, np.concatenate(sub_state_sequences)


def predict():
    """
    Find most likely state sequence corresponding to ``X``.

    Parameters
    ----------
    X : array-like, shape (n_samples, n_features)
        Feature matrix of individual samples.
    lengths : array-like of integers, shape (n_sequences, ), optional
        Lengths of the individual sequences in ``X``. The sum of
        these should be ``n_samples``.

    Returns
    -------
    state_sequence : array, shape (n_samples, )
        Labels for each sample from ``X``.
    """
    logprob, state_sequence = decode()
    return logprob, state_sequence


_, state_seq = predict()

pred_state_seq = [state_seq[df[df['unit'] == i].index[0]:df[df['unit'] == i].index[-1] + 1] for i in
                  range(1, df_A['unit'].max() + 1)]

building on top of this, using log-gamma can decode the sequence of hidden states.

To fit testing data, one could set the data using the testing set and re-run the E-step to get a new set of log-gammas. (this does not update the transitions and emissions, so would be still using the trained transitions and emissions). Using these new log-gammas, re-run the decoding function as above.

USHMM.set_data([testing]) #this initializes the new set of log-gammas
USHMM.E_step() #fitting the log-gammas to the testing set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants