Design: Tagger parser merge step 2 #52

Travelinglight · 2016-07-21T07:49:44Z

For 1st order features

Eisner state C[s][t][d][c] is a dict, of which the keys and values are both tuples:

{(POS_of_head, POS_of_dependent): (score, mid_index), ....}
Keep every combination of POSTAG of head and dependent

for q in range(s, t):
    for pos_head in possible_POSTAG_of_head:
        for pos_dept in possible_POSTAG_of_dependent:
            for pos_mid in possible_POSTAG_of_q:
                # complete right triangle
                score = C[s][q][->][1](pos_head, pos_mid)][0] + C[q][t][->][0][(pos_mid, pos_dept)][0]
                if (C[s][t][->][0][(pos_head, pos_dept)][0] < score)
                    C[s][t][->][0][(pos_head, pos_dept)] = (score, q)
                # incomplete right trapezoidal
                score = C[s][q][->][0](pos_head, pos_mid)][0] + C[q][t][<-][0][(pos_mid, pos_dept)][0] + get_score(s, t, pos_head, pos_dept)
                if (C[s][t][->][1][(pos_head, pos_dept)][0] < score)
                    C[s][t][->][0][(pos_head, pos_dept)] = (score, q)
                # complete left triangle
                ....
                # incomplete left trapezoidal
                ....

For 2nd order features (with cube pruning)

Eisner state C[s][t][d][c] is a list of tuples:

[(feature_signature , POS_of_head, POS_of_dependent, score, mid_index), ....]
Keep only top k tuples

for q in range(s, t):
    for pos_head in possible_POSTAG_of_head:
        for pos_mid in possible_POSTAG_of_q:
            for pos_dept in possible_POSTAG_of_dependent:

                # complete right triangle
                list_left = SELECT tuples FROM C[s][q][->][1] WHERE tuple[1] == pos_head AND tuple[2] == pos_mid ORDER BY score
                list_right = SELECT tuples FROM C[q][t][->][0] WHERE tuple[1] == pos_mid AND tuple[2] == pos_dept ORDER BY score
                C[s][t][->][0].push((q, pos_head, pos_mid, explore(list_left, list_right), q))

                # incomplete right trapezoidal
                list_left = SELECT tuples FROM C[s][q][->][0] WHERE tuple[1] == pos_head AND tuple[2] == pos_mid ORDER BY score
                list_right = SELECT tuples FROM C[q][t][<-][0] WHERE tuple[1] == pos_dept AND tuple[2] == pos_mid ORDER BY score
                C[s][t][->][1].push((t, pos_head, pos_dept, explore(list_left, list_right), q))

                # complete left triangle
                ....
                # incomplete left trapezoidal
                ....
            keep_highest_score(C[s][t][->][0], q, pos_head, pos_mid)
            keep_highest_score(C[s][t][<-][0], q, pos_head, pos_mid)
keep_k_scores(C, s, t)

This can also implemented by dict. In this case, the function keep_k_scores would extract items from the dict, sort them, and delete items other than k best from the dict.

The text was updated successfully, but these errors were encountered:

Travelinglight · 2016-08-01T21:06:16Z

Just find out the design may not be applicable. In the design, only the POS tag of s and t are determined, but when the feature generator generates in_between features, it requires that all words between s and t have only one POS tag.

english_1st_fgen.pyx Line #208
for between_index in range(start_index + 1, end_index):
    xb_pos = self.POSTAG[between_index]
    # Add all words between xi and xj into the feature
    local_fv.add( (2,0,xi_pos,xb_pos,xj_pos) )

Thus, in_between features depend on the POS tag of the words between s and t.
Also, surrounding features depend on the POS tag of the surrounding words.

Possible solutions:

generate all possible features (with all possible in_between POS tags), but this may not be good for training
dismiss in_between features and surrounding features, but also not be good for training

I'm still trying to invent a better design. Anyone has a good idea on this problem?

anoopsarkar · 2016-08-02T18:08:35Z

Third solution is to take the single best POS tag for each word between head and modifier for this feature function.

We need to have the single best POS tag computed anyway for unknown words. In this case, we will use it for known words as well for this feature function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design: Tagger parser merge step 2 #52

Design: Tagger parser merge step 2 #52

Travelinglight commented Jul 21, 2016 •

edited

Loading

Travelinglight commented Aug 1, 2016

anoopsarkar commented Aug 2, 2016

Design: Tagger parser merge step 2 #52

Design: Tagger parser merge step 2 #52

Comments

Travelinglight commented Jul 21, 2016 • edited Loading

For 1st order features

For 2nd order features (with cube pruning)

Travelinglight commented Aug 1, 2016

anoopsarkar commented Aug 2, 2016

Travelinglight commented Jul 21, 2016 •

edited

Loading