You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for q in range(s, t):
for pos_head in possible_POSTAG_of_head:
for pos_mid in possible_POSTAG_of_q:
for pos_dept in possible_POSTAG_of_dependent:
# complete right triangle
list_left = SELECT tuples FROM C[s][q][->][1] WHERE tuple[1] == pos_head AND tuple[2] == pos_mid ORDER BY score
list_right = SELECT tuples FROM C[q][t][->][0] WHERE tuple[1] == pos_mid AND tuple[2] == pos_dept ORDER BY score
C[s][t][->][0].push((q, pos_head, pos_mid, explore(list_left, list_right), q))
# incomplete right trapezoidal
list_left = SELECT tuples FROM C[s][q][->][0] WHERE tuple[1] == pos_head AND tuple[2] == pos_mid ORDER BY score
list_right = SELECT tuples FROM C[q][t][<-][0] WHERE tuple[1] == pos_dept AND tuple[2] == pos_mid ORDER BY score
C[s][t][->][1].push((t, pos_head, pos_dept, explore(list_left, list_right), q))
# complete left triangle
....
# incomplete left trapezoidal
....
keep_highest_score(C[s][t][->][0], q, pos_head, pos_mid)
keep_highest_score(C[s][t][<-][0], q, pos_head, pos_mid)
keep_k_scores(C, s, t)
This can also implemented by dict. In this case, the function keep_k_scores would extract items from the dict, sort them, and delete items other than k best from the dict.
The text was updated successfully, but these errors were encountered:
Just find out the design may not be applicable. In the design, only the POS tag of s and t are determined, but when the feature generator generates in_between features, it requires that all words between s and t have only one POS tag.
english_1st_fgen.pyx Line #208
for between_index in range(start_index + 1, end_index):
xb_pos = self.POSTAG[between_index]
# Add all words between xi and xj into the feature
local_fv.add( (2,0,xi_pos,xb_pos,xj_pos) )
Thus, in_between features depend on the POS tag of the words between s and t.
Also, surrounding features depend on the POS tag of the surrounding words.
Possible solutions:
generate all possible features (with all possible in_between POS tags), but this may not be good for training
dismiss in_between features and surrounding features, but also not be good for training
I'm still trying to invent a better design. Anyone has a good idea on this problem?
Third solution is to take the single best POS tag for each word between head and modifier for this feature function.
We need to have the single best POS tag computed anyway for unknown words. In this case, we will use it for known words as well for this feature function.
For 1st order features
Eisner state C[s][t][d][c] is a dict, of which the keys and values are both tuples:
{(POS_of_head, POS_of_dependent): (score, mid_index), ....}
Keep every combination of POSTAG of head and dependent
For 2nd order features (with cube pruning)
Eisner state C[s][t][d][c] is a list of tuples:
[(feature_signature , POS_of_head, POS_of_dependent, score, mid_index), ....]
Keep only top k tuples
This can also implemented by dict. In this case, the function
keep_k_scores
would extract items from the dict, sort them, and delete items other than k best from the dict.The text was updated successfully, but these errors were encountered: