5 Matching Annotations
- Apr 2018
-
s3.amazonaws.com s3.amazonaws.com
-
features which represent regular expres-sions; a feature which indicates whether the word appears within quotes; dictionary features for the current word and a window of ±2 words around it; features which indicate if the current word is in or a part of an expression in one of the pre-processed lists.
HMM architecture with multiple features per state
-
The disadvantage of this definition is that the features are considered as a union and not separately. The simple HMM only emits one symbol to each state. A Limitation of this model is the use of the inde-pendence assumption which is not always correct in the NER problem.
-
States are defined as a product of the set of pos-sible name classes and the set of possible Part of Speech tags. For example, there would be a state for PERSON + NOUN, PERSON + VERB etc. We also define special states for the beginning and end of a sentence. Overall we get a set of 212 states. The intuition for this state definition came from the fact that the syntactic structure of the sentence has great impact on the prediction of name classes. Defining the Part of Speech tags as part of the HMM states emphasizes the structure of the sentence through the transition probabili-ties.
HMM Architecture
-
Transition and emission probabilities were calculated using maximum likelihood esti-mators
HMM architecture training
-
We experimented with several HMMs to ap-proach the Hebrew NER problem. In each model we defined the alphabet and the state set differ-ently. The following describes the HMM which produced the best results throughout our experi-ments.
-