NLP IIT POS tagging
31 Dec 2017
NLP POS-tagging (lecture by Pushpak Bhattacharyya)
- NLP= Ambiguity Processing
- Lexical Ambiguity: dog (noun vs verb), (animal vs detesable person), contexts.
- Structural Ambiguity
- Semantic Ambiguity
- Pragmatic Ambiguity
- Main methodology
- A: extract parts & features
- B: which is in correspondence with A: extract parts and features
- Learn mapping of these features and parts
- Apply to new situations (decoding)
POS tagging
- POS Tagging: attaches to each word in a sentence a part of speech tag from a given set of tags called the Tag-Set
- A word can have multiple POS tags
- New examples break rules, so we need a robust system.
- Generative: HMM
- Training: Maximize the likelihood of observations
- Testing: search the best POS tag sequence in the hypothesis space
- generate POS tag sequences and score them
- HMM
- Given the observation sequence, find the possible state sequences- Viterbi
- Given the observation sequence, find its probability- forward/backward algorithm
- Given the observation sequence find the HMM prameters.- Baum-Welch algorithm
- Discriminative
- Training: Maximize entropy of probability distribution subject to the constraints from data
- Testing: discriminate amongst hypotheses by scoring them
- MEMM, CRF
- Classification
- Training: Minimize loss function (Total sum square error, cross entropy- soft_max)
- Testing: classify the input into one of the class
- SVM, neural net
- Deep Learning
- Training: minimize the loss function
- DL = feature discovery + classification
- classifiy POS tag + discover required features
- Deep Learning II - sequence to sequence
- An RNN encoder (words -> hidden states) + a RNN decoder (hidden states -> tags)
- get a richer representations from the sentences
- long dependencies
- Represent the source sentence by the set of output vectors from the encoder)
- Replace RNN encoder to LSTM
- RNN cann’t see future -> use a bdirectional RNN/LSTM
- Due to the sequential nature of RNN, parallelism is limited.
- Compare to CNN: good in parallel, can’t model the whole history, more local.
- POS tagging vs MT:
- 1-1 vs 1 to none ~ many
- the order may change in MT
- one to many/ many to one
- Complexity:
- POS, HMM: linear
- MT, Bear search: exponential