NLP IIT POS tagging

31 Dec 2017

screen shot 2017-12-31 at 12 57 24 pm

NLP= Ambiguity Processing
- Lexical Ambiguity: dog (noun vs verb), (animal vs detesable person), contexts.
- Structural Ambiguity
- Semantic Ambiguity
- Pragmatic Ambiguity

Main methodology
- A: extract parts & features
- B: which is in correspondence with A: extract parts and features
- Learn mapping of these features and parts
- Apply to new situations (decoding)

POS Tagging: attaches to each word in a sentence a part of speech tag from a given set of tags called the Tag-Set
A word can have multiple POS tags
New examples break rules, so we need a robust system.
Generative: HMM
- Training: Maximize the likelihood of observations
- Testing: search the best POS tag sequence in the hypothesis space
- generate POS tag sequences and score them
- HMM
- Given the observation sequence, find the possible state sequences- Viterbi
- Given the observation sequence, find its probability- forward/backward algorithm
- Given the observation sequence find the HMM prameters.- Baum-Welch algorithm

Discriminative
- Training: Maximize entropy of probability distribution subject to the constraints from data
- Testing: discriminate amongst hypotheses by scoring them
- MEMM, CRF
Classification
- Training: Minimize loss function (Total sum square error, cross entropy- soft_max)
- Testing: classify the input into one of the class
- SVM, neural net
Deep Learning
- Training: minimize the loss function
- DL = feature discovery + classification
- classifiy POS tag + discover required features
Deep Learning II - sequence to sequence
- An RNN encoder (words -> hidden states) + a RNN decoder (hidden states -> tags)
- get a richer representations from the sentences
- long dependencies
  - Represent the source sentence by the set of output vectors from the encoder)
  - Replace RNN encoder to LSTM
- RNN cann’t see future -> use a bdirectional RNN/LSTM
- Due to the sequential nature of RNN, parallelism is limited.
- Compare to CNN: good in parallel, can’t model the whole history, more local.
POS tagging vs MT:
- 1-1 vs 1 to none ~ many
- the order may change in MT
- one to many/ many to one
Complexity:
- POS, HMM: linear
- MT, Bear search: exponential

screen shot 2017-12-31 at 12 57 32 pm