Why we need text summarization
Information is everywhere and omnipresent data fill up. With the unrelated information blowing up, finding useful and proper text data we need becomes tougher. So the summarization is greatly needed. The objective of automatic text summarization is to condense the origin text into a precise version preserves its report content and global denotation.
There’re 2 main approaches
1. Extractive Methods.
2. Abstractive Methods.
Extractive Methods
The model extracts keywords and keyphrases from the text, which could be salient parts of the source document. The significance of sentences is strongly based on statistical and linguistic features of sentences.
Unsupervised Methods
Graph based approach
Graph can effectively represent the document structure. And some external knowledge (e.g. Wikipedia) can be incorporated.Fuzzy logic based approach
Summarization based on fuzzy rule using various sets of features.Concept-based approach
Importance of sentences is calculated based on the concepts retrieved from external knowledge base.
Text Features
- Content Key Word (e.g. based on TF-IDF)
- Title Word
- Cue Phrase (words indicating structure)
- Biased Word (e.g. domain specific words)
- Sentence Location (e.g. beginning and conclusion part)
- Sentence Length (very long sentences have less chance to be important)
- Paragraph Location (e.g. in peripheral sections)
- Cohesion between Sentences (e.g. similarity)
*Text summarization using the TextRank algorithm
Supervised Methods
- Machine Learning approach
- neural network based approaches
- attentional encoder-decoder
It is usually be regarded as a classification task
Abstractive Methods
The model not only extracts but also concisely paraphrases the important parts of the document via generation, it can overcome the grammar inconsistencies of the extractive method.
- Recursive Autoencoder
- Neural network based approaches
- Attentional feed-forward network
- RNN-based encoder-decoder models
- Reinforcement learning for sequence generation
Metrics
ROUGE scores (Recall-Oriented Understudy for Gisting Evaluation)
ROUGE simply counts how many n-grams in your generated summary matches the n-grams in your reference summary
Reference
[1]Moratanch N, Chitrakala S. A survey on extractive text summarization[C]//2017 International Conference on Computer, Communication and Signal Processing (ICCCSP). IEEE, 2017: 1-6.
[2]An Introduction to Text Summarization using the TextRank Algorithm