Back to Search
Start Over
A New Biomedical Text Summarization Method Based on Sentence Clustering and Frequent Itemsets Mining
- Source :
- Smart Innovation, Systems and Technologies ISBN: 9783030210045
- Publication Year :
- 2019
- Publisher :
- Springer International Publishing, 2019.
-
Abstract
- In this paper, we combined sentence clustering and frequent itemsets mining to build a single biomedical text summarization method. Biomedical documents are represented as a sets of UMLS concepts. Very generic concepts are discarded. The vector space model is used to represent sentences. The K-means clustering algorithm is applied to cluster semantically similar sentences. The most frequent itemsets are extracted among the global cluster. The generated frequent itemsets are used to calculate the score of sentences. The top N highly scoring sentences are selected to represent the final summary. The method is evaluated against three summarizers: TextRank, SweSum and Itemset based summarizer on a 50 randomly selected biomedical papers from the BioMed Central database full text. The evaluation process consists of comparing the generated summaries with the abstracts of these papers using the ROUGE toolkit. Our method achieved good results, it ranked first in ROUGE-1 and ROUGE-2 measures with an improvement of \(\sim \)3% than the Itemset based summarizer and it ranked second in ROUGE-SU4 measure with a diminution of \(\sim \)1% always against the Itemset based summarizer.
- Subjects :
- 0301 basic medicine
Computer science
business.industry
Unified Medical Language System
Sentence clustering
02 engineering and technology
computer.software_genre
Automatic summarization
03 medical and health sciences
030104 developmental biology
Biomedical text
0202 electrical engineering, electronic engineering, information engineering
Vector space model
020201 artificial intelligence & image processing
Artificial intelligence
Cluster analysis
business
Central database
computer
Natural language processing
Subjects
Details
- ISBN :
- 978-3-030-21004-5
- ISBNs :
- 9783030210045
- Database :
- OpenAIRE
- Journal :
- Smart Innovation, Systems and Technologies ISBN: 9783030210045
- Accession number :
- edsair.doi...........8d29e302d93d9050c0278e7b3494913c