Back to Search
Start Over
Assessing Scientific Text Similarity: A Novel Approach Utilizing Non-Negative Matrix Factorization and Bidirectional Encoder Representations from Transformer.
- Source :
-
Mathematics (2227-7390) . Nov2024, Vol. 12 Issue 21, p3328. 18p. - Publication Year :
- 2024
-
Abstract
- The patent serves as a vital component of scientific text, and over time, escalating competition has generated a substantial demand for patent analysis encompassing areas such as company strategy and legal services, necessitating fast, accurate, and easily applicable similarity estimators. At present, conducting natural language processing(NLP) on patent content, including titles, abstracts, etc., can serve as an effective method for estimating similarity. However, the traditional NLP approach has some disadvantages, such as the requirement for a huge amount of labeled data and poor explanation of deep-learning-based model internals, exacerbated by the high compression of patent content. On the other hand, most knowledge-based deep learning models require a vast amount of additional analysis results as training variables in similarity estimation, which are limited due to human participation in the analysis part. Thus, in this research, addressing these challenges, we introduce a novel estimator to enhance the transparency of similarity estimation. This approach integrates a patent's content with international patent classification (IPC), leveraging bidirectional encoder representations from transformers (BERT), and non-negative matrix factorization (NMF). By integrating these techniques, we aim to improve knowledge discovery transparency in NLP across various IPC dimensions and incorporate more background knowledge into context similarity estimation. The experimental results demonstrate that our model is reliable, explainable, highly accurate, and practically usable. [ABSTRACT FROM AUTHOR]
- Subjects :
- *LANGUAGE models
*MATRIX decomposition
*NONNEGATIVE matrices
*LEGAL services
Subjects
Details
- Language :
- English
- ISSN :
- 22277390
- Volume :
- 12
- Issue :
- 21
- Database :
- Academic Search Index
- Journal :
- Mathematics (2227-7390)
- Publication Type :
- Academic Journal
- Accession number :
- 180781145
- Full Text :
- https://doi.org/10.3390/math12213328