1. DPWord2Vec: Better Representation of Design Patterns in Semantics
- Author
-
Dong Liu, Lei Qiao, Zhilei Ren, Zuohua Ding, Xiaochen Li, and He Jiang
- Subjects
Word embedding ,Plain text ,business.industry ,Computer science ,Design pattern ,020207 software engineering ,02 engineering and technology ,computer.file_format ,Semantics ,computer.software_genre ,Software design pattern ,0202 electrical engineering, electronic engineering, information engineering ,Word2vec ,Artificial intelligence ,business ,computer ,Software ,Natural language processing ,Word (computer architecture) ,Natural language - Abstract
With the plain text descriptions of design patterns, developers could better learn and understand the definitions and usage scenarios of design patterns. To facilitate the automatic usage of these descriptions, e.g., recommending design patterns by free-text queries, design patterns and natural languages should be adequately associated. Existing studies usually use texts in design pattern books as the representations of design patterns to calculate similarities with the queries. However, this way is problematic. Lots of information of design patterns may be absent from design pattern books and many words would be out of vocabulary due to the content limitation of these books. To overcome these issues, a more comprehensive method should be constructed to estimate the relatedness between design patterns and natural language words. Motivated by Word2Vec, in this study, we propose DPWord2Vec that embeds design patterns and natural language words into vectors simultaneously. We first build a corpus containing more than 400 thousand documents extracted from design pattern books, Wikipedia, and Stack Overflow. Next, we redefine the concept of context window to associate design patterns with words. Then, the design pattern and word vector representations are learnt by leveraging an advanced word embedding method. The learnt design pattern and word vectors can be universally used in textual description based design pattern tasks. An evaluation shows that DPWord2Vec outperforms the baseline algorithms by 17.1%-96.5% in measuring the similarities between design patterns and words in terms of Spearman's rank correlation coefficient. Moreover, we adopt DPWord2Vec on two typical design pattern tasks. In the design pattern tag recommendation task, the DPWord2Vec based method outperforms two state-of-the-art algorithms by 6.6% and 32.7% respectively when considering Recall@10. In the design pattern selection task, DPWord2Vec improves the existing methods by 6.5%-70.7% in terms of MRR.
- Published
- 2022
- Full Text
- View/download PDF