Back to Search
Start Over
Bug or Not? Bug Report Classification Using N-Gram IDF
- Source :
- ICSME
- Publication Year :
- 2017
- Publisher :
- IEEE, 2017.
-
Abstract
- Previous studies have found that a significant number of bug reports are misclassified between bugs and non-bugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.<br />5 pages, ICSME 2017
- Subjects :
- Topic model
FOS: Computer and information sciences
Computer science
Testing
Feature extraction
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
Logistics
02 engineering and technology
computer.software_genre
regression analysis
Task (project management)
Computer Science - Software Engineering
pattern classification
Software
inverse document frequency
bug reports
phrase handling
0202 electrical engineering, electronic engineering, information engineering
Training
tf–idf
bug report classification
software engineering tasks
business.industry
logistic regression
020207 software engineering
text analysis
data mining
program debugging
Random forest
word handling
Software Engineering (cs.SE)
N-gram IDF
n-gram
ComputingMethodologies_PATTERNRECOGNITION
Software bug
Computer bugs
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
random forest
Natural language processing
software engineering
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- ICSME
- Accession number :
- edsair.doi.dedup.....03d1d01c332b56920a5743536ae32e2e