8 results on '"Derong Shen"'
Search Results
2. A new symbolic representation method for time series
- Author
-
Yucheng Li and Derong Shen
- Subjects
Information Systems and Management ,Artificial Intelligence ,Control and Systems Engineering ,Software ,Computer Science Applications ,Theoretical Computer Science - Published
- 2022
3. Towards deep entity resolution via soft schema matching
- Author
-
Derong Shen and Chenchen Sun
- Subjects
Complex data type ,business.industry ,Computer science ,Cognitive Neuroscience ,Deep learning ,Semantics ,computer.software_genre ,Schema matching ,Computer Science Applications ,Weighting ,Schema (genetic algorithms) ,Artificial Intelligence ,Key (cryptography) ,Data mining ,Data pre-processing ,Artificial intelligence ,business ,computer - Abstract
Entity resolution (ER) leads a key role in data preprocessing. ER identifies records corresponding to the same real-world entity. Recent years have witnessed a growing trend of deep learning based ER (deep ER). However, previous deep ER works do not fully utilize schema semantics, since they either use hard schema matching or disregard schema matching. In this work, we flexibly exploit schema matching to enhance deep ER. We define and implement soft schema matching, where attributes are flexibly associated in probabilities. Attribute associations are generated by aggregating token connections in coarse deep ER. Then we incorporate soft schema matching into hierarchical attention networks for ER, which tremendously improves resolution quality, especially for complex data and corrupted data. Different attentions are utilized for particular sub-tasks in ER networks, such as self-attention for contextualization, inter-attention for alignment and intra-attention for weighting. Finally comprehensive experiments are run over common data, complex data and corrupted data. Evaluation results show that our approach surpasses previous works.
- Published
- 2022
4. Clustering Time Series with Diverse Shape Variability Measure
- Author
-
Yucheng Li and Derong Shen
- Published
- 2023
5. A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data
- Author
-
Nan Yin, Zhaozhao Xu, Xi Han, Yue Kou, Tiezheng Nie, and Derong Shen
- Subjects
Information Systems and Management ,Computer science ,05 social sciences ,Decision tree ,k-means clustering ,050301 education ,Sample (statistics) ,02 engineering and technology ,Computer Science Applications ,Theoretical Computer Science ,Random forest ,Artificial Intelligence ,Control and Systems Engineering ,0202 electrical engineering, electronic engineering, information engineering ,Oversampling ,020201 artificial intelligence & image processing ,Sensitivity (control systems) ,0503 education ,Algorithm ,Software ,Cluster based ,Interpolation - Abstract
The algorithm of C4.5 decision tree has the advantages of high classification accuracy , fast calculation speed and comprehensible classification rules, so it is widely used for medical data analysis. However, for imbalanced medical data, the classification accuracy of decision trees-based models is not ideal. Therefore, this paper proposes a cluster-based oversampling algorithm (KNSMOTE) combining Synthetic minority oversampling technique (SMOTE) and k-means algorithm. The sample classes clustered by k -means and the original sample classes are calculated to select the ‘‘safe samples” whose sample classes have not been changed. The ‘‘safe samples” are linearly interpolated to synthesize the new samples. The improved SMOTE sets the oversampling ratio according to the imbalance ratio of the original samples, which is used to synthesize the samples whose number is the same as that of the original samples. Compared with other oversampling algorithms on 8 UCI datasets, our algorithm has achieved significant advantages. Our algorithm was applied to the medical datasets, and the average values of the Sensitivity and Specificity indexes of the Random forest (RF) algorithm were 99.84% and 99.56%, respectively.
- Published
- 2021
6. Accelerating Progressive Set Similarity Join with the CPU-GPU Architecture
- Author
-
Yue Kou, Tiezheng Nie, Derong Shen, and Lining Yu
- Subjects
Information Systems and Management ,Matching (graph theory) ,Computer science ,Search engine indexing ,Parallel computing ,Bloom filter ,computer.software_genre ,Computer Science Applications ,Management Information Systems ,Set (abstract data type) ,Similarity (network science) ,Central processing unit ,Graphics ,computer ,Information Systems ,Data integration - Abstract
Set similarity join (SSJoin) is known as an important operation for searching similarity set pairs from the given database and plays a core role in data integration, data cleaning, and data mining. Different from the traditional SSJoin methods, progressive SSJoin aims to resolve large datasets so that the efficiency of finding similarity pairs in the limited running time can be improved. Progressive SSJoin can provide possible partial matching pairs of the dataset as early as possible in the processing. Moreover, many recent researches have shown that GPUs (Graphics Processing Units) can accelerate and improve the efficiency of similarity join operation. This paper focuses on exploring progressive SSJoin algorithms and accelerating them with the CPU-GPU architecture. We propose two progressive SSJoin methods, PSSJM and PBM. PSSJM utilizes inverted indexing and PBM achieves its required functions by utilizing the counting Bloom filter and prefix filtering techniques. In addition, we proposed a GPUs-based algorithm based on our progressive SSJoin method to accelerate the processing. Comprehensive experiments with real-world datasets show that our methods can generate better quality results than the traditional method under limited time and the method implementing on CPU-GPU architecture has high speedups over the CPU-base method.
- Published
- 2021
7. Attentional Memory Network with Correlation-based Embedding for time-aware POI recommendation
- Author
-
Yue Kou, Derong Shen, Tiezheng Nie, Ge Yu, and Meihui Shi
- Subjects
Information Systems and Management ,Mechanism (biology) ,Computer science ,02 engineering and technology ,computer.software_genre ,Management Information Systems ,Correlation ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Data mining ,computer ,Software - Abstract
As considerable amounts of point-of-interest (POI) check-in data have been accumulated, POI recommendation has received much attention recently. It is well recognized that spatial–temporal information plays an important role in the user’s decision-making for visiting a POI. However, in time-aware POI recommendation, exploring temporal patterns on user preferences and incorporating multi-view factors for choosing preferred POIs are challenging issues to be resolved. To this end, we propose a novel Attentional Memory Network with Correlation-based Embedding (AMN-CE) for time-aware POI recommendation. Specifically, we first propose a correlation-based POI embedding method to capture geographical influence and interactive correlation between POIs. Sequentially, we design an attentional memory network, which is able to capture the micro-level relationship between time slot pairs. Furthermore, we propose a temporal-level attention mechanism to distinguish and dynamically adjust the influence strength of different time slots on user preferences at the target time slot. The experimental results on four real-life datasets demonstrate significant improvements of our proposed method compared with state-of-the-art models.
- Published
- 2021
8. A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data
- Author
-
Derong Shen, Yue Kou, Tiezheng Nie, and Zhaozhao Xu
- Subjects
0303 health sciences ,Computer science ,Sampling (statistics) ,Health Informatics ,Matthews correlation coefficient ,Class (biology) ,Computer Science Applications ,Random forest ,03 medical and health sciences ,Identification (information) ,Statistical classification ,0302 clinical medicine ,Research Design ,Humans ,030212 general & internal medicine ,Noise (video) ,Medical diagnosis ,Algorithm ,Algorithms ,030304 developmental biology - Abstract
The problem of imbalanced data classification often exists in medical diagnosis. Traditional classification algorithms usually assume that the number of samples in each class is similar and their misclassification cost during training is equal. However, the misclassification cost of patient samples is higher than that of healthy person samples. Therefore, how to increase the identification of patients without affecting the classification of healthy individuals is an urgent problem. In order to solve the problem of imbalanced data classification in medical diagnosis, we propose a hybrid sampling algorithm called RFMSE, which combines the Misclassification-oriented Synthetic minority over-sampling technique (M-SMOTE) and Edited nearset neighbor (ENN) based on Random forest (RF). The algorithm is mainly composed of three parts. First, M-SMOTE is used to increase the number of samples in the minority class, while the over-sampling rate of M-SMOTE is the misclassification rate of RF. Then, ENN is used to remove the noise ones from the majority samples. Finally, RF is used to perform classification prediction for the samples after hybrid sampling, and the stopping criterion for iterations is determined according to the changes of the classification index (i.e. Matthews Correlation Coefficient (MCC)). When the value of MCC continuously drops, the process of iterations will be stopped. Extensive experiments conducted on ten UCI datasets demonstrate that RFMSE can effectively solve the problem of imbalanced data classification. Compared with traditional algorithms, our method can improve F-value and MCC more effectively.
- Published
- 2020
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.