Descriptor: "Data set" / Publication Year Range: Last 3 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Data set"' showing total 1,395 results

Start Over Descriptor "Data set" Publication Year Range Last 3 years

1,395 results on '"Data set"'

1. Recognition of Soybean Crops and Weeds with YOLO v4 and UAV

Author: Symagulov, Adilkhan, Kuchin, Yan, Yakunin, Kirill, Murzakhmetov, Sanzhar, Yelis, Marina, Oxenenko, Alexey, Assanov, Ilyas, Bastaubayeva, Sholpan, Tabynbaeva, Laila, Rabčan, Jan, Mukhamediev, Ravil, Brilly, Mitja, Advisory Editor, Davis, Richard A., Advisory Editor, Hoalst-Pullen, Nancy, Advisory Editor, Leitner, Michael, Advisory Editor, Patterson, Mark W., Advisory Editor, Veress, Márton, Advisory Editor, Bolgov, Radomir, editor, Mukhamediev, Ravil, editor, Pereira, Roberto, editor, and Mityagin, Sergey, editor
Published: 2024
Full Text: View/download PDF

2. An Unmanned System for Automatic Classification of Hazardous Wastes in Norway

Author: Gröling, Marc, Huang, Laurent, Hameed, Ibrahim A., Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
Published: 2024
Full Text: View/download PDF

3. Review of Deep Learning-Based Entity Alignment Methods

Author: Lu, Dan, Han, Guoyu, Zhao, Yingnan, Han, Qilong, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Jin, Hai, editor, Yu, Zhiwen, editor, Yu, Chen, editor, Zhou, Xiaokang, editor, Lu, Zeguang, editor, and Song, Xianhua, editor
Published: 2024
Full Text: View/download PDF

4. Application of K-Means Clustering Algorithm in Automatic Machine Learning

Author: Ji, Dongri, Zhang, Ming, Luo, Xin, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Jiming, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Zhang, Junjie James, Series Editor, Tan, Kay Chen, Series Editor, Hung, Jason C., editor, Yen, Neil, editor, and Chang, Jia-Wei, editor
Published: 2024
Full Text: View/download PDF

5. Hateful Messages: A Conversational Data Set of Hate Speech Produced by Adolescents on Discord

Author: Fillies, Jan, Peikert, Silvio, Paschke, Adrian, Haber, Peter, editor, Lampoltshammer, Thomas J., editor, and Mayr, Manfred, editor
Published: 2024
Full Text: View/download PDF

6. Irregular Step of Changing for Neural Network Data Sets Improves the Accuracy of Resistive Sensors Calculation

Author: Penin, Alexandr, Sidorenko, Anatolie, Magjarević, Ratko, Series Editor, Ładyżyński, Piotr, Associate Editor, Ibrahim, Fatimah, Associate Editor, Lackovic, Igor, Associate Editor, Rock, Emilio Sacristan, Associate Editor, Sontea, Victor, editor, Tiginyanu, Ion, editor, and Railean, Serghei, editor
Published: 2024
Full Text: View/download PDF

7. A dedicated structured data set for reporting of invasive carcinoma of the breast in the setting of neoadjuvant therapy: recommendations from the International Collaboration on Cancer Reporting (ICCR).

Author: Bossuyt, Veerle, Provenzano, Elena, Symmans, W Fraser, Webster, Fleur, Allison, Kimberly H, Dang, Chau, Gobbi, Helenice, Kulka, Janina, Lakhani, Sunil R, Moriya, Takuya, Quinn, Cecily M, Sapino, Anna, Schnitt, Stuart, Sibbering, D Mark, Slodkowska, Elzbieta, Yang, Wentao, Tan, Puay Hoon, and Ellis, Ian
Subjects: *BREAST, *NEOADJUVANT chemotherapy, *LOBULAR carcinoma, *CARCINOMA in situ, *CARCINOMA, *DUCTAL carcinoma
Abstract: Aims: The International Collaboration on Cancer Reporting (ICCR), a global alliance of major (inter‐)national pathology and cancer organisations, is an initiative aimed at providing a unified international approach to reporting cancer. ICCR recently published new data sets for the reporting of invasive breast carcinoma, surgically removed lymph nodes for breast tumours and ductal carcinoma in situ, variants of lobular carcinoma in situ and low‐grade lesions. The data set in this paper addresses the neoadjuvant setting. The aim is to promote high‐quality, standardised reporting of tumour response and residual disease after neoadjuvant treatment that can be used for subsequent management decisions for each patient. Methods: The ICCR convened expert panels of breast pathologists with a representative surgeon and oncologist to critically review and discuss current evidence. Feedback from the international public consultation was critical in the development of this data set. Results: The expert panel concluded that a dedicated data set was required for reporting of breast specimens post‐neoadjuvant therapy with inclusion of data elements specific to the neoadjuvant setting as core or non‐core elements. This data set proposes a practical approach for handling and reporting breast resection specimens following neoadjuvant therapy. The comments for each data element clarify terminology, discuss available evidence and highlight areas with limited evidence that need further study. This data set overlaps with, and should be used in conjunction with, the data sets for the reporting of invasive breast carcinoma and surgically removed lymph nodes from patients with breast tumours, as appropriate. Key issues specific to the neoadjuvant setting are included in this paper. The entire data set is freely available on the ICCR website. Conclusions: High‐quality, standardised reporting of tumour response and residual disease after neoadjuvant treatment are critical for subsequent management decisions for each patient. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Kernel rootkit detection multi class on deep learning techniques.

Author: Srinivasan, Suresh Kumar and Thalavaipillai, Sudalai Muthu
Subjects: DEEP learning, MACHINE learning, ARTIFICIAL intelligence, CLOUD computing, COMPUTER systems
Abstract: The harmful code application known as a rootkit is designed to be loaded and run directly from the operating system's (OSs') Kernel. Rootkits deployed in the Kernel, called Kernel-mode rootkits, can alter the OS. The intention behind these Kernel changes is to conceal the hack. Detecting a Kernel rootkit in a target machine is found to be quite challenging. Numerous techniques can be employed to modify the Kernel of a system. Kernel rootkits also create hidden access for attacks, enabling unauthorized entry to be gained by attackers on the machine. The ultimate consequence is that essential computer data can be modified, personal information can be gathered, and hackers can observe behavior. Synthetic neural networks support artificial intelligence, a branch of deep learning that models the human brain and operates on large datasets. This study proposed the Kernel rootkit detection multi-class deep learning techniques (KRDMCDLT). Deep learning algorithms are utilized to recognize the Kernel rootkit from a batch of data by selecting essential properties for learning tracking models. Thus, by identifying the OS malware, trojan assaults can be stopped before they can access infected data. This Kernel rootkit detection was tested in a Google Cloud Platform (GCP) computing system. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Potsdam data set of eye movement on natural scenes (DAEMONS).

Author: Schwetlick, Lisa, Kümmerer, Matthias, Bethge, Matthias, and Engbert, Ralf
Subjects: EYE movements, GAZE, ARTIFICIAL neural networks, DATA modeling, COGNITIVE science, COMPUTER vision, HUMAN mechanics
Abstract: The article "Potsdam data set of eye movement on natural scenes (DAEMONS)" emphasizes the significance of high-quality, openly accessible data sets for studying eye movement behavior. The authors have compiled and released a substantial data set of eye tracking data on 2,400 color photographs of natural scenes. This data set, called DAEMONS, includes annotations and serves as a benchmark for scan path modeling and spatial saliency prediction. The article provides comprehensive details about the stimulus material, image annotations, participants, and experimental setup. The study was conducted ethically, and the authors express gratitude to the individuals and organizations involved in the research. The dataset, along with the stimulus images and eye tracking experiment data, is available in online repositories. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

10. A Comprehensive Northern Hemisphere Particle Microphysics Data Set From the Precipitation Imaging Package.

Author: King, Fraser, Pettersen, Claire, Bliven, Larry F., Cerrai, Diego, Chibisov, Alexey, Cooper, Steven J., L'Ecuyer, Tristan, Kulie, Mark S., Leskinen, Matti, Mateling, Marian, McMurdie, Lynn, Moisseev, Dimitri, Nesbitt, Stephen W., Petersen, Walter A., Rodriguez, Peter, Schirtzinger, Carl, Stuefer, Martin, von Lerber, Annakaisa, Wingo, Matthew T., and Wolff, David B.
Subjects: *MICROPHYSICS, *NUMERICAL weather forecasting, *WEATHER forecasting, *QUALITY control, *PRECIPITATION (Chemistry), *RAIN gauges, *PARTICLE size determination
Abstract: Microphysical observations of precipitating particles are critical data sources for numerical weather prediction models and remote sensing retrieval algorithms. However, obtaining coherent data sets of particle microphysics is challenging as they are often unindexed, distributed across disparate institutions, and have not undergone a uniform quality control process. This work introduces a unified, comprehensive Northern Hemisphere particle microphysical data set from the National Aeronautics and Space Administration precipitation imaging package (PIP), accessible in a standardized data format and stored in a centralized, public repository. Data is collected from 10 measurement sites spanning 34° latitude (37°N–71°N) over 10 years (2014–2023), which comprise a set of 1,070,000 precipitating minutes. The provided data set includes measurements of a suite of microphysical attributes for both rain and snow, including distributions of particle size, vertical velocity, and effective density, along with higher‐order products including an approximation of volume‐weighted equivalent particle densities, liquid equivalent snowfall, and rainfall rate estimates. The data underwent a rigorous standardization and quality assurance process to filter out erroneous observations to produce a self‐describing, scalable, and achievable data set. Case study analyses demonstrate the capabilities of the data set in identifying physical processes like precipitation phase‐changes at high temporal resolution. Bulk precipitation characteristics from a multi‐site intercomparison also highlight distinct microphysical properties unique to each location. This curated PIP data set is a robust database of high‐quality particle microphysical observations for constraining future precipitation retrieval algorithms, and offers new insights toward better understanding regional and seasonal differences in bulk precipitation characteristics. Plain Language Summary: This work introduces a new particle microphysics data set that is useful for improving weather prediction models and in enhancing precipitation estimation techniques. The data set, produced from National Aeronautics and Space Administration's precipitation imaging package, is comprehensive, well documented, and easy to access. It includes observations from 10 locations across the Northern Hemisphere over 10 years, providing information on both rain and snow. This information includes details like particle size, speed, and density, as well as estimates of rainfall and snowfall rates. The data has been standardized and checked for quality, making it reliable and easy to use. This product is a valuable resource for refining methods to measure precipitation, and offers new insights into regional and seasonal precipitation patterns. Key Points: This data set contains high temporal resolution, disdrometer‐derived precipitation microphysics observations from 10 sites over 10 yearsRigorous quality control practices yield a scalable, self‐describing data set packaged into a common, standardized NetCDF formatThe data's diverse geographic and environmental coverage offers new insights into regional and seasonal precipitation processes and patterns [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. The effect of the re-segmentation method on improving the performance of rectal cancer image segmentation models.

Author: Lei, Jie, Huang, YiJun, Chen, YangLin, Xia, Linglin, and Yi, Bo
Subjects: *IMAGE segmentation, *RECTAL cancer, *DEEP learning, *CANCER hospitals, *COMPUTED tomography, RECTUM tumors
Abstract: BACKGROUND: Rapid and accurate segmentation of tumor regions from rectal cancer images can better understand the patientâs lesions and surrounding tissues, providing more effective auxiliary diagnostic information. However, cutting rectal tumors with deep learning still cannot be compared with manual segmentation, and a major obstacle to cutting rectal tumors with deep learning is the lack of high-quality data sets. OBJECTIVE: We propose to use our Re-segmentation Method to manually correct the model segmentation area and put it into training and training ideas. The data set has been made publicly available. Methods: A total of 354 rectal cancer CT images and 308 rectal region images labeled by experts from Jiangxi Cancer Hospital were included in the data set. Six network architectures are used to train the data set, and the region predicted by the model is manually revised and then put into training to improve the ability of model segmentation and then perform performance measurement. RESULTS: In this study, we use the Resegmentation Method for various popular network architectures. CONCLUSION: By comparing the evaluation indicators before and after using the Re-segmentation Method, we prove that our proposed Re-segmentation Method can further improve the performance of the rectal cancer image segmentation model. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. FEMA P-154 Formlarının Dijitalleştirilmesi İçin Bir Python Tabanlı Uygulama

Author: Asena Soyluk and Nurdan Talaslıoğlu
Subjects: python, hızlı görsel tarama, deprem güvenlik değerlendirmesi, veri seti, rapid visual screening, earthquake safety assessment, data set, Architecture, NA1-9428, Architectural drawing and design, NA2695-2793
Abstract: Deprem kuşağında yer alan ve büyük deprem tehditleri ile karşı karşıya olan Türkiye, 6 Şubat depremleri dolayısıyla ciddi kayıplar vermiştir. Deprem yönetmeliklerine uygun olmayan birçok mevcut-yeni yapıya sahip yerleşim yerleri bulunmaktadır. Bu nedenle, olası bir depremde can ve mal kayıplarına engel olmak için, yapı stokundaki binaların deprem dayanımlarının güvenli ve hızlı şekilde araştırılması gerekliliği ortaya çıkmaktadır. Yapı stokunun olası bir deprem etkisiyle oluşturacakları risk düzeyinin gerçekçi şekilde belirlenebilmesi için detaylı yapısal araştırmaların yapılması gereklidir. Hızlı ve güvenilir olması sebebiyle sık kullanılan değerlendirme metotlarından biri olan “FEMA P-154” hızlı değerlendirme yönteminin Python tabanlı bir uygulama ile uygulanarak sonuçların veri seti haline getirilmesi amaçlanmıştır. Uygulama ile FEMA P-154 formunun uygulanmasında süre, maliyet, iş gücü ve depolamada avantaj elde edilebilmektedir. Bu yöntem sayesinde hızlı bir şekilde hazırlanabilen veri setleri daha sonra yapay zekâ projelerinde kullanılmak üzere saklanabilir ve deprem dayanımı uygun olmayan binaların öncelik sırasına göre dönüşümünün yapılması için referans oluşturabilir.
Published: 2024
Full Text: View/download PDF

13. Multi-Environment Adaptive Fast Constant False Alarm Detection Algorithm Optimization Strategy

Author: Wei Li, Qian Wang, Yuan-shuai Lan, and Chang-song Ma
Subjects: data set, ESVI-CFAR, false alarm probability, noise, VI-CFAR, Engineering (General). Civil engineering (General), TA1-2040
Abstract: It takes a long time to detect target information in noisy radar information and reduce the probability of false alarm. Therefore, it has become a research direction to reduce the probability of false alarm and the time of effective target detection. This paper introduces a new method to reduce the occurrence of false alarm in non-uniform environment and improve the efficiency of target detection. The proposed method involves a faster and more stable method that involves preprocessing the data set, splitting it into smaller parts, and utilizing a KTH minimum value M determined by an ordered statistics class constant false alarm detection algorithm. Each data point in the small segment is then compared to M, anything above M is classified as a target, and anything below M is ignored as clutter. Then ESVI-CFAR detection was performed on the selected target to obtain the final detection result. Experimental results show that the proposed method is superior to the traditional VI-CFAR and has better target detection performance.
Published: 2024
Full Text: View/download PDF

14. Mesothelioma in the pleura, pericardium and peritoneum: Recommendations from the International Collaboration on Cancer Reporting (ICCR).

Author: Klebe, Sonja, Judge, Meagan, Brcic, Luka, Dacic, Sanja, Galateau‐Salle, Francoise, Nicholson, Andrew G, Roggli, Victor, Nowak, Anna K, and Cooper, Wendy A
Subjects: *MESOTHELIOMA, *PLEURA, *PERITONEUM, *PERICARDIUM, *PROGNOSIS, *CLINICAL trials
Abstract: Aims: Mesothelioma is a rare malignancy of the serosal membranes that is commonly related to exposure to asbestos. Despite extensive research and clinical trials, prognosis to date remains poor. Consistent, comprehensive and reproducible pathology reporting form the basis of all future interventions for an individual patient, but also ensures that meaningful data are collected to identify predictive and prognostic markers. Methods and results: This article details the International Collaboration on Cancer Reporting (ICCR) process and the development of the international consensus mesothelioma reporting data set. It describes the 'core' and 'non‐core' elements to be included in pathology reports for mesothelioma of all sites, inclusive of clinical, macroscopic, microscopic and ancillary testing considerations. An international expert panel consisting of pathologists and a medical oncologist produced a set of data items for biopsy and resection specimens based on a critical review and discussion of current evidence, and in light of the changes in the 2021 WHO Classification of Tumours. The commentary focuses particularly upon new entities such as mesothelioma in situ and provides background on relevant and essential ancillary testing as well as implementation of the new requirement for tumour grading. Conclusion: We recommend widespread and consistent implementation of this data set, which will facilitate accurate reporting and enhance the consistency of data collection, improve the comparison of epidemiological data, support retrospective research and ultimately help to improve clinical outcomes. To this end, all data sets are freely available worldwide on the ICCR website (www.iccr‐cancer.org/data‐sets). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Analysis of data with interval uncertainty: application of a combined sample consistency measure.

Author: Bazhenov, A. N., Zhilin, S. I., and Telnova, A. Yu.
Subjects: *INTERVAL analysis, *DATA analysis, *FUZZY sets, *PARAMETER estimation, *ELECTRONIC data processing
Abstract: The article considers one of the main problems of data analysis, i.e., estimation of parameters characterizing the data sample of a constant quantity. Data analysis is required in all areas of experimental physics to obtain reliable measurement results. To describe sample units under bilateral constraints on their error, the apparatus of interval analysis and statistics is used. In particular, data homogeneity in the sample is described using various consistency measures. A set of three consistency measures is presented that describe different relationships between sample units. On the basis of the considered set, a combined sample consistency measure is proposed that can simultaneously provide outer and inner estimates of the quantity under study. The specified estimates are important in solving a massive data processing problem (i.e., processing of a set of samples obtained under different measurement conditions). The article provides necessary information on interval analysis and various interval arithmetics. The relationships between the proposed combined measure and the results of computations with interval twins and fuzzy sets are considered. This combined measure can be used in solving the massive data processing problem typically addressed in theoretical and applied semiconductor physics. A practical example is presented of using the combined sample consistency measure in the testing of solar transducers against a reference transducer as part of the study of their spectral properties and quantum yield. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Emotional arousal pattern (EMAP): A new database for modeling momentary subjective and psychophysiological responding to affective stimuli.

Author: Eisenbarth, Hedwig, Oxner, Matt, Shehu, Harisu Abdullahi, Gastrell, Tim, Walsh, Amy, Browne, Will N., and Xue, Bing
Subjects: *DATABASES, *AFFECT (Psychology), *FILM excerpts, *AFFECTIVE computing, *MACHINE learning, *STIMULUS & response (Psychology), *EMOTIONAL experience
Abstract: This article describes a new database (named "EMAP") of 145 individuals' reactions to emotion‐provoking film clips. It includes electroencephalographic and peripheral physiological data as well as moment‐by‐moment ratings for emotional arousal in addition to overall and categorical ratings. The resulting variation in continuous ratings reflects inter‐individual variability in emotional responding. To make use of the moment‐by‐moment data for ratings as well as neurophysiological activity, we used a machine learning approach. The results show that algorithms that are based on temporal information improve predictions compared to algorithms without a temporal component, both within and across participant modeling. Although predicting moment‐by‐moment changes in emotional experiences by analyzing neurophysiological activity was more difficult than using aggregated experience ratings, selecting a subset of predictors improved the prediction. This also showed that not only single features, for example, skin conductance, but a range of neurophysiological parameters explain variation in subjective fluctuations of subjective experience. To increase our understanding of emotion, realistic, constantly changing reports of experience are needed to investigate relationships between subjective ratings and physiological reactions. A new data set to explore moment‐by‐moment personal experiences of emotion in conjunction with neurophysiological responses is presented. Initial modeling results show that although it is difficult to predict moment‐by‐moment ratings, predictions are improved by algorithms that take temporal information into account. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Optimization of classification algorithm based on gene expression programming.

Author: Yang, Lei, Li, Kangshun, Zhang, Wensheng, Zheng, Liefeng, Ke, Zhenxu, and Qi, Yu
Abstract: Data classification is an important task in the field of data mining, which can be used to mine the model of important data and forecast the future trend of those data. Although some breakthroughs have been made in data classification theoretically and technically, there are still some problems, such as lack accuracy of classification modeling algorithm, poor comprehensibility of classification rules and so on. Accuracy improvement and accurate achievement of classification has become hot research topics. Gene expression programming (GEP) has been considered a powerful evolutionary method for data classification. Aiming at the shortage of basic GEP classification algorithm, a novel classification algorithm based on GEP named O_GEPCA has been proposed in this paper. By using this method the initialization and mutation operator adjustment method, calibration set, evolution function and correction strategy will be improved, and the basic GEP classification algorithm will be optimized. The proposed O_GEPCA method shows significantly improvement after comparative study between our proposed O_GEPCA methods and the primitive GEP. The efficiency and capability of our proposed O_GEPCA for data classification will be tested in four well-studied benchmark test cases including card, cancer, heart, glass classification problem demonstrate. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. 2021 Amazon Last Mile Routing Research Challenge: Data Set.

Author: Merchán, Daniel, Arora, Jatin, Pachon, Julian, Konduri, Karthik, Winkenbach, Matthias, Parks, Steven, and Noszek, Joseph
Subjects: *LOCATION data, *PERSONALLY identifiable information, *MACHINE learning, *METROPOLITAN areas, *DELIVERY of goods, *MULTICASTING (Computer networks)
Abstract: The 2021 Amazon Last Mile Routing Research Challenge, hosted by Amazon.com's Last Mile Research team, and scientifically supported by the Massachusetts Institute of Technology's Center for Transportation and Logistics, prompted participants to leverage real operational data to find new and better ways to solve a real-world routing problem. In this article, we describe the data set released for the research challenge, which includes route-, stop-, and package-level features for 9,184 historical routes performed by Amazon drivers in 2018 in five metropolitan areas in the United States. This real-world data set excludes any personally identifiable information: all route and package identifiers have been randomly regenerated and related location data have been obfuscated to ensure anonymity. Although multiple synthetic benchmark data sets are available in the literature, the data set of the 2021 Amazon Last Mile Routing Research Challenge is the first large and publicly available data set to include instances based on real-world operational routing data. History: This paper has been accepted for the Transportation Science Special Issue on Machine Learning Methods and Applications in Large-Scale Route Planning Problems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. FAKE NEWS DETECTION SYSTEM USING LOGISTIC REGRESSION, DECISION TREE AND RANDOM FOREST.

Author: Ayankemi, Oni Oluwabunmi, Ruth, Idowu Oluwaferanmi, and Abiye, Bassir Abdullai
Subjects: COMPUTER science, INFORMATION technology, COMPUTER programming, COMPUTER network management, COMPUTER network monitoring
Abstract: This article discusses a study on designing a fake news detection system using machine learning models. The study found that the Decision Tree model was the most efficient in accurately detecting fake news, with an accuracy of 99.64%. The article emphasizes the importance of detecting fake news and provides a detailed methodology for the study. The research aims to raise awareness about fake news and contribute to its eradication in society. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

20. A Comprehensive Biogeochemical Assessment of Climate‐Threatened Glacial River Headwaters on the Eastern Slopes of the Canadian Rocky Mountains.

Author: Serbu, J. A., St. Louis, V. L., Emmerton, C. A., Tank, S. E., Criscitiello, A. S., Silins, U., Bhatia, M. P., Cavaco, M. A., Christenson, C., Cooke, C. A., Drapeau, H. F., Enns, S. J. A., Flett, J. E., Holland, K. M., Lavallee‐Whiffen, J., Ma, M., Muir, C. E., Poesch, M., and Shin, J.
Subjects: MELTWATER, WATER quality, LIFE zones, ALPINE glaciers, CHEMICAL yield, PRINCIPAL components analysis, GLACIERS, RAINFALL
Abstract: Climate change is driving the loss of alpine glaciers globally, yet investigations about the water quality of rivers stemming from them are few. Here we provide an overview assessment of a biogeochemical data set containing 200+ parameters that we collected between 2019 and 2021 from the headwaters of three such rivers (Sunwapta‐Athabasca, North Saskatchewan, and Bow) which originate from the glacierized eastern slopes of the Canadian Rocky Mountains. We used regional hydrometric data sets to accurately model discharge at our 14 sampling sites. We created a Local Meteoric Water Line (LMWL) using riverine water isotope signatures and compared it to collected regional rain, snow, and glacial ice signatures. Principal component analyses of river physicochemical measures revealed distance from glacier explained more data variability than other spatiotemporal factors (i.e., season, year, or river). Discharge, chemical concentrations, and watershed areas were then used to model site‐specific open water season yields for 25 parameters. Chemical yields followed what would generally be expected along river continuums from glacierized to montane altitudinal life zones, with landscape characteristics driving chemical sources and sinks. For instance, particulate chemical yields were generally highest near source glaciers with proglacial lakes acting as settling ponds, whereas most dissolved yields varied by parameter and site. As these headwaters continue to evolve with glacier mass loss, the data set and analyses presented here can be used as a contemporary baseline to mark future change against. Further, following this initial assessment of our data set, we encourage others to mine it for additional biogeochemical studies. Plain Language Summary: Alpine glaciers are vulnerable to climate change, with their numbers and sizes expected to decline dramatically within the current century. It is known that a decrease in glacier mass will have direct consequences on glacial meltwater quantity, but how this may impact the quality of receiving freshwaters is less understood. Our goal was to record a water quality data set of 200+ physical and chemical parameters for the headwaters of three major rivers draining glaciers on the eastern slopes of the Canadian Rocky Mountains. We first used statistics to model how fast the rivers were flowing at our 14 sampling locations over our 2‐year sampling period. We then looked at how and why water isotopes collected from river, rain, snow, and glacier ice samples differed. We discovered that distance downstream from the glacier explained more statistical variation in our physical and chemical measures than season, year, or river. Finally, at each of our sampling sites, we calculated chemical yields, or the amount of chemical constituent being exported downriver per unit watershed area, and found landscape features like lakes, outwash plains, and forests influence them. We ultimately hope that others use our comprehensive data set to investigate this changing region further. Key Points: We assessed a 200+ parameter data set collected from glacial river headwaters on the eastern slopes of the Canadian Rocky MountainsOur findings can act as a contemporary reference for future investigation of glacial headwaters in rapidly evolving alpine regionsThe possibilities for data exploration of our 260,000+ measure data set are numerous and we encourage others to mine it for new studies [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. A Comprehensive Northern Hemisphere Particle Microphysics Data Set From the Precipitation Imaging Package

Author: Fraser King, Claire Pettersen, Larry F. Bliven, Diego Cerrai, Alexey Chibisov, Steven J. Cooper, Tristan L’Ecuyer, Mark S. Kulie, Matti Leskinen, Marian Mateling, Lynn McMurdie, Dimitri Moisseev, Stephen W. Nesbitt, Walter A. Petersen, Peter Rodriguez, Carl Schirtzinger, Martin Stuefer, Annakaisa vonLerber, Matthew T. Wingo, David B. Wolff, Telyana Wong, and Norman Wood
Subjects: precipitation, microphysics, disdrometer, particle size distribution, data set, precipitation imaging package, Astronomy, QB1-991, Geology, QE1-996.5
Abstract: Abstract Microphysical observations of precipitating particles are critical data sources for numerical weather prediction models and remote sensing retrieval algorithms. However, obtaining coherent data sets of particle microphysics is challenging as they are often unindexed, distributed across disparate institutions, and have not undergone a uniform quality control process. This work introduces a unified, comprehensive Northern Hemisphere particle microphysical data set from the National Aeronautics and Space Administration precipitation imaging package (PIP), accessible in a standardized data format and stored in a centralized, public repository. Data is collected from 10 measurement sites spanning 34° latitude (37°N–71°N) over 10 years (2014–2023), which comprise a set of 1,070,000 precipitating minutes. The provided data set includes measurements of a suite of microphysical attributes for both rain and snow, including distributions of particle size, vertical velocity, and effective density, along with higher‐order products including an approximation of volume‐weighted equivalent particle densities, liquid equivalent snowfall, and rainfall rate estimates. The data underwent a rigorous standardization and quality assurance process to filter out erroneous observations to produce a self‐describing, scalable, and achievable data set. Case study analyses demonstrate the capabilities of the data set in identifying physical processes like precipitation phase‐changes at high temporal resolution. Bulk precipitation characteristics from a multi‐site intercomparison also highlight distinct microphysical properties unique to each location. This curated PIP data set is a robust database of high‐quality particle microphysical observations for constraining future precipitation retrieval algorithms, and offers new insights toward better understanding regional and seasonal differences in bulk precipitation characteristics.
Published: 2024
Full Text: View/download PDF

22. Fixed-text keystroke dynamics authentication data set—collection and analysis

Author: Risto, Halvor Nybø and Graven, Olaf Hallan
Published: 2024
Full Text: View/download PDF

23. Improving the Quality of the Identification of the Information Security State Based on Sample Segmentation.

Author: Sukhoparov, M. E. and Lebedev, I. S.
Abstract: Increasing the quality indicators for identifying the information security (IS) state of individual segments of cyber-physical systems is related to processing large information arrays. A method for improving quality indicators when solving problems of identifying the IS state is proposed. Its implementation is based on the formation of individual sample segments. Analysis of the properties of these segments makes it possible to select and assign algorithms that have the best quality indicators in the current segment. Segmentation of a data sample is considered. Using real dataset data as an example, experimental values of the quality indicator for the proposed method are given for various classifiers on individual segments and the entire sample. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. TranSpec3D: A Novel Measurement Principle to Generate A Non-Synthetic Data Set of Transparent and Specular Surfaces without Object Preparation.

Author: Junger, Christina, Speck, Henri, Landmann, Martin, Srokos, Kevin, and Notni, Gunther
Subjects: *POWDER coating, *MONOCULARS, *ACQUISITION of data, *BINOCULAR vision, *MEASUREMENT
Abstract: Estimating depth from images is a common technique in 3D perception. However, dealing with non-Lambertian materials, e.g., transparent or specular, is still nowadays an open challenge. However, to overcome this challenge with deep stereo matching networks or monocular depth estimation, data sets with non-Lambertian objects are mandatory. Currently, only few real-world data sets are available. This is due to the high effort and time-consuming process of generating these data sets with ground truth. Currently, transparent objects must be prepared, e.g., painted or powdered, or an opaque twin of the non-Lambertian object is needed. This makes data acquisition very time consuming and elaborate. We present a new measurement principle for how to generate a real data set of transparent and specular surfaces without object preparation techniques, which greatly reduces the effort and time required for data collection. For this purpose, we use a thermal 3D sensor as a reference system, which allows the 3D detection of transparent and reflective surfaces without object preparation. In addition, we publish the first-ever real stereo data set, called TranSpec3D, where ground truth disparities without object preparation were generated using this measurement principle. The data set contains 110 objects and consists of 148 scenes, each taken in different lighting environments, which increases the size of the data set and creates different reflections on the surface. We also show the advantages and disadvantages of our measurement principle and data set compared to the Booster data set (generated with object preparation), as well as the current limitations of our novel method. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. 基于 YOLO 的无约束场景中文车牌检测与识别.

Author: 陈子昂, 刘娜, 袁野, 李清都, and 万里红
Subjects: *AUTOMOBILE license plates, *PATTERN recognition systems, *RECURRENT neural networks, *OBJECT recognition (Computer vision), *POINT set theory, *TEXT recognition, *OPTICAL character recognition
Abstract: In view of the problems of traditional Chinese license plate recognition methods, such as the requirement of scenes, poor real-time performance, and inability to deploy on edge devices, this study proposes a Chinese license plate detection and recognition method based on YOLO(You Only Look Once) in unconstrained scenes. This method is divided into two modules: license plate detection and license plate character recognition. In the license plate detection part, the improved YOLOv5 model is used to predict four groups of key points for license plate correction based on the prediction of target candidate regions, and the pre-training model trained on the COCO data set is used for training, which reduces the error detection problem caused by the complex environment and has high real-time performance. In the license plate character recognition part, the CRNN(Convolutional Recurrent Neural Network) model is improved, which greatly reduces the parameters and computation of the algorithm, so that it can be successfully deployed in various edge devices. Experimental results show that the proposed method can efficiently detect and recognize license plates in complex environments. The map value of the proposed license plate detection model is 3.0% higher than that of Retina-face in the license plate detection data set. Compared with LPR-Net, the accuracy of license plate character recognition model in license plate recognition data set is improved by 4.2%. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. The innovation path of virtual practice teaching in college Civics class based on the Ridge regression model

Author: Han Fengzhi and Cheng Chen
Subjects: ridge regression model, least squares estimation, random variables, data set, variance expansion factor, 62j05, Mathematics, QA1-939
Abstract: The application of virtual practice teaching to the teaching of college Civics and Political Science class helps to develop a new way of practical teaching in ideological and political theory classes. This paper constructs a virtual practice teaching platform through the Ridge regression model, first calculates the least squares estimation of the virtual practice teaching model and sets up the matrix, standardizes the matrix for the original data set, and gets the estimation cluster of Ridge regression. Then the random variables under virtual practice teaching are given by variance expansion factors, the functions of the practice teaching matrix are defined, and the teaching time modeling analysis is performed with Ridge regression to derive the operating parameters of the matrix. Finally, the innovation path of virtual practice teaching is derived based on the constructed platform parameters. The simulation results show that the head-up rate of students under the virtual practice teaching class for freshmen students reaches 97.54% and 95.14% for sophomores, which is 25.13% and 13.84% higher than that of traditional classrooms. Thus, it can be seen that the platform constructed in this paper is conducive to applying the virtual practice to the teaching mode of college Civics class, promoting the innovative path of college Civics class, and improving the communication and communication ability of students.
Published: 2024
Full Text: View/download PDF

27. The construction of innovation system of college physical education reform based on Bp neural network

Author: Zhu Liang
Subjects: ga-bp neural network, mean error, samples features, educational reform, data set, 97q70, Mathematics, QA1-939
Abstract: The substance here is to workshop the current problems of physical education in higher learning institutions and construct a reform and innovation machinery of sports learning style in institutions of higher learning. This paper researches the teaching system of college sports learning style based on BP nervous enlist the services, proposes a majorization network foundation hereditary compute mode for the shortcoming that BP nervous enlist the services to be prone to land oneself in the optimal regional solution, and takes 10 influence element the results of academy sports learning style as the sample features of the data set, uses the optimized GA-BP nervous enlist the services for discipline and compares the test results with the practice price and the dispatch of other algorithms. The average error of the detection of GA-BP nervous enlist the services was 1.23, and the average error rate was 1.85%, which performed better compared with other neural networks. In terms of the poundage of the influencing factors in the export multiple, the diversity of curriculum, the advancement of teaching philosophy and the science of sports skill instruction have significant effects on the amount of sports learning style in institutions of higher learning, with the weights reaching 0.1212, 0.1387 and 0.1453, respectively. The reform and innovation of sports learning style in institutions of higher learning should develop diversified physical education courses according to students’ constitution bases and sports levels and impart scientific methods of the constitution and sports values of enjoying sports.
Published: 2024
Full Text: View/download PDF

28. Exploring the creation of films based on digital media technology

Author: Peng Xiao
Subjects: digital media, data mining algorithm, data set, rough set theory, information entropy, 28522, Mathematics, QA1-939
Abstract: Digital media is an inevitable product of the development of information technology, and the emergence of this art has brought new opportunities and space for the innovative development of the film industry. This paper constructs a film creation model based on digital media technology and analyzes its data set using data mining algorithms. Firstly, the digital media is set as a matrix to calculate the similarity, and a subset of attributes is calculated using cosine similarity and Pearson coefficient to extract the indistinguishable relationship of rough set theory. Then the information entropy calculation scheme is reset, and the specific process of the data mining method is given. Finally, the mining is carried out for six data subsets, from which the importance of the integration of digital media and film creation is known. The results show that the sound and picture fluency tests of the frames in the film creation using digital media technology reached 8 frequencies for all three kinds of audio during the sound test. The picture smoothness of each frame of the film creation work met the design standard of >16.67ms/frame. Therefore, the combination of digital media art and film creation in modern film creation can not only promote the development of the film industry but also enhance the aesthetic ability of the audience to a certain extent.
Published: 2024
Full Text: View/download PDF

29. A Deep Learning Approach to Segment High-Content Images of the E. coli Bacteria

Author: Duong, Dat Q., Tran, Tuan-Anh, Kieu, Phuong Nhi Nguyen, Nguyen, Tien K., Le, Bao, Baker, Stephen, Nguyen, Binh T., Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Blanc-Talon, Jaques, editor, Delmas, Patrice, editor, Philips, Wilfried, editor, and Scheunders, Paul, editor
Published: 2023
Full Text: View/download PDF

30. SML: Semantic Machine Learning Model Ontology

Author: Kallab, Lara, Mansour, Elio, Chbeir, Richard, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Zhang, Feng, editor, Wang, Hua, editor, Barhamgi, Mahmoud, editor, Chen, Lu, editor, and Zhou, Rui, editor
Published: 2023
Full Text: View/download PDF

31. Derivation of the Data Attributes for Identification of Incorrect Events in Supply Chain Event Management

Author: Janßen, Jokim, Schröer, Tobias, Schuh, Günther, Boos, Wolfgang, Rannenberg, Kai, Editor-in-Chief, Soares Barbosa, Luís, Editorial Board Member, Goedicke, Michael, Editorial Board Member, Tatnall, Arthur, Editorial Board Member, Neuhold, Erich J., Editorial Board Member, Stiller, Burkhard, Editorial Board Member, Stettner, Lukasz, Editorial Board Member, Pries-Heje, Jan, Editorial Board Member, Kreps, David, Editorial Board Member, Rettberg, Achim, Editorial Board Member, Furnell, Steven, Editorial Board Member, Mercier-Laurent, Eunika, Editorial Board Member, Winckler, Marco, Editorial Board Member, Malaka, Rainer, Editorial Board Member, Alfnes, Erlend, editor, Romsdal, Anita, editor, Strandhagen, Jan Ola, editor, von Cieminski, Gregor, editor, and Romero, David, editor
Published: 2023
Full Text: View/download PDF

32. Using Convolutional Neural Network to Enhance Coronary Heart Disease Predictions in South African Men Living in the Western Cape Region

Author: Tabane, Elias, Xhafa, Fatos, Series Editor, Woungang, Isaac, editor, and Dhurandher, Sanjay Kumar, editor
Published: 2023
Full Text: View/download PDF

33. Data Consumption Behaviour and Packet Delivery Delay Analysis in OTT Services Using Machine Learning Techniques

Author: Thakur, Rohit Kumar, Kumari, Raj, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Das, Swagatam, editor, Saha, Snehanshu, editor, Coello Coello, Carlos A., editor, and Bansal, Jagdish Chand, editor
Published: 2023
Full Text: View/download PDF

34. Research on Cloud Health Privacy Information Protection Algorithm Based on Data Mining

Author: Wang, Wennan, Song, Shiyang, Zhu, Linkai, Su, Junyu, Guo, Te, Tang, Jinhai, Akan, Ozgur, Editorial Board Member, Bellavista, Paolo, Editorial Board Member, Cao, Jiannong, Editorial Board Member, Coulson, Geoffrey, Editorial Board Member, Dressler, Falko, Editorial Board Member, Ferrari, Domenico, Editorial Board Member, Gerla, Mario, Editorial Board Member, Kobayashi, Hisashi, Editorial Board Member, Palazzo, Sergio, Editorial Board Member, Sahni, Sartaj, Editorial Board Member, Shen, Xuemin, Editorial Board Member, Stan, Mircea, Editorial Board Member, Jia, Xiaohua, Editorial Board Member, Zomaya, Albert Y., Editorial Board Member, and Wang, Shuihua, editor
Published: 2023
Full Text: View/download PDF

35. A Review of Datasets for NLIDBs

Author: Das, Alaka, Balabantaray, Rakesh, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Kaiser, M. Shamim, editor, Xie, Juanying, editor, and Rathore, Vijay Singh, editor
Published: 2023
Full Text: View/download PDF

36. Novel Image and Its Compressed Image Based on VVC Standard, Pair Data Set for Deep Learning Image and Video Compression Applications

Author: lal, Rohan, Sharma, Prashant, Patel, Devendra Kumar, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Gupta, Deep, editor, Bhurchandi, Kishor, editor, Murala, Subrahmanyam, editor, Raman, Balasubramanian, editor, and Kumar, Sanjeev, editor
Published: 2023
Full Text: View/download PDF

37. SAP Signavio Academic Models: A Large Process Model Dataset

Author: Sola, Diana, Warmuth, Christian, Schäfer, Bernhard, Badakhshan, Peyman, Rehse, Jana-Rebecca, Kampik, Timotheus, van der Aalst, Wil, Series Editor, Ram, Sudha, Series Editor, Rosemann, Michael, Series Editor, Szyperski, Clemens, Series Editor, Guizzardi, Giancarlo, Series Editor, Montali, Marco, editor, Senderovich, Arik, editor, and Weidlich, Matthias, editor
Published: 2023
Full Text: View/download PDF

38. The Optimization Study of Apparent Damage Recognition Algorithm of Bridge Underwater Structure

Author: Wang, Yeteng, Ding, Haoyang, Song, Changlin, Xiao, Yao, Yuan, Ruiyang, Liang, Zhishui, di Prisco, Marco, Series Editor, Chen, Sheng-Hong, Series Editor, Vayas, Ioannis, Series Editor, Kumar Shukla, Sanjay, Series Editor, Sharma, Anuj, Series Editor, Kumar, Nagesh, Series Editor, Wang, Chien Ming, Series Editor, Wu, Zhishen, editor, Nagayama, Tomonori, editor, Dang, Ji, editor, and Astroza, Rodrigo, editor
Published: 2023
Full Text: View/download PDF

39. The Key Issues and Evaluation Methods for Constructing Agricultural Pest and Disease Image Datasets: A Review

Author: GUAN Bolun, ZHANG Liping, ZHU Jingbo, LI Runmei, KONG Juanjuan, WANG Yan, and DONG Wei
Subjects: agricultural pests, data set, deep learning, monitoring and warning, data acquisition, data annotations, data set evaluation, Agriculture (General), S1-972, Technology (General), T1-995
Abstract: SignificanceThe scientific dataset of agricultural pests and diseases is the foundation for monitoring and warning of agricultural pests and diseases. It is of great significance for the development of agricultural pest control, and is an important component of developing smart agriculture. The quality of the dataset affecting the effectiveness of image recognition algorithms, with the discovery of the importance of deep learning technology in intelligent monitoring of agricultural pests and diseases. The construction of high-quality agricultural pest and disease datasets is gradually attracting attention from scholars in this field. In the task of image recognition, on one hand, the recognition effect depends on the improvement strategy of the algorithm, and on the other hand, it depends on the quality of the dataset. The same recognition algorithm learns different features in different quality datasets, so its recognition performance also varies. In order to propose a dataset evaluation index to measure the quality of agricultural pest and disease datasets, this article analyzes the existing datasets and takes the challenges faced in constructing agricultural pest and disease image datasets as the starting point to review the construction of agricultural pest and disease datasets.ProgressFirstly, disease and pest datasets are divided into two categories: private datasets and public datasets. Private datasets have the characteristics of high annotation quality, high image quality, and a large number of inter class samples that are not publicly available. Public datasets have the characteristics of multiple types, low image quality, and poor annotation quality. Secondly, the problems faced in the construction process of datasets are summarized, including imbalanced categories at the dataset level, difficulty in feature extraction at the dataset sample level, and difficulty in measuring the dataset size at the usage level. These include imbalanced inter class and intra class samples, selection bias, multi-scale targets, dense targets, uneven data distribution, uneven image quality, insufficient dataset size, and dataset availability. The main reasons for the problem are analyzed by two key aspects of image acquisition and annotation methods in dataset construction, and the improvement strategies and suggestions for the algorithm to address the above issues are summarized. The collection devices of the dataset can be divided into handheld devices, drone platforms, and fixed collection devices. The collection method of handheld devices is flexible and convenient, but it is inefficient and requires high photography skills. The drone platform acquisition method is suitable for data collection in contiguous areas, but the detailed features captured are not clear enough. The fixed device acquisition method has higher efficiency, but the shooting scene is often relatively fixed. The annotation of image data is divided into rectangular annotation and polygonal annotation. In image recognition and detection, rectangular annotation is generally used more frequently. It is difficult to label images that are difficult to separate the target and background. Improper annotation can lead to the introduction of more noise or incomplete algorithm feature extraction. In response to the problems in the above three aspects, the evaluation methods are summarized for data distribution consistency, dataset size, and image annotation quality at the end of the article.Conclusions and ProspectsThe future research and development suggestions for constructing high-quality agricultural pest and disease image datasets based are proposed on the actual needs of agricultural pest and disease image recognition:(1) Construct agricultural pest and disease datasets combined with practical usage scenarios. In order to enable the algorithm to extract richer target features, image data can be collected from multiple perspectives and environments to construct a dataset. According to actual needs, data categories can be scientifically and reasonably divided from the perspective of algorithm feature extraction, avoiding unreasonable inter class and intra class distances, and thus constructing a dataset that meets task requirements for classification and balanced feature distribution. (2) Balancing the relationship between datasets and algorithms. When improving algorithms, consider the more sufficient distribution of categories and features in the dataset, as well as the size of the dataset that matches the model, to improve algorithm accuracy, robustness, and practicality. It ensures that comparative experiments are conducted on algorithm improvement under the same evaluation standard dataset, and improved the pest and disease image recognition algorithm. Research the correlation between the scale of agricultural pest and disease image data and algorithm performance, study the relationship between data annotation methods and algorithms that are difficult to annotate pest and disease images, integrate recognition algorithms for fuzzy, dense, occluded targets, and propose evaluation indicators for agricultural pest and disease datasets. (3) Enhancing the use value of datasets. Datasets can not only be used for research on image recognition, but also for research on other business needs. The identification, collection, and annotation of target images is a challenging task in the construction process of pest and disease datasets. In the process of collecting image data, in addition to collecting images, attention can be paid to the collection of surrounding environmental information and host information. This method is used to construct a multimodal agricultural pest and disease dataset, fully leveraging the value of the dataset. In order to focus researchers on business innovation research, it is necessary to innovate the organizational form of data collection, develop a big data platform for agricultural diseases and pests, explore the correlation between multimodal data, improve the accessibility and convenience of data, and provide efficient services for application implementation and business innovation.
Published: 2023
Full Text: View/download PDF

40. Fühler‐im‐Netz: A smart grid and power line communication data set

Author: Christoph Balada, Sheraz Ahmed, Andreas Dengel, Max Bondorf, Nikolai Hopfer, and Markus Zdrallek
Subjects: data set, deep learning, IoT, machine learning, power line communication, smart grid, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Abstract The increasing complexity of low‐voltage networks poses a growing challenge for the reliable and fail‐safe operation of electricity grids. The reasons for this include an increasingly decentralised energy generation (photovoltaic systems, wind power etc.) and the emergence of new types of consumers (e‐mobility, domestic electricity storage etc.). At the same time, the low‐voltage grid is largely unmonitored and local power failures are sometimes hard to detect. To overcome this, power line communication (PLC) has emerged as a potential solution for reliable monitoring of the low‐voltage grid. In addition to establishing a communication infrastructure, PLC also offers the possibility of evaluating the cables themselves, as well as the connection quality between individual cable distributors based on their signal‐to‐noise ratio (SNR). The roll‐out of a large‐scale PLC infrastructure therefore not only ensures communication, but also introduces a tool for monitoring the entire network. To evaluate the potential of this data, we installed 38 PLC modems in three different areas of a German city with a population of about 150,000 as part of the Fühler‐im‐Netz (FiN) project. Over a period of 22 months, an SNR spectrum of each connection between adjacent PLC modems was generated every quarter of an hour. The availability of this real‐world PLC data opens up new possibilities to react to the increasingly complex challenges in future smart grids. This paper provides a detailed analysis of the data generation and describes how the data was collected during normal operation of the electricity grid. In addition, we present common anomalies, effects, and trends that could be observed in the PLC data at daily, weekly, or seasonal levels. Finally, we discuss potential use cases and the remote inspection of a cable section is highlighted as an example.
Published: 2023
Full Text: View/download PDF

41. Available Wireless Sensor Network and Internet of Things testbed facilities: dataset [version 2; peer review: 2 approved, 1 approved with reservations]

Author: Janis Judvaitis, Amr Elkenawy, Valters Abolins, and Kaspars Ozols
Subjects: Testbed facility, Data set, Wireless Sensor Networks, WSN, Internet of Things, IoT, eng, Science, Social Sciences
Abstract: The availability of data is an important aspect of any research as it determines the likelihood of the study’s commencement, completion, and success. The Internet of Things and Wireless Sensor Networks technologies have been attracting a huge amount of researchers for more than two decades, without having a consolidated or unified source that identifies and describes available Internet of Things and Wireless Sensor Network testbed facilities. In this paper, a dataset including 41 distinct testbed facilities is described. These testbed facilities are classified according to their key features such as Device Under Test (DUT) type, mobility, access level, facility count, connection/interaction interfaces, and other criteria. The systematic review process resulting in the gathered data set consisted of three filtering phases applied to relevant articles published between the years 2011 and 2021 as obtained from the Web of Science and SCOPUS databases.
Published: 2023
Full Text: View/download PDF

42. 农业病虫害图像数据集构建关键问题及评价方法综述.

Author: 管博伦, 张立平, 朱静波, 李闰枚, 孔娟娟, 汪焱, and 董伟
Abstract: [Significance] The scientific dataset of agricultural pests and diseases is the foundation for monitoring and warning of agricultural pests and diseases. It is of great significance for the development of agricultural pest control, and is an important component of developing smart agriculture. The quality of the dataset affecting the effectiveness of image recognition algorithms, with the discovery of the importance of deep learning technology in intelligent monitoring of agricultural pests and diseases. The construction of high-quality agricultural pest and disease datasets is gradually attracting attention from scholars in this field. In the task of image recognition, on one hand, the recognition effect depends on the improvement strategy of the algorithm, and on the other hand, it depends on the quality of the dataset. The same recognition algorithm learns different features in different quality datasets, so its recognition performance also varies. In order to propose a dataset evaluation index to measure the quality of agricultural pest and disease datasets, this article analyzes the existing datasets and takes the challenges faced in constructing agricultural pest and disease image datasets as the starting point to review the construction of agricultural pest and disease datasets. [Progress] Firstly, disease and pest datasets are divided into two categories: private datasets and public datasets. Private datasets have the characteristics of high annotation quality, high image quality, and a large number of inter class samples that are not publicly available. Public datasets have the characteristics of multiple types, low image quality, and poor annotation quality. Secondly, the problems faced in the construction process of datasets are summarized, including imbalanced categories at the dataset level, difficulty in feature extraction at the dataset sample level, and difficulty in measuring the dataset size at the usage level. These include imbalanced inter class and intra class samples, selection bias, multi-scale targets, dense targets, uneven data distribution, uneven image quality, insufficient dataset size, and dataset availability. The main reasons for the problem are analyzed by two key aspects of image acquisition and annotation methods in dataset construction, and the improvement strategies and suggestions for the algorithm to address the above is‐ sues are summarized. The collection devices of the dataset can be divided into handheld devices, drone platforms, and fixed collection devices. The collection method of handheld devices is flexible and convenient, but it is inefficient and requires high photography skills. The drone platform acquisition method is suitable for data collection in contiguous areas, but the detailed features captured are not clear enough. The fixed device acquisition method has higher efficiency, but the shooting scene is often relatively fixed. The annotation of image data is divided into rectangular annotation and polygonal annotation. In image recognition and detection, rectangular annotation is generally used more frequently. It is difficult to label images that are difficult to separate the target and background. Improper annotation can lead to the introduction of more noise or incomplete algorithm feature extraction. In response to the problems in the above three aspects, the evaluation methods are summarized for data distribution consistency, dataset size, and image annotation quality at the end of the article. [Conclusions and Prospects] The future research and development suggestions for constructing high-quality agricultural pest and disease image datasets based are proposed on the actual needs of agricultural pest and disease image recognition:(1) Construct agricultural pest and disease datasets combined with practical usage scenarios. In order to enable the algorithm to extract richer target features, image data can be collected from multiple perspectives and environments to construct a dataset. According to actual needs, data categories can be scientifically and reasonably divided from the perspective of algorithm feature extraction, avoiding unreasonable inter class and intra class distances, and thus constructing a dataset that meets task requirements for classification and balanced feature distribution. (2) Balancing the relationship between datasets and algorithms. When improving algorithms, consider the more sufficient distribution of categories and features in the dataset, as well as the size of the dataset that matches the model, to improve algorithm ac‐ curacy, robustness, and practicality. It ensures that comparative experiments are conducted on algorithm improvement under the same evaluation standard dataset, and improved the pest and disease image recognition algorithm. Research the correlation between the scale of agricultural pest and disease image data and algorithm performance, study the relationship between data annotation methods and algorithms that are difficult to annotate pest and disease images, integrate recognition algorithms for fuzzy, dense, occluded tar‐ gets, and propose evaluation indicators for agricultural pest and disease datasets. (3) Enhancing the use value of datasets. Datasets can not only be used for research on image recognition, but also for research on other business needs. The identification, collection, and annotation of target images is a challenging task in the construction process of pest and disease datasets. In the process of collecting image data, in addition to collecting images, attention can be paid to the collection of surrounding environmental information and host information. This method is used to construct a multimodal agricultural pest and disease dataset, fully leveraging the value of the dataset. In order to focus researchers on business innovation research, it is necessary to innovate the organizational form of data collection, develop a big data platform for agricultural diseases and pests, explore the correlation between multimodal data, improve the accessibility and convenience of data, and provide efficient services for application implementation and business innovation. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

43. Profile of people attending emergency departments with thoughts of self‐harm and suicide: A descriptive study of a nurse‐led programme in Ireland.

Author: Kavalidou, Katerina, Zortea, Tiago C., Griffin, Eve, and Troya, M. Isabela
Subjects: *OCCUPATIONAL roles, *PSYCHIATRIC nursing, *HOSPITAL emergency services, *RESEARCH methodology, *AGE distribution, *SELF-injurious behavior, *MENTAL health, *NURSING services administration, *SUICIDAL ideation, *SEX distribution, *HOSPITAL nursing staff, *NURSES, *NURSING research, *DESCRIPTIVE statistics, *SOCIODEMOGRAPHIC factors, *DATA analysis software, *NURSING interventions, *LONGITUDINAL method
Abstract: Increasing research has been conducted on individuals presenting with self‐harm at emergency departments (EDs). However, less is known about individuals presenting to EDs with only self‐harm ideation. We aimed to describe the characteristics of those attending Irish hospitals with self‐harm ideation and investigate any differences in comparison to those presenting with suicide ideation. A prospective cohort study was conducted on Irish ED presentations due to suicidal and self‐harm ideation. Data were obtained from the service improvement data set of a dedicated nurse‐led National Clinical programme for the assessment of those presenting to Irish emergency departments due to Self‐harm and Suicide‐related Ideation (NCPSHI). A total of 10 602 anonymized presentation data were analysed from 1 January 2018 to 31 December 2019. Descriptive analysis was conducted to compare those with suicidal and self‐harm ideation on sociodemographic and care interventions. Being female and aged <29 were more prevalent among the self‐harm ideation presentations. Compared to the self‐harm ideation group, a higher proportion of those with suicidal thoughts received an emergency care plan (63% vs 58%, p = 0.002) and General Practitioner letter sent within 24 h of presentation (75% vs 69%, p = 0.045). Little variation was found between hospitals for self‐harm ideation in both years. Our study suggests that females and younger populations are more prevalent in hospital presentations due self‐harm ideation, while presentations related to suicidal ideation are more often made by males and involving substance use. Attention should be given to the relationship between clinicians' attitudes towards care provision and the content of suicide‐related ideation ED disclosure. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

44. Select liquefaction case histories from the 2001 Nisqually, Washington, earthquake: A digital data set and assessment of model performance.

Author: Rasanen, Ryan A, Geyin, Mertcan, and Maurer, Brett W
Abstract: While soil liquefaction is common in earthquakes, the case-history data required to train and test state-of-practice prediction models remains comparatively scarce, owing to the breadth and expense of data that comprise a single case history. The 2001 Nisqually, Washington, earthquake, for example, occurred in a metropolitan region and induced damaging liquefaction in the urban cores of Seattle and Olympia, yet case-history data have not previously been published. Accordingly, this article compiles 24 cone-penetration-test (CPT) case histories from free-field locations. The many methods used to obtain and process the data are detailed herein, as is the structure of the digital data set. The case histories are then analyzed by 18 existing liquefaction response models to determine whether any is better, and to compare model performance in Nisqually against global observations. While differences are measured, both between models and against prior global case histories, these differences are often statistically insignificant considering finite-sample uncertainty. This alludes to the general inappropriateness of championing models based on individual earthquakes or otherwise small data sets, and to the ongoing needs for additional case-history data and more rigorous adherence to best practices in model training and testing. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

45. Integrating a Categorial Structure for Clinical Practice into EHRs.

Author: Hovenga, Evelyn J. S.
Abstract: A continuing global desire to be using clinical systems within a digital health ecosystem, able to facilitate data flows and information exchange as required to support person-centred, predictive, preventative, participatory and precision (5p) health and medical care can best be supported through the use of the standard categorial structure able to represent not only the clinical nursing practice domain but also other clinical disciplines by the generic labelling of some high-level categories. It is hypothesised that adoption of this generic clinical categorial structure within any electronic health/medical record within a well connected digital health ecosystem, supported by a cloud based openEHR platform, will enable the 5p support to be realized. This presentation provides the results of the latest update of this technical standard based on the 20+ year nursing practice categorial structure development process adopted to achieve this aim and a summary about linking this categorial structure to standard terminologies and to standard EHR/EMR system architectures. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

46. A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment

Author: Rosario Catelli, Luca Bevilacqua, Nicola Mariniello, Vladimiro Scotto Di Carlo, Massimo Magaldi, Hamido Fujita, Giuseppe De Pietro, and Massimo Esposito
Subjects: Italian cultural heritage, data set, fake reviews, sentiment analysis, deceptive, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by the emergence and spread of the phenomenon of fake reviews, against which the scientific world has developed language models capable of distinguishing them from the truthful. The application of such models, often based on deep neural networks with transformer-type architectures, is however limited by the availability of local language data sets for specific domains, useful for both training and verification. The purpose of this article is twofold. Firstly, a new data set was created in the Italian language, generally considered low-resource, relating to the domain of cultural heritage in Italy, by collecting reviews available online, reorganising them in the form of a data set usable by the language models. Secondly, a baseline of results for the detection of misleading reviews was constructed by exploiting two widely used language models, namely BERT and ELECTRA. The performance achieved is interesting, around 95% accuracy and F1 score, using data set splits between training and testing of 80/20 and 90/10. In addition, SHAP was used as a tool to support the explicability of AI models: in this way, it was possible to show the usefulness of sentiment analysis as a support for the recognition of deceptiveness.
Published: 2023
Full Text: View/download PDF

47. In-Depth Analysis of Mobile Apps Statistics: A Study and Development of a Mobile App.

Author: Abbasi, Maryam, Lopes, André, Rodrigues, Diogo, Saraiva, João, Martins, Pedro, Sá, Filipe, and Cardoso, Filipe
Subjects: MOBILE apps, SMARTPHONES, DATA analysis, BIG data, GRAPHIC arts
Abstract: With the popularity of smartphones and mobile devices, mobile application (a.k.a. “app”) markets have been growing exponentially in terms of the number of users and downloads. To increase user satisfaction, app developers invest a lot of work into gathering and utilizing user input. This paper presents an analysis of the mobile app market through the development of a mobile app. The app provides users with an overview of the most important statistics, including the best apps in each category, the categories with the most apps, and the overall statistics. The data was collected through the analysis of publicly available annual reports, and presented in a user-friendly format through the use of graphics. The focus of the study was to provide a useful tool for developers and individuals seeking specific statistics. Although the approach has proven to be effective, the authors suggest potential, such as incorporating live data from the Google Play Store and analysing the App Store. Additionally, comparing the data from multiple years can provide useful insights into the evolution of the market. [ABSTRACT FROM AUTHOR]
Published: 2023

48. Interplanetary shock data base

Author: Denny M. Oliveira
Subjects: interplanetary shocks, data set, shock impact angle, solar cycle, shock parameters, Astronomy, QB1-991, Geophysics. Cosmic physics, QC801-809
Published: 2023
Full Text: View/download PDF

49. A Datasheet for the INSIGHT University Hospitals Birmingham Retinal Vein Occlusion Data Set

Author: Edward J. Bilton, MBChB (MBBS), Emily J. Guggenheim, PhD, Balazs Baranyi, Charlotte Radovanovic, Rowena L. Williams, William Bradlow, FRCP, Alastair K. Denniston, PhD, and Susan P. Mollan, FRCOphth
Subjects: Biomedical data, Data set, Major Cardiovascular events, Myocardial infarction, Retinal vein occlusion, Ophthalmology, RE1-994
Abstract: Purpose: Retinal vein occlusion (RVO) is the second leading cause of visual loss due to retinal disease. Retinal vein occlusion increases the risk of cardiovascular mortality and the risk of stroke. This article describes the data contained within the INSIGHT eye health data set for RVO and cardiovascular disease. Design: Data set descriptor for routinely collected eye and systemic disease data. Participants: All people who had suffered an RVO aged ≥ 18 years old, attending the Ophthalmology Clinic at Queen Elizabeth Hospital, University Hospitals Birmingham (UHB) National Health Service (NHS) Trust were included. Methods: The INSIGHT Health Data Research Hub for Eye Health is an NHS-led ophthalmic bioresource. It provides researchers with safe access to anonymized routinely collected data from contributing NHS hospitals to advance research for patient benefit. This report describes the INSIGHT UHB RVO and major adverse cardiovascular events data set, a data set of ophthalmology and systemic data derived from the United Kingdom’s largest acute care trust. Main Outcome Measures: This data set consists of routinely collected data from the hospital’s electronic patient records. The data set primarily includes structured data (relating to their hospital eye care and any cardiovascular data held for the individual) and OCT ocular images. Further details regarding the available data points are available in the supplementary information. Results: At the time point of this analysis (September 30, 2022) the data set was composed of clinical data from 1521 patients, from Medisoft records inception. The data set includes 2196 occurrences of RVO affecting 2026 eyes, longitudinal eye follow-up clinical parameters, over 6217 eye-related procedures, and 982 encountered complications. The data set contains information on 2534 major adverse cardiovascular event occurrences, their subtype, number experienced per patient, and chronological relation to RVO event. Longitudinal follow-up data including laboratory results, regular medications, and all-cause mortality are also available within the data set. Conclusions: This data set descriptor article summarizes the data set contents, the process of its curation, and potential uses. The data set is available through the structured application process that ensures research studies are for patient benefit. Further information regarding the data repository and contact details can be found at https://www.insight.hdrhub.org/. Financial Disclosure(s): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Published: 2023
Full Text: View/download PDF

50. FinnSentiment: a Finnish social media corpus for sentiment polarity annotation.

Author: Lindén, Krister, Jauhiainen, Tommi, and Hardwick, Sam
Subjects: *SOCIAL media, *SENTIMENT analysis, *PUBLIC opinion, *ANNOTATIONS, *FAKE news
Abstract: Sentiment analysis and opinion mining are essential tasks with many prominent application areas, e.g., when researching popular opinions on products or brands. Sentiments expressed in social media can be used in brand name monitoring and indicating fake news. In our survey of previous work, we note that there is no large-scale social media data set with sentiment polarity annotations for Finnish. This publication aims to remedy this shortcoming by introducing a 27,000-sentence data set annotated independently with sentiment polarity by three native annotators. We had three annotators annotate the whole data set, which provides a unique opportunity for further studies of annotator behavior over the sample annotation order. We analyze their inter-annotator agreement and provide two baselines to validate the usefulness of the data set. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,395 results on '"Data set"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources