Descriptor: "informática" / Topic: computer.software_genre - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"informática"' showing total 1,328 results

Start Over Descriptor "informática" Topic computer.software_genre

1,328 results on '"informática"'

1. Voting Margin: A Scheme for Error-Tolerant k Nearest Neighbors Classifiers for Machine Learning

Author: Pedro Reviriego, Fabrizio Lombardi, José Alberto Hernández, Shanshan Liu, Comunidad de Madrid, and Ministerio de Economía y Competitividad (España)
Subjects: Scheme (programming language), 0209 industrial biotechnology, Majority rule, Computer science, media_common.quotation_subject, 02 engineering and technology, Machine learning, computer.software_genre, k-nearest neighbors algorithm, 020901 industrial engineering & automation, Margin (machine learning), Voting, Classifier (linguistics), 0202 electrical engineering, electronic engineering, information engineering, Computer Science (miscellaneous), Redundancy (engineering), K nearest neighbors, computer.programming_language, media_common, Informática, business.industry, Modular design, Computer Science Applications, Human-Computer Interaction, 020201 artificial intelligence & image processing, Artificial intelligence, Error tolerance, business, computer, Information Systems
Abstract: Machine learning (ML) techniques such as classifiers are used in many applications, some of which are related to safety or critical systems. In this case, correct processing is a strict requirement and thus ML algorithms (such as for classification) must be error tolerant. A naive approach to implement error tolerant classifiers is to resort to general protection techniques such as modular redundancy. However, modular redundancy incurs in large overheads in many metrics such as hardware utilization and power consumption that may not be acceptable in applications that run on embedded or battery powered systems. Another option is to exploit the algorithmic properties of the classifier to provide protection and error tolerance at a lower cost. This paper explores this approach for a widely used classifier, the k Nearest Neighbors ( k NNs), and proposes an efficient scheme to protect it against errors. The proposed technique is based on a time-based modular redundancy (TBMR) scheme. The proposed scheme exploits the intrinsic redundancy of k NNs to drastically reduce the number of re-computations needed to detect errors. This is achieved by noting that when voting among the k nearest neighbors has a large majority, an error in one of the voters cannot change the result, hence voting margin (VM). This observation has been refined and extended in the proposed VM scheme to also avoid re-computations in some cases in which the majority vote is tight. The VM scheme has been implemented and evaluated with publicly available data sets that cover a wide range of applications and settings. The results show that by exploiting the intrinsic redundancy of the classifier, the proposed scheme is able to reduce the cost compared to modular redundancy by more than 60 percent in all configurations evaluated. Pedro Reviriego and Josée Alberto Hernández would like to acknowledge the support of the TEXEO project TEC2016-80339-R funded by the Spanish Ministry of Economy and Competitivity and of the Madrid Community research project TAPIR-CM Grant no. P2018/TCS-4496.
Published: 2021

2. Multidimensional continuous time Bayesian network classifiers

Author: Pedro Larrañaga, Concha Bielza, and Carlos Villa-Blanco
Subjects: Informática, Matemáticas, Computer science, business.industry, Bayesian network, Machine learning, computer.software_genre, Theoretical Computer Science, Human-Computer Interaction, Artificial Intelligence, Graphical model, Artificial intelligence, business, computer, Software
Abstract: The multidimensional classification of multivariate timeseries deals with the assignment of multiple classes to time‐ordered data described by a set of feature variables. Although this challenging task has received almost no attention in the literature, it is present in a wide variety of domains, such as medicine, finance or industry. The complexity of this problem lies in two nontrivial tasks, the learning with multivariate time series in continuous time and the simultaneous classification of multiple class variables that may show dependencies between them. These can be addressed with different strategies, but most of them may involve a difficult preprocessing of the data, high space and classification complexity or ignoring useful interclass dependencies. Additionally, no attention has been given to the development of new multidimensional classifiers of time series based on probabilistic graphical models, even though transparent models can facilitate further understanding of the domain. In this paper, a novel probabilistic graphical model is proposed, which is able to classify a discrete multivariate temporal sequence into multiple class variables while modeling their dependencies. This model extends continuous time Bayesian networks to the multidimensional classification problem, which are able to explicitly represent the behavior of time series that evolve over continuous time. Different methods for the learning of the parameters and structure of the model are presented, and numerical experiments on synthetic and real‐world data show encouraging results in terms of performance and learning time with respect to independent classifiers, the current alternative approach under the continuous time Bayesian network paradigm.
Published: 2021

3. Modulector: a platform-as-a-service for access to microRNA databases

Author: Waldo Hasperué, Martín Carlos Abba, Agustín Daniel Marraco, Genaro Camele, Matias Butti, and Sebastián Menazzi
Subjects: Informática, Representational state transfer, Information retrieval, Source code, Application programming interface, Medicina, Computer science, computer.internet_protocol, media_common.quotation_subject, regulación de la expresión génica, microARN, computer.software_genre, JSON, base de datos biomédica, Consistency (database systems), plataforma web, bioinformática, Data exchange, Scalability, Web service, computer, media_common, computer.programming_language
Abstract: El notable crecimiento del volumen de datos genómicos y la enorme variedad de bases de datos que los almacenan, hacen indispensable disponer de mecanismos eficientes y eficaces de integración. En la actualidad se encuentran disponibles varias herramientas que ofrecen APIs (Interfaz de programación de aplicaciones) que permiten acceder a dicha información, que pueden ser utilizados tanto a través de lenguajes de programación como de navegadores a partir de servicios web. Sin embargo, en dominios específicos de la bioinformática como el caso de los micro ARN -pequeñas moléculas de ARN de gran interés por su capacidad de regular la actividad de otros genes- la mayoría de las soluciones recurren en problemas que dificultan su uso, incluyendo la falta de procesos que simplifiquen la actualización de sus bases de datos a medida que se publica nueva información, tiempos de respuesta inadecuados, dificultad para garantizar la escalabilidad, falta de consistencia en el formato de intercambio de datos, funcionalidad extremadamente limitada, errores por falta de mantenimiento, entre otros problemas frecuentes. En el presente trabajo se presenta Modulector, una solución que integra información de bases de datos genómicas, con bases de datos de micro ARNs (microARNs), para simplificar el acceso a las distintas dimensiones de información de los microARNs de interés (secuencias, fármacos y patologías asociadas, genes regulados, publicaciones científicas), poniendo especial énfasis en resolver las problemáticas técnicas comunes descritas anteriormente. Modulector brinda acceso a través de una API REST (API para la transferencia de estado representacional), garantiza tiempos de respuesta adecuados y escalabilidad, tiene capacidad de ordenamiento, filtro, búsqueda y paginado de resultados. La solución utiliza contenedores, simplificando el despliegue en cualquier servidor, lo que la hace adaptable para la mayoría de los casos de uso donde se quiere utilizar Modulector de manera privada. Toda la información retornada por Modulector se encuentra normalizada en formato JSON, haciéndola eficiente para su manipulación mediante cualquier herramienta de desarrollo. El código fuente de Modulector está disponible en https://github.com/omics-datascience/modulector., The remarkable growth in the volume of genomic data and the enormous variety of databases that store them make it essential to have efficient and effective integration mechanisms. Several tools are currently available that offer APIs (Application Programming Interfaces) that allow access to this information, which can be used both through programming languages and browsers from web services. However, in specific domains of bioinformatics such as the case of MicroRNAs -small RNA molecules of great interest due to their ability to regulate the activity of other genes- most of the solutions fall back on problems that make them difficult to use, including the lack of processes that simplify the updating of their databases as new information is published, inadequate response times, difficulty to guarantee scalability, lack of consistency in the data exchange format, extremely limited functionality, errors due to lack of maintenance, among other frequent problems. This paper presents Modulector, a solution that integrates information from genomic databases with microARN (miRNA) databases to simplify access to the different dimensions of microRNA information of interest (sequences, drugs and associated pathologies, regulated genes, scientific publications), with special emphasis on solving the common technical problems described above. Modulector provides access through a REST API (API Representational State Transfer), guarantees adequate response times and scalability, has sorting, filtering, searching, and pagination capabilities. The solution uses containers, simplifying deployment on any server, which makes it adaptable for most use cases where Modulector is to be used privately. All information returned by Modulector is normalized in JSON format, making it efficient for manipulation by any development tool. Modulector source code is available at https://github.com/omics-datascience/modulector., Secretaría de Ciencia y Técnica
Published: 2021

4. MoiRNAiFold: a novel tool for complex in silico RNA design

Author: Gerard Minuesa, Ivan Dotu, Juan Carlos Oliveros, Cristina Alsina, Juan Antonio Garcia-Martin, European Commission, and Generalitat de Catalunya
Subjects: Web server, Matemáticas, AcademicSubjects/SCI00010, In silico, Computational biology, Biology, computer.software_genre, Software, Narese/14, Gene expression, Computational methods, Genetics, Protein biosynthesis, Constraint programming, Computer Simulation, Nucleic acid structure, Biología y Biomedicina, Informática, Base Sequence, business.industry, Computational Biology, RNA, Narese/22, Narese/24, ComputingMethodologies_PATTERNRECOGNITION, Gene Expression Regulation, Protein Biosynthesis, Nucleic Acid Conformation, Synthetic Biology, RNA characterisation and manipulation, Corrigendum, Heuristics, business, computer, Computational Methods
Abstract: © The Author(s) 2021., Novel tools for in silico design of RNA constructs such as riboregulators are required in order to reduce time and cost to production for the development of diagnostic and therapeutic advances. Here, we present MoiRNAiFold, a versatile and user-friendly tool for de novo synthetic RNA design. MoiRNAiFold is based on Constraint Programming and it includes novel variable types, heuristics and restart strategies for Large Neighborhood Search. Moreover, this software can handle dozens of design constraints and quality measures and improves features for RNA regulation control of gene expression, such as Translation Efficiency calculation. We demonstrate that MoiRNAiFold outperforms any previous software in benchmarking structural RNA puzzles from EteRNA. Importantly, with regard to biologically relevant RNA designs, we focus on RNA riboregulators, demonstrating that the designed RNA sequences are functional both in vitro and in vivo. Overall, we have generated a powerful tool for de novo complex RNA design that we make freely available as a web server (https://moiraibiodesign.com/design/, Gerard Minuesa leading to these results has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie SklodowskaCurie grant agreement No 712949 (TECNIOspring PLUS) and from the Agency for Business and Competitiveness (ACCIO) of the Government of Catalonia. Fund- ´ ing for open access charge: European Union’s Horizon 2020 Research and Innovation Programme under the Marie Sklodowska-Curie grant agreement No 712949 (TECNIOspring PLUS) and from the Agency for Business and Competitiveness (ACCIO) of the Government of Catalonia.
Published: 2021

5. Rotation Forest for multi-target regression

Author: Mario Juez-Gil, Álvar Arnaiz-González, Juan J. Rodríguez, and Carlos López-Nozal
Subjects: Computer science, Computational intelligence, 02 engineering and technology, Machine learning, computer.software_genre, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Quantization (image processing), Rotation Forest, Statistical hypothesis testing, Informática, business.industry, Supervised learning, 020206 networking & telecommunications, Multi-target regression, Regression, Random forest, Task (computing), Pattern recognition (psychology), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, Ensemble, computer, Software
Abstract: The prediction of multiple numeric outputs at the same time is called multi-target regression (MTR), and it has gained attention during the last decades. This task is a challenging research topic in supervised learning because it poses additional difficulties to traditional single-target regression (STR), and many real-world problems involve the prediction of multiple targets at once. One of the most successful approaches to deal with MTR, although not the only one, consists in transforming the problem in several STR problems, whose outputs will be combined building up the MTR output. In this paper, the Rotation Forest ensemble method, previously proposed for single-label classification and single-target regression, is adapted to MTR tasks and tested with several regressors and data sets. Our proposal rotates the input space in an efficient and novel fashion, avoiding extra rotations forced by MTR problem decomposition. Four approaches for MTR are used: single-target (ST), stacked-single target (SST), Ensembles of Regressor Chains (ERC), and Multi-target Regression via Quantization (MRQ). For assessing the benefits of the proposal, a thorough experimentation with 28 MTR data sets and statistical tests are used, concluding that Rotation Forest, adapted by means of these approaches, outperforms other popular ensembles, such as Bagging and Random Forest., Ministerio de Economía y Competitividad of the Spanish Government under project TIN2015-67534-P (MINECO-FEDER, UE), by the Junta de Castilla y León under project BU085P17 (JCyL/FEDER, UE) (both projects co-financed through European Union FEDER funds), and by the Consejería de Educación of the Junta de Castilla y León and the European Social Fund with the EDU/1100/2017 pre-doctoral grant.
Published: 2021

6. BayeSuites: An open web framework for massive Bayesian networks focused on neuroscience

Author: Michiels, Mario, Larrañaga, Pedro, and Bielza, Concha
Subjects: 0209 industrial biotechnology, Matemáticas, Computer science, Cognitive Neuroscience, Interoperability, 02 engineering and technology, computer.software_genre, 01 natural sciences, Extensibility, 010104 statistics & probability, 020901 industrial engineering & automation, Software, User experience design, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 0101 mathematics, Interpretability, Informática, business.industry, Web application framework, Bayesian network, Usability, Computer Science Applications, Scalability, 020201 artificial intelligence & image processing, business, Software engineering, computer
Abstract: BayeSuites is the first web framework for learning, visualizing, and interpreting Bayesian networks (BNs) that can scale to tens of thousands of nodes while providing fast and friendly user experience. All the necessary features that enable this are reviewed in this paper; these features include scalability, extensibility, interoperability, ease of use, and interpretability. Scalability is the key factor in learning and processing massive networks within reasonable time; for a maintainable software open to new functionalities, extensibility and interoperability are necessary. Ease of use and interpretability are fundamental aspects of model interpretation, fairly similar to the case of the recent explainable artificial intelligence trend. We present the capabilities of our proposed framework by highlighting a real example of a BN learned from genomic data obtained from Allen Institute for Brain Science. The extensibility properties of the software are also demonstrated with the help of our BN-based probabilistic clustering implementation, together with another genomic-data example.
Published: 2021

7. An Ensemble Method for Radicalization and Hate Speech Detection Online Empowered by Sentic Computing

Author: Carlos A. Iglesias and Oscar Araque
Subjects: Informática, Voice activity detection, Exploit, Computer science, business.industry, Cognitive Neuroscience, Feature extraction, 02 engineering and technology, Machine learning, computer.software_genre, Automation, Field (computer science), Computer Science Applications, 03 medical and health sciences, Projection (relational algebra), 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Affective computing, business, computer, 030217 neurology & neurosurgery, Statistical hypothesis testing
Abstract: The dramatic growth of the Web has motivated researchers to extract knowledge from enormous repositories and to exploit the knowledge in myriad applications. In this study, we focus on natural language processing (NLP) and, more concretely, the emerging field of affective computing to explore the automation of understanding human emotions from texts. This paper continues previous efforts to utilize and adapt affective techniques into different areas to gain new insights. This paper proposes two novel feature extraction methods that use the previous sentic computing resources AffectiveSpace and SenticNet. These methods are efficient approaches for extracting affect-aware representations from text. In addition, this paper presents a machine learning framework using an ensemble of different features to improve the overall classification performance. Following the description of this approach, we also study the effects of known feature extraction methods such as TF-IDF and SIMilarity-based sentiment projectiON (SIMON). We perform a thorough evaluation of the proposed features across five different datasets that cover radicalization and hate speech detection tasks. To compare the different approaches fairly, we conducted a statistical test that ranks the studied methods. The obtained results indicate that combining affect-aware features with the studied textual representations effectively improves performance. We also propose a criterion considering both classification performance and computational complexity to select among the different methods.
Published: 2021

8. Zephyrus: An Information Hiding Mechanism Leveraging Ethereum Data Fields

Author: Lorena González-Manzano, Carmen Camara, Mar Gimenez-Aguilar, José María de Fuentes, Comunidad de Madrid, Ministerio de Ciencia e Innovación (España), Universidad Carlos III de Madrid, and European Commission
Subjects: Informática, blockchain, Cryptocurrency, Blockchain, General Computer Science, Steganography, Computer science, General Engineering, Data field, Novelty, Covert channel, Computer security, computer.software_genre, information hiding, TK1-9971, Ethereum, Information hiding, etherum, General Materials Science, Electrical engineering. Electronics. Nuclear engineering, Electrical and Electronic Engineering, steganography, Database transaction, computer
Abstract: Permanent availability makes blockchain technologies a suitable alternative for building a covert channel. Previous works have analysed its feasibility in a particular blockchain technology called Bitcoin. However, Ethereum cryptocurrency is gaining momentum as a means to build distributed apps. The novelty of this paper relies on the use of Ethereum to establish a covert channel considering all transaction fields and smart contracts. No previous work has explored this issue. Thus, a mechanism called $Zephyrus$ , an information hiding mechanism based on steganography, is developed. Moreover, its capacity, cost and stealthiness are assessed both theoretically, and empirically through a prototype implementation that is publicly released. Disregarding the time taken to send the transaction to the blockchain, its retrieval and the mining time, experimental results show that, in the best case, 40 Kbits can be embedded in 0.57 s. for US $\$ $ 1.64, and retrieved in 2.8 s.
Published: 2021

9. Multipartition clustering of mixed data with Bayesian networks

Author: Fernando Rodriguez-Sanchez, Concha Bielza, PEDRO MARIA LARRAÑAGA MUGICA, and Pedro Larrañaga
Subjects: Human-Computer Interaction, Informática, Artificial Intelligence, Computer science, Matemáticas, Bayesian network, Multipartition, Data mining, computer.software_genre, Cluster analysis, computer, Software, Theoretical Computer Science
Abstract: Real‐world applications often involve multifaceted data with several reasonable interpretations. To cluster this data, we need methods that are able to produce multiple clustering solutions. To this purpose, it is interesting to learn a finite mixture model with multiple latent variables, where each latent variable represents a unique way to partition the data. However, although there is an extensive literature on multipartition clustering methods for categorical data and for continuous data, there is a lack of work for mixed data. In this paper, we propose a multipartition clustering method that is able to efficiently deal with mixed data by exploiting the Bayesian network factorization and the variational Bayes framework. We show the flexibility and applicability of the proposed method by solving clustering, density estimation, and missing data imputation tasks in real‐world data sets. For reproducibility, all code, data, and results can be found in the following public repository: https://github.com/ferjorosa/mpc‐mixed.
Published: 2022

10. 'You Are Not My Type': An Evaluation of Classification Methods for Automatic Phytolith Identification

Author: Pedro Latorre-Carmona, Javier Ruiz-Pérez, Álvar Arnaiz-González, Débora Zurro, and José-Francisco Díez-Pastor
Subjects: 010506 paleontology, Standardization, Computer science, Feature extraction, Machine learning, computer.software_genre, Microfossils, 01 natural sciences, Humans, 0601 history and archaeology, Instrumentation, 0105 earth and related environmental sciences, Informática, 060102 archaeology, business.industry, Morphometry, 06 humanities and the arts, Plants, Proxy, Archaeology, Categorization, Phytolith, Classification methods, Artificial intelligence, business, computer
Abstract: Phytoliths can be an important source of information related to environmental and climatic change, as well as to ancient plant use by humans, particularly within the disciplines of paleoecology and archaeology. Currently, phytolith identification and categorization is performed manually by researchers, a time-consuming task liable to misclassifications. The automated classification of phytoliths would allow the standardization of identification processes, avoiding possible biases related to the classification capability of researchers. This paper presents a comparative analysis of six classification methods, using digitized microscopic images to examine the efficacy of different quantitative approaches for characterizing phytoliths. A comprehensive experiment performed on images of 429 phytoliths demonstrated that the automatic phytolith classification is a promising area of research that will help researchers to invest time more efficiently and improve their recognition accuracy rate., This work was supported by the project TIN2015- 67534-P (MINECO/FEDER, UE) of the Ministerio de Economía y Competitividad of the Spanish Government, by the project BU085P17 (JCyL/FEDER, UE) of the Junta de Castilla y León (both projects co-financed through European Union FEDER funds) and by Grups de Recerca de Qualitat CaSEs – Culture and Socio-Ecological Dynamics (2017 SGR 212), AGAUR-Generalitat de Catalunya. The authors gratefully acknowledge the support of NVIDIA Corporation and its donation of the TITAN Xp GPUs used in this research.
Published: 2020

11. Towards enhanced MRI by using a multiple back end programming framework

Author: Jesus Carretero, J. Daniel Garcia, David del Rio Astorga, Javier Garcia-Blas, Comunidad de Madrid, European Commission, and Ministerio de Economía y Competitividad (España)
Subjects: Informática, Computer Networks and Communications, Data stream mining, Computer science, Distributed computing, Perspective (graphical), GRPPI, 020206 networking & telecommunications, Task-level parallelism, 02 engineering and technology, Data-level parallelism, computer.software_genre, MRI reconstruction, Task (project management), Software framework, Data stream processing, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, Parallelism (grammar), 020201 artificial intelligence & image processing, computer, Software
Abstract: In recent years, on-line processing of data streams (DaSP) has been established as a major computing paradigm. This is due mainly to two reasons: first, more and more data that are generated in near real-time need to be processed; the second reason is given by the need of efficient parallel applications. However, the above-mentioned areas expose a tough challenge over traditional data-analysis techniques, which have been forced to evolve to a stream perspective. In this work, we apply a novel multiple back end programming framework for stream data and task based parallelism to a multi-staged diffusion magnetic resonance imaging (MRI) toolkit, named pHARDI. The results demonstrate the benefits of using our framework in terms of performance and memory usage. The evaluation carried out also depicts that the speed-up of our parallel framework increases with the problem size. This work was supported by the EU project \ASPIDE: Exascale Programming Models for Extreme Data Processing" under grant 801091 and project TIN2016-79637-P \Towards unification of HPC and Big Data Paradigms" from the Spanish Ministry of Economy and Competitiveness of Spain. This research was partially supported by Madrid regional Government (Spain) under the grant \Convergencia Big data-Hpc: de los sensores a las Aplicaciones. (CABAHLA-CM)" Ref: S2018/TCS-4423.
Published: 2020

12. CCE: An ensemble architecture based on coupled ANN for solving multiclass problems

Author: Juan M. Alonso-Weber, M. Paz Sesmero, Araceli Sanchis, and Ministerio de Economía y Competitividad (España)
Subjects: Informática, Divide and conquer algorithms, Computational complexity theory, Artificial neural network, Computer science, business.industry, Binary number, Disjoint sets, Machine learning, computer.software_genre, diversity, Multiclass classification, ComputingMethodologies_PATTERNRECOGNITION, Hardware and Architecture, ensemble of classifiers, Signal Processing, artificial neural-networks, Artificial intelligence, multiclass-classification tasks, Architecture, business, Classifier (UML), computer, Software, Information Systems
Abstract: The resolution of multiclass classification problems has been usually addressed by using a "divide and conquer" strategy that splits the original problem into several binary subproblems. This approach is mandatory when the learning algorithm has been designed to solve binary problems and a multiclass version cannot be devised. Artificial Neural Networks, ANN, are binary learning models whose extension to multiclass problems is rather straightforward by using the standard 1-out-of N codification of the classes. However, the use of a single ANN can be inefficient in terms of accuracy and computational complexity when the data set is large, or the number of classes is high. In this work, we exhaustively describe CCE, a new classifier ensemble based on ANN. Each member of this new ensemble is a couple of multiclass ANN's. Each ANN is trained using different subsets of the dataset ensuring these subsets to be disjoint. This new approach allows to combine the benefits of the divide and conquer methodology, with the use of multiclass ANNs and with the combination of individual classification modules that give a complete answer to the addressed problem. The combination of these elements results in a classifier ensemble in which the diversity of the base classifiers provides high accuracy values. Moreover, the use of couples of ANN proves to be tolerant to labeling noise and computationally efficient. The performance of CCE has been tested on various datasets and the results show the higher performance of this approach with respect to other used classification systems. This research was supported by the Spanish MINECO under projects TRA2016-78886-C3-1-R and RTI2018-096036-B-C22.
Published: 2020

13. A data-driven approach to spoken dialog segmentation

Author: David Griol, Zoraida Callejas, José M. Molina, and Araceli Sanchis
Subjects: 0209 industrial biotechnology, Process (engineering), Computer science, Cognitive Neuroscience, 02 engineering and technology, computer.software_genre, conversational interfaces, Data-driven, domain knowledge acquisition, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Segmentation, Spoken dialog, Dialog box, Baseline (configuration management), human-machine interaction, Informática, business.industry, spoken interaction, Statistical model, Computer Science Applications, dialog structure annotation, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing
Abstract: In this paper, we present a statistical model for spoken dialog segmentation that decides the current phase of the dialog by means of an automatic classification process. We have applied our proposal to three practical conversational systems acting in different domains. The results of the evaluation show that is possible to attain high accuracy rates in dialog segmentation when using different sources of information to represent the user input. Our results indicate how the module proposed can also improve dialog management by selecting better system answers. The statistical model developed with human-machine dialog corpora has been applied in one of our experiments to human-human conversations and provides a good baseline as well as insights in the model limitation.
Published: 2020

14. Application of machine learning techniques to the flexible assessment and improvement of requirements quality

Author: Valentín Moreno, Anabel Fraga, Eugenio Parra, Gonzalo Génova, European Commission, and Ministerio de Economía y Competitividad (España)
Subjects: Computer science, Process (engineering), media_common.quotation_subject, 02 engineering and technology, automatic classification, Machine learning, computer.software_genre, Domain (software engineering), Set (abstract data type), flexible assessment, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Quality (business), experts judgment, Safety, Risk, Reliability and Quality, Linear combination, Function (engineering), media_common, Informática, Measure (data warehouse), business.industry, automatic improvement, 020207 software engineering, Automation, requirements quality, machine learning, Artificial intelligence, business, computer, Software
Abstract: It is already common to compute quantitative metrics of requirements to assess their quality. However, the risk is to build assessment methods and tools that are both arbitrary and rigid in the parameterization and combination of metrics. Specifically, we show that a linear combination of metrics is insufficient to adequately compute a global measure of quality. In this work, we propose to develop a flexible method to assess and improve the quality of requirements that can be adapted to different contexts, projects, organizations, and quality standards, with a high degree of automation. The domain experts contribute with an initial set of requirements that they have classified according to their quality, and we extract their quality metrics. We then use machine learning techniques to emulate the implicit expert’s quality function. We provide also a procedure to suggest improvements in bad requirements. We compare the obtained rule-based classifiers with different machine learning algorithms, obtaining measurements of effectiveness around 85%. We show as well the appearance of the generated rules and how to interpret them. The method is tailorable to different contexts, different styles to write requirements, and different demands in quality. The whole process of inferring and applying the quality rules adapted to each organization is highly automated This research has received funding from the CRYSTAL project–Critical System Engineering Acceleration (European Union’s Seventh Framework Program FP7/2007-2013, ARTEMIS Joint Undertaking grant agreement no 332830); and from the AMASS project–Architecture-driven, Multi-concern and Seamless Assurance and Certification of Cyber-Physical Systems (H2020-ECSEL grant agreement no 692474; Spain’s MINECO ref. PCIN-2015-262).
Published: 2020

15. CloudBench: an integrated evaluation of VM placement algorithms in clouds

Author: Gomez Rodriguez, Mario A., Gómez Rodríguez, Mario A., Sosa-Sosa, Victor J., Carretero Pérez, Jesús, González, José Luis, and Ministerio de Economía, Industria y Competitividad (España)
Subjects: Informática, Computer science, business.industry, load balancing, cloud resource management, Cloud computing, User requirements document, Asset (computer security), computer.software_genre, Theoretical Computer Science, iaas, Task (computing), Resource (project management), Hardware and Architecture, Virtual machine, cloud simulator, Resource management, business, Algorithm, Implementation, computer, Software, Information Systems
Abstract: A complex and important task in the cloud resource management is the efficient allocation of virtual machines (VMs), or containers, in physical machines (PMs). The evaluation of VM placement techniques in real-world clouds can be tedious, complex and time-consuming. This situation has motivated an increasing use of cloud simulators that facilitate this type of evaluations. However, most of the reported VM placement techniques based on simulations have been evaluated taking into account one specific cloud resource (e.g., CPU), whereas values often unrealistic are assumed for other resources (e.g., RAM, awaiting times, application workloads, etc.). This situation generates uncertainty, discouraging their implementations in real-world clouds. This paper introduces CloudBench, a methodology to facilitate the evaluation and deployment of VM placement strategies in private clouds. CloudBench considers the integration of a cloud simulator with a real-world private cloud. Two main tools were developed to support this methodology, a specialized multi-resource cloud simulator (CloudBalanSim), which is in charge of evaluating VM placement techniques, and a distributed resource manager (Balancer), which deploys and tests in a real-world private cloud the best VM placement configurations that satisfied user requirements defined in the simulator. Both tools generate feedback information, from the evaluation scenarios and their obtained results, which is used as a learning asset to carry out intelligent and faster evaluations. The experiments implemented with the CloudBench methodology showed encouraging results as a new strategy to evaluate and deploy VM placement algorithms in the cloud. This work was partially funded by the Spanish Ministry of Economy, Industry and Competitiveness under the Grant TIN2016-79637-P “Towards Unifcation of HPC and Big Data Paradigms” and by the Mexican Council of Science and Technology (CONACYT) through a Ph.D. Grant (No. 212677).
Published: 2020

16. Telemonitoring System for Infectious Disease Prediction in Elderly People Based on a Novel Microservice Architecture

Author: Gloria Sención, Luis Mendoza-Pittí, José Luis Castillo-Sequera, Huriviades Calderón-Gómez, José Sanz-Moreno, José Manuel Gómez-Pulido, Miguel Vargas-Lombardo, Universidad de Alcalá. Departamento de Ciencias de la Computación, and Universidad de Alcalá. Departamento de Medicina y Especialidades Médicas
Subjects: Telemonitoring, Artificial intelligence, General Computer Science, Computer science, Medicina, Big data, Cloud computing, 02 engineering and technology, Microservices, Elderly people, NoSQL, computer.software_genre, infectious diseases, microservices, 0202 electrical engineering, electronic engineering, information engineering, Arti cial intelligence, General Materials Science, Architecture, Microservice architecture, e-health, elderly people, Informática, elderly people, business.industry, General Engineering, Usability, 021001 nanoscience & nanotechnology, microservice architecture, e-Health, Infectious diseases, Medicine, e-health, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, 0210 nano-technology, business, Software architecture, Software engineering, computer, lcsh:TK1-9971
Abstract: This article describes the design, development and implementation of a set of microservices based on an architecture that enables detection and assisted clinical diagnosis within the field of infectious diseases of elderly patients, via a telemonitoring system. The proposed system is designed to continuously update a medical database fed with vital signs from biosensor kits applied by nurses to elderly people on a daily basis. The database is hosted in the cloud and is managed by a flexible microservices software architecture. The computational paradigms of the edge and the cloud were used in the implementation of a hybrid cloud architecture in order to support versatile high-performance applications under the microservices pattern for the pre-diagnosis of infectious diseases in elderly patients. The results of an analysis of the usability of the equipment, the performance of the architecture and the service concept show that the proposed e-health system is feasible and innovative. The system components are also selected to give a cost-effective implementation for people living in disadvantaged areas. The proposed e-health system is also suitable for distributed computing, big data and NoSQL structures, thus allowing the immediate application of machine learning and AI algorithms to discover knowledge patterns from the overall population., European Commission
Published: 2020

17. Incremental Learning of Latent Forests

Author: Pedro Larrañaga, Fernando Rodriguez-Sanchez, and Concha Bielza
Subjects: General Computer Science, Computer science, Test data generation, 02 engineering and technology, Latent variable, Machine learning, computer.software_genre, hidden variables, 01 natural sciences, 010104 statistics & probability, Cardinality, 0202 electrical engineering, electronic engineering, information engineering, Code (cryptography), General Materials Science, Fraction (mathematics), latent tree model, 0101 mathematics, Latent variable model, Informática, variational Bayes, business.industry, General Engineering, Process (computing), Tree (data structure), Incremental learning, 020201 artificial intelligence & image processing, Artificial intelligence, lcsh:Electrical engineering. Electronics. Nuclear engineering, business, computer, lcsh:TK1-9971
Abstract: In the analysis of real-world data, it is useful to learn a latent variable model that represents the data generation process. In this setting, latent tree models are useful because they are able to capture complex relationships while being easily interpretable. In this paper, we propose two incremental algorithms for learning forests of latent trees. Unlike current methods, the proposed algorithms are based on the variational Bayesian framework, which allows them to introduce uncertainty into the learning process and work with mixed data. The first algorithm, incremental learner , determines the forest structure and the cardinality of its latent variables in an iterative search process. The second algorithm, constrained incremental learner , modifies the previous method by considering only a subset of the most prominent structures in each step of the search. Although restricting each iteration to a fixed number of candidate models limits the search space, we demonstrate that the second algorithm returns almost identical results for a small fraction of the computational cost. We compare our algorithms with existing methods by conducting a comparative study using both discrete and continuous real-world data. In addition, we demonstrate the effectiveness of the proposed algorithms by applying them to data from the 2018 Spanish Living Conditions Survey. All code, data, and results are available at https://github.com/ferjorosa/incremental-latent-forests .
Published: 2020

18. Evaluation of Cybersecurity Data Set Characteristics for Their Applicability to Neural Networks Algorithms Detecting Cybersecurity Anomalies

Author: Víctor A. Villagrá, Mario Sanz Rodrigo, Mario Vega-Barbas, and Xavier Larriva-Novo
Subjects: Cybersecurity, General Computer Science, Computer science, intrusion detection, Activation function, 02 engineering and technology, Intrusion detection system, Computer security, computer.software_genre, data sets, Field (computer science), 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, data analytics, Informática, Artificial neural network, General Engineering, 020206 networking & telecommunications, Division (mathematics), neural networks, Data set, machine learning, Categorization, 020201 artificial intelligence & image processing, lcsh:Electrical engineering. Electronics. Nuclear engineering, computer, Algorithm, lcsh:TK1-9971
Abstract: Artificial intelligence algorithms have a leading role in the field of cybersecurity and attack detection, being able to present better results in some scenarios than classic intrusion detection systems such as Snort or Suricata. In this sense, this research focuses on the evaluation of characteristics for different well-established Machine Leaning algorithms commonly applied to IDS scenarios. To do this, a categorization for cybersecurity data sets that groups its records into several groups is first considered. Making use of this division, this work seeks to determine which neural network model (multilayer or recurrent), activation function, and learning algorithm yield higher accuracy values, depending on the group of data. Finally, the results are used to determine which group of data from a cybersecurity data set are more relevant and representative for the intrusion detection, and the most suitable configuration of Machine Learning algorithm to decrease the computational load of the system.
Published: 2020

19. Property Satisfiability Analysis for Product Lines of Modelling Languages

Author: Rick Salay, Juan de Lara, Esther Guerra, Marsha Chechik, and UAM. Departamento de Ingeniería Informática
Subjects: Informática, Property (philosophy), Programming language, Computer science, Software language engineering, 020207 software engineering, 02 engineering and technology, OCL, computer.software_genre, Satisfiability, Model finding, 020204 information systems, Product (mathematics), 0202 electrical engineering, electronic engineering, information engineering, Product lines, Meta-modelling, Model-driven engineering, computer, Software
Abstract: © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works., Software engineering uses models throughout most phases of the development process. Models are defined using modelling languages. To make these languages applicable to a wider set of scenarios and customizable to specific needs, researchers have proposed using product lines to specify modelling language variants. However, there is currently a lack of efficient techniques for ensuring correctness with respect to properties of the models accepted by a set of language variants. This may prevent detecting problematic combinations of language variants that produce undesired effects at the model level. To attack this problem, we first present a classification of instantiability properties for language product lines. Then, we propose a novel approach to lifting the satisfiability checking of model properties of individual language variants, to the product line level. Finally, we report on an implementation of our proposal in the Merlin tool, and demonstrate the efficiency gains of our lifted analysis method compared to an enumerative analysis of each individual language variant, This work has been funded by the Spanish Ministry of Science (RTI2018-095255-B-I00), the R&D programme of Madrid (P2018/TCS-4314), and by NSERC. We thank the anonymous referees for their useful comments
Published: 2022

20. Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish

Author: Sandra Baldassarri, Laura Cristina Lanzarini, Leonardo Esnaola, and Juan Pablo Tessore
Subjects: Facebook, Text mining, Computer science, Cognitive Neuroscience, Sample (material), media_common.quotation_subject, Fleiss' kappa, 02 engineering and technology, computer.software_genre, Field (computer science), 03 medical and health sciences, Sentiment analysis, 0302 clinical medicine, 0202 electrical engineering, electronic engineering, information engineering, Quality (business), media_common, Informática, business.industry, Dataset validation, Computer Science Applications, Inter-rater reliability, Statistical classification, Categorization, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, business, computer, Dataset construction, 030217 neurology & neurosurgery, Natural language processing
Abstract: Tagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field., Instituto de Investigación en Informática
Published: 2022

21. Fostering interpretability of data mining models through data perturbation

Author: Seddik Belkoura, Massimiliano Zanin, and Antonio LaTorre
Subjects: Informática, 0209 industrial biotechnology, business.industry, Computer science, General Engineering, Perturbation (astronomy), 02 engineering and technology, Modular design, computer.software_genre, Computer Science Applications, Nonlinear system, 020901 industrial engineering & automation, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, business, computer, Interpretability
Abstract: With the widespread adoption of data mining models to solve real-world problems, the scientific community is facing the need of increasing their interpretability and comprehensibility. This is especially relevant in the case of black box models, in which inputs and outputs are usually connected by highly complex and nonlinear functions; in applications requiring an interaction between the user and the model; and when the machine’s solution disagrees with the human experience. In this contribution we present a new methodology that allows to simplify the process of understanding the rules behind a classification model, even in the case of black box ones. It is based on the perturbation of the features describing one instance, and on finding the minimal variation required to change the forecasted class. It thus yields simplified rules describing under which circumstances would the solution have been different, and allows to compare these with the human expectation. We show how such methodology is well defined, model-agnostic, easy to implement and modular; and demonstrate its usefulness with several synthetic and real-world data sets.
Published: 2019

22. Blood glucose prediction using multi-objective grammatical evolution: analysis of the 'agnostic' and 'what-if' scenarios

Author: J. Manuel Colmenar, J. Ignacio Hidalgo, J. Manuel Velasco, Sergio Contador, and Oscar Garnica
Subjects: Informática, Fitness function, Computer science, business.industry, Grid analysis, Function (mathematics), Machine learning, computer.software_genre, Computer Science Applications, Theoretical Computer Science, Programación de ordenadores, Clinical Practice, Hardware and Architecture, Grammatical evolution, Artificial intelligence, Symbolic regression, business, computer, Software
Abstract: In this paper we investigate the benefits of applying a multi-objective approach for solving a symbolic regression problem by means of Grammatical Evolution. In particular, we extend previous work, obtaining mathematical expressions to model glucose levels in the blood of diabetic patients. Here we use a multi-objective Grammatical Evolution approach based on the NSGA-II algorithm, considering the root-mean-square error and an ad-hoc fitness function as objectives. This ad-hoc function is based on the Clarke Error Grid analysis, which is useful for showing the potential danger of mispredictions in diabetic patients. In this work, we use two datasets to analyse two different scenarios: What-if and Agnostic, the most common in daily clinical practice. In the What-if scenario, where future events are evaluated, results show that the multi-objective approach improves previous results in terms of Clarke Error Grid analysis by reducing the number of dangerous mispredictions. In the Agnostic situation, with no available information about future events, results suggest that we can obtain good predictions with only information from the previous hour for both Grammatical Evolution and Multi-Objective Grammatical Evolution.
Published: 2021

23. Automating the synthesis of recommender systems for modelling languages

Author: Juan de Lara, Iván Cantador, Esther Guerra, Sara Pérez-Soler, and Lissette Almonte
Subjects: Informática, Domain-specific language, Service (systems architecture), Computer science, business.industry, RSS, computer.file_format, Recommender system, computer.software_genre, Chatbot, Domain (software engineering), Tree (data structure), Recommender systems, Model-driven architecture, Model-driven engineering, Software engineering, business, computer, Domain-specific languages, computer.programming_language, Modelling languages
Abstract: We are witnessing an increasing interest in building recommender systems (RSs) for all sorts of Software Engineering activities. Modelling is no exception to this trend, as modelling environments are being enriched with RSs that help building models by providing recommendations based on previous solutions to similar problems in the same domain. However, building a RS from scratch requires considerable effort and specialized knowledge. To alleviate this problem, we propose an automated approach to the generation of RSs for modelling languages. Our approach is model-based, and we provide a domain-specific language called Droid to configure every aspect of the RS (like the type and features of the recommended items, the recommendation method, and the evaluation metrics). The RS so configured can be deployed as a service, and we offer out-of-the-box integration of this service with the EMF tree editor. To assess the usefulness of our proposal, we present a case study on the integration of a generated RS with a modelling chatbot, and report on an offline experiment measuring the precision and completeness of the recommendations, This project has received funding from the EU Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 813884, the Spanish Ministry of Science (RTI2018-095255-B-I00) and the R&D programme of Madrid (P2018/TCS-4314)
Published: 2021

24. Subscription video on demand (SVOD) platform accessibility verification method

Author: Francisco Utray and Gema López-Sanchez
Subjects: Informática, Measure (data warehouse), Multimedia, Computer science, business.industry, Audio description, media studies, Video on demand, svod, computer.software_genre, critical disability studies, accessibility, Movie theater, universal design, disability, Content analysis, platforms, new media, business, computer, Research question, Qualitative research
Abstract: How can you measure the percentage of sensory accessibility in the new Subscription Video on Demand (SVOD) platforms? This investigation aims to answer this research question. For this, a qualitative methodology model based on a content analysis of 37 variables is presented, which have been written based on the Spanish technical regulations UNE 153010 of May 2012 of the Subtitling for deaf people and people with hearing disabilities and UNE 153020 of January 2005 of the Audio description for people with visual disabilities. This method is applicable for cinema, series and short films presented at SVOD and is limited to the accessibility of subtitling for deaf people (CC) and the audio description for blind people.
Published: 2021

25. Using Advanced Learning Technologies with University Students: An Analysis with Machine Learning Techniques

Author: María Consuelo Sáiz-Manzanares, Raúl Marticorena-Sánchez, and Javier Ochoa-Orihuel
Subjects: self-regulated learning, TK7800-8360, Computer Networks and Communications, Process (engineering), Computer science, Machine learning, computer.software_genre, Educational data mining, Self-regulated learning, ComputingMilieux_COMPUTERSANDEDUCATION, Psychology, Electrical and Electronic Engineering, Informática, business.industry, advanced learning technologies, Advanced learning technologies, Automation, LMS, Psicología, machine learning, Hardware and Architecture, Control and Systems Engineering, Signal Processing, Unsupervised learning, Learning Management, Artificial intelligence, Tracking (education), Electronics, business, Classifier (UML), computer
Abstract: The use of advanced learning technologies (ALT) techniques in learning management systems (LMS) allows teachers to enhance self-regulated learning and to carry out the personalized monitoring of their students throughout the teaching–learning process. However, the application of educational data mining (EDM) techniques, such as supervised and unsupervised machine learning, is required to interpret the results of the tracking logs in LMS. The objectives of this work were (1) to determine which of the ALT resources would be the best predictor and the best classifier of learning outcomes, behaviours in LMS, and student satisfaction with teaching; (2) to determine whether the groupings found in the clusters coincide with the students’ group of origin. We worked with a sample of third-year students completing Health Sciences degrees. The results indicate that the combination of ALT resources used predict 31% of learning outcomes, behaviours in the LMS, and student satisfaction. In addition, student access to automatic feedback was the best classifier. Finally, the degree of relationship between the source group and the found cluster was medium (C = 0.61). It is necessary to include ALT resources and the greater automation of EDM techniques in the LMS to facilitate their use by teachers., This research was funded by the MINISTERIO DE CIENCIA E INNOVACIÓN, grant number PID2020-117111RB-I00.
Published: 2021

26. Audio Feature Engineering for Occupancy and Activity Estimation in Smart Buildings

Author: Gabriela Santiago, Marvin Jiménez, Edwin Montoya, Jose Aguilar, and Universidad de Alcalá. Departamento de Automática
Subjects: Feature engineering, Scheme (programming language), Occupancy, TK7800-8360, Computer Networks and Communications, Computer science, Smart buildings, 020209 energy, Audio feature engineering, 0211 other engineering and technologies, 02 engineering and technology, Machine learning, computer.software_genre, Occupancy estimation, 021105 building & construction, 0202 electrical engineering, electronic engineering, information engineering, Atmospherics, Electrical and Electronic Engineering, Building automation, computer.programming_language, Estimation, Informática, business.industry, occupancy estimation, audio feature engineering, acoustic features, Activity estimation, Acoustic features, Hardware and Architecture, Control and Systems Engineering, Signal Processing, smart buildings, Smart environment, Artificial intelligence, activity estimation, Electronics, business, computer
Abstract: The occupancy and activity estimation are fields that have been severally researched in the past few years. However, the different techniques used include a mixture of atmospheric features such as humidity and temperature, many devices such as cameras and audio sensors, or they are limited to speech recognition. In this work is proposed that the occupancy and activity can be estimated only from the audio information using an automatic approach of audio feature engineering to extract, analyze and select descriptors/variables. This scheme of extraction of audio descriptors is used to determine the occupation and activity in specific smart environments, such that our approach can differentiate between academic, administrative or commercial environments. Our approach from the audio feature engineering is compared to previous similar works on occupancy estimation and/or activity estimation in smart buildings (most of them including other features, such as atmospherics and visuals). In general, the results obtained are very encouraging compared to previous studies., European Commission
Published: 2021

27. Peer-to-Peer System Design Trade-Offs: A Framework Exploring the Balance between Blockchain and IPFS

Author: Samer Hassan, Ámbar Tenorio-Fornés, and Juan Pavón
Subjects: Technology, decentralization, Computer science, QH301-705.5, QC1-999, Cloud computing, P2P systems, IPFS, 02 engineering and technology, Peer-to-peer, computer.software_genre, Decentralization, Consistency (database systems), Blockchain, 0502 economics and business, 0202 electrical engineering, electronic engineering, information engineering, General Materials Science, multi-agent systems, 050207 economics, Biology (General), Instrumentation, QD1-999, Fluid Flow and Transfer Processes, Informática, distributed systems, business.industry, Process Chemistry and Technology, Multi-agent system, Physics, 05 social sciences, General Engineering, Data discovery, 020207 software engineering, Engineering (General). Civil engineering (General), Computer Science Applications, Chemistry, Data access, Risk analysis (engineering), Systems design, TA1-2040, business, computer
Abstract: The current state of the web, which is dominated by centralized cloud services, raises several concerns regarding different aspects such as governance, privacy, surveillance, and security. A way to address these issues is to decentralize the platforms by adopting new distributed technologies, such as IPFS and Blockchain, which follow a full peer-to-peer model. This work proposes a set of guidelines to design decentralized systems, taking the different trade-offs these technologies face with regard to their consistency requirements into consideration. These guidelines are then illustrated with the design of a decentralized questions and answers system. This system serves to illustrate a framework to create decentralized services and applications that uses IPFS and Blockchain technologies and incorporates the discussion and guidelines of the paper, providing solutions for data access, data provenance, and data discovery. Thus, this work proposes a framework to assist in the design of new decentralized systems, proposing a set of guidelines to choose the appropriate technologies depending on the relevant requirements, e.g., considering if Blockchain technology may be required or IPFS might be sufficient.
Published: 2021

28. Videoconferencing Software Options for Telemedicine: A Review for Movement Disorder Neurologists

Author: Esther Cubo, Adrian Arnaiz-Rodriguez, Álvar Arnaiz-González, José Francisco Díez-Pastor, Meredith Spindler, Adriana Cardozo, Alvaro Garcia-Bustillo, Zoltan Mari, and Bastiaan R. Bloem
Subjects: Telemedicine, Computer science, telehealth, Mini Review, Parkinson's disease, Data security, Telehealth, computer.software_genre, World Wide Web, Videoconferencing, Software, Health care, RC346-429, Movement disorders, Neurología, Informática, business.industry, Videoconference, Security information, Disorders of movement Donders Center for Medical Neuroscience [Radboudumc 3], Neurology, videoconference, Parkinson’s disease, movement disorders, Neurology (clinical), telemedicine, Neurology. Diseases of the nervous system, business, computer
Abstract: Artículo de revisión, Advances in technology have expanded telemedicine opportunities in medical practice, research, and education. After the declaration of the COVID-19 outbreak as a pandemic, the use of telemedicine has increased to address the ongoing healthcare needs of patients with chronic illnesses, for example, by the introduction of interdisciplinary telehealth services. Such services have helped reduce the number of in-person clinic visits and thereby minimize human exposures to Coronavirus. In response to the surging needs for remote care, many countries worldwide have expanded laws and regulations to permit greater adoption of telemedicine systems, have provided increased guidance on digital health technologies and cybersecurity expectations, and have expanded reimbursement options. Many organizations, including the American Academy of Neurology and the International Parkinson and Movement Disorder Society, have also issued telemedicine guidelines., This work was supported by the project PI19/00670 of the Ministerio de Ciencia, Innovación y Universidades, Instituto de Salud Carlos II, Spain.
Published: 2021

29. A High-Level Ontology Network for ICT Infrastructures

Author: Jhon Toledo, Hu Peng, David Chaves-Fraga, Mingxue Wang, Oscar Corcho, Nicholas Burrett, Puchao Zhang, Julián Arenas-Guerrero, José Mora, and Carlos Badenes-Olmedo
Subjects: Informática, Configuration management, Virtual machine, Computer science, Server, Configuration management database, Context (language use), Microservices, Ontology (information science), computer.software_genre, Data science, computer, Networking hardware
Abstract: The ICT infrastructures of medium and large organisations that offer ICT services (infrastructure, platforms, software, applications, etc.) are becoming increasingly complex. Nowadays, these environments combine all sorts of hardware (e.g., CPUs, GPUs, storage elements, network equipment) and software (e.g., virtual machines, servers, microservices, services, products, AI models). Tracking, understanding and acting upon all the data produced in the context of such environments is hence challenging. Configuration management databases have been so far widely used to store and provide access to relevant information and views on these components and on their relationships. However, different databases are organised according to different schemas. Despite existing efforts in standardising the main entities relevant for configuration management, there is not yet a core set of ontologies that describes these environments homogeneously, and which can be easily extended when new types of items appear. This paper presents an ontology network created with the purpose of serving as an initial step towards an homogeneous representation of this domain, and which has been already used to produce a knowledge graph for a large ICT company.
Published: 2021

30. Co-Participation and Co-Creation in Higher Education: the use of ICT in the interdisciplinary project 'Voz Delas'

Author: Maria José Carvalho de Souza Domingues, Denise Maria Sapelli, Gisele Baumgarten Rosumek, and Uniedu/Fumdes Graduate Program
Subjects: Knowledge management, Higher education, Comunication and Information Tecnologies, business.industry, Process (engineering), computer.software_genre, Co-Participation and Co-Creation in Higher Education, Digital Convergence, Colaborative Learning, Virtual machine, Information and Communications Technology, Educational resources, Educação, Informática, Comunicação social, ComputingMilieux_COMPUTERSANDEDUCATION, Co-creation, Virtual learning environment, ICTS, Sociology, business, computer
Abstract: Implementing ICTs in the educational environment is one of the greatest challenges related to pedagogical and technological innovation, since their use can influence how to teach, constituting educational resources. The aim of this article is describe one teaching experience, based on colaborative learning through co-participation and co-creation, that used a lot of resources and digital plattaforms in an undergraduate course. The stages of the construction of co-creation and co-participation took place in four phases. The project lasted 4 months and was developed by 40 students. In conclusion, the introduction of ICTs and virtual learning environments into the traditional teaching model can improve interaction in the teaching-learning process. The activities developed show the possibilities of the use of ICTs as a potential for co-participation and co-creation. The contact with students through the virtual environment created different possibilities to enrich their knowledge.
Published: 2021

31. Semiparametric Bayesian Networks

Author: Concha Bielza, David Atienza, and Pedro Larrañaga
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, Information Systems and Management, Computer science, Matemáticas, Kernel density estimation, Multivariate normal distribution, Machine Learning (stat.ML), Overfitting, Machine learning, computer.software_genre, Theoretical Computer Science, Machine Learning (cs.LG), Artificial Intelligence, Statistics - Machine Learning, Parametric statistics, Informática, I.5.1, business.industry, I.2.6, Nonparametric statistics, Conditional probability, Bayesian network, Computer Science Applications, ComputingMethodologies_PATTERNRECOGNITION, Control and Systems Engineering, 68T05 68T10, Parametric model, Artificial intelligence, business, computer, Software
Abstract: We introduce semiparametric Bayesian networks that combine parametric and nonparametric conditional probability distributions. Their aim is to incorporate the advantages of both components: the bounded complexity of parametric models and the flexibility of nonparametric ones. We demonstrate that semiparametric Bayesian networks generalize two well-known types of Bayesian networks: Gaussian Bayesian networks and kernel density estimation Bayesian networks. For this purpose, we consider two different conditional probability distributions required in a semiparametric Bayesian network. In addition, we present modifications of two well-known algorithms (greedy hill-climbing and PC) to learn the structure of a semiparametric Bayesian network from data. To realize this, we employ a score function based on cross-validation. In addition, using a validation dataset, we apply an early-stopping criterion to avoid overfitting. To evaluate the applicability of the proposed algorithm, we conduct an exhaustive experiment on synthetic data sampled by mixing linear and nonlinear functions, multivariate normal data sampled from Gaussian Bayesian networks, real data from the UCI repository, and bearings degradation data. As a result of this experiment, we conclude that the proposed algorithm accurately learns the combination of parametric and nonparametric components, while achieving a performance comparable with those provided by state-of-the-art methods., 44 pages, 13 figures, 4 tables, submitted to Information Sciences
Published: 2021

32. Development of a Management System for Short Online Learning Courses

Author: Antonio Sarasa-Cabezuelo
Subjects: Informática, Aprendizaje, training, Public Administration, Multimedia, Computer science, Interface (Java), Online learning, Physical Therapy, Sports Therapy and Rehabilitation, computer.software_genre, Computer Science Applications, Android app, Education, Development (topology), Simple (abstract algebra), Management system, course management, Developmental and Educational Psychology, Computer Science (miscellaneous), ComputingMilieux_COMPUTERSANDEDUCATION, eLearning, Set (psychology), computer
Abstract: In all areas of knowledge, there is a set of basic concepts or ideas that are essential to understand. A very common problem that arises among students is the difficulty in acquiring these concepts. One way to solve this problem is by carrying out small knowledge tests where these concepts are put into practice. To develop these tests there are multiple options, however, in general, the applications that allow for these tests to be implemented are usually very complex and are often included within other larger systems. This article describes a tool to develop short online courses that presents an interface and a functionality that allows a teacher to create and manage courses in a very simple and intuitive way. In the same way, it offers the student an Android app to be able to access the courses published by the teacher to be able to take them, as well as to manage all the activity of it.
Published: 2021

33. An Analysis of Android Malware Classification Services

Author: Guillermo Suarez-Tangil, Mohammed Rashed, and Ministerio de Economía y Competitividad (España)
Subjects: family, Computer science, Sample (statistics), TP1-1185, computer.software_genre, Biochemistry, Malware, Article, Clustering, Analytical Chemistry, Android, Antivirus, Family, Electrical and Electronic Engineering, Android (operating system), Cluster analysis, Virustotal, Instrumentation, Ecosystem, Informática, antivirus, Information retrieval, Labels, malware, Chemical technology, Classification, Class (biology), Atomic and Molecular Physics, and Optics, labels, classification, Benchmark (computing), Key (cryptography), VirusTotal, Noise (video), computer, Algorithms, clustering
Abstract: The increasing number of Android malware forced antivirus (AV) companies to rely on automated classification techniques to determine the family and class of suspicious samples. The research community relies heavily on such labels to carry out prevalence studies of the threat ecosystem and to build datasets that are used to validate and benchmark novel detection and classification methods. In this work, we carry out an extensive study of the Android malware ecosystem by surveying white papers and reports from 6 key players in the industry, as well as 81 papers from 8 top security conferences, to understand how malware datasets are used by both. We, then, explore the limitations associated with the use of available malware classification services, namely VirusTotal (VT) engines, for determining the family of an Android sample. Using a dataset of 2.47 M Android malware samples, we find that the detection coverage of VT's AVs is generally very low, that the percentage of samples flagged by any 2 AV engines does not go beyond 52%, and that common families between any pair of AV engines is at best 29%. We rely on clustering to determine the extent to which different AV engine pairs agree upon which samples belong to the same family (regardless of the actual family name) and find that there are discrepancies that can introduce noise in automatic label unification schemes. We also observe the usage of generic labels and inconsistencies within the labels of top AV engines, suggesting that their efforts are directed towards accurate detection rather than classification. Our results contribute to a better understanding of the limitations of using Android malware family labels as supplied by common AV engines. This work has been supported by the “Ramon y Cajal” Fellowship RYC-2020-029401.
Published: 2021
Full Text: View/download PDF

34. Analysis and Transformation of Constrained Horn Clauses for Program Verification

Author: Emanuele De Angelis, Alberto Pettorossi, Maurizio Proietti, John P. Gallagher, Fabio Fioravanti, and Manuel V. Hermenegildo
Subjects: FOS: Computer and information sciences, Loop invariant, Computer Science - Logic in Computer Science, Horn clause, Computer science, computer.software_genre, Field (computer science), Theoretical Computer Science, Program Transformation, Artificial Intelligence, Constraint logic programming, Software system, Informática, Program Verification, Program Analysis, Computer Science - Programming Languages, Programming language, Static analysis, Satisfiability, Logic in Computer Science (cs.LO), Transformation (function), TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES, Computational Theory and Mathematics, Hardware and Architecture, Constraint Logic Programming, Constrained Horn Clauses, computer, Software, Programming Languages (cs.PL)
Abstract: This paper surveys recent work on applying analysis and transformation techniques that originate in the field of constraint logic programming (CLP) to the problem of verifying software systems. We present specialisation-based techniques for translating verification problems for different programming languages, and in general software systems, into satisfiability problems for constrained Horn clauses (CHCs), a term that has become popular in the verification field to refer to CLP programs. Then, we describe static analysis techniques for CHCs that may be used for inferring relevant program properties, such as loop invariants. We also give an overview of some transformation techniques based on specialisation and fold/unfold rules, which are useful for improving the effectiveness of CHC satisfiability tools. Finally, we discuss future developments in applying these techniques., Under consideration in Theory and Practice of Logic Programming (TPLP)
Published: 2021

35. On the Security of the K Minimum Values (KMV) Sketch

Author: Pedro Reviriego, Shanshan Liu, Alfonso Sanchez-Macian, Fabrizio Lombardi, Comunidad de Madrid, and Ministerio de Economía y Competitividad (España)
Subjects: Informática, Computational complexity theory, Computer science, business.industry, Cardinality, Hash function, Big data, Probabilistic logic, Attack, computer.software_genre, Data structure, Data sketches, Sketch, Similarity, Kmv, Security, Data mining, Electrical and Electronic Engineering, business, computer, Adversary model
Abstract: Data sketches are widely used to accelerate operations in big data analytics. For example, algorithms use sketches to compute the cardinality of a set, or the similarity between two sets. Sketches achieve significant reductions in computing time and storage requirements by providing probabilistic estimates rather than exact values. In many applications, an estimate is sufficient and thus, it is possible to trade accuracy for computational complexity; this enables the use of probabilistic sketches. However, the use of probabilistic data structures may create security issues because an attacker may manipulate the data in such a way that the sketches produce an incorrect estimate. For example, an attacker could potentially inflate the estimate of the number of distinct users to increase its revenues or popularity. Recent works have shown that an attacker can manipulate Hyperloglog, a sketch widely used for cardinality estimate, with no knowledge of its implementation details. This paper considers the security of K Minimum Values (KMV), a sketch that is also widely used to implement both cardinality and similarity estimates. Next sections characterize vulnerabilities at an implementationindependent level, with attacks formulated as part of a novel adversary model that manipulates the similarity estimate. Therefore, the paper pursues an analysis and simulation; the results suggest that as vulnerable to attacks, an increase or reduction of the estimate may occur. The execution of the attacks against the KMV implementation in the Apache DataSketches library validates these scenarios. Experiments show an excellent agreement between theory and experimental results. Pedro Reviriego acknowledges the support of the ACHILLES project PID2019-104207RB-I00 and the Go2Edge network RED2018-102585-T funded by the Spanish Ministry of Economy and Competitivity and of the Madrid Community research project TAPIR-CM under Grant P2018/TCS-4496.
Published: 2021

36. Impact of the learners diversity and combination method on the generation of heterogeneous classifier ensembles

Author: Elena Magán, M. Paz Sesmero, Agapito Ledezma, Araceli Sanchis, Jose Antonio Iglesias, and Agencia Estatal de Investigación (España)
Subjects: Informática, Relation (database), business.industry, Computer science, Base (topology), Machine learning, computer.software_genre, Variety (cybernetics), ComputingMethodologies_PATTERNRECOGNITION, Ensembles of classifiers, Ensemble diversity, Classifier (linguistics), Ensemble of classifiers, Artificial intelligence, Noise (video), business, computer, Software, Selection (genetic algorithm), Multiclass classification task, Diversity (business), Labelling noise
Abstract: Ensembles of classifiers is a proven approach in machine learning with a wide variety of research works. The main issue in ensembles of classifiers is not only the selection of the base classifiers, but also the combination of their outputs. According to the literature, it has been established that much is to be gained from combining classifiers if those classifiers are accurate and diverse. However, it is still an open issue how to define the relation between accuracy and diversity in order to define the best possible ensemble of classifiers. In this paper, we propose a novel approach to evaluate the impact of the diversity of the learners on the generation of heterogeneous ensembles. We present an exhaustive study of this approach using 27 different multiclass datasets and analysing their results in detail. In addition, to determine the performance of the different results, the presence of labelling noise is also considered. This work has been supported under projects PEAVAUTO-CM-UC3M–2020/00036/001, PID2019-104793RB-C31, and RTI2018-096036-B-C22, and by the Region of Madrid’s Excellence Program, Spain (EPUC3M17).
Published: 2021

37. Detection and analysis of COVID-19 in medical images using deep learning techniques

Author: Lara Visuña, Hardev Mukeshbhai Khandhar, Jesus Carretero, Chintan Bhatt, Cristhian Martinez, Dandi Yang, and European Commission
Subjects: Coronavirus disease 2019 (COVID-19), Computer science, Science, Training time, Binary number, Image processing, Machine learning, computer.software_genre, Article, Imaging, 03 medical and health sciences, 0302 clinical medicine, Deep Learning, Image Processing, Computer-Assisted, Humans, 030304 developmental biology, Informática, 0303 health sciences, Respiratory tract diseases, Multidisciplinary, business.industry, SARS-CoV-2, Deep learning, X-Rays, COVID-19, Thorax, Task (computing), Binary classification, Medicine, Artificial intelligence, business, Transfer of learning, Tomography, X-Ray Computed, computer, 030217 neurology & neurosurgery
Abstract: The main purpose of this work is to investigate and compare several deep learning enhanced techniques applied to X-ray and CT-scan medical images for the detection of COVID-19. In this paper, we used four powerful pre-trained CNN models, VGG16, DenseNet121, ResNet50,and ResNet152, for the COVID-19 CT-scan binary classification task. The proposed Fast.AI ResNet framework was designed to find out the best architecture, pre-processing, and training parameters for the models largely automatically. The accuracy and F1-score were both above 96% in the diagnosis of COVID-19 using CT-scan images. In addition, we applied transfer learning techniques to overcome the insufficient data and to improve the training time. The binary and multi-class classification of X-ray images tasks were performed by utilizing enhanced VGG16 deep transfer learning architecture. High accuracy of 99% was achieved by enhanced VGG16 in the detection of X-ray images from COVID-19 and pneumonia. The accuracy and validity of the algorithms were assessed on X-ray and CT-scan well-known public datasets. The proposed methods have better results for COVID-19 diagnosis than other related in literature. In our opinion, our work can help virologists and radiologists to make a better and faster diagnosis in the struggle against the outbreak of COVID-19. The research leading to these results received funding from the Innovative Medicines Innitiative 2 Joint Undertaking (JU) under grant agreement No 853989. The JU receives support from the European Union's Horizon 2020 research and innovation programme and EFPIA and Global Alliance for TB Drug Development non profit organisation, Bill & Melinda Gates Foundation and University of Dundee.
Published: 2021

38. Choosing a Chatbot Development Tool

Author: Juan de Lara, Sandra Juarez-Puerta, Esther Guerra, Sara Pérez-Soler, UAM. Departamento de Ingeniería Informática, and Modelado e Ingeniería del Software (ING EPS-013)
Subjects: Informática, Focus (computing), business.industry, Computer science, Computer Applications, media_common.quotation_subject, Software Engineering, computer.software_genre, Chatbot, World Wide Web, Development (topology), Chatbots, Web application, Conversation, business, computer, Software, Natural language, media_common, Natural Language Processing
Abstract: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works, Chatbots are programs that supply services to users via conversation in natural language, acting as virtual assistants within social networks or web applications. Here, we review the most representative chatbot development tools with a focus on technical and managerial aspects, This work was partially funded by the R&D program of the Madrid Region (project FORTE, S2018/TCS4314), and the Spanish Ministry of Science (project MASSIVE, RTI2018-095255-B-I00)
Published: 2021

39. SmartCAMPP - Smartphone-based Continuous Authentication leveraging Motion sensors with Privacy Preservation

Author: José María de Fuentes, Luis Hernández-Álvarez, Luis Hernández Encinas, Lorena González-Manzano, Comunidad de Madrid, Ministerio de Ciencia e Innovación (España), Universidad Carlos III de Madrid, European Commission, Ministerio de Economía, Industria y Competitividad (España), Agencia Estatal de Investigación (España), and Consejo Superior de Investigaciones Científicas (España)
Subjects: Scheme (programming language), Continuous authentication, gyroscope, Computer science, Computation, Cryptography, 02 engineering and technology, Computer security, computer.software_genre, Encryption, smartphone, privacy, 01 natural sciences, law.invention, Outsourcing, Artificial Intelligence, law, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, smartphone privacy, 010306 general physics, Motion sensors, computer.programming_language, Informática, Authentication, business.industry, Gyroscope, continuous authentication, Accelerometer, accelerometer, Signal Processing, 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, business, computer, Software
Abstract: 8 páginas, 8 tablas, 5 figuras, Continuous Authentication (CA) approaches are attracting attention due to the explosion of available sensors from IoT devices such as smartphones. However, a critical privacy concern arises when CA data is outsourced. Data from motion sensors may reveal users' private issues. Despite the need for CA in smartphones, no previous work has explored how to tackle this matter leveraging motion sensors in a privacy-preserving way. In this work, a mechanism dubbed SmartCAMPP is proposed to achieve CA based on gyroscope and accelerometer data. Format-preserving encryption techniques are applied to privately outsource them. Our results show the suitability of the proposed scheme, featuring $76.85\%$ of accuracy while taking 5.12 ms. of computation for authenticating each user. Interestingly, the use of cryptography does not lead to a significant impact as compared to a non-privacy-preserving mechanism., The authors would like to thank the anonymous reviewers for the irinsightful comments.This work was partially supported by Spanish MINECO , AEI and European Regional Development Fund (ERDF), through grantsTIN2017-84844-C2-1-R (COPCIS) and PID2019-111429RBC21 (ODIO); by Comunidad de Madrid (Spain) through grant P2018/TCS-4566-CM (CYNAMON), co-funded with ERDF, and also jointly with Univ. Carlos III de Madrid, grant CAVTIONS-CM-UC3M. Lorena González and José María de Fuentes would like to thank the Excellence Program for University Re-searchers .LuisHernández-Álvarez would like to thank CSIC Project 202050E304(CASDiM).
Published: 2021

40. VeriFly: On-the-fly Assertion Checking via Incrementality

Author: Manuel V. Hermenegildo, Miguel A. Sanchez-Ordaz, José F. Morales, Victor Perez-Carrasco, Isabel Garcia-Contreras, and Pedro López-García
Subjects: FOS: Computer and information sciences, Informática, Horn clause, Computer Science - Programming Languages, Syntax (programming languages), Programming language, Computer science, Semantic analysis (machine learning), Assertion, Static program analysis, Static analysis, computer.software_genre, Theoretical Computer Science, Software development process, Prolog, Computational Theory and Mathematics, Artificial Intelligence, Hardware and Architecture, computer, Software, Programming Languages (cs.PL), computer.programming_language
Abstract: Assertion checking is an invaluable programmer's tool for finding many classes of errors or verifying their absence in dynamic languages such as Prolog. For Prolog programmers this means being able to have relevant properties such as modes, types, determinacy, non-failure, sharing, constraints, cost, etc., checked and errors flagged without having to actually run the program. Such global static analysis tools are arguably most useful the earlier they are used in the software development cycle, and fast response times are essential for interactive use. Triggering a full and precise semantic analysis of a software project every time a change is made can be prohibitively expensive. In our static analysis and verification framework this challenge is addressed through a combination of modular and incremental (context- and path-sensitive) analysis that is responsive to program edits, at different levels of granularity. We describe how the combination of this framework within an integrated development environment (IDE) takes advantage of such incrementality to achieve a high level of reactivity when reflecting analysis and verification results back as colorings and tooltips directly on the program text -- the tool's VeriFly mode. The concrete implementation that we describe is Emacs-based and reuses in part off-the-shelf "on-the-fly" syntax checking facilities (flycheck). We believe that similar extensions are also reproducible with low effort in other mature development environments. Our initial experience with the tool shows quite promising results, with low latency times that provide early, continuous, and precise assertion checking and other semantic feedback to programmers during the development process. The tool supports Prolog natively, as well as other languages by semantic transformation into Horn clauses. This paper is under consideration for acceptance in TPLP., Paper presented at the 37th International Conference on Logic Programming (ICLP 2021), 16 pages
Published: 2021

41. Bayesian networks for interpretable machine learning and optimization

Author: Bojan Mihaljević, Pedro Larrañaga, and Concha Bielza
Subjects: Informática, business.industry, Computer science, Matemáticas, Cognitive Neuroscience, Probabilistic logic, Evolutionary algorithm, Bayesian network, Conditional probability distribution, Directed acyclic graph, Machine learning, computer.software_genre, Computer Science Applications, Estimation of distribution algorithm, Artificial Intelligence, Joint probability distribution, Artificial intelligence, Cluster analysis, business, computer
Abstract: As artificial intelligence is being increasingly used for high-stakes applications, it is becoming more and more important that the models used be interpretable. Bayesian networks offer a paradigm for interpretable artificial intelligence that is based on probability theory. They provide a semantics that enables a compact, declarative representation of a joint probability distribution over the variables of a domain by leveraging the conditional independencies among them. The representation consists of a directed acyclic graph that encodes the conditional independencies among the variables and a set of parameters that encodes conditional distributions. This representation has provided a basis for the development of algorithms for probabilistic reasoning (inference) and for learning probability distributions from data. Bayesian networks are used for a wide range of tasks in machine learning, including clustering, supervised classification, multi-dimensional supervised classification, anomaly detection, and temporal modeling. They also provide a basis for estimation of distribution algorithms, a class of evolutionary algorithms for heuristic optimization. We illustrate the use of Bayesian networks for interpretable machine learning and optimization by presenting applications in neuroscience, the industry, and bioinformatics, covering a wide range of machine learning and optimization tasks.
Published: 2021

42. Cohort selection for clinical trials using deep learning models

Author: Isabel Segura-Bedmar, Pablo Raez, and Ministerio de Economía y Competitividad (España)
Subjects: cohort selection, Computer science, Recurrent neural network, convolutional neural network, Convolutional neural network, Health Informatics, Research and Applications, Machine learning, computer.software_genre, Task (project management), 03 medical and health sciences, Deep Learning, Multilabel text classification, 0302 clinical medicine, Data Mining, Humans, multilabel text classification, 030212 general & internal medicine, Selection (genetic algorithm), Natural Language Processing, 030304 developmental biology, Informática, Clinical Trials as Topic, 0303 health sciences, business.industry, Patient Selection, Deep learning, Clinical trial, Filter (video), Cohort, recurrent neural network, Cohort selection, Neural Networks, Computer, Artificial intelligence, business, computer
Abstract: Objective The goal of the 2018 n2c2 shared task on cohort selection for clinical trials (track 1) is to identify which patients meet the selection criteria for clinical trials. Cohort selection is a particularly demanding task to which natural language processing and deep learning can make a valuable contribution. Our goal is to evaluate several deep learning architectures to deal with this task. Materials and Methods Cohort selection can be formulated as a multilabeling problem whose goal is to determine which criteria are met for each patient record. We explore several deep learning architectures such as a simple convolutional neural network (CNN), a deep CNN, a recurrent neural network (RNN), and CNN-RNN hybrid architecture. Although our architectures are similar to those proposed in existing deep learning systems for text classification, our research also studies the impact of using a fully connected feedforward layer on the performance of these architectures. Results The RNN and hybrid models provide the best results, though without statistical significance. The use of the fully connected feedforward layer improves the results for all the architectures, except for the hybrid architecture. Conclusions Despite the limited size of the dataset, deep learning methods show promising results in learning useful features for the task of cohort selection. Therefore, they can be used as a previous filter for cohort selection for any clinical trial with a minimum of human intervention, thus reducing the cost and time of clinical trials significantly.
Published: 2019

43. Automated Reuse of Model Transformations through Typing Requirements Models

Author: Alfonso Pierantonio, Ludovico Iovino, Juan de Lara, Esther Guerra, Juri Di Rocco, Jesús Sánchez Cuadrado, and Davide Di Ruscio
Subjects: Model transformation reuse, Informática, Correctness, Computer science, Programming language, Model transformation, 020207 software engineering, 02 engineering and technology, Refinement, Reuse, ENCODE, computer.software_genre, ATL, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Meta-modelling, Special case, Requirements model, computer, Software, computer.programming_language
Abstract: © ACM 2019. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Software Engineering and Methodology, http://dx.doi.org/10.1145/3340108., Model transformations are key elements of model-driven engineering, where they are used to automate the manipulation of models. However, they are typed with respect to concrete source and target meta-models, making their reuse for other (even similar) meta-models challenging. To improve this situation, we propose capturing the typing requirements for reusing a transformation with other meta-models by the notion of a typing requirements model (TRM). A TRM describes the prerequisites that amodel transformation imposes on the source and targetmeta-models to obtain a correct typing. The key observation is that any meta-model pair that satisfies the TRM is a valid reuse context for the transformation at hand. A TRM is made of two domain requirement models (DRMs) describing the requirements for the source and target meta-models, and a compatibility model expressing dependencies between them. We define a notion of refinement between DRMs and see meta-models as a special case of DRM. We provide a catalogue of valid refinements and describe how to automatically extract a TRM from an ATL transformation. The approach is supported by our tool TOTEM. We report on two experiments-based on transformations developed by third parties and meta-model mutation techniques-validating the correctness and completeness of our TRM extraction procedure and confirming the power of TRMs to encode variability and support flexible reuse, Work partially funded by the R&D programme of the Madrid Region (project FORTE, S2018/TCS4314), the Spanish Ministry of Science (project MASSIVE, RTI2018-095255-B-I00), the Spanish MINECO(project RECOM, TIN2015-73968-JIN, AEI/FEDER/UE), a Ramón y Cajal 2017 grant, and the European Union Horizon 2020 research and innovation programme through the Polyglot and Hybrid Persistence Architectures for Big Data Analytics (TYPHON) project (#780251)
Published: 2019

44. Integration in the European electricity market: A machine learning-based convergence analysis for the Central Western Europe region

Author: Luis Corona, Pedro Isasi, Asuncion Mochon, and Yago Saez
Subjects: Grid network, Computer science, Process (engineering), 020209 energy, Decision tree, 02 engineering and technology, 010501 environmental sciences, Management, Monitoring, Policy and Law, Machine learning, computer.software_genre, 01 natural sciences, flow-based market coupling, european electricity market, cwe region, Convergence (routing), 0202 electrical engineering, electronic engineering, information engineering, Electricity market, 0105 earth and related environmental sciences, Informática, decision trees, business.industry, Estimator, Renewable energy, Variable (computer science), machine learning, General Energy, Artificial intelligence, business, computer, random forest
Abstract: The European electricity market is immersed in an integration process that requires a fundamental transformation. In this process, Flow-Based Market Coupling, which was employed for the first time in the Central Western Europe electricity market in 2015 as a means to manage cross-border capacity allocation, is a crucial cornerstone. The novelty of this paper lies in the analysis of the price convergence or congestion across the Central Western Europe region since the Flow-Based Market Coupling was implemented. We propose using random forests to build learning models that are trained and tested with features from connected markets of this region during 2016 and 2017. These machine learning models are used for mining knowledge about our target variable, price equalization. To search for robust predictive patterns that decision-makers can use to understand congestion situations, we have tested different combinations of learning schemes, several estimators and different model parameters. The results of all implemented models are robust and reveal that promoting renewable energy can contradict the integration of the electricity market if the grid network and, in particular, the transmission lines are not adapted to the new paradigm.
Published: 2019

45. Developing enhanced conversational agents for social virtual worlds

Author: David Griol, Zoraida Callejas, Araceli Sanchis, and José M. Molina
Subjects: social networks, 0209 industrial biotechnology, Computer science, Cognitive Neuroscience, 02 engineering and technology, speech interaction, computer.software_genre, conversational interfaces, 020901 industrial engineering & automation, Artificial Intelligence, Human–computer interaction, 0202 electrical engineering, electronic engineering, information engineering, Selection (linguistics), statistical dialog management, Dialog system, affective computing, Affective computing, Avatar, Informática, user modeling, Social virtual worlds, User modeling, Multimodal communication, Computer Science Applications, virtual worlds, second life, Embodied cognition, 020201 artificial intelligence & image processing, computer
Abstract: In This Paper, We Present A Methodology For The Development Of Embodied Conversational Agents For Social Virtual Worlds. The Agents Provide Multimodal Communication With Their Users In Which Speech Interaction Is Included. Our Proposal Combines Different Techniques Related To Artificial Intelligence, Natural Language Processing, Affective Computing, And User Modeling. A Statistical Methodology Has Been Developed To Model The System Conversational Behavior, Which Is Learned From An Initial Corpus And Improved With The Knowledge Acquired From The Successive Interactions. In Addition, The Selection Of The Next System Response Is Adapted Considering Information Stored Into User&#39 S Profiles And Also The Emotional Contents Detected In The User&#39 S Utterances. Our Proposal Has Been Evaluated With The Successful Development Of An Embodied Conversational Agent Which Has Been Placed In The Second Life Social Virtual World. The Avatar Includes The Different Models And Interacts With The Users Who Inhabit The Virtual World In Order To Provide Academic Information. The Experimental Results Show That The Agent&#39 S Conversational Behavior Adapts Successfully To The Specific Characteristics Of Users Interacting In Such Environments. Work partially supported by the Spanish CICyT Projects under grant TRA2015-63708-R and TRA2016-78886-C3-1-R.
Published: 2019

46. Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: A Survey

Author: Edwin Lughofer, Daniel Leite, Jose Antonio Iglesias, Igor Škrjanc, Araceli Sanchis, Fernando Gomide, Banco Santander, and Universidad Carlos III de Madrid
Subjects: drifts, Information Systems and Management, Neuro-fuzzy, Computer science, data streams, inference system, model-based design, Computational intelligence, 02 engineering and technology, Machine learning, computer.software_genre, Fuzzy logic, Theoretical Computer Science, evolving systems, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, controller, Cluster analysis, online identification, incremental learning, Informática, algorithm, Fuzzy rule, business.industry, Data stream mining, 05 social sciences, System identification, 050301 education, adaptive systems, prediction, space, Computer Science Applications, Identification (information), Control and Systems Engineering, network, artmap, 020201 artificial intelligence & image processing, Artificial intelligence, business, 0503 education, computer, Software
Abstract: Major assumptions in computational intelligence and machine learning consist of the availability of a historical dataset for model development, and that the resulting model will, to some extent, handle similar instances during its online operation. However, in many real world applications, these assumptions may not hold as the amount of previously available data may be insufficient to represent the underlying system, and the environment and the system may change over time. As the amount of data increases, it is no longer feasible to process data efficiently using iterative algorithms, which typically require multiple passes over the same portions of data. Evolving modeling from data streams has emerged as a framework to address these issues properly by self-adaptation, single-pass learning steps and evolution as well as contraction of model components on demand and on the fly. This survey focuses on evolving fuzzy rule-based models and neuro-fuzzy networks for clustering, classification and regression and system identification in online, real-time environments where learning and model development should be performed incrementally. (C) 2019 Published by Elsevier Inc. Igor Škrjanc, Jose Antonio Iglesias and Araceli Sanchis would like to thank to the Chair of Excellence of Universidad Carlos III de Madrid, and the Bank of Santander Program for their support. Igor Škrjanc is grateful to Slovenian Research Agency with the research program P2-0219, Modeling, simulation and control. Daniel Leite acknowledges the Minas Gerais Foundation for Research and Development (FAPEMIG), process APQ-03384-18. Igor Škrjanc and Edwin Lughofer acknowledges the support by the ”LCM — K2 Center for Symbiotic Mechatronics” within the framework of the Austrian COMET-K2 program. Fernando Gomide is grateful to the Brazilian National Council for Scientific and Technological Development (CNPq) for grant 305906/2014-3.
Published: 2019

47. Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics

Author: Rafael E. Banchs, Luis Fernando D'Haro, Chiori Hori, Haizhou Li, and School of Computer Science and Engineering
Subjects: Computer science, 02 engineering and technology, computer.software_genre, 01 natural sciences, Theoretical Computer Science, Fluency, End-to-end principle, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Dialog system, Dialog box, Low correlation, Set (psychology), 010301 acoustics, Dialog Systems, Informática, Telecomunicaciones, Automatic Evaluation Metrics, business.industry, 020206 networking & telecommunications, Human-Computer Interaction, Metric (mathematics), Computer science and engineering [Engineering], Artificial intelligence, business, computer, Software, Natural language processing, Sentence
Abstract: End-to-end dialog systems are gaining interest due to the recent advances of deep neural networks and the availability of large human–human dialog corpora. However, in spite of being of fundamental importance to systematically improve the performance of this kind of systems, automatic evaluation of the generated dialog utterances is still an unsolved problem. Indeed, most of the proposed objective metrics shown low correlation with human evaluations. In this paper, we evaluate a two-dimensional evaluation metric that is designed to operate at sentence level, which considers the syntactic and semantic information carried along the answers generated by an end-to-end dialog system with respect to a set of references. The proposed metric, when applied to outputs generated by the systems participating in track 2 of the DSTC-6 challenge, shows a higher correlation with human evaluations (up to 12.8% relative improvement at the system level) than the best of the alternative state-of-the-art automatic metrics currently available.
Published: 2019

48. DermaKNet: Incorporating the Knowledge of Dermatologists to Convolutional Neural Networks for Skin Lesion Diagnosis

Author: Iván González-Díaz and Ministerio de Economía y Competitividad (España)
Subjects: Skin Neoplasms, Neural Networks, Databases, Factual, Computer science, Dermoscopy, Health Informatics, CAD, 02 engineering and technology, Solid modeling, Machine learning, computer.software_genre, Convolutional neural network, Skin lesion analysis, 030218 nuclear medicine & medical imaging, Reduction (complexity), 03 medical and health sciences, 0302 clinical medicine, Health Information Management, Image Interpretation, Computer-Assisted, 0202 electrical engineering, electronic engineering, information engineering, Humans, Electrical and Electronic Engineering, Melanoma, Biología y Biomedicina, Cancer, Skin, Interpretability, Informática, Artificial neural network, business.industry, CNN-based Computer Aided Diagnosis, Image segmentation, Computer Science Applications, Informatics, 020201 artificial intelligence & image processing, Convolutional, Neural Networks, Computer, Artificial intelligence, business, computer, Algorithms
Abstract: Traditional approaches to automatic diagnosis of skin lesions consisted of classifiers working on sets of hand-crafted features, some of which modeled lesion aspects of special importance for dermatologists. Recently, the broad adoption of convolutional neural networks (CNNs) in most computer vision tasks has brought about a great leap forward in terms of performance. Nevertheless, with this performance leap, the CNN-based computer-aided diagnosis (CAD) systems have also brought a notable reduction of the useful insights provided by hand-crafted features. This paper presents DermaKNet, a CAD system based on CNNs that incorporates specific subsystems modeling properties of skin lesions that are of special interest to dermatologists aiming to improve the interpretability of its diagnosis. Our results prove that the incorporation of these subsystems not only improves the performance, but also enhances the diagnosis by providing more interpretable outputs. This work was supported in part by the National Grant TEC2014-53390-P and National Grant TEC2014-61729-EXP of the Spanish Ministry of Economy and Competitiveness, and in part by NVIDIA Corporation with the donation of the TITAN X GPU Publicado
Published: 2019

49. Building user profiles based on sequences for content and collaborative filtering

Author: Alejandro Bellogín, Pablo Sánchez, and UAM. Departamento de Ingeniería Informática
Subjects: Normalization (statistics), Content-based filtering, Computer science, Collaborative filtering, 02 engineering and technology, Library and Information Sciences, Management Science and Operations Research, Recommender system, computer.software_genre, Longest common subsequence problem, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Media Technology, Longest Common Subsequence, Sequential algorithm, Informática, Preference filtering, Filter (signal processing), Hybrid recommender systems, Computer Science Applications, Proof of concept, Metric (mathematics), 020201 artificial intelligence & image processing, Data mining, computer, Information Systems
Abstract: Modeling user profiles is a necessary step for most information filtering systems – such as recommender systems – to provide personalized recommendations. However, most of them work with users or items as vectors, by applying di erent types of mathematical operations between them and neglecting sequential or content-based information. Hence, in this paper we study how to propose an adaptive mechanism to obtain user sequences using di erent sources of information, allowing the generation of hybrid recommendations as a seamless, transparent technique from the system viewpoint. As a proof of concept, we develop the Longest Common Subsequence (LCS) algorithm as a similarity metric to compare the user sequences, where, in the process of adapting this algorithm to recommendation, we include di erent parameters to control the e - ciency by reducing the information used in the algorithm (preference filter), to decide when a neighbor is considered useful enough to be included in the process (confidence filter), to identify whether two interactions are equivalent ( -matching threshold), and to normalize the length of the LCS in a bounded interval (normalization functions). These parameters can be extended to work with any type of sequential algorithm. We evaluate our approach with several state-of-the-art recommendation algorithms using di erent evaluation metrics measuring the accuracy, diversity, and novelty of the recommendations, and analyze the impact of the proposed parameters. We have found that our approach o ers a competitive performance, outperforming content, collaborative, and hybrid baselines, and producing positive results when either content- or rating-based information is exploited, This article has been co-funded by the European Social Fund (ESF) within the 2017 call for predoctoral contracts and the Spanish Ministry of Economy, Industry and Competitiveness (project reference: TIN2016-80630-P)
Published: 2019

50. An Architecture for Providing Data Usage and Access Control in Data Sharing Ecosystems

Author: Andres Munoz-Arcentales, Álvaro Alonso, Gabriel Huecas, Joaquín Salvachúa, Sonsoles López-Pernas, and Alejandro Pozo
Subjects: Informática, Flexibility (engineering), Telecomunicaciones, business.industry, Computer science, XACML, 020206 networking & telecommunications, Access control, 02 engineering and technology, Computer security, computer.software_genre, Data sharing, Resource (project management), 11. Sustainability, 0202 electrical engineering, electronic engineering, information engineering, General Earth and Planetary Sciences, 020201 artificial intelligence & image processing, Reference architecture, Architecture, business, computer, Digital Revolution, General Environmental Science, computer.programming_language
Abstract: We are experiencing a new digital revolution in which data are becoming a key pillar for business and industry. Promoting data sharing, without compromising data sovereignty and traceability, is fundamental since it provides a heterogeneous ecosystem with the potential to enrich the variety of applications and services that take part in this digital revolution. In this scope, the use of secure and trusted platforms for sharing and processing personal and industrial data is crucial for the creation of a data market and a data economy. Protecting data goes beyond restricting who can access what resource (covered by identity and access control respectively): it becomes necessary to control how data are treated, which is known as data usage control. Data usage control provides a common and trustful security framework to guarantee the sovereignty and the responsible use of organizations’ data by third-party entities, easing and ensuring data sharing in ecosystems such as industry or smart cities. In this article, we present an architecture proposal for achieving access and usage control in shared data ecosystems among multiple organizations. The proposed architecture is based on the UCON (Usage Control) model and an extended XACML (eXtensible Access Control Markup Language) Reference Architecture, relying on key aspects of the IDS (International Data Spaces) Reference Architecture Model. Its modular design and technology-agnostic nature provide an integral solution while maintaining flexibility of implementation.
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

1,328 results on '"informática"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources