132 results on '"Human assessment"'
Search Results
2. Automatic assessment of spoken-language interpreting based on machine-translation evaluation metrics: A multi-scenario exploratory study.
- Author
-
Lu, Xiaolei and Han, Chao
- Subjects
- *
MACHINE translating , *METEORS - Abstract
Automated metrics for machine translation (MT) such as BLEU are customarily used because they are quick to compute and sufficiently valid to be useful in MT assessment. Whereas the instantaneity and reliability of such metrics are made possible by automatic computation based on predetermined algorithms, their validity is primarily dependent on a strong correlation with human assessments. Despite the popularity of such metrics in MT, little research has been conducted to explore their usefulness in the automatic assessment of human translation or interpreting. In the present study, we therefore seek to provide an initial insight into the way MT metrics would function in assessing spoken-language interpreting by human interpreters. Specifically, we selected five representative metrics – BLEU, NIST, METEOR, TER and BERT – to evaluate 56 bidirectional consecutive English–Chinese interpretations produced by 28 student interpreters of varying abilities. We correlated the automated metric scores with the scores assigned by different types of raters using different scoring methods (i.e., multiple assessment scenarios). The major finding is that BLEU, NIST, and METEOR had moderate-to-strong correlations with the human-assigned scores across the assessment scenarios, especially for the English-to-Chinese direction. Finally, we discuss the possibility and caveats of using MT metrics in assessing human interpreting. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Four Keys to Topic Interpretability in Topic Modeling
- Author
-
Mavrin, Andrey, Filchenkov, Andrey, Koltcov, Sergei, Barbosa, Simone Diniz Junqueira, Series Editor, Filipe, Joaquim, Series Editor, Kotenko, Igor, Series Editor, Sivalingam, Krishna M., Series Editor, Washio, Takashi, Series Editor, Yuan, Junsong, Series Editor, Zhou, Lizhu, Series Editor, Ustalov, Dmitry, editor, Filchenkov, Andrey, editor, Pivovarova, Lidia, editor, and Žižka, Jan, editor
- Published
- 2018
- Full Text
- View/download PDF
4. ‘Tailception’: using neural networks for assessing tail lesions on pictures of pig carcasses
- Author
-
J. Brünger, S. Dippel, R. Koch, and C. Veit
- Subjects
slaughter pigs ,tail lesions ,abattoir ,human assessment ,neural network ,Animal culture ,SF1-1100 - Abstract
Tail lesions caused by tail biting are a widespread welfare issue in pig husbandry. Determining their prevalence currently involves labour intensive, subjective scoring methods. Increased societal interest in tail lesions requires fast, reliable and cheap systems for assessing tail status. In the present study, we aimed to test the reliability of neural networks for assessing tail pictures from carcasses against trained human observers. Three trained observers scored tail lesions from automatically recorded pictures of 13 124 pigs. Nearly all pigs had been tail docked. Tail lesions were classified using a 4-point score (0=no lesion, to 3=severe lesion). In addition, total tail loss was recorded. Agreement between observers was tested prior and during the assessment in a total of seven inter-observer tests with 80 pictures each. We calculated agreement between observer pairs as exact agreement (%) and prevalence-adjusted bias-adjusted κ (PABAK; value 1=optimal agreement). Out of the 13 124 scored pictures, we used 80% for training and 20% for validating our neural networks. As the position of the tail in the pictures varied (high, low, left, right), we first trained a part detection network to find the tail in the picture and select a rectangular part of the picture which includes the tail. We then trained a classification network to categorise tail lesion severity using pictures scored by human observers whereby the classification network only analysed the selected picture parts. Median exact agreement between the three observers was 80% for tail lesions and 94% for tail loss. Median PABAK for tail lesions and loss were 0.75 and 0.87, respectively. The agreement between classification by the neural network and human observers was 74% for tail lesions and 95% for tail loss. In other words, the agreement between the networks and human observers were very similar to the agreement between human observers. The main reason for disagreement between observers and thereby higher variation in network training material were picture quality issues. Therefore, we expect even better results for neural network application to tail lesions if training is based on high quality pictures. Very reliable and repeatable tail lesion assessment from pictures would allow automated tail classification of all pigs slaughtered, which is something that some animal welfare labels would like to do.
- Published
- 2019
- Full Text
- View/download PDF
5. Methodological Aspects of Infrared Thermography in Human Assessment
- Author
-
Priego Quesada, Jose Ignacio, Kunzler, Marcos Roberto, Carpes, Felipe P., Aizawa, Masuo, Series editor, Greenbaum, Elias, Editor-in-chief, Andersen, Olaf S., Series editor, Austin, Robert H., Series editor, Barber, James, Series editor, Berg, Howard C., Series editor, Bloomfield, Victor, Series editor, Callender, Robert, Series editor, Chu, Steven, Series editor, DeFelice, Louis J., Series editor, Deisenhofer, Johann, Series editor, Feher, George, Series editor, Frauenfelder, Hans, Series editor, Giaever, Ivar, Series editor, Gruner, Sol M., Series editor, Herzfeld, Judith, Series editor, Humayun, Mark S., Series editor, Joliot, Pierre, Series editor, Keszthelyi, Lajos, Series editor, King, Paul W., Series editor, Knox, Robert S., Series editor, Lazzi, Gianluca, Series editor, Lewis, Aaron, Series editor, Lindsay, Stuart M., Series editor, Mauzerall, David, Series editor, Mielczarek, Eugenie V., Series editor, Niemz, Markolf, Series editor, Parsegian, V. Adrian, Series editor, Powers, Linda S., Series editor, Prohofsky, Earl W., Series editor, Rostovtseva, Tatiana K, Series editor, Rubin, Andrew, Series editor, Seibert, Michael, Series editor, Thomas, David, Series editor, and Priego Quesada, Jose Ignacio, editor
- Published
- 2017
- Full Text
- View/download PDF
6. Confidence-Based State Estimation: A Novel Tool for Test and Evaluation of Human-Systems
- Author
-
Marathe, Amar R., McDaniel, Jonathan R., Gordon, Stephen M., McDowell, Kaleb, Kacprzyk, Janusz, Series editor, Savage-Knepshield, Pamela, editor, and Chen, Jessie, editor
- Published
- 2017
- Full Text
- View/download PDF
7. A Hierarchical Reinforcement Learning Based Artificial Intelligence for Non-Player Characters in Video Games
- Author
-
Ponce, Hiram, Padilla, Ricardo, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Siekmann, Jörg, Series editor, Gelbukh, Alexander, editor, Espinoza, Félix Castro, editor, and Galicia-Haro, Sofía N., editor
- Published
- 2014
- Full Text
- View/download PDF
8. Post-editing neural machine translation versus phrase-based machine translation for English–Chinese.
- Author
-
Jia, Yanfang, Carl, Michael, and Wang, Xiangling
- Subjects
TRANSLATIONS ,MACHINE translating - Abstract
This paper aims to shed light on the post-editing process of the recently-introduced neural machine translation (NMT) paradigm. Using simple and more complex texts, we first evaluate the output quality from English to Chinese phrase-based statistical (PBSMT) and NMT systems. Nine raters assess the MT quality in terms of fluency and accuracy and find that NMT produces higher-rated translations than PBSMT for both texts. Then we analyze the effort expended by 68 student translators during HT and when post-editing NMT and PBSMT output. Our measures of post-editing effort are all positively correlated for both NMT and PBSMT post-editing. Our findings suggest that although post-editing output from NMT is not always significantly faster than post-editing PBSMT, it significantly reduces the technical and cognitive effort. We also find that, in contrast to HT, post-editing effort is not necessarily correlated with source text complexity. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
9. 'Tailception': using neural networks for assessing tail lesions on pictures of pig carcasses.
- Author
-
Brünger, J., Dippel, S., Koch, R., and Veit, C.
- Abstract
Tail lesions caused by tail biting are a widespread welfare issue in pig husbandry. Determining their prevalence currently involves labour intensive, subjective scoring methods. Increased societal interest in tail lesions requires fast, reliable and cheap systems for assessing tail status. In the present study, we aimed to test the reliability of neural networks for assessing tail pictures from carcasses against trained human observers. Three trained observers scored tail lesions from automatically recorded pictures of 13 124 pigs. Nearly all pigs had been tail docked. Tail lesions were classified using a 4-point score (0=no lesion, to 3=severe lesion). In addition, total tail loss was recorded. Agreement between observers was tested prior and during the assessment in a total of seven inter-observer tests with 80 pictures each. We calculated agreement between observer pairs as exact agreement (%) and prevalence-adjusted bias-adjusted κ (PABAK; value 1=optimal agreement). Out of the 13 124 scored pictures, we used 80% for training and 20% for validating our neural networks. As the position of the tail in the pictures varied (high, low, left, right), we first trained a part detection network to find the tail in the picture and select a rectangular part of the picture which includes the tail. We then trained a classification network to categorise tail lesion severity using pictures scored by human observers whereby the classification network only analysed the selected picture parts. Median exact agreement between the three observers was 80% for tail lesions and 94% for tail loss. Median PABAK for tail lesions and loss were 0.75 and 0.87, respectively. The agreement between classification by the neural network and human observers was 74% for tail lesions and 95% for tail loss. In other words, the agreement between the networks and human observers were very similar to the agreement between human observers. The main reason for disagreement between observers and thereby higher variation in network training material were picture quality issues. Therefore, we expect even better results for neural network application to tail lesions if training is based on high quality pictures. Very reliable and repeatable tail lesion assessment from pictures would allow automated tail classification of all pigs slaughtered, which is something that some animal welfare labels would like to do. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. Deconstructing the failure: Analyzing the unanswered questions within educational Q&A.
- Author
-
Rath, Manasa and Shah, Chirag
- Subjects
- *
SOCIAL learning , *KNOWLEDGE management , *STUDENT attitudes , *QUALITY ,STUDENTS & society - Abstract
ABSTRACT Community Question and Answering (CQA) is a well-known platform for knowledge sharing and social learning. CQA services have expanded into the education sector, where school students are the main users. Here, CQA functions as a non-traditional learning environment in which students use their own knowledge to construct the knowledge base. However, there have been cases where some questions are not answered within such a Q&A system. This failure may occur if a question is unclear, complex, inappropriate, or unrelated to the subject in which it is contextualized. While experts do not answer posted questions, co-users moderate responses and help maintain answer quality. However, due to the presence of users from diverse cultural and linguistic backgrounds, many questions remain unanswered. The current study analyzes and explores the failed questions collected from Brainly, a social learning Q&A platform for school students. The quality of 1,000 questions extracted from this service is analyzed based on human-based ratings and extracted textual features. The findings show that a relationship can be drawn between the non-textual assessment results and the objective textually extracted features. This further encourages the study of why a question might be of poor quality. The findings also show which subjects contain the highest number of unanswered questions. These results will further help us to understand how questions should be restructured to obtain answers from their askers' peers. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
11. Question Answering Pilot Task at CLEF 2004
- Author
-
Herrera, Jesús, Peñas, Anselmo, Verdejo, Felisa, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Peters, Carol, editor, Clough, Paul, editor, Gonzalo, Julio, editor, Jones, Gareth J. F., editor, Kluck, Michael, editor, and Magnini, Bernardo, editor
- Published
- 2005
- Full Text
- View/download PDF
12. Comparisons with a 'color' component as a means of human assessment (based on the Tatar and Russian languages)
- Author
-
Ramziya Bolgarova and Nurmukhametova Raushaniya
- Subjects
Tatar ,business.industry ,Computer science ,Component (UML) ,language ,Artificial intelligence ,business ,computer.software_genre ,computer ,Natural language processing ,language.human_language ,Human assessment - Published
- 2021
13. Experimental Analysis of Sensory Measurement Imperfection Impact for a Cheese Ripening Fuzzy Model
- Author
-
Ioannou, Irina, Perrot, Nathalie, Mauris, Gilles, Trystram, Gilles, Carbonell, Jaime G., editor, Siekmann, Jörg, editor, Bilgiç, Taner, editor, De Baets, Bernard, editor, and Kaynak, Okyay, editor
- Published
- 2003
- Full Text
- View/download PDF
14. СОПОСТАВИТЕЛЬНЫЙ АНАЛИЗ ФЕ, ВЫРАЖАЮЩИХ ХАРАКТЕР ЧЕЛОВЕКА В ЧЕЧЕНСКОМ И РУССКОМ ЯЗЫКАХ
- Subjects
чеченский язык ,comparative analysis ,русский язык ,Russian language ,thematic group ,human assessment ,фразеологические единицы ,qualitative characteristic ,Chechen language ,оценка человека ,качественная характеристика ,тематическая группа ,phraseological units ,сопоставительный анализ - Abstract
В статье проводится сопоставительный анализ фразеологизмов, дающих характеристику человека в чеченском и русском языках. Отмечается, что исследуемые единицы широко употребляются в данных языках. Главными считаются функции экспрессивности и эмоциональности, характеристика индивидуальных достижений, социального статуса и многое другое, связанное с характером человека. В данных языках фразеологизмы, образующие различные единства, группы и подгруппы, которые характеризуют поведение и поступки человека, представляют большое разнообразие. Обнаружилось множество чеченских устойчивых сочетаний, которые сообщают те или иные сведения о человеке, не совпадающие с русскими фразеологизмами. Также встречаются и русские фразеологизмы, дающие определение качествам и поведению человека, не совпадающие с фразеологизмами чеченского языка. В результате проведенной предметной кодификации обозначенных устойчивых выражений в данных языках выявлены три группы: ФЕ, отрицательно оценивающие человека; ФЕ, положительно оценивающие человека; ФЕ, определяющие нейтральную оценку человека. В предметный ряд устойчивых единиц, демонстрирующих неблагоприятные качества и свойства человека, вошли 6 семантических полей. Фразеологизмы, демонстрирующие благонравные и похвальные свойства человека, составили 3 семантических поля. Наиболее наполненным оказалось поле с семантикой «умственные способности». В предметный ряд ФЕ, изображающих лояльность человека, вошли 5 семантических полей. Наполняемость семантических полей оказалась примерно одинаковой в обоих языках., The article provides a comparative analysis of phraseological units that characterize a person in the Chechen and Russian languages. It is noted that the studied linguistic units are widely used in these languages. The main function is considered to be the function of expressiveness, as well as emotionality, characteristics of individual achievements, social status and much more related to the character of a person. In these languages, phraseological units that form various unities, groups and subgroups that characterize human behavior and actions represent a great variety. There were many Chechen stable combinations that report certain information about a person that does not coincide with Russian phraseological units. There are also Russian phraseological units that define the qualities and behavior of a person that do not coincide with the phraseological units of the Chechen language. As a result of the subject codification of these stable expressions in these languages, three groups are identified: units that negatively evaluate a person; units that positively evaluate a person; units that determine a neutral assessment of a person. The subject range of stable units demonstrating unfavorable qualities and properties of a person includes 6 semantic fields. Phraseological units demonstrating the well-behaved and laudable properties of a person included 3 semantic fields. The field with the semantics “mental abilities” turned out to be the most filled. The subject range of the phraseological units depicting a person’s loyalty includes 5 semantic fields. The content of semantic fields turned out to be approximately the same in both languages.
- Published
- 2022
- Full Text
- View/download PDF
15. Model based tracking for navigation and segmentation
- Author
-
Southall, B., Marchant, J. A., Hague, T., Buxton, B. F., Goos, Gerhard, editor, Hartmanis, Juris, editor, van Leeuwen, Jan, editor, Burkhardt, Hans, editor, and Neumann, Bernd, editor
- Published
- 1998
- Full Text
- View/download PDF
16. Peanut maturity classification using hyperspectral imagery
- Author
-
Seung-Chul Yoon, Alina Zare, Yu-Chien Tseng, Barry L. Tillman, Diane L. Rowland, and Sheng Zou
- Subjects
genetic structures ,010401 analytical chemistry ,Economic return ,food and beverages ,Soil Science ,Hyperspectral imaging ,04 agricultural and veterinary sciences ,Orange (colour) ,01 natural sciences ,0104 chemical sciences ,Human assessment ,Horticulture ,Quality research ,Control and Systems Engineering ,040103 agronomy & agriculture ,0401 agriculture, forestry, and fisheries ,Maturity assessment ,Cultivar ,Quality characteristics ,Agronomy and Crop Science ,Food Science ,Mathematics - Abstract
Seed maturity in peanut (Arachis hypogaea L.) determines economic return to a producer because of its impact on seed weight (yield), and critically influences seed vigour and other quality characteristics. During seed development, the inner mesocarp layer of the pericarp (hull) transitions in colour from white to black as the seed matures. The maturity assessment process involves the removal of the exocarp of the hull and visually categorizing the mesocarp colours into varying colour classes from immature (white, yellow, orange) to mature (brown, and black). This visual colour classification is time consuming because the exocarp must be manually removed. In addition, the visual classification process involves human assessment of colours, which leads to large variability of colour classification from observer to observer. A more objective, digital imaging approach to peanut maturity is needed, optimally without the requirement of removal of the hull's exocarp. This study examined the use of a hyperspectral imaging (HSI) process to determine pod maturity with intact pericarps. The HSI method leveraged spectral differences between mature and immature pods within a classification algorithm to identify the mature and immature pods. Therefore, there is no need to remove the exocarp nor is there a need for subjective colour assessment in the proposed process. The results showed a consistent high classification accuracy using samples from different years and cultivars. In addition, the proposed method was capable of estimating a continuous-valued, pixel-level maturity value for individual peanut pods, allowing for a valuable tool that can be utilized in seed quality research. This new method solves issues of labour intensity and subjective error that all current methods of peanut maturity determination have.
- Published
- 2019
17. Introduction to Computational Design
- Author
-
Yuki Koyama
- Subjects
business.industry ,Computer science ,media_common.quotation_subject ,Crowdsourcing ,Human assessment ,Human–computer interaction ,Related research ,Human-in-the-loop ,Computational design ,Quality (business) ,Graphics ,User interface ,business ,media_common - Abstract
Computational design is one of the hot topics in HCI and related research fields, where various design problems are formulated using mathematical languages and solved by computational techniques. By this paradigm, researchers aim at establishing highly sophisticated or efficient design processes that otherwise cannot be achieved. Target domains include graphics, personal fabrication, user interface, etc. This course introduces fundamental concepts in computational design and provides an overview of the recent trend. It then goes into a more specific case where human assessment is necessary to evaluate the quality of design outcomes, which is often true in HCI scenarios. This course is recommended to HCI students and researchers who are new to this topic.
- Published
- 2021
18. Artificial Intelligence in hair research: A proof-of-concept study on evaluating hair assembly features
- Author
-
Sergio Benini, Gabriela Daniels, Slobodanka Tamburic, Jane Randall, Mattia Savardi, and Tracey Sanderson
- Subjects
Aging ,2019-20 coronavirus outbreak ,Coronavirus disease 2019 (COVID-19) ,Computer science ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,virgin hair ,Paired comparison ,Pharmaceutical Science ,Dermatology ,hair detection ,030226 pharmacology & pharmacy ,Proof of Concept Study ,030207 dermatology & venereal diseases ,03 medical and health sciences ,0302 clinical medicine ,Colloid and Surface Chemistry ,Artificial Intelligence ,Drug Discovery ,otorhinolaryngologic diseases ,Humans ,Bleached hair ,integumentary system ,hair segmentation ,business.industry ,sensory assessment ,bleached hair ,Image detection ,artificial intelligence ,machine learning ,Shampoo ,Human assessment ,Chemistry (miscellaneous) ,sense organs ,Artificial intelligence ,business ,Algorithms ,Hair - Abstract
The first objective of this study was to apply computer vision and machine learning techniques to quantify the effects of haircare treatments on hair assembly and to identify correctly whether unknown tresses were treated or not. The second objective was to explore and compare the performance of human assessment with that obtained from artificial intelligence (AI) algorithms.Machine learning was applied to a data set of hair tress images (virgin and bleached), both untreated and treated with a shampoo and conditioner set, aimed at increasing hair volume whilst improving alignment and reducing the flyway of the hair. The automatic quantification of the following hair image features was conducted: local and global hair volumes and hair alignment. These features were assessed at three time points: tThe automatic image analysis identified changes to hair volume and alignment which enabled the successful application of the classification tests, especially when the hair images were grouped into untreated and treated groups. The human assessment of hair presented in pairs confirmed the automatic image analysis. The image assessment for both virgin hair and bleached only partially agreed with the analysis of the subset of images used in the online survey. One hypothesis is that treatments changed somewhat the shape of the hair tress, with the effect being more pronounced in bleached hair. This made human assessment of flat images more challenging than when viewed directly in 3D. Overall, the bleached hair exhibited effects of higher magnitude than the virgin hair.This study illustrated the capacity of artificial intelligence for hair image detection and classification, and for image analysis of hair assembly features following treatments. The human assessment partially confirmed the image analysis and highlighted the challenges imposed by the presentation mode.Le premier objectif de cette étude était d'appliquer des techniques de vision par ordinateur et d'apprentissage automatique pour quantifier les effets des traitements capillaires sur l'organisation des cheveux et pour identifier précisément si des cheveux d’origine inconnue ont été traités ou non. Le deuxième objectif était d'explorer et de comparer les performances obtenues par évaluation humaine avec celles obtenues à partir d'algorithmes d'intelligence artificielle (IA). MÉTHODES: L'apprentissage automatique a été appliqué à un ensemble de données d'images de cheveux (vierges et décolorés), à la fois non traités et traités avec une association de shampooing et après shampooing visant à augmenter le volume des cheveux tout en améliorant l'alignement des fibres capillaires et en réduisant les frisottis. La quantification automatique des caractéristiques suivantes de l'image capillaire a été réalisée : volumes capillaires locaux et globaux et alignement des cheveux. Ces caractéristiques ont été évaluées à trois moments : t0 (pas de traitement), t1 (deux traitements), t2 (trois traitements). Des tests de classification ont été appliqués pour tester la précision de l'apprentissage automatique. Un test sensoriel (comparaison par paire de t0 vs t2) et une enquête en ligne basée sur l'image frontale (comparaison par paire de t0 vs t1, t1 vs t2, t0 vs t2) ont été menés pour comparer l'évaluation humaine avec celle des algorithmes. RÉSULTATS: L'analyse automatique des images a identifié des changements dans le volume et l'alignement des cheveux qui ont permis la validation des tests de classification, en particulier lorsque les images de cheveux ont été rassemblés en groupes non traités et traités. L'évaluation humaine des cheveux présentés par paires a confirmé l'analyse automatique des images. L'évaluation des images pour les cheveux vierges et décolorés n'était que partiellement en accord avec l'analyse du sous-ensemble d'images utilisées dans l'enquête en ligne. Une hypothèse est que les traitements ont quelque peu changé la forme de la chevelure, l'effet étant plus prononcé avec les cheveux décolorés. Cela a rendu l'évaluation humaine des images plates plus difficile que lorsqu'elles sont visualisées directement en 3D. Dans l'ensemble, les cheveux décolorés ont présenté des effets de plus grande ampleur que les cheveux vierges.Cette étude a illustré la capacité de l'intelligence artificielle pour la détection et la classification d'images capillaires, et pour l'analyse d'images des caractéristiques d'organisation des cheveux après traitements. Le bilan humain a partiellement confirmé l'analyse d'image et mis en évidence les enjeux posés par le mode de présentation.
- Published
- 2021
19. Micro-tasking as a method for human assessment and quality control in a geospatial data import
- Author
-
Anne Sofie Strøm Erichsen, Atle Frenvik Sveen, and Terje Midtbø
- Subjects
Geospatial analysis ,Computer science ,media_common.quotation_subject ,Data assessment ,05 social sciences ,Geography, Planning and Development ,Control (management) ,0211 other engineering and technologies ,0507 social and economic geography ,02 engineering and technology ,computer.software_genre ,Data science ,Human assessment ,Management of Technology and Innovation ,Quality (business) ,050703 geography ,computer ,021101 geological & geomatics engineering ,Civil and Structural Engineering ,media_common - Abstract
Crowd-sourced geospatial data can often be enriched by importing open governmental datasets as long as they are up-to date and of good quality. Unfortunately, merging datasets is not straight forwa...
- Published
- 2019
20. Post-editing neural machine translation versus phrase-based machine translation for English–Chinese
- Author
-
Yanfang Jia, Michael Carl, and Xiangling Wang
- Subjects
Linguistics and Language ,Phrase ,Machine translation ,Computer science ,business.industry ,Contrast (statistics) ,Cognitive effort ,02 engineering and technology ,computer.software_genre ,Language and Linguistics ,Human assessment ,Fluency ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Source text ,Computational linguistics ,business ,computer ,Software ,Natural language processing - Abstract
This paper aims to shed light on the post-editing process of the recently-introduced neural machine translation (NMT) paradigm. Using simple and more complex texts, we first evaluate the output quality from English to Chinese phrase-based statistical (PBSMT) and NMT systems. Nine raters assess the MT quality in terms of fluency and accuracy and find that NMT produces higher-rated translations than PBSMT for both texts. Then we analyze the effort expended by 68 student translators during HT and when post-editing NMT and PBSMT output. Our measures of post-editing effort are all positively correlated for both NMT and PBSMT post-editing. Our findings suggest that although post-editing output from NMT is not always significantly faster than post-editing PBSMT, it significantly reduces the technical and cognitive effort. We also find that, in contrast to HT, post-editing effort is not necessarily correlated with source text complexity.
- Published
- 2019
21. The wisdom of the rankers
- Author
-
Álvaro Barreiro, David Otero, and Javier Parapar
- Subjects
Information retrieval ,Training set ,Computer science ,Pooling ,020207 software engineering ,Information needs ,02 engineering and technology ,Field (computer science) ,Test (assessment) ,Human assessment ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Effective method ,Relevance (information retrieval) - Abstract
Information Retrieval is an area where evaluation is crucial to validate newly proposed models. As the first step in the evaluation of models, researchers carry out offline experiments on specific datasets. While the field started around ad-hoc search, the number of new tasks is continuously growing. These tasks demand the development of new test collections (documents, information needs, and judgments). The construction of those datasets relies on expensive campaigns like TREC. Due to the size of modern collections, obtaining the relevance for each document-topic pair is infeasible. To reduce this cost, organizers usually apply a technique called pooling. When building pooled test collections, assessors only judge a portion of the documents selected among the participants' results. Although the judgments will not be exhaustive, they will be sufficiently complete and unbiased if pooling is done correctly. Therefore, researchers may safely use pooled collections to evaluate new models. However, the application of pooling depends on the existence of participant systems. This need is a handicap for tasks for which it is necessary to release training data before the celebration of the competition or for those with few participants. In this paper, we present a simple method for building pooled collections when such restrictions exist. Our proposal relies on two principles: the wisdom of the rankers and the application of pooling. By creating enough artificial participant systems, we can apply pooling on their results to select the documents that merit human assessment. Using an innovative approach to evaluate our method, we show that researchers may use it to produce high-quality collections on the absence of participant systems.
- Published
- 2021
22. Guided Generation of Cause and Effect
- Author
-
Benjamin Van Durme, Zhongyang Li, J. Edward Hu, Ting Liu, and Xiao Ding
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer science ,business.industry ,computer.software_genre ,Human assessment ,Model architecture ,Causal knowledge ,Causal reasoning ,Artificial intelligence ,Cause–effect graph ,Set (psychology) ,business ,computer ,Encoder ,Computation and Language (cs.CL) ,Natural language processing ,Decoding methods - Abstract
We present a conditional text generation framework that posits sentential expressions of possible causes and effects. This framework depends on two novel resources we develop in the course of this work: a very large-scale collection of English sentences expressing causal patterns CausalBank; and a refinement over previous work on constructing large lexical causal knowledge graphs Cause Effect Graph. Further, we extend prior work in lexically-constrained decoding to support disjunctive positive constraints. Human assessment confirms that our approach gives high-quality and diverse outputs. Finally, we use CausalBank to perform continued training of an encoder supporting a recent state-of-the-art model for causal reasoning, leading to a 3-point improvement on the COPA challenge set, with no change in model architecture., Comment: accepted in IJCAI 2020 main track
- Published
- 2021
- Full Text
- View/download PDF
23. An assessment of the human performance of iris identification.
- Author
-
Guest, Richard M, He, Hongmei, Stevenage, Sarah V, and Neil, Greg J
- Abstract
Biometric iris recognition systems are widely used for a range of identity recognition applications and have been shown to perform with high accuracy. For large-scale deployments, however, system enhancements leading to a reduction in error rates are continually sought. In this paper we investigate the performance of human verification of iris images and compare against a standard computer-based method. Our results suggest that performance using the computer-based system is no better than performance of the human participants. Additionally and importantly, however, performance can be improved through incorporation of the human as a ‘second decision maker’. This fusion system yields a false acceptance rate of just 9% when disagreements are resolved in line with strengths of each ‘decision-maker’. The results are presented as an illustration of the benefits that can be gained when combining human and automated systems in biometric processing. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
24. PerKey: A Persian News Corpus for Keyphrase Extraction and Generation
- Author
-
Hossein Sameti, Ehsan Doostmohammadi, and Mohammad Hadi Bokaei
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Keyword extraction ,02 engineering and technology ,computer.software_genre ,Automatic summarization ,language.human_language ,Field (computer science) ,Human assessment ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,tf–idf ,computer ,Computation and Language (cs.CL) ,Natural language processing ,Persian - Abstract
Keyphrases provide an extremely dense summary of a text. Such information can be used in many Natural Language Processing tasks, such as information retrieval and text summarization. Since previous studies on Persian keyword or keyphrase extraction have not published their data, the field suffers from the lack of a human extracted keyphrase dataset. In this paper, we introduce PerKey1, a corpus of 553k news articles from six Persian news websites and agencies with relatively high quality author extracted keyphrases, which is then filtered and cleaned to achieve higher quality keyphrases. The resulted data was put into human assessment to ensure the quality of the keyphrases. We also measured the performance of different supervised and unsupervised techniques, e.g. TFIDF, MultipartiteRank, KEA, etc. on the dataset using precision, recall, and F 1 -score.
- Published
- 2020
25. Building English–Punjabi Parallel Corpus for Machine Translation
- Author
-
Rashmi Agrawal and Simran Kaur Jolly
- Subjects
Machine translation ,Low resource ,Computer science ,business.industry ,Document classification ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Translation (geometry) ,computer.software_genre ,Human assessment ,Permutation ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Artificial intelligence ,business ,computer ,Sentence ,Natural language processing - Abstract
Parallel corpus is needed for many natural language processing tasks, like machine translation and multilingual document classification. The parallel corpus of English–Punjabi language pair is sparse in volume due to the semantic differences between two languages and Punjabi being a low resource language. In this paper, a parallel corpus for machine translation is being created and evaluated using the sentence alignment permutation metrics. Multiple translation corpora and human assessment together validate automatic evaluation metrics, which are important for the development of machine translation systems. The corpora considered are dialogues of the movie taken from the Wikipedia dumps. Further, the metrics are identified that define the corpora more accurately. The quality of the corpus is verified using the performance metrics based on distance metrics.
- Published
- 2020
26. Computer Vision Challenges for Chronic Wounds Assessment
- Author
-
Paula A. Teixeira, Miguel Coimbra, and Paulino Sousa
- Subjects
Chronic wound ,Wound Healing ,integumentary system ,Computers ,business.industry ,Administration, Topical ,Wound size ,020206 networking & telecommunications ,030209 endocrinology & metabolism ,02 engineering and technology ,Clinical routine ,Human assessment ,Europe ,03 medical and health sciences ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,Humans ,Medicine ,Tissue type ,Computer vision ,Artificial intelligence ,medicine.symptom ,business ,Physical Examination ,Wound treatment - Abstract
Chronic wound assessment and wound healing are important for diagnostic, follow up and wound treatment. However, this growing disease affecting nearly 2 thousand million and 5.7 million people in the USA and Europe, costing around $20 billion and $8 thousand million USD per year, still relies on subjective human assessment of wounds. A scoping review allowed us to identify 109 articles that map the literature on the topic of computer vision for chronic wound assessment and healing. These results were carefully analyzed and mapped into relevant clinical challenges associated with this field, identifying the maturity of each different computer vision challenge that needs addressing. Results show that wound size and tissue type classification already have interesting work, but various other clinical areas are in need of larger datasets and computer vision research efforts for achieving a relevant impact in today's clinical routine.
- Published
- 2020
27. Flash behavior in mammals?
- Author
-
Theodore Stankowich, Tim Caro, and Hana Raees
- Subjects
0106 biological sciences ,Appendage ,White (horse) ,05 social sciences ,Zoology ,Biology ,010603 evolutionary biology ,01 natural sciences ,Intraspecific competition ,Human assessment ,Predation ,Flash (photography) ,Animal ecology ,0501 psychology and cognitive sciences ,Animal Science and Zoology ,050102 behavioral science & comparative psychology ,Predator ,Ecology, Evolution, Behavior and Systematics - Abstract
Conspicuous coloration in animals has many possible functions including signaling to conspecifics, or predator deterrence through confusion, intimidation, and duping; the last includes flash behavior where predators are deceived into looking for conspicuous cues exhibited in flight but that are hidden when the animal comes to rest. In an effort to see if flash behavior occurs in mammals, we made predictions about situations where conspicuous coloration (as based on human assessment) might occur in artiodactyls and lagomorphs, and other predictions as to where such coloration might be found under an intraspecific signaling hypothesis. Using phylogenetically controlled analyses, we found that across species of artiodactyls, conspicuous rumps are more likely to have evolved in larger-sized group-living species supporting an intraspecific signaling function; this was not replicated in lagomorphs. Examining those artiodactyls that can facultatively expose color patches (putative flash behavior), we discovered that this trait occurred in artiodactyls that are solitary or living in very small groups irrespective of their body size. It is therefore possible that species such as white- and black-tailed deer, which display white rumps and tails during pursuit but hide them when stationary, are using flash behavior to confuse the predator into looking for the wrong object and thereby avoid detection and suggests that this form of antipredator defense in mammals needs greater attention. We found no effects of group size or body mass on conspicuous tail or ear markings in these taxa. Many mammals have conspicuous markings on their appendages and hindquarters, the function of which is mostly unknown. We matched these markings in rabbits, hares, and pikas and in bovids and cervids to both body size and group size across species. We found that conspicuous rumps are found in group living ungulates but when we separated these into conspicuous hindquarters always on display or that could be hidden, we found that hidden markings were principally found in species living alone or in very small groups irrespective of their body size. These species may expose conspicuous patches during flight but hide them at rest fooling the predator into searching for the wrong object, a relatively newly researched defense mechanism called flash behavior.
- Published
- 2020
28. Defining Strawberry Uniformity using 3D Imaging and Genetic Mapping
- Author
-
Amanda Karlström, Abigail W. Johnson, Richard J. Harrison, Greg Deakin, Eleftheria Stavridou, Bo Li, and Helen M. Cockerton
- Subjects
2. Zero hunger ,0106 biological sciences ,Germplasm ,0303 health sciences ,education.field_of_study ,Population ,Objective data ,Quantitative trait locus ,01 natural sciences ,Human assessment ,03 medical and health sciences ,Visual assessment ,Statistics ,Trait ,education ,Selection (genetic algorithm) ,030304 developmental biology ,010606 plant biology & botany ,Mathematics - Abstract
Strawberry uniformity is a complex trait, influenced by multiple genetic and environmental components. To complicate matters further, the phenotypic assessment of strawberry uniformity is confounded by the difficulty of quantifying geometric parameters ‘by eye’ and variation between assessors. An in-depth genetic analysis of strawberry uniformity has not been undertaken to date, due to the lack of accurate and objective data. Nonetheless, uniformity remains one of the most important fruit quality selection criteria for the development of a new variety. In this study, a 3D-imaging approach was developed to characterise berry uniformity. We show that circularity of the maximum circumference had the closest predictive relationship with the manual uniformity score. Combining five or six automated metrics provided the best predictive model, indicating that human assessment of uniformity is highly complex. Furthermore, visual assessment of strawberry fruit quality in a multi-parental QTL mapping population has allowed the identification of genetic components controlling uniformity. A “regular shape” QTL was identified and found to be associated with three uniformity metrics. The QTL was present across a wide array of germplasm, indicating a strong candidate for marker-assisted breeding. A greater understanding of berry uniformity has been achieved through the study of the relative impact of automated metrics on human perceived uniformity. Furthermore, the comprehensive definition of strawberry uniformity using 3D imaging tools has allowed precision phenotyping, which has improved the accuracy of trait quantification. This tool has allowed us to illustrate the use of advanced image analysis towards the breeding of greater uniformity in strawberry.
- Published
- 2020
29. A new procedure, free from human assessment, that automatically grades some facial skin signs in men from selfie pictures. Application to changes induced by a severe aerial chronic urban pollution
- Author
-
Frederic Flament, David Amar, Lauren Sarda‐Dutilh, Eric Elmozino, Irina Kezele, Yuze Zhang, Ruowei Jiang, Jingyi Zhang, Jerome Coquide, Vincent Arcin, Chengda Ye, Parham Aarabi, and Seema Dwivedi
- Subjects
Adult ,Male ,Aging ,medicine.medical_specialty ,Urban Population ,Chinese men ,Pharmaceutical Science ,Dermatology ,Audiology ,030226 pharmacology & pharmacy ,Cohort Studies ,030207 dermatology & venereal diseases ,03 medical and health sciences ,Automation ,Young Adult ,0302 clinical medicine ,Colloid and Surface Chemistry ,Drug Discovery ,Epidemiology ,medicine ,Photography ,Humans ,Grading (education) ,Aged ,Skin ,Aged, 80 and over ,Air Pollutants ,business.industry ,Racial Groups ,Middle Aged ,Human assessment ,Facial skin ,Chemistry (miscellaneous) ,Face ,Smartphone ,Selfie ,business - Abstract
These were two folds: at first, to develop an automatic grading system specifically dedicated to some facial signs of men, similar to the one previously validated on women of different ethnic ancestry and second, to assess its potential in detecting and grading the possible impacts of a severe aerial urban pollution on some facial signs of Chinese men.In both studies, selfie images were obtained from differently aged men. Nine facial signs were automatically graded through a specific A.I-based algorithm and clinically assessed by a panel of experts and dermatologists. Selfie pictures were taken from individual smartphones of variable optical properties. The first study, designed for developing an automatic grading system, involved three comparable cohorts of men from three different regional ancestries (African, Asian, Caucasian, 110 each) the selfie images of which were acquired under four different lighting conditions. As a second use case study, the facial signs of two cohorts of Chinese men (101 and 100, each), differently aged, regularly exposed to very different aerial urban pollution conditions (UP) were analysed by the same algorithm, selfies being taken under only one lighting condition.-The new automatic grading system of facial signs suits well to men, showing comparable results than that the one dedicated to women and provides data in close agreement with experts' assessments. -In both cases (expert's or automatic methodology), the accuracy of the scores appeared ethnic-dependent. -The applied case confirmed previous results obtained clinically, that is, that many facial signs were found of an increased severity among men exposed to a severe urban pollution, as compared to those living in a less polluted city. -In both studies, statistical agreements between the automatic grading system and expert's assessments were reached. In some facial signs, the automatic grading system seems offering a slightly better accuracy than the assessments made by the experts.Apart from some minor limitations, this A.I-based automatic grading system, free from human intervention, performed as well as the one previously developed in women, in close agreement with expert's assessments. In epidemiological studies, this system offers an easy, fast, affordable and confidential approach in the detection and quantification of male facial signs.Il était double: (i) de développer d’un système automatique de scorage spécifique de plusieurs signes faciaux pour les hommes, similaire à celui précédemment validé sur des femmes de différentes origines. Et (ii), de jauger ses capacités pour la détection et l’évaluation des possibles impacts d’une pollution aérienne urbaine sévère sur le visage d’hommes chinois. MÉTHODES: Dans chacune des deux études des images de type selfies d’hommes de différents âges ont été obtenues. Neuf signes faciaux ont été automatiquement évalués grâce à un algorithme spécifique basé sur l’Intelligence Artificielle (IA) puis scorés cliniquement par un panel d’experts et de dermatologues. Des selfies ont été acquis à partir de téléphones portables individuels possédants des optiques et des résolutions différentes. L’étude N°1, conçue pour développer un système de scorage automatique du visage, a regroupé trois cohortes comparables d’hommes d’origines géographiques différentes (Africain, Asiatique et Caucasien, 110 volontaires par ethnies) et a requis l’acquisition sous 4 conditions d’éclairage. L’étude N°2, comme cas pratique, a nécessité le recrutement de deux cohortes d’hommes chinois d’âges différents (101 et 101 volontaires chacune) exposés régulièrement à de très différentes conditions de pollution aérienne urbaine et pour lesquels des selfies ont été enregistrés sous une seule condition d’éclairage. RÉSULTATS: -Le nouveau système de scorage automatique de signes faciaux des hommes performe de manière satisfaisante et montre des résultats comparables à celui précédemment conçu pour les femmes et donne des mesures très proches des évaluations cliniques des experts et dermatologues. -Dans les deux cas (experts ou mesures automatiques), l’acuité des scores apparaît dépendante à l’origine ethnique. -Le cas pratique confirme nos résultats précédents obtenues cliniquement, c’est à dire que de nombreux signes faciaux ont été trouvés d’amplitude plus importante pour les hommes exposés à une pollution aérienne urbaine sévère en comparaison de ceux vivant dans une ville moins polluée. -Les deux études ont démontré l’adéquation statistique entre le système automatique et les évaluations des experts et dermatologues. Pour certains signes, une certaine supériorité de système automatique a pu être observée vis-à-vis de l’évaluation des experts.A l’exception de quelques limitations mineures, le nouveau système de scorage automatique, basé sur l’IA, du visage des hommes - ne nécessitant aucune intervention humaine - fonctionne aussi bien que celui dédié aux femmes et toujours en parfaite adéquation avec les dermatologues. Pour des études épidémiologiques, ce système offre une approche rapide, aisée, confidentielle et d’un coût très abordable pour la détection et la quantification des signes faciaux masculins.
- Published
- 2020
30. Prediction of Human Responses to Dairy Odor Using an Electronic Nose and Neural Networks
- Author
-
Fangle Chang and Paul Heinz Heinemann
- Subjects
Computer science ,Biomedical Engineering ,Stability (learning theory) ,Soil Science ,01 natural sciences ,Human nose ,medicine ,Artificial neural network ,Electronic nose ,business.industry ,010401 analytical chemistry ,Forestry ,Pattern recognition ,Hedonic tone ,04 agricultural and veterinary sciences ,0104 chemical sciences ,Human assessment ,Identification (information) ,medicine.anatomical_structure ,Odor ,040103 agronomy & agriculture ,0401 agriculture, forestry, and fisheries ,Artificial intelligence ,business ,Agronomy and Crop Science ,Food Science - Abstract
Odor emitted from dairy operations may cause negative reactions by farm neighbors. Identification and evaluation of such malodors is vital for better understanding of human response and methods for mitigating effects of odors. The human nose is a valuable tool for odor assessment, but using human panels can be costly and time-consuming, and human evaluation of odor is subjective. Sensing devices, such as an electronic nose, have been widely used to measure volatile emissions from different materials. The challenge, though, is connecting human assessment of odors with the quantitative measurements from instruments. In this work, a prediction system was designed and developed to use instruments to predict human assessment of odors from common dairy operations. The model targets are the human responses to odor samples evaluated using a general pleasantness scale ranging from -11 (extremely unpleasant) to +11 (extremely pleasant). The model inputs were the electronic nose measurements. Three different neural networks, a Levenberg-Marquardt back-propagation neural network (LMBNN), a scaled conjugate gradient back-propagation neural network (CGBNN), and a resilient back-propagation neural network (RPBNN), were applied to connect these two sources of information (human assessments and instrument measurements). The results showed that the LMBNN model can predict human assessments with accuracy as high as 78% within a 10% range and as high as 63% within a 5% range of the targets in independent validation. In addition, the LMBNN model performed with the best stability in both training and independent validation. Keywords: Animal production, Hedonic tone, Olfactometric models.
- Published
- 2018
31. Structural and functional imaging of aqueous humour outflow: a review
- Author
-
Brian A. Francis, Alex S. Huang, and Robert N. Weinreb
- Subjects
0301 basic medicine ,Functional evaluation ,Intraocular pressure ,genetic structures ,business.industry ,Aqueous humour ,Glaucoma ,medicine.disease ,eye diseases ,Human assessment ,Functional imaging ,03 medical and health sciences ,Ophthalmology ,030104 developmental biology ,0302 clinical medicine ,030221 ophthalmology & optometry ,Medicine ,Optometry ,sense organs ,business ,Neuroscience - Abstract
Maintaining healthy aqueous humour outflow (AHO) is important for intraocular cellular health and stable vision. Impairment of AHO can lead to increased intraocular pressure, optic nerve damage and concomitant glaucoma. An improved understanding of AHO will lead to improved glaucoma surgeries that enhance native AHO as well as facilitate the development of AHO-targeted pharmaceuticals. Recent AHO imaging has evolved to live human assessment and has focused on the structural evaluation of AHO pathways and the functional documentation of fluid flow. Structural AHO evaluation is predominantly driven by optical coherence tomography, and functional evaluation of flow is performed using various methods, including aqueous angiography. Advances in structural and functional evaluation of AHO are reviewed with discussion of strengths, weaknesses and potential future directions.
- Published
- 2017
32. Do cuckoos choose nests of great reed warblers on the basis of host egg appearance?
- Author
-
CHERRY, M. I., BENNETT, A. T. D., and MOSKÁT, C.
- Subjects
- *
CUCKOO behavior , *NESTS , *EGGS , *REED warblers , *SPECTROPHOTOMETRY - Abstract
Prevailing theory assumes cuckoos lay at random among host nests within a population, although it has been suggested that cuckoos could choose large nests and relatively active pairs within host populations. We tested the hypothesis that egg matching could be improved by cuckoos choosing nests in which host eggs more closely match their own, by assessing matching and monitoring nest fate in great reed warblers naturally or experimentally parasitized by eggs of European cuckoos. A positive correlation between cuckoo and host egg visual features suggests that cuckoos do not lay at random within a population, but choose nests and this improves egg matching: naturally parasitized cuckoo eggs were more similar to host eggs as perceived by humans and as measured by spectrophotometry. Our results suggest a hitherto overlooked step in cuckoo–host evolutionary arms races, and have nontrivial implications for the common experimental practice of artificially parasitizing clutches. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
33. Towards a new model of light quality assessment based on occupant satisfaction and lighting glare indices
- Author
-
Zemmouri Noureddine, Barbara Ester Adele Piga, Saadi Mohamed Yacine, Eugenio Morello, and Daich Safa
- Subjects
Engineering ,Statistical regression ,business.industry ,media_common.quotation_subject ,0211 other engineering and technologies ,Glare metrics ,Glare (vision) ,HDR photography ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Human assessment ,Light quality ,Daylighting assessment ,Energy (all) ,Machine learning ,021105 building & construction ,Daylight ,Quality (business) ,business ,Daylighting ,Simulation ,0105 earth and related environmental sciences ,media_common - Abstract
This study looks at the effect of daylighting on human performance. It includes a focus on glare index combined with the actual feeling of users of the classroom as a way to assess indoor lighting quality. The main objective of this research is to understand the impact of daylighting from windows on the glare sensation and also to determine which glare index is the closest to human visual sensation under local daylighting conditions in Biskra, Algeria with highly luminous climate. The study used High Dynamic Range (HDR) photography, Evaglare and Aftab Alpha software to calculate the two glare metrics Daylight Glare, Index (DGI) and the Daylight Glare Probability (DGP). A survey was also used with 90 occupants under different lighting conditions (different configurations) in a design classroom. In order to link the mathematical model and the human assessment of glare, statistical regression analysis was used. We established a statistically compelling connection between daylighting and student performance.
- Published
- 2017
34. Multimodal Assessment on Teaching Skills in a Virtual Rehearsal Environment
- Author
-
Hung-Hsuan Huang, Toyoaki Nishida, Masato Fukuda, and Kazuhiro Kuwabara
- Subjects
Medical education ,School teachers ,Teaching skills ,Work (electrical) ,Tacit knowledge ,ComputingMilieux_COMPUTERSANDEDUCATION ,Student teacher ,Psychology ,User assessment ,Multimodal interaction ,Human assessment - Abstract
In the training programs for student teachers, the opportunity to practice teaching skills is often limited due to the lack of resources in preparing a rehearsal environment. We are developing a virtual rehearsal environment for teaching practicing with multiple virtual students. In order to provide feedbacks to the student teachers and allow them to improve their skills, automatic assessment on their performance is required. However, it is hard to assess on teaching because the assessment is subjective and often depends on the tacit knowledge of experienced teacher trainers. In this work, we proposed an automatic assessment model based on human assessment done by experienced high school teachers.
- Published
- 2019
35. Observer-independent assessment of psoriasis-affected area using machine learning
- Author
-
N. Meienberger, R. Christen, Thomas Koller, L. Amruthalingam, Florian Anzengruber, Julia-Tatjana Maul, Vahid Djamei, Marc Pouly, and Alexander A. Navarini
- Subjects
Adult ,Observer (quantum physics) ,Adolescent ,Diagnostic accuracy ,Dermatology ,Machine learning ,computer.software_genre ,Severity of Illness Index ,030218 nuclear medicine & medical imaging ,Objective assessment ,Machine Learning ,030207 dermatology & venereal diseases ,03 medical and health sciences ,Young Adult ,0302 clinical medicine ,Psoriasis Area and Severity Index ,Psoriasis ,Photography ,Medicine ,Humans ,Aged ,Retrospective Studies ,Alternative methods ,Aged, 80 and over ,Observer Variation ,business.industry ,Reproducibility of Results ,Middle Aged ,medicine.disease ,Human assessment ,Infectious Diseases ,Computer-aided diagnosis ,Artificial intelligence ,Neural Networks, Computer ,business ,computer - Abstract
BACKGROUND Assessment of psoriasis severity is strongly observer-dependent, and objective assessment tools are largely missing. The increasing number of patients receiving highly expensive therapies that are reimbursed only for moderate-to-severe psoriasis motivates the development of higher quality assessment tools. OBJECTIVE To establish an accurate and objective psoriasis assessment method based on segmenting images by machine learning technology. METHODS In this retrospective, non-interventional, single-centred, interdisciplinary study of diagnostic accuracy, 259 standardized photographs of Caucasian patients were assessed and typical psoriatic lesions were labelled. Two hundred and three of those were used to train and validate an assessment algorithm which was then tested on the remaining 56 photographs. The results of the algorithm assessment were compared with manually marked area, as well as with the affected area determined by trained dermatologists. RESULTS Algorithm assessment achieved accuracy of more than 90% in 77% of the images and differed on average 5.9% from manually marked areas. The difference between algorithm-predicted and photograph-based estimated areas by physicians was 8.1% on average. CONCLUSION The study shows the potential of the evaluated technology. In contrast to the Psoriasis Area and Severity Index (PASI), it allows for objective evaluation and should therefore be developed further as an alternative method to human assessment.
- Published
- 2019
36. Pathologist-level classification of histopathological melanoma images with deep neural networks
- Author
-
Philipp Jansen, Dirk Schadendorf, Jochen Utikal, Christof von Kalle, Carola Berking, Alexander Enk, Cindy Franklin, Stefan Fröhling, Achim Hekler, Titus J. Brinker, Tim Holland-Letz, Joachim Klode, and Dieter Krahl
- Subjects
0301 basic medicine ,Cancer Research ,medicine.medical_specialty ,Pathology ,Skin Neoplasms ,Biopsy ,Medizin ,03 medical and health sciences ,0302 clinical medicine ,Breast cancer ,Deep Learning ,Predictive Value of Tests ,Image Interpretation, Computer-Assisted ,medicine ,Humans ,Diagnosis, Computer-Assisted ,Medical diagnosis ,Melanoma diagnosis ,Melanoma ,Nevus ,Observer Variation ,Microscopy ,business.industry ,Reproducibility of Results ,medicine.disease ,Human assessment ,Pathologists ,030104 developmental biology ,Oncology ,030220 oncology & carcinogenesis ,Deep neural networks ,Histopathology ,Benign nevus ,business - Abstract
Background The diagnosis of most cancers is made by a board-certified pathologist based on a tissue biopsy under the microscope. Recent research reveals a high discordance between individual pathologists. For melanoma, the literature reports 25–26% of discordance for classifying a benign nevus versus malignant melanoma. Deep learning was successfully implemented to enhance the precision of lung and breast cancer diagnoses. The aim of this study is to illustrate the potential of deep learning to assist human assessment for a histopathologic melanoma diagnosis. Methods Six hundred ninety-five lesions were classified by an expert histopathologist in accordance with current guidelines (350 nevi and 345 melanomas). Only the haematoxylin and eosin stained (H&E) slides of these lesions were digitalised using a slide scanner and then randomly cropped. Five hundred ninety-five of the resulting images were used for the training of a convolutional neural network (CNN). The additional 100 H&E image sections were used to test the results of the CNN in comparison with the original class labels. Findings The total discordance with the histopathologist was 18% for melanoma (95% confidence interval [CI]: 7.4–28.6%), 20% for nevi (95% CI: 8.9–31.1%) and 19% for the full set of images (95% CI: 11.3–26.7%). Interpretation Even in the worst case, the discordance of the CNN was about the same compared with the discordance between human pathologists as reported in the literature. Despite the vastly reduced amount of data, time necessary for diagnosis and cost compared with the pathologist, our CNN archived on-par performance. Conclusively, CNNs indicate to be a valuable tool to assist human melanoma diagnoses.
- Published
- 2019
37. Assessment of Knitted Fabric Smoothness and Softness Based on Paired Comparison
- Author
-
Patsy Perry, Ivana Salopek Čubrić, and Goran Čubrić
- Subjects
Softness ,Secondary education ,Materials science ,Chemistry(all) ,Polymers and Plastics ,General Chemical Engineering ,Paired comparison ,02 engineering and technology ,010402 general chemistry ,01 natural sciences ,Yarn ,Statistics ,Smoothness (probability theory) ,Single factor ,Knitted fabric ,General Chemistry ,Yarn, Knitted fabric, Smoothness, Softness, Evaluators ,021001 nanoscience & nanotechnology ,Smoothness ,0104 chemical sciences ,Human assessment ,Evaluators ,Gender and Education ,Chemical Engineering(all) ,0210 nano-technology - Abstract
The purpose of this study is to conduct an in-depth analysis of customer preferences for single jersey knit fabrics regarding the attributes of smoothness and softness, in order to build up a holistic picture of purchase preferences. A paired comparison test was conducted for the subjective human assessment of single jersey knitted fabrics which were designed to differ in a single factor only. A sample of 140 evaluators were recruited and assigned into seven demographic groups according to age, gender and education level, with 20 evaluators in each group. It was shown that fabrics produced of viscose and polyester or with addition of elastane yarn are perceived smoother than 100 % cotton fabrics without elastane. Regarding the gender groups, the discrepancy is seen between the results obtained for smoothness and softness, especially within the male evaluators. Also, it was shown that evaluators with completed secondary education do not make higher differences in the perception of attributes, both smoothness and softness.
- Published
- 2019
38. A convolutional neural network trained with dermoscopic images performed on par with 145 dermatologists in a clinical melanoma image classification task
- Author
-
Titus J. Brinker, Achim Hekler, Alexander H. Enk, Joachim Klode, Axel Hauschild, Carola Berking, Bastian Schilling, Sebastian Haferkamp, Dirk Schadendorf, Stefan Fröhling, Jochen S. Utikal, Christof von Kalle, Wiebke Ludwig-Peitsch, Judith Sirokay, Lucie Heinzerling, Magarete Albrecht, Katharina Baratella, Lena Bischof, Eleftheria Chorti, Anna Dith, Christina Drusio, Nina Giese, Emmanouil Gratsias, Klaus Griewank, Sandra Hallasch, Zdenka Hanhart, Saskia Herz, Katja Hohaus, Philipp Jansen, Finja Jockenhöfer, Theodora Kanaki, Sarah Knispel, Katja Leonhard, Anna Martaki, Liliana Matei, Johanna Matull, Alexandra Olischewski, Maximilian Petri, Jan-Malte Placke, Simon Raub, Katrin Salva, Swantje Schlott, Elsa Sody, Nadine Steingrube, Ingo Stoffels, Selma Ugurel, Wiebke Sondermann, Anne Zaremba, Christoffer Gebhardt, Nina Booken, Maria Christolouka, Kristina Buder-Bakhaya, Therezia Bokor-Billmann, Alexander Enk, Patrick Gholam, Holger Hänßle, Martin Salzmann, Sarah Schäfer, Knut Schäkel, Timo Schank, Ann-Sophie Bohne, Sophia Deffaa, Katharina Drerup, Friederike Egberts, Anna-Sophie Erkens, Benjamin Ewald, Sandra Falkvoll, Sascha Gerdes, Viola Harde, Marion Jost, Katja Kosova, Laetitia Messinger, Malte Metzner, Kirsten Morrison, Rogina Motamedi, Anja Pinczker, Anne Rosenthal, Natalie Scheller, Thomas Schwarz, Dora Stölzl, Federieke Thielking, Elena Tomaschewski, Ulrike Wehkamp, Michael Weichenthal, Oliver Wiedow, Claudia Maria Bär, Sophia Bender-Säbelkampf, Marc Horbrügger, Ante Karoglan, Luise Kraas, Jörg Faulhaber, Cyrill Geraud, Ze Guo, Philipp Koch, Miriam Linke, Nolwenn Maurier, Verena Müller, Benjamin Thomas, Jochen Sven Utikal, Ali Saeed M. Alamri, Andrea Baczako, Matthias Betke, Carolin Haas, Daniela Hartmann, Markus V. Heppt, Katharina Kilian, Sebastian Krammer, Natalie Lidia Lapczynski, Sebastian Mastnik, Suzan Nasifoglu, Cristel Ruini, Elke Sattler, Max Schlaak, Hans Wolff, Birgit Achatz, Astrid Bergbreiter, Konstantin Drexler, Monika Ettinger, Anna Halupczok, Marie Hegemann, Verena Dinauer, Maria Maagk, Marion Mickler, Biance Philipp, Anna Wilm, Constanze Wittmann, Anja Gesierich, Valerie Glutsch, Katrin Kahlert, Andreas Kerstan, and Philipp Schrüfer
- Subjects
0301 basic medicine ,Cancer Research ,Skin Neoplasms ,Computer science ,Medizin ,Dermoscopy ,Dermatology ,Convolutional neural network ,Sensitivity and Specificity ,03 medical and health sciences ,0302 clinical medicine ,Image Interpretation, Computer-Assisted ,Humans ,Melanoma ,Receiver operating characteristic ,Contextual image classification ,Artificial neural network ,business.industry ,Deep learning ,Pattern recognition ,University hospital ,Human assessment ,030104 developmental biology ,Oncology ,030220 oncology & carcinogenesis ,Artificial intelligence ,Neural Networks, Computer ,business ,Dermatologists - Abstract
Background: Recent studies have demonstrated the use of convolutional neural networks (CNNs) to classify images of melanoma with accuracies comparable to those achieved by board-certified dermatologists. However, the performance of a CNN exclusively trained with dermoscopic images in a clinical image classification task in direct competition with a large number of dermatologists has not been measured to date. This study compares the performance of a convolutional neuronal network trained with dermoscopic images exclusively for identifying melanoma in clinical photographs with the manual grading of the same images by dermatologists. Methods: We compared automatic digital melanoma classification with the performance of 145 dermatologists of 12 German university hospitals. We used methods from enhanced deep learning to train a CNN with 12,378 open-source dermoscopic images. We used 100 clinical images to compare the performance of the CNN to that of the dermatologists. Dermatologists were compared with the deep neural network in terms of sensitivity, specificity and receiver operating characteristics. Findings: The mean sensitivity and specificity achieved by the dermatologists with clinical images was 89.4% (range: 55.0%-100%) and 64.4% (range: 22.5%-92.5%). At the same sensitivity, the CNN exhibited a mean specificity of 68.2% (range 47.5%-86.25%). Among the dermatologists, the attendings showed the highest mean sensitivity of 92.8% at a mean specificity of 57.7%. With the same high sensitivity of 92.8%, the CNN had a mean specificity of 61.1%. Interpretation: For the first time, dermatologist-level image classification was achieved on a clinical image classification task without training on clinical images. The CNN had a smaller variance of results indicating a higher robustness of computer vision compared with human assessment for dermatologic image classification tasks. (C) 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
- Published
- 2018
39. 'Tailception': using neural networks for assessing tail lesions on pictures of pig carcasses
- Author
-
C. Veit, Sabine Dippel, Johannes Brünger, and Reinhard Koch
- Subjects
Tail-biting ,Tail ,medicine.medical_specialty ,neural network ,040301 veterinary sciences ,Swine ,abattoir ,Audiology ,Animal Welfare ,SF1-1100 ,0403 veterinary science ,Lesion ,medicine ,Prevalence ,Animals ,Humans ,Bites and Stings ,Mathematics ,Observer Variation ,tail lesions ,Artificial neural network ,0402 animal and dairy science ,Scoring methods ,Reproducibility of Results ,04 agricultural and veterinary sciences ,human assessment ,040201 dairy & animal science ,Animal culture ,Human assessment ,Training material ,Animal Science and Zoology ,medicine.symptom ,Nerve Net ,slaughter pigs ,Abattoirs - Abstract
Tail lesions caused by tail biting are a widespread welfare issue in pig husbandry. Determining their prevalence currently involves labour intensive, subjective scoring methods. Increased societal interest in tail lesions requires fast, reliable and cheap systems for assessing tail status. In the present study, we aimed to test the reliability of neural networks for assessing tail pictures from carcasses against trained human observers. Three trained observers scored tail lesions from automatically recorded pictures of 13 124 pigs. Nearly all pigs had been tail docked. Tail lesions were classified using a 4-point score (0=no lesion, to 3=severe lesion). In addition, total tail loss was recorded. Agreement between observers was tested prior and during the assessment in a total of seven inter-observer tests with 80 pictures each. We calculated agreement between observer pairs as exact agreement (%) and prevalence-adjusted bias-adjusted κ (PABAK; value 1=optimal agreement). Out of the 13 124 scored pictures, we used 80% for training and 20% for validating our neural networks. As the position of the tail in the pictures varied (high, low, left, right), we first trained a part detection network to find the tail in the picture and select a rectangular part of the picture which includes the tail. We then trained a classification network to categorise tail lesion severity using pictures scored by human observers whereby the classification network only analysed the selected picture parts. Median exact agreement between the three observers was 80% for tail lesions and 94% for tail loss. Median PABAK for tail lesions and loss were 0.75 and 0.87, respectively. The agreement between classification by the neural network and human observers was 74% for tail lesions and 95% for tail loss. In other words, the agreement between the networks and human observers were very similar to the agreement between human observers. The main reason for disagreement between observers and thereby higher variation in network training material were picture quality issues. Therefore, we expect even better results for neural network application to tail lesions if training is based on high quality pictures. Very reliable and repeatable tail lesion assessment from pictures would allow automated tail classification of all pigs slaughtered, which is something that some animal welfare labels would like to do.
- Published
- 2018
40. A new procedure, free from human assessment that automatically grades some facial skin structural signs. Comparison with assessments by experts, using referential atlases of skin ageing
- Author
-
Frederic Flament, Esohe Omoyuri, Alex Levinshtein, Elmoznino Eric, Jerome Coquide, Parham Aarabi, Vincent Arcin, He Ma, Irina Kezele, Ruowei Jiang, Junwei Ma, and Jingyi Zhang
- Subjects
Adult ,Aging ,medicine.medical_specialty ,Skin ageing ,Scoring system ,Consensus ,Adolescent ,Pharmaceutical Science ,Black People ,Dermatology ,Audiology ,030226 pharmacology & pharmacy ,Convolutional neural network ,White People ,030207 dermatology & venereal diseases ,03 medical and health sciences ,Young Adult ,0302 clinical medicine ,Colloid and Surface Chemistry ,Atlases as Topic ,Asian People ,Drug Discovery ,medicine ,Photography ,Humans ,Grading (education) ,Aged ,Skin ,Aged, 80 and over ,Facial expression ,Training set ,Middle Aged ,Human assessment ,Skin Aging ,Facial skin ,Chemistry (miscellaneous) ,Face ,Female ,Smartphone ,Psychology - Abstract
To develop an automatic system that grades the severity of facial signs through 'selfies' pictures taken by women of different ages and ethnics.1140 women from three ethnics (African-American, Asian, Caucasian), of different ages (18-80 years old), took 'selfies' by high resolution smartphones cameras under different conditions of lighting or facial expressions. A dedicated software, was developed, based on a Convolutional Neural Network (CNN) that integrates training data from referential Skin Aging Atlases. The latter allows to an immediate quantification of the severity of nine facial signs according to the ethnicity declared by the subject. These automatic grading were confronted to those assessed by 12 trained experts and dermatologists either on 'selfies' pictures or in live conditions on a smaller cohort of women.The system appears weakly influenced by lighting conditions or facial expressions (coefficients of variations ranging 10-13% for most signs) and leads to global agreements with experts' assessments, even showing a better reproducibility on some facial signs.This automatic scoring system, still in development, seems offering a new quantitative approach in the quantified description of facial signs, independent from human vision, in many applications, being individual, cosmetic oriented or dermatological with regard to the follow-up of medical anti-ageing corrective strategies.De développer un système automatique qui quantifie la sévérité de certains signes du visage à partir de photographies de type ‘selfies’ pris par des femmes d'origine ethnique et d’âge différents. MÉTHODES: 1140 femmes de trois ethnies différentes (Afro-Américaines, Asiatiques, Caucasiennes), d’âges différents (18-80 ans) ont pris des selfies sous différentes conditions d’éclairage et d'expressions faciales. Un logiciel dédié a été développé, basé sur un réseau de convolution neuronal et intégrant les données d'annotations utilisant les Atlas de Vieillissement Cutané. Ce système quantifie immédiatement la sévérité de 9 signes faciaux selon l'ethnie déclarée par le sujet. Ces scores ont été confrontés à ceux de 12 experts et dermatologistes soit à partir des ‘selfies’ ou en conditions réelles sur un groupe plus restreint de femmes. RÉSULTATS: Le système apparaît faiblement influencé par les conditions d’éclairage et les expressions faciales (coefficients de variation de l'ordre de 10-13%) et conduit à des valeurs comparables de celles des experts, voire même de meilleure reproductibilité dans certains cas.Ce système de scorage automatique, encore en développement, semble offrir une nouvelle approche dans la description quantitative de signes du visage, indépendante de l’œil humain, dans de nombreuses applications, comme la personnalisation, à visée cosmétique ou dermatologique, dans le suivi de certaines stratégies médicales de l'antivieillissement cutané.
- Published
- 2018
41. Static analysis of programming exercises: Fairness, usefulness and a method for application
- Author
-
Colin Higgins and Stephen Nutbrown
- Subjects
General Computer Science ,Java ,Computer science ,business.industry ,05 social sciences ,050301 education ,02 engineering and technology ,Static analysis ,Machine learning ,computer.software_genre ,Education ,Human assessment ,Formative assessment ,Summative assessment ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,Abstract syntax tree ,Software engineering ,business ,Grading (education) ,0503 education ,computer ,computer.programming_language ,Coding (social sciences) - Abstract
This article explores the suitability of static analysis techniques based on the abstract syntax tree (AST) for the automated assessment of early/mid degree level programming. Focus is on fairness, timeliness and consistency of grades and feedback. Following investigation into manual marking practises, including a survey of markers, the assessment of 97 student Java programming submissions is automated using static analysis rules. Initially, no correlation between human provided marks and rule violations is found. This paper investigates why, and considers several improvements to the approaches used for applying static analysis rules. New methods for application are explored and the resulting technique is applied to a second exercise with 95 submissions. The results show a stronger positive correlation with manual assessment, whilst retaining advantages in terms of time cost, pedagogic advantages and instant feedback. This study provides insight into the differences between human assessment and st...
- Published
- 2016
42. Development of a quantitative morphological assessment of toxicant-treated zebrafish larvae using brightfield imaging and high-content analysis
- Author
-
Nick Radio, Keith A. Houck, Stephanie Padilla, John F. Wambaugh, Samantha Deal, Richard S. Judson, and Shad Mosher
- Subjects
0301 basic medicine ,Training set ,genetic structures ,business.industry ,fungi ,Pattern recognition ,Anatomy ,Biology ,Toxicology ,biology.organism_classification ,Human assessment ,Visual inspection ,03 medical and health sciences ,chemistry.chemical_compound ,030104 developmental biology ,chemistry ,High-content screening ,Visual assessment ,Zebrafish larvae ,Artificial intelligence ,business ,Zebrafish ,Toxicant - Abstract
One of the rate-limiting procedures in a developmental zebrafish screen is the morphological assessment of each larva. Most researchers opt for a time-consuming, structured visual assessment by trained human observer(s). The present studies were designed to develop a more objective, accurate and rapid method for screening zebrafish for dysmorphology. Instead of the very detailed human assessment, we have developed the computational malformation index, which combines the use of high-content imaging with a very brief human visual assessment. Each larva was quickly assessed by a human observer (basic visual assessment), killed, fixed and assessed for dysmorphology with the Zebratox V4 BioApplication using the Cellomics® ArrayScan® V(TI) high-content image analysis platform. The basic visual assessment adds in-life parameters, and the high-content analysis assesses each individual larva for various features (total area, width, spine length, head-tail length, length-width ratio, perimeter-area ratio). In developing the computational malformation index, a training set of hundreds of embryos treated with hundreds of chemicals were visually assessed using the basic or detailed method. In the second phase, we assessed both the stability of these high-content measurements and its performance using a test set of zebrafish treated with a dose range of two reference chemicals (trans-retinoic acid or cadmium). We found the measures were stable for at least 1 week and comparison of these automated measures to detailed visual inspection of the larvae showed excellent congruence. Our computational malformation index provides an objective manner for rapid phenotypic brightfield assessment of individual larva in a developmental zebrafish assay. Copyright © 2016 John Wiley & Sons, Ltd.
- Published
- 2016
43. The Experiential Evolution of Physiognomy: Focusing on Choi Hanki’s Human Assessment(測人)
- Author
-
Seo Young Yi
- Subjects
Physiognomy ,Psychology ,Experiential learning ,Epistemology ,Human assessment - Published
- 2015
44. Fuzzy approach for predicting combined effect of variables affecting consumer food preferences
- Author
-
R. Baskar
- Subjects
Computer science ,Fuzzy inference system ,Econometrics ,Semantics ,Affect (psychology) ,Fuzzy logic ,Human assessment - Abstract
Predicting the behaviour of consumers is always a challenge for marketers. Consumers are moved by various factors to make choices of food. This paper provides a framework for predicting consumer food preferences using fuzzy approach which considers uncertainties in predicting the combined effect of factors which affect the consumer food preferences. Advantage of using a fuzzy approach is its power to interpret the semantics of human assessment. In predicting the effect of variables affecting consumer food preferences, three variables which have significant impact were identified, confirming our literature review. A FIS model is tested on the factors which affect the consumer's food preferences and the results are discussed. Among the factors, the consumer's attitude is found to have a major impact in determining the food preferences, and using the fuzzy approach gives an understanding on the intensity of the role of these variables in predicting the consumer food preferences. The results are useful for businesses which design products based on consumer food preferences.
- Published
- 2020
45. Reasoning with an uncertainty of information measure: decision making for military and non-military applications
- Author
-
Andre Harrison and Adrienne Raglin
- Subjects
Intelligent agent ,Software ,Risk analysis (engineering) ,business.industry ,Computer science ,Interpretation (philosophy) ,Information measure ,business ,computer.software_genre ,computer ,Abductive reasoning ,Pace ,Human assessment - Abstract
Intelligent agents are devices, software, and simulations that perceive the environment and take actions to achieve a goal through the use of artificial intelligence. These AI agents are increasingly incorporated into every aspect of our lives. This is particularly true for soldiers and analysts as they must increasingly perform tasks in varied, dynamic, and fast paced operational environments. There is a common idea that, in the future, the pace of operations will increasingly far exceed soldiers’ or analysts’ ability to react to extreme, complex activities. Accelerated decision making in Army operations will relying on AI agents and enabling technologies such as autonomous systems and simulations. However, what happens when the decisions from these AI agents are wrong, produce results contrary to expectations, or simply in disagreement with a person? Explanations can help resolve these issues. Any errors or uncertainty from the AI agent in an accelerated environment will present unique and unforeseen challenges that may potentially inhibit analysts’ or soldiers’ ability to make decisions effectively and efficiently. Providing explanations for AI outputs, predictions, or behaviors is challenging. Algorithms or techniques frequently obfuscate features and how actions are decided. In addition, results from these systems do not always include uncertainty information related to the factors that influenced the actions or decisions. Therefore, information on the uncertainty explicitly in the explanation is necessary. We explore the use of abductive reasoning to provide explanations for situations where an agents answers are not in line with human assessment nor provide uncertainty information needed for human interpretation of the answers. The primary goal of this work is to strengthen the communication of information and increase the effectiveness of interactions between humans and non-human agents.
- Published
- 2018
46. Análisis de dos métodos de evaluación automática de análisis semántico latente (LSA): Un nuevo método LSA (Inbuilt Rubric) y un método LSA tradicional (Golden Summary) en resúmenes extraídos de textos expositivos
- Author
-
José Á. Martínez-Huertas, Jessica Moraleda, Adrián Mencu, José León, Ricardo Olmos, Olga Jastrzebska, UAM. Departamento de Psicología Básica, and UAM. Departamento de Psicología Social y Metodología
- Subjects
Social Psychology ,Computer science ,Inbuilt rubric ,lcsh:BF1-990 ,Automatic essay scoring (AES) ,computer.software_genre ,050105 experimental psychology ,Lexical descriptors ,LSA ,Similarity (psychology) ,Developmental and Educational Psychology ,0501 psychology and cognitive sciences ,Summaries ,Latent semantic analysis ,business.industry ,05 social sciences ,050301 education ,Rubric ,Psicología ,Human assessment ,lcsh:Psychology ,Assessment methods ,lcsh:B ,Artificial intelligence ,business ,lcsh:Philosophy. Psychology. Religion ,0503 education ,computer ,Natural language processing - Abstract
The purpose of this study was to compare two automatic assessment methods using Latent Semantic Analysis (LSA): A novel LSA assessment method (Inbuilt Rubric) and a traditional LSA method (Golden Summary). Two conditions were analyzed using the Inbuilt Rubric method: The number of lexical descriptors needed to better accommodate an expert rubric (few vs. many) and a weighting function to penalize off-topic contents included in the student summaries (weighted vs. non-weighted). One hundred and sixty-six students divided in two different samples (81 undergraduates and 85 High School students) took part in this study. Students summarized two expository texts that differed in complexity (complex/ easy) and length (1,300/500 words). Results showed that the Inbuilt Rubric method simulates human assessment better than Golden summaries in all cases. The similarity with human assessment was higher for Inbuilt Rubric (r = .78 and r = .79) than for Golden Summary (r = .67 and r = .47) in both texts. Moreover, to accommodate an expert rubric into the Inbuilt Rubric method was better using few descriptors and the weighted function., El objetivo de este estudio es comparar dos métodos de evaluación automática del análisis semántico latente (LSA): Un nuevo método LSA (Inbuilt Rubric) y un método LSA tradicional (Golden Summary). Se analizaron dos condiciones del método Inbuilt Rubric: el número de descriptores léxicos que se utilizan para generar la rúbrica (pocos vs. muchos) y una corrección que penaliza el contenido irrelevante incluido en los resúmenes de los estudiantes (corregido vs. no corregido). Ciento sesenta y seis estudiantes divididos en dos muestras (81 estudiantes universitarios y 85 estudiantes de instituto) participaron en este estudio. Los estudiantes resumieron dos textos expositivos que tenían distinta complejidad (difícil/fácil) y longitud (1,300/500 palabras). Los resultados mostraron que el método Inbuilt Rubric imita las evaluaciones humanas mejor que Golden Summary en todos los casos. La similitud con las evaluaciones humanas fue más alta con Inbuilt Rubric (r = .78 and r = .79) que con Golden Summary (r = .67 and r = .47) en ambos textos. Además, la versión de Inbuilt Rubric con menor número de descriptores y con corrección es la que obtuvo mejores resultados, This study was supported by Grant PSI2013-47219-P from the Ministry of Economic and Competitive (MINECO) of Spain, and European Union
- Published
- 2018
47. Handfeel of Single Jersey Fabrics as Assessed by a New Physical Method
- Author
-
M. Abu-Rous, J. Innerlohinger, Benny Malengier, and E. Liftinger
- Subjects
Polyester ,Cellulose fiber ,Materials science ,parasitic diseases ,technology, industry, and agriculture ,Lyocell ,Viscose ,Composite material ,Human assessment - Abstract
Hand feel of fabrics made of cotton, polyester and wood-based cellulose fibers lyocell, modal and viscose was assessed by Fabric Touch Tester (FTT), Tissue Softness Analyzer (TSA), ring pullthrough and PhabrOmeter® and compared with human handfeel ranking. Additionally, the effect of repeated washing and drying on fabric handfeel was investigated by TSA. TSA ranking of softness and smoothness corresponded to the rankings by other direct physical methods as well as with human handfeel. Fabrics made from wood-based cellulosic fibers especially modal types showed better handfeel results than cotton even after repeated washing cycles. A divergence between physical and human assessment was observed on polyester.
- Published
- 2018
48. Four Keys to Topic Interpretability in Topic Modeling
- Author
-
Andrey Filchenkov, Sergei Koltcov, and Andrey Mavrin
- Subjects
Topic model ,Measure (data warehouse) ,business.industry ,Probability estimation ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,Space (commercial competition) ,Machine learning ,computer.software_genre ,Human assessment ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Function (engineering) ,computer ,media_common ,Parametric statistics ,Interpretability - Abstract
Interpretability of topics built by topic modeling is an important issue for researchers applying this technique. We suggest a new interpretability score, which we select from an interpretability score parametric space defined by four components: a splitting method, a probability estimation method, a confirmation measure and an aggregation function. We designed a regularizer for topic modeling representing this score. The resulting topic modeling method shows significant superiority to all analogs in reflecting human assessments of topic interpretability.
- Published
- 2018
49. Visual and Semiquantitative Accuracy in Clinical Baseline 123I-Ioflupane SPECT/CT Imaging
- Author
-
Constantin Lapa, Jeffrey P. Leal, Rudolf A. Werner, Takahiro Higuchi, Lilja Solnes, Yong Du, Andreas K. Buck, Mehrbod S. Javadi, Sara Sheikhbahaei, Charles Marcus, and Steven P. Rowe
- Subjects
Adult ,Male ,Single Photon Emission Computed Tomography Computed Tomography ,Adolescent ,Nortropanes ,Concordance ,030218 nuclear medicine & medical imaging ,03 medical and health sciences ,0302 clinical medicine ,123I-Ioflupane ,Parkinsonian Disorders ,Region of interest ,Image Interpretation, Computer-Assisted ,Humans ,Medicine ,Radiology, Nuclear Medicine and imaging ,ddc:610 ,Aged ,Aged, 80 and over ,business.industry ,Significant difference ,General Medicine ,Middle Aged ,Human assessment ,Semiquantitative Method ,030220 oncology & carcinogenesis ,Clinical diagnosis ,SPECT ,Female ,Radiopharmaceuticals ,Ct imaging ,Nuclear medicine ,business - Abstract
PURPOSE: We aimed to (a) elucidate the concordance of visual assessment of an initial I-ioflupane scan by a human interpreter with comparison to results using a fully automatic semiquantitative method and (b) to assess the accuracy compared to follow-up (f/u) diagnosis established by movement disorder specialists. METHODS: An initial I-ioflupane scan was performed in 382 patients with clinically uncertain Parkinsonian syndrome. An experienced reader performed a visual evaluation of all scans independently. The findings of the visual read were compared with semiquantitative evaluation. In addition, available f/u clinical diagnosis (serving as a reference standard) was compared with results of the human read and the software. RESULTS: When comparing the semiquantitative method with the visual assessment, discordance could be found in 25 (6.5%) of 382 of the cases for the experienced reader (ĸ = 0.868). The human observer indicated region of interest misalignment as the main reason for discordance. With neurology f/u serving as reference, the results of the reader revealed a slightly higher accuracy rate (87.7%, ĸ = 0.75) compared to semiquantification (86.2%, ĸ = 0.719, P < 0.001, respectively). No significant difference in the diagnostic performance of the visual read versus software-based assessment was found. CONCLUSIONS: In comparison with a fully automatic semiquantitative method in I-ioflupane interpretation, human assessment obtained an almost perfect agreement rate. However, compared to clinical established diagnosis serving as a reference, visual read seemed to be slightly more accurate as a solely software-based quantitative assessment.
- Published
- 2018
50. Development of an Automatic Testing Platform for Aviator’s Night Vision Goggle Honeycomb Defect Inspection
- Author
-
Chao-Chung Peng and Bo-Lin Jian
- Subjects
Engineering ,defect detection ,night vision goggles ,military avionics systems ,auto focus ,passive focusing ,02 engineering and technology ,lcsh:Chemical technology ,Biochemistry ,Article ,Analytical Chemistry ,law.invention ,law ,Night vision ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,lcsh:TP1-1185 ,Electrical and Electronic Engineering ,Instrumentation ,Automatic testing ,Autofocus ,business.industry ,Process (computing) ,Honeycomb (geometry) ,020206 networking & telecommunications ,Atomic and Molecular Physics, and Optics ,Human assessment ,020201 artificial intelligence & image processing ,Aerial reconnaissance ,Artificial intelligence ,business ,Night vision device - Abstract
Due to the direct influence of night vision equipment availability on the safety of night-time aerial reconnaissance, maintenance needs to be carried out regularly. Unfortunately, some defects are not easy to observe or are not even detectable by human eyes. As a consequence, this study proposed a novel automatic defect detection system for aviator's night vision imaging systems AN/AVS-6(V)1 and AN/AVS-6(V)2. An auto-focusing process consisting of a sharpness calculation and a gradient-based variable step search method is applied to achieve an automatic detection system for honeycomb defects. This work also developed a test platform for sharpness measurement. It demonstrates that the honeycomb defects can be precisely recognized and the number of the defects can also be determined automatically during the inspection. Most importantly, the proposed approach significantly reduces the time consumption, as well as human assessment error during the night vision goggle inspection procedures.
- Published
- 2017
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.