2,281 results
Search Results
2. Extracting accurate materials data from research papers with conversational language models and prompt engineering
- Author
-
Maciej P. Polak and Dane Morgan
- Subjects
Science - Abstract
Abstract There has been a growing effort to replace manual extraction of data from research papers with automated data extraction based on natural language processing, language models, and recently, large language models (LLMs). Although these methods enable efficient extraction of data from large sets of research papers, they require a significant amount of up-front effort, expertise, and coding. In this work, we propose the ChatExtract method that can fully automate very accurate data extraction with minimal initial effort and background, using an advanced conversational LLM. ChatExtract consists of a set of engineered prompts applied to a conversational LLM that both identify sentences with data, extract that data, and assure the data’s correctness through a series of follow-up questions. These follow-up questions largely overcome known issues with LLMs providing factually inaccurate responses. ChatExtract can be applied with any conversational LLMs and yields very high quality data extraction. In tests on materials data, we find precision and recall both close to 90% from the best conversational LLMs, like GPT-4. We demonstrate that the exceptional performance is enabled by the information retention in a conversational model combined with purposeful redundancy and introducing uncertainty through follow-up prompts. These results suggest that approaches similar to ChatExtract, due to their simplicity, transferability, and accuracy are likely to become powerful tools for data extraction in the near future. Finally, databases for critical cooling rates of metallic glasses and yield strengths of high entropy alloys are developed using ChatExtract.
- Published
- 2024
- Full Text
- View/download PDF
3. Extracting accurate materials data from research papers with conversational language models and prompt engineering.
- Author
-
Polak, Maciej P. and Morgan, Dane
- Subjects
LANGUAGE models ,COLLOQUIAL language ,ENGINEERING models ,DATA extraction ,METALLIC glasses ,GENERATIVE pre-trained transformers ,NATURAL language processing - Abstract
There has been a growing effort to replace manual extraction of data from research papers with automated data extraction based on natural language processing, language models, and recently, large language models (LLMs). Although these methods enable efficient extraction of data from large sets of research papers, they require a significant amount of up-front effort, expertise, and coding. In this work, we propose the ChatExtract method that can fully automate very accurate data extraction with minimal initial effort and background, using an advanced conversational LLM. ChatExtract consists of a set of engineered prompts applied to a conversational LLM that both identify sentences with data, extract that data, and assure the data's correctness through a series of follow-up questions. These follow-up questions largely overcome known issues with LLMs providing factually inaccurate responses. ChatExtract can be applied with any conversational LLMs and yields very high quality data extraction. In tests on materials data, we find precision and recall both close to 90% from the best conversational LLMs, like GPT-4. We demonstrate that the exceptional performance is enabled by the information retention in a conversational model combined with purposeful redundancy and introducing uncertainty through follow-up prompts. These results suggest that approaches similar to ChatExtract, due to their simplicity, transferability, and accuracy are likely to become powerful tools for data extraction in the near future. Finally, databases for critical cooling rates of metallic glasses and yield strengths of high entropy alloys are developed using ChatExtract. Efficient data extraction from research papers accelerates science and engineering. Here, the authors develop an automated approach which uses conversational large language models to achieve high precision and recall in extracting materials data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Nature Communications from the point of view of our very first authors.
- Subjects
MATERIALS science ,DNA damage ,PERIODICAL publishing ,AUTHORS - Abstract
On the 12th of April 2010, Nature Communications published its first editorial and primary research articles. The topics of these first 11 papers represented the multidisciplinary nature of the journal: from DNA damage to optics alongside material science to energy and including polymer chemistry. We have spoken with the corresponding authors of some of these very first papers and asked them about their experience of publishing in this then new journal and how they see Nature Communications now. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Rapid and visual identification of β-lactamase subtypes for precision antibiotic therapy.
- Author
-
Li, Wenshuai, Li, Jingqi, Xu, Hua, Gao, Hongmei, and Liu, Dingbin
- Subjects
ANTIBIOTICS ,BACTERIAL diseases ,DRUG resistance in bacteria ,POINT-of-care testing ,SENSITIVITY & specificity (Statistics) - Abstract
The abuse of antibiotics urgently requires rapid identification of drug-resistant bacteria at the point of care (POC). Here we report a visual paper sensor that allows rapid (0.25-3 h) discrimination of the subtypes of β-lactamase (the major cause of bacterial resistance) for precision antibiotic therapy. The sensor exhibits high performance in identifying antibiotic-resistant bacteria with 100 real samples from patients with diverse bacterial infections, demonstrating 100% clinical sensitivity and specificity. Further, this sensor can enhance the accuracy of antibiotic use from 48% empirically to 83%, and further from 50.6% to 97.6% after eliminating fungal infection cases. Our work provides a POC testing platform for guiding effective management of bacterial infections in both hospital and community settings. The rapid identification of drug-resistant bacteria is vital for effective treatment and to avoid antibiotic misuse. Here authors report a paper-based sensor which utilises chromogenic carbapenem and cephalosporin substrates for the identification and discrimination of β-lactamase subtypes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Data encoding for healthcare data democratization and information leakage prevention.
- Author
-
Thakur, Anshul, Zhu, Tingting, Abrol, Vinayak, Armstrong, Jacob, Wang, Yujiang, and Clifton, David A.
- Subjects
DEEP learning ,DEMOCRATIZATION ,ENCODING ,LEAKAGE ,MEDICAL care - Abstract
The lack of data democratization and information leakage from trained models hinder the development and acceptance of robust deep learning-based healthcare solutions. This paper argues that irreversible data encoding can provide an effective solution to achieve data democratization without violating the privacy constraints imposed on healthcare data and clinical models. An ideal encoding framework transforms the data into a new space where it is imperceptible to a manual or computational inspection. However, encoded data should preserve the semantics of the original data such that deep learning models can be trained effectively. This paper hypothesizes the characteristics of the desired encoding framework and then exploits random projections and random quantum encoding to realize this framework for dense and longitudinal or time-series data. Experimental evaluation highlights that models trained on encoded time-series data effectively uphold the information bottleneck principle and hence, exhibit lesser information leakage from trained models. Healthcare data democratization is often hampered by privacy constraints governing the sensitive healthcare data. Here, the authors show that encoding healthcare data could be a potential solution for achieving healthcare democratization within the context of deep learning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Multi-site integrated optical addressing of trapped ions.
- Author
-
Kwon, Joonhyuk, Setzer, William J., Gehl, Michael, Karl, Nicholas, Van Der Wall, Jay, Law, Ryan, Blain, Matthew G., Stick, Daniel, and McGuinness, Hayden J.
- Abstract
One of the most effective ways to advance the performance of quantum computers and quantum sensors is to increase the number of qubits or quantum resources in the system. A major technical challenge that must be solved to realize this goal for trapped-ion systems is scaling the delivery of optical signals to many individual ions. In this paper we demonstrate an approach employing waveguides and multi-mode interferometer splitters to optically address multiple
171 Yb+ ions in a surface trap by delivering all wavelengths required for full qubit control. Measurements of hyperfine spectra and Rabi flopping were performed on the E2 clock transition, using integrated waveguides for delivering the light needed for Doppler cooling, state preparation, coherent operations, and detection. We describe the use of splitters to address multiple ions using a single optical input per wavelength and use them to demonstrate simultaneous Rabi flopping on two different transitions occurring at distinct trap sites. This work represents an important step towards the realization of scalable integrated photonics for atomic clocks and trapped-ion quantum information systems. A promising strategy for scaling trapped-ion-based quantum technologies is to use fully integrated optical waveguides to deliver light to numerous ions at multiple sites. Here, the authors. optically address three ions using on-chip waveguides to deliver three distinct wavelengths per ion, and perform Rabi flopping on each ion simultaneously. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
8. A sparse quantized hopfield network for online-continual memory.
- Author
-
Alonso, Nicholas and Krichmar, Jeffrey L.
- Abstract
An important difference between brains and deep neural networks is the way they learn. Nervous systems learn online where a stream of noisy data points are presented in a non-independent, identically distributed way. Further, synaptic plasticity in the brain depends only on information local to synapses. Deep networks, on the other hand, typically use non-local learning algorithms and are trained in an offline, non-noisy, independent, identically distributed setting. Understanding how neural networks learn under the same constraints as the brain is an open problem for neuroscience and neuromorphic computing. A standard approach to this problem has yet to be established. In this paper, we propose that discrete graphical models that learn via an online maximum a posteriori learning algorithm could provide such an approach. We implement this kind of model in a neural network called the Sparse Quantized Hopfield Network. We show our model outperforms state-of-the-art neural networks on associative memory tasks, outperforms these networks in online, continual settings, learns efficiently with noisy inputs, and is better than baselines on an episodic memory task. Brains and neuromorphic systems learn with local learning rules in online-continual learning scenarios. Designing neural networks that learn effectively under these conditions is challenging. The authors introduce a neural network that implements an effective, principled approach to local, online-continual learning on associative memory tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Higher emissions scenarios lead to more extreme flooding in the United States.
- Author
-
Kim, Hanbeen and Villarini, Gabriele
- Abstract
Understanding projected changes in flooding across the contiguous United States (CONUS) helps increase our capability to adapt to and mitigate against this hazard. Here, we assess future changes in flooding across CONUS using outputs from 28 global climate models and four scenarios of the Coupled Model Intercomparison Project Phase 6. We find that CONUS is projected to experience an overall increase in flooding, especially under higher emission scenarios; there are subregional differences, with the Northeast and Southeast (Great Plains of the North and Southwest) showing higher tendency towards increasing (decreasing) flooding due to changes in flood processes at the seasonal scale. Moreover, even though trends may not be detected in the historical period, these projected future trends highlight the current needs for incorporating climate change in the future infrastructure designs and management of the water resources. This paper assesses future changes in flood magnitude across the conterminous United States based on multiple climate change scenarios. The results suggest that annual maximum peak discharge is projected to become more extreme under higher emission scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Cross-modality mapping using image varifolds to align tissue-scale atlases to molecular-scale measures with application to 2D brain sections.
- Author
-
Stouffer, Kaitlin M., Trouvé, Alain, Younes, Laurent, Kunst, Michael, Ng, Lydia, Zeng, Hongkui, Anant, Manjari, Fan, Jean, Kim, Yongsoo, Chen, Xiaoyin, Rue, Mara, and Miller, Michael I.
- Subjects
TECHNOLOGICAL innovations ,TRANSCRIPTOMES ,THEORY of distributions (Functional analysis) ,RANDOM fields ,IMAGE representation - Abstract
This paper explicates a solution to building correspondences between molecular-scale transcriptomics and tissue-scale atlases. This problem arises in atlas construction and cross-specimen/technology alignment where specimens per emerging technology remain sparse and conventional image representations cannot efficiently model the high dimensions from subcellular detection of thousands of genes. We address these challenges by representing spatial transcriptomics data as generalized functions encoding position and high-dimensional feature (gene, cell type) identity. We map onto low-dimensional atlas ontologies by modeling regions as homogeneous random fields with unknown transcriptomic feature distribution. We solve simultaneously for the minimizing geodesic diffeomorphism of coordinates through LDDMM and for these latent feature densities. We map tissue-scale mouse brain atlases to gene-based and cell-based transcriptomics data from MERFISH and BARseq technologies and to histopathology and cross-species atlases to illustrate integration of diverse molecular and cellular datasets into a single coordinate system as a means of comparison and further atlas construction. Omics data's diversity and high-dimensionality challenge integration across technologies and with imaging. Here, authors introduce mapping method xIV-LDDMM that estimates geometric and feature transformations to integrate tissue-scale atlases with molecular and cellular-scale data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.