Author: "Smith, Matthew Beauregard" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Smith, Matthew Beauregard"' showing total 5 results

Start Over Author "Smith, Matthew Beauregard"

5 results on '"Smith, Matthew Beauregard"'

1. Estimating error rates for single molecule protein sequencing experiments.

Author: Smith, Matthew Beauregard, VanderVelden, Kent, Blom, Thomas, Stout, Heather D., Mapes, James H., Folsom, Tucker M., Martin, Christopher, Bardo, Angela M., and Marcotte, Edward M.
Subjects: *AMINO acid sequence, *ERROR rates, *SINGLE molecules, *EXPECTATION-maximization algorithms, *STANDARD deviations, *FLUOROPOLYMERS, *HIDDEN Markov models, *CHEMICAL sample preparation
Abstract: The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell's method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell's method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets. Author summary: Diverse new technologies are being developed for single-molecule protein sequencing, capable of identifying and quantifying mixtures of proteins at the level of individual molecules. There are many biochemical challenges intrinsic to high-throughput studies of proteins at such high sensitivity arising from their heterogeneous chemistries, sizes, and abundances. Beyond these challenges, the technologies themselves involve complex multi-step analytical processes. Thus, in developing and optimizing these technologies, it is important to consider the accuracy of each step and to have reliable approaches for estimating these accuracies. We focus on one particular single-molecule sequencing technology known as flourosequencing. We report and validate two methods for simultaneously determining the error-rates of each of the various steps of the fluorosequencing process. These new error estimation techniques will help researchers to better interpret the effects of changes to the chemistry and sample preparation used in fluorosequencing so that these steps can be improved. Further, more accurate determination of error rates will aid in the creation of better tools for the interpretation of this data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Estimating error rates for single molecule protein sequencing experiments

Author: Smith, Matthew Beauregard, primary, VanderVelden, Kent, additional, Blom, Thomas, additional, Stout, Heather D, additional, Mapes, James H, additional, Folsom, Tucker M, additional, Martin, Christopher, additional, Bardo, Angela M, additional, and Marcotte, Edward M., additional
Published: 2023
Full Text: View/download PDF

3. Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier

Author: Smith, Matthew Beauregard, primary, Simpson, Zack Booth, additional, and Marcotte, Edward M., additional
Published: 2022
Full Text: View/download PDF

4. Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier.

Author: Smith, Matthew Beauregard, Simpson, Zack Booth, and Marcotte, Edward M.
Subjects: *SINGLE molecules, *AMINO acid sequence, *TANDEM mass spectrometry, *CHEMICAL processes, *PEPTIDES, *INTERNET servers
Abstract: We present a machine learning-based interpretive framework (whatprot) for analyzing single molecule protein sequencing data produced by fluorosequencing, a recently developed proteomics technology that determines sparse amino acid sequences for many individual peptide molecules in a highly parallelized fashion. Whatprot uses Hidden Markov Models (HMMs) to represent the states of each peptide undergoing the various chemical processes during fluorosequencing, and applies these in a Bayesian classifier, in combination with pre-filtering by a k-Nearest Neighbors (kNN) classifier trained on large volumes of simulated fluorosequencing data. We have found that by combining the HMM based Bayesian classifier with the kNN pre-filter, we are able to retain the benefits of both, achieving both tractable runtimes and acceptable precision and recall for identifying peptides and their parent proteins from complex mixtures, outperforming the capabilities of either classifier on its own. Whatprot's hybrid kNN-HMM approach enables the efficient interpretation of fluorosequencing data using a full proteome reference database and should now also enable improved sequencing error rate estimates. Author summary: Scientists often wish to know which proteins, and at what quantities, are present in a sample. The field of proteomics offers a number of technologies that aid in this, such as tandem mass spectrometry and immunoassays, that provide different tradeoffs between sensitivity, throughput, and generality. One new technology, known as fluorosequencing, detects and provides partial sequences for individual peptide or protein molecules from a sample in a highly parallelized fashion. However, as only partial sequences are measured, the resulting sequencing reads must be matched to a reference database of possible proteins, such as might be obtained from the human genome. We describe a suitable computer algorithm for performing this matching of fluorosequencing reads to a reference database while accounting for the most prevalent types of sequencing errors. We detail its performance and implementation, and describe a number of uncommon algorithmic improvements and approximations which allow this approach to scale to classification against the whole human proteome. The resulting software, known as whatprot, allows researchers to interpret fluorosequencing reads and better apply this emergent single molecule protein sequencing technology. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Estimating error rates for single molecule protein sequencing experiments.

Author: Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, and Marcotte EM
Abstract: The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot , where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently, which should help prevent overfitting. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell's method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell's method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results on '"Smith, Matthew Beauregard"'

1. Estimating error rates for single molecule protein sequencing experiments.

2. Estimating error rates for single molecule protein sequencing experiments

3. Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier

4. Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier.

5. Estimating error rates for single molecule protein sequencing experiments.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

5 results on '"Smith, Matthew Beauregard"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources