Author: "Macocco, Iuri" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Macocco, Iuri"' showing total 12 results

Start Over Author "Macocco, Iuri"

12 results on '"Macocco, Iuri"'

1. Emergence of a High-Dimensional Abstraction Phase in Language Transformers

Author: Cheng, Emily, Doimo, Diego, Kervadec, Corentin, Macocco, Iuri, Yu, Jade, Laio, Alessandro, and Baroni, Marco
Subjects: Computer Science - Computation and Language
Abstract: A language model (LM) is a mapping from a linguistic context to an output token. However, much remains to be known about this mapping, including how its geometric properties relate to its function. We take a high-level geometric approach to its analysis, observing, across five pre-trained transformer-based LMs and three input datasets, a distinct phase characterized by high intrinsic dimensionality. During this phase, representations (1) correspond to the first full linguistic abstraction of the input; (2) are the first to viably transfer to downstream tasks; (3) predict each other across different LMs. Moreover, we find that an earlier onset of the phase strongly predicts better language modelling performance. In short, our results suggest that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures.
Published: 2024

2. Beyond the noise: intrinsic dimension estimation with optimal neighbourhood identification

Author: Di Noia, Antonio, Macocco, Iuri, Glielmo, Aldo, Laio, Alessandro, and Mira, Antonietta
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Statistics Theory, Statistics - Computation, Statistics - Methodology
Abstract: The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID depends on the scale at which the data are analysed. Quite typically at a small scale, the ID is very large, as the data are affected by measurement errors. At large scale, the ID can also be erroneously large, due to the curvature and the topology of the manifold containing the data. In this work, we introduce an automatic protocol to select the sweet spot, namely the correct range of scales in which the ID is meaningful and useful. This protocol is based on imposing that for distances smaller than the correct scale the density of the data is constant. In the presented framework, to estimate the density it is necessary to know the ID, therefore, this condition is imposed self-consistently. We derive theoretical guarantees and illustrate the usefulness and robustness of this procedure by benchmarks on artificial and real-world datasets.
Published: 2024

3. Intrinsic dimension as a multi-scale summary statistics in network modeling

Author: Macocco, Iuri, Mira, Antonietta, and Laio, Alessandro
Published: 2024
Full Text: View/download PDF

4. Intrinsic dimension estimation for discrete metrics

Author: Macocco, Iuri, Glielmo, Aldo, Grilli, Jacopo, and Laio, Alessandro
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Physics - Computational Physics
Abstract: Real world-datasets characterized by discrete features are ubiquitous: from categorical surveys to clinical questionnaires, from unweighted networks to DNA sequences. Nevertheless, the most common unsupervised dimensional reduction methods are designed for continuous spaces, and their use for discrete spaces can lead to errors and biases. In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces. We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting, finding a surprisingly small ID, of order 2. This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space., Comment: RevTeX4.2, 13 pages, 10 figures
Published: 2022
Full Text: View/download PDF

5. DADApy: Distance-based Analysis of DAta-manifolds in Python

Author: Glielmo, Aldo, Macocco, Iuri, Doimo, Diego, Carli, Matteo, Zeni, Claudio, Wild, Romina, d'Errico, Maria, Rodriguez, Alex, and Laio, Alessandro
Subjects: Computer Science - Machine Learning, Physics - Computational Physics, Statistics - Machine Learning
Abstract: DADApy is a python software package for analysing and characterising high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics. We review the main functionalities of the package and exemplify its usage in toy cases and in a real-world application. DADApy is freely available under the open-source Apache 2.0 license., Comment: 9 pages, 6 figures. Patterns (2022)
Published: 2022
Full Text: View/download PDF

6. Intrinsic Dimension Estimation for Discrete Metrics

Author: Macocco, Iuri, primary, Glielmo, Aldo, additional, Grilli, Jacopo, additional, and Laio, Alessandro, additional
Published: 2023
Full Text: View/download PDF

7. DADApy: Distance-based analysis of data-manifolds in Python

Author: Glielmo, Aldo, primary, Macocco, Iuri, additional, Doimo, Diego, additional, Carli, Matteo, additional, Zeni, Claudio, additional, Wild, Romina, additional, d’Errico, Maria, additional, Rodriguez, Alex, additional, and Laio, Alessandro, additional
Published: 2022
Full Text: View/download PDF

8. DADApy: Distance-based analysis of data-manifolds in Python

Author: Glielmo, Aldo; https://orcid.org/0000-0002-4737-2878, Macocco, Iuri, Doimo, Diego, Carli, Matteo, Zeni, Claudio, Wild, Romina, d’Errico, Maria, Rodriguez, Alex, Laio, Alessandro, Glielmo, Aldo; https://orcid.org/0000-0002-4737-2878, Macocco, Iuri, Doimo, Diego, Carli, Matteo, Zeni, Claudio, Wild, Romina, d’Errico, Maria, Rodriguez, Alex, and Laio, Alessandro
Abstract: Data are often represented via many thousands of features. Fortunately, in most applications, such high-dimensional spaces are very sparsely populated, and data points effectively live on low-dimensional “data manifolds.” This is the key reason behind the success of dimensionality reduction schemes, which, however, cannot be easily deployed on data manifolds with nontrivial geometries and topologies, where a set of coordinates capable of describing the manifold globally cannot exist. In these scenarios, one can analyze the data manifold directly, without an explicit dimensional reduction step, and compute fundamental properties, such as the intrinsic dimension of the manifold and the density of the points lying on it. DADApy implements a set of methods recently developed to this aim. DADApy is easy-to-use as it is written entirely in Python, but also computationally efficient as time-consuming routines are C-compiled through Cython.
Published: 2022

9. Slow Escape from a Helical Misfolded State of the Pore-Forming Toxin Cytolysin A

Author: Dingfelder, Fabian, Macocco, Iuri, Benke, Stephan, Nettels, Daniel, Faccioli, Pietro, Schuler, Benjamin, Dingfelder, F, Macocco, I, Benke, S, Nettels, D, Faccioli, P, Schuler, B, and University of Zurich
Subjects: Chemistry, molecular dynamics simulation, single-molecule spectroscopy, protein folding, 10019 Department of Biochemistry, microfluidic mixing, 570 Life sciences, biology, 610 Medicine & health, molecular dynamics simulations, QD1-999, Article
Abstract: The pore-forming toxin cytolysin A (ClyA) is expressed as a large α-helical monomer that, upon interaction with membranes, undergoes a major conformational rearrangement into the protomer conformation, which then assembles into a cytolytic pore. Here, we investigate the folding kinetics of the ClyA monomer with single-molecule Förster resonance energy transfer spectroscopy in combination with microfluidic mixing, stopped-flow circular dichroism experiments, and molecular simulations. The complex folding process occurs over a broad range of time scales, from hundreds of nanoseconds to minutes. The very slow formation of the native state occurs from a rapidly formed and highly collapsed intermediate with large helical content and nonnative topology. Molecular dynamics simulations suggest pronounced non-native interactions as the origin of the slow escape from this deep trap in the free-energy surface, and a variational enhanced path-sampling approach enables a glimpse of the folding process that is supported by the experimental data.
Published: 2021

10. Slow Escape from a Helical Misfolded State of the Pore-Forming Toxin Cytolysin A

Author: Dingfelder, Fabian, primary, Macocco, Iuri, additional, Benke, Stephan, additional, Nettels, Daniel, additional, Faccioli, Pietro, additional, and Schuler, Benjamin, additional
Published: 2021
Full Text: View/download PDF

11. Eco-evolutionary dynamics lead to functionally robust and redundant communities

Author: Fant, Lorenzo, primary, Macocco, Iuri, additional, and Grilli, Jacopo, additional
Published: 2021
Full Text: View/download PDF

12. DADApy: Distance-based analysis of data-manifolds in Python

Author: Aldo Glielmo, Iuri Macocco, Diego Doimo, Matteo Carli, Claudio Zeni, Romina Wild, Maria d’Errico, Alex Rodriguez, Alessandro Laio, University of Zurich, Glielmo, Aldo, Laio, Alessandro, Macocco, Iuri, Doimo, Diego, Carli, Matteo, Zeni, Claudio, Wild, Romina, D'Errico, Maria, and Rodriguez, Alejandro
Subjects: FOS: Computer and information sciences, Computer Science - Machine Learning, 10051 Rheumatology Clinic and Institute of Physical Medicine, FOS: Physical sciences, General Decision Sciences, metric learning, Machine Learning (stat.ML), 610 Medicine & health, 10071 Functional Genomics Center Zurich, Computational Physics (physics.comp-ph), Machine Learning (cs.LG), manifold analysis, Settore FIS/03 - Fisica della Materia, intrinsic dimension, density estimation, density-based clustering, feature selection, Statistics - Machine Learning, manifold analysi, 570 Life sciences, biology, 1800 General Decision Sciences, Physics - Computational Physics
Abstract: DADApy is a Python software package for analyzing and characterizing high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering, and for comparing different distance metrics. We review the main functionalities of the package and exemplify its usage in a synthetic dataset and in a real-world application. DADApy is freely available under the open-source Apache 2.0 license., Patterns, 3 (10)
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

12 results on '"Macocco, Iuri"'

1. Emergence of a High-Dimensional Abstraction Phase in Language Transformers

2. Beyond the noise: intrinsic dimension estimation with optimal neighbourhood identification

3. Intrinsic dimension as a multi-scale summary statistics in network modeling

4. Intrinsic dimension estimation for discrete metrics

5. DADApy: Distance-based Analysis of DAta-manifolds in Python

6. Intrinsic Dimension Estimation for Discrete Metrics

7. DADApy: Distance-based analysis of data-manifolds in Python

8. DADApy: Distance-based analysis of data-manifolds in Python

9. Slow Escape from a Helical Misfolded State of the Pore-Forming Toxin Cytolysin A

10. Slow Escape from a Helical Misfolded State of the Pore-Forming Toxin Cytolysin A

11. Eco-evolutionary dynamics lead to functionally robust and redundant communities

12. DADApy: Distance-based analysis of data-manifolds in Python

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

12 results on '"Macocco, Iuri"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources