2,439 results on '"Baù, A"'
Search Results
2. Quantum theory of the effect of increasing weak electromagnetic wave by a strong laser radiation in 2D Graphene
- Author
-
Tran, Anh-Tuan, Nama, Nguyen Dinh, Nhan, Nguyen Thi Thanh, and Bau, Nguyen Quang
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Analytic expressions for the absorption coefficient (AC) of a weak electromagnetic wave (EMW) in 2D Graphene under influence of strong laser radiation are calculated using the quantum kinetic equation (QKE) in the case of electron-optical phonon scattering in both the absence and presence of a magnetic field perpendicular to the graphene sheet. The dependence of the AC on the intensity $E_{02}$ and the frequency $\Omega_2$ of a weak EMW, on the intensity $E_{01}$ and the frequency $\Omega_1$ of a strong laser radiation, on the temperature T of the system is obtained. These results are investigated from low temperature to high temperature. These results are obtained from the QKE method, which broke the limit of the Boltzmann kinetic equations (only investigated in the high-temperature domain). Besides, the numerical results show that the AC of a weak EMW in 2D Graphene can have negative values. This demonstrates the possibility of increasing weak EMW by strong laser radiation in 2D Graphene. This is different from a similar problem in bulk semiconductors and the case without strong laser radiation. In the case of the presence of an external magnetic field, the numerical calculation results also show the appearance of the peak spectral lines that obey the magneto-phonon resonance conditions. The appearance of these resonance peaks provides a model illustrating the dependence of the Half-Width at Half Maximum (HWHM) on the external magnetic field. This is an important criterion for the fabrication of graphene-related electronic components and orientation for future experiments.
- Published
- 2024
- Full Text
- View/download PDF
3. Influence of Magnetic Field and Temperature on Half Width at Half Maximum of Multi-photon Absorption Spectrum in Two-dimensional Graphene
- Author
-
Ba, Cao Thi Vi, Bau, Nguyen Quang, Nam, Nguyen Dinh, Tran, Anh-Tuan, and Huong, Nguyen Thu
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
We use the Profile numerical method to calculate the spectral line width, or half width at half maximum (HWHM) of the absorption peaks of multi-photon absorption processes in a two-dimensional graphene system (2DGS) according to important external parameters such as magnetic field and temperature in the presence of strong electromagnetic waves (SEMW). The appearance of these absorption peaks is theoretically obtained from magneto-phonon resonance conditions within the framework of the quantum kinetic equation. The results take into account both scattering mechanisms: electron-optical phonon and electron-acoustic phonon. Under the influence of the magnetic field, according to the increasing photon energy of the SEMW, the graph showing the dependence of the multi-photon nonlinear absorption coefficient on photon energy has the form of absorption spectrum lines following magneto-phonon resonance conditions. When increasing the value of the external magnetic field and the intensity of the SEMW, the intensity of the resonance peaks increases. In addition, the HWHM W of the resonance peaks of multi-photon absorption processes increases with increasing magnetic field $\mathrm{B}$ according to the square root law $\mathrm{W} = \kappa\sqrt{\mathrm{B}}$ but is independent of temperature. The value of the HWHM of the one-photon absorption process is larger than the value of the HWHM of the multi-photon absorption processes. The calculations of the HWHM of the one-photon absorption process in this paper are consistent with previous experimental observations and theoretical calculations. Thus, our calculations of the HWHM of multi-photon absorption processes can serve as reliable predictions for future experiments.
- Published
- 2024
- Full Text
- View/download PDF
4. Two-Dimensional Graphene: Theoretical Study of Multi-photon Non-linear Absorption Coefficient of a Strong Electromagnetic Wave by Using Quantum Kinetic Equation
- Author
-
Tran, Anh-Tuan, Bau, Nguyen Quang, Nam, Nguyen Dinh, Ba, Cao Thi Vi, and Nhan, Nguyen Thi Thanh
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science - Abstract
Based on the quantum kinetic equation for electrons, we theoretically study the quantum multi-photon non-linear absorption of a strong electromagnetic wave (EMW) in two-dimensional graphene. Two cases of the electron scattering mechanism are considered: Electron-optical phonon scattering and electron-acoustic phonon scattering. The general multi-photon absorption coefficient is presented as a function of the temperature, the external magnetic field, the photon energy and the amplitude of external EMW. These analytical expressions for multi-photon non-linear absorption coefficient (MNAC) are numerically calculated and the results are discussed in both the absence and presence of a magnetic field perpendicular to the graphene sheet. The results show that there is no absorption peak in the absence of the magnetic field, which contrasts with previous results in 2D systems such as quantum wells or superlattices. However, when there is a strong magnetic field along the direction perpendicular to the 2D graphene, absorption spectral lines appear consistent with the magneto-phonon resonance conditions. Our calculations show that the MPA's effect is stronger than mono-photon absorption. Besides, the quantum multi-photon non-linear absorption phenomenon has been studied from low to high temperatures. This transcends the limits of the classical BKE which is studied in the high-temperature domain. The computational results show that the dependence of MNAC on the above quantities is consistent with the previous theoretical investigation. Another novel feature of this work is that the general analytic expression for MNAC shows the Half Width at Half Maximum dependence on the magnetic field which is in good agreement with the previous experimental observations. Thus, our estimation might give a critical prediction for future experimental observations in 2D graphene.
- Published
- 2024
- Full Text
- View/download PDF
5. Theoretical study of Magnetoresistance Oscillations in Semi-parabolic Plus Semi-inverse Squared Quantum Wells in the Presence of Intense Electromagnetic Waves
- Author
-
Huong, Nguyen Thu, Bau, Nguyen Quang, Ba, Cao Thi Vi, Dung, Bui Thi, Toan, Nguyen Cong, and Tran, Anh-Tuan
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Magnetoresistance oscillations in semiconductor quantum wells, with the semi-parabolic plus semi-inverse squared potential, under the influence of intense electromagnetic waves (IEMW), is studied theoretically. Analytical expression for the longitudinal magnetoresistance (LMR) is derived from the quantum kinetic equation for electrons, using the Fr\"ohlich Hamiltonian of the electron-acoustic phonon system. Numerical calculation results show the complex dependence of LMR on the parameters of the external field (electric, magnetic field and temperature) as well as the structure parameters of the confinement potential. In the absence of IMEW, Shubnikov-de Haas (SdH) oscillations appear with amplitudes that decrease with temperature in agreement with previous theoretical and experimental results. In the presence of IEMW, the SdH oscillations appear in beats with amplitudes that increase with the intensity of the IEMW. SdH oscillations under the influence of electromagnetic waves are called microwave-induced magnetoresistance oscillations. The maximum and minimum peaks appear at the positions where the IEMW frequencies are integer and half-integer values of the cyclotron frequency, respectively. In addition, the structural parameters of the quantum well such as the confinement frequency and the geometrical parameters have a significant influence on the LMR as well as the SdH oscillations. When the confinement frequency is small, the two-dimensional electronic system in the quantum well behaves as a bulk semiconductor, resulting in the absence of SdH oscillations. In addition, the LMR increases with the geometrical parameter $\beta_z$ of the quantum well. The obtained results provide a solid theoretical foundation for the possibility of controlling SdH oscillations by IEMW as well as the structural properties of materials in future experimental observations., Comment: Physica Scripta
- Published
- 2024
- Full Text
- View/download PDF
6. Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
- Author
-
Cooper, A. Feder, Choquette-Choo, Christopher A., Bogen, Miranda, Jagielski, Matthew, Filippova, Katja, Liu, Ken Ziyu, Chouldechova, Alexandra, Hayes, Jamie, Huang, Yangsibo, Mireshghallah, Niloofar, Shumailov, Ilia, Triantafillou, Eleni, Kairouz, Peter, Mitchell, Nicole, Liang, Percy, Ho, Daniel E., Choi, Yejin, Koyejo, Sanmi, Delgado, Fernando, Grimmelmann, James, Shmatikov, Vitaly, De Sa, Christopher, Barocas, Solon, Cyphert, Amy, Lemley, Mark, boyd, danah, Vaughan, Jennifer Wortman, Brundage, Miles, Bau, David, Neel, Seth, Jacobs, Abigail Z., Terzis, Andreas, Wallach, Hanna, Papernot, Nicolas, and Lee, Katherine
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
We articulate fundamental mismatches between technical methods for machine unlearning in Generative AI, and documented aspirations for broader impact that these methods could have for law and policy. These aspirations are both numerous and varied, motivated by issues that pertain to privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effects of targeted information from a generative-AI model's parameters, e.g., a particular individual's personal data or in-copyright expression of Spiderman that was included in the model's training data. Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of "Spiderman." Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges. We provide a framework for thinking rigorously about these challenges, which enables us to be clear about why unlearning is not a general-purpose solution for circumscribing generative-AI model behavior in service of broader positive impact. We aim for conceptual clarity and to encourage more thoughtful communication among machine learning (ML), law, and policy experts who seek to develop and apply technical methods for compliance with policy objectives., Comment: Presented at the 2nd Workshop on Generative AI and Law at ICML (July 2024)
- Published
- 2024
7. Art-Free Generative Models: Art Creation Without Graphic Art Knowledge
- Author
-
Ren, Hui, Materzynska, Joanna, Gandikota, Rohit, Bau, David, and Torralba, Antonio
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We explore the question: "How much prior art knowledge is needed to create art?" To investigate this, we propose a text-to-image generation model trained without access to art-related content. We then introduce a simple yet effective method to learn an art adapter using only a few examples of selected artistic styles. Our experiments show that art generated using our method is perceived by users as comparable to art produced by models trained on large, art-rich datasets. Finally, through data attribution techniques, we illustrate how examples from both artistic and non-artistic datasets contributed to the creation of new artistic styles.
- Published
- 2024
8. Interface second harmonic generation enhancement in hetero-bilayer van der Waals nanoantennas
- Author
-
Tognazzi, Andrea, Franceschini, Paolo, Biechteler, Jonas, Baù, Enrico, Cino, Alfonso Carmelo, Tittl, Andreas, De Angelis, Costantino, and Sortino, Luca
- Subjects
Physics - Optics ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Layered van der Waals (vdW) materials have emerged as a promising platform for nanophotonics due to large refractive indexes and giant optical anisotropy. Unlike conventional dielectrics and semiconductors, the absence of covalent bonds between layers allows for novel degrees of freedom in designing optically resonant nanophotonic structures down to the atomic scale, from the precise stacking of vertical heterostructures to controlling the twist angle between crystallographic axes. Specifically, while transition metal dichalcogenides monolayers exhibit giant second order nonlinear responses, their bulk counterparts with 2H stacking have zero second order response. In this work, we show second harmonic generation (SHG) arising from the interface of WS$_2$/MoS$_2$ hetero-bilayer thin films with an additional SHG enhancement in nanostructured optical antennas mediated by both the excitonic resonances and the anapole condition. When both conditions are met, we observe up to $10^2$ SHG signal enhancement. Our results highlights vdW materials as a platform for designing unique multilayer optical nanostructures and metamaterial, paving the way for advanced applications in nanophotonics and nonlinear optics., Comment: Manuscript + Supplementary (21 pages, 3 Main figures 8 supplementary figures)
- Published
- 2024
9. A novel conjunction filter based on the minimum distance between perturbed trajectories
- Author
-
Rivero, Ana S., Baù, Giulio, Vazquez, Rafael, and Bombardelli, Claudio
- Subjects
Astrophysics - Earth and Planetary Astrophysics ,Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
The increasing congestion in the near-Earth space environment has amplified the need for robust and efficient conjunction analysis techniques including the computation of the minimum distance between orbital paths in the presence of perturbations. After showing that classical Minimum Orbit Intersection Distance (MOID) computation schemes are unsuitable to treat Earth orbiting objects, the article presents an analytical approach to provide a more accurate estimate of the true distance between perturbed trajectories by incorporating the effect of zonal harmonics of arbitrary order. Cook's linear secular theory for the motion of the eccentricity vector is extended to include higher order eccentricity effects and applied to the computation of the minimum and maximum radii attained by two orbits at their mutual nodes, which can be employed to estimate the true distance between the two orbital paths and to establish an efficient algorithm for determining or excluding potential conjunctions. Extensive testing and validation are conducted using a high-fidelity propagator and a comprehensive dataset of resident space objects. The results demonstrate an accuracy below the km level for the orbit distance computation in 99\% of cases, which enables high-efficiency conjunction filtering.
- Published
- 2024
10. Erasing Conceptual Knowledge from Language Models
- Author
-
Gandikota, Rohit, Feucht, Sheridan, Marks, Samuel, and Bau, David
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Concept erasure in language models has traditionally lacked a comprehensive evaluation framework, leading to incomplete assessments of effectiveness of erasure methods. We propose an evaluation paradigm centered on three critical criteria: innocence (complete knowledge removal), seamlessness (maintaining conditional fluent generation), and specificity (preserving unrelated task performance). Our evaluation metrics naturally motivate the development of Erasure of Language Memory (ELM), a new method designed to address all three dimensions. ELM employs targeted low-rank updates to alter output distributions for erased concepts while preserving overall model capabilities including fluency when prompted for an erased concept. We demonstrate ELM's efficacy on biosecurity, cybersecurity, and literary domain erasure tasks. Comparative analysis shows that ELM achieves superior performance across our proposed metrics, including near-random scores on erased topic assessments, generation fluency, maintained accuracy on unrelated benchmarks, and robustness under adversarial attacks. Our code, data, and trained models are available at https://elm.baulab.info, Comment: Project Page: https://elm.baulab.info
- Published
- 2024
11. The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
- Author
-
Mueller, Aaron, Brinkmann, Jannik, Li, Millicent, Marks, Samuel, Pal, Koyena, Prakash, Nikhil, Rager, Can, Sankaranarayanan, Aruna, Sharma, Arnab Sen, Sun, Jiuding, Todd, Eric, Bau, David, and Belinkov, Yonatan
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the basic causal units underlying these mechanisms are often not explicitly defined. In this paper, we propose a perspective on interpretability research grounded in causal mediation analysis. Specifically, we describe the history and current state of interpretability taxonomized according to the types of causal units (mediators) employed, as well as methods used to search over mediators. We discuss the pros and cons of each mediator, providing insights as to when particular kinds of mediators and search methods are most appropriate depending on the goals of a given study. We argue that this framing yields a more cohesive narrative of the field, as well as actionable insights for future work. Specifically, we recommend a focus on discovering new mediators with better trade-offs between human-interpretability and compute-efficiency, and which can uncover more sophisticated abstractions from neural networks than the primarily linear mediators employed in current work. We also argue for more standardized evaluations that enable principled comparisons across mediator types, such that we can better understand when particular causal units are better suited to particular use cases.
- Published
- 2024
12. Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
- Author
-
Karvonen, Adam, Wright, Benjamin, Rager, Can, Angell, Rico, Brinkmann, Jannik, Smith, Logan, Verdun, Claudio Mayrink, Bau, David, and Marks, Samuel
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
What latent features are encoded in language model (LM) representations? Recent work on training sparse autoencoders (SAEs) to disentangle interpretable features in LM representations has shown significant promise. However, evaluating the quality of these SAEs is difficult because we lack a ground-truth collection of interpretable features that we expect good SAEs to recover. We thus propose to measure progress in interpretable dictionary learning by working in the setting of LMs trained on chess and Othello transcripts. These settings carry natural collections of interpretable features -- for example, "there is a knight on F3" -- which we leverage into $\textit{supervised}$ metrics for SAE quality. To guide progress in interpretable dictionary learning, we introduce a new SAE training technique, $\textit{p-annealing}$, which improves performance on prior unsupervised metrics as well as our new metrics., Comment: Accepted as an oral paper (top 5%) at the ICML 2024 Mechanistic Interpretability Workshop and to the NeurIPS 2024 Main Conference
- Published
- 2024
13. NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
- Author
-
Fiotto-Kaufman, Jaden, Loftus, Alexander R., Todd, Eric, Brinkmann, Jannik, Pal, Koyena, Troitskii, Dmitrii, Ripa, Michael, Belfki, Adam, Rager, Can, Juang, Caden, Mueller, Aaron, Marks, Samuel, Sharma, Arnab Sen, Lucchetti, Francesca, Prakash, Nikhil, Brodley, Carla, Guha, Arjun, Bell, Jonathan, Wallace, Byron C., and Bau, David
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. NDIF is a scalable inference service that executes NNsight requests, allowing users to share GPU resources and pretrained models. These technologies are enabled by the intervention graph, an architecture developed to decouple experiment design from model runtime. Together, this framework provides transparent and efficient access to the internals of deep neural networks such as very large language models (LLMs) without imposing the cost or complexity of hosting customized models individually. We conduct a quantitative survey of the machine learning literature that reveals a growing gap in the study of the internals of large-scale AI. We demonstrate the design and use of our framework to address this gap by enabling a range of research methods on huge models. Finally, we conduct benchmarks to compare performance with previous approaches. Code documentation, and materials are available at https://nnsight.net/., Comment: Code at https://nnsight.net
- Published
- 2024
14. Theoretical Study of the Photo-stimulated Radio-electric Effect in Asymmetric Semi-parabolic Quantum Wells
- Author
-
Ba, Cao Thi Vi, Bau, Nguyen Quang, Huong, Nguyen Thu, Dung, Bui Thi, and Tran, Anh-Tuan
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
In this study, based on the quantum kinetic equation approach, we systematically present the radio-electric effect in asymmetric semi-parabolic quantum wells under the influence of a laser radiation field taking into account the electron-longitudinal optical phonon scattering mechanism. The numerical results show that the blue-shift of the maximum peaks in the photon energy range is less than 60 meV. The height of maximum peaks increases according to an exponential rule, depending nonlinearly on the structural parameters of the asymmetric semi-parabolic quantum wells. In the photon energy range greater than 100 meV, the saturated radio-electric field increases with temperature and geometric parameters of the quantum well. The results show the differences between symmetric and asymmetric semi-parabolic quantum wells, highlighting the influence of asymmetric structures on radio-electric effects in two-dimensional quantum well systems., Comment: Communication in Theoretical Physics 2024
- Published
- 2024
15. Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs
- Author
-
Feucht, Sheridan, Atkinson, David, Wallace, Byron, and Bau, David
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning ,I.2.7 - Abstract
LLMs process text as sequences of tokens that roughly correspond to words, where less common words are represented by multiple tokens. However, individual tokens are often semantically unrelated to the meanings of the words/concepts they comprise. For example, Llama-2-7b's tokenizer splits the word "northeastern" into the tokens ['_n', 'ort', 'he', 'astern'], none of which correspond to semantically meaningful units like "north" or "east." Similarly, the overall meanings of named entities like "Neil Young" and multi-word expressions like "break a leg" cannot be directly inferred from their constituent tokens. Mechanistically, how do LLMs convert such arbitrary groups of tokens into useful higher-level representations? In this work, we find that last token representations of named entities and multi-token words exhibit a pronounced "erasure" effect, where information about previous and current tokens is rapidly forgotten in early layers. Using this observation, we propose a method to "read out" the implicit vocabulary of an autoregressive LLM by examining differences in token representations across layers, and present results of this method for Llama-2-7b and Llama-3-8B. To our knowledge, this is the first attempt to probe the implicit vocabulary of an LLM., Comment: 13 pages, 14 figures. Code and data at https://footprints.baulab.info/
- Published
- 2024
16. Transient infrared nanoscopy resolves the millisecond photoswitching dynamics of single lipid vesicles in water
- Author
-
Gölz, Thorsten, Baù, Enrico, Zhang, Jinhua, Kaltenecker, Korbinian, Trauner, Dirk, Maier, Stefan A., Keilmann, Fritz, Lohmüller, Theobald, and Tittl, Andreas
- Subjects
Physics - Optics ,Physics - Applied Physics - Abstract
Understanding the biophysical and biochemical properties of molecular nanocarriers under physiological conditions and with minimal interference is crucial for advancing nanomedicine, photopharmacology, drug delivery, nanotheranostics and synthetic biology. Yet, analytical methods struggle to combine precise chemical imaging and measurements without perturbative labeling. This challenge is exemplified for azobenzene-based photoswitchable lipids, which are intriguing reagents for controlling nanocarrier properties on fast timescales, enabling, e.g., precise light-induced drug release processes. Here, we leverage the chemical recognition and high spatio-temporal resolution of scattering-type scanning near-field optical microscopy (s-SNOM) to demonstrate non-destructive, label-free mid-infrared (MIR) imaging and spectroscopy of photoswitchable liposomes below the diffraction limit and the tracking of their dynamics down to 50 ms resolution. The vesicles are adsorbed on an ultrathin 10-nm SiN membrane, which separates the sample space from the tip space for stable and hour-long observations. By implementing a transient nanoscopy approach, we accurately resolve, for the first time, photoinduced changes in both the shape and the MIR spectral signature of individual vesicles and reveal abrupt change dynamics of the underlying photoisomerization process. Our findings highlight the methods potential for future studies on the complex dynamics of unlabeled nanoscale soft matter, as well as, in a broader context, for host-guest systems, energy materials or drugs., Comment: 4 figures, 10 supplementary figures
- Published
- 2024
17. Customizing Text-to-Image Models with a Single Image Pair
- Author
-
Jones, Maxwell, Wang, Sheng-Yu, Kumari, Nupur, Bau, David, and Zhu, Jun-Yan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics ,Computer Science - Machine Learning - Abstract
Art reinterpretation is the practice of creating a variation of a reference work, making a paired artwork that exhibits a distinct artistic style. We ask if such an image pair can be used to customize a generative model to capture the demonstrated stylistic difference. We propose Pair Customization, a new customization method that learns stylistic difference from a single image pair and then applies the acquired style to the generation process. Unlike existing methods that learn to mimic a single concept from a collection of images, our method captures the stylistic difference between paired images. This allows us to apply a stylistic change without overfitting to the specific image content in the examples. To address this new task, we employ a joint optimization method that explicitly separates the style and content into distinct LoRA weight spaces. We optimize these style and content weights to reproduce the style and content images while encouraging their orthogonality. During inference, we modify the diffusion process via a new style guidance based on our learned weights. Both qualitative and quantitative experiments show that our method can effectively learn style while avoiding overfitting to image content, highlighting the potential of modeling such stylistic differences from a single image pair., Comment: project page: https://paircustomization.github.io/
- Published
- 2024
18. Performance Enhancement of Pump Parameters Through Innovative Spring Design of Mechanical Seal for a Heat Exchanger
- Author
-
Ramu, I., Suresh Bau, G., Manikanta, N. V. V., Venu, M., Chaari, Fakher, Series Editor, Gherardini, Francesco, Series Editor, Ivanov, Vitalii, Series Editor, Haddar, Mohamed, Series Editor, Cavas-Martínez, Francisco, Editorial Board Member, di Mare, Francesca, Editorial Board Member, Kwon, Young W., Editorial Board Member, Tolio, Tullio A. M., Editorial Board Member, Trojanowska, Justyna, Editorial Board Member, Schmitt, Robert, Editorial Board Member, Xu, Jinyang, Editorial Board Member, Deepak, B B V L, editor, Bahubalendruni, M.V.A. Raju, editor, Parhi, D.R.K., editor, and Biswal, B. B., editor
- Published
- 2025
- Full Text
- View/download PDF
19. Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
- Author
-
Gandikota, Rohit, Materzyńska, Joanna, Zhou, Tingrui, Torralba, Antonio, Bau, David, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
20. Revealing mode formation in quasi-bound states in the continuum metasurfaces via near-field optical microscopy
- Author
-
Gölz, Thorsten, Baù, Enrico, Aigner, Andreas, Mancini, Andrea, Barkey, Martin, Keilmann, Fritz, Maier, Stefan A., and Tittl, Andreas
- Subjects
Physics - Optics ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Photonic metasurfaces offer exceptional control over light at the nanoscale, facilitating applications spanning from biosensing, and nonlinear optics to photocatalysis. Many metasurfaces, especially resonant ones, rely on periodicity for the collective mode to form, which makes them subject to the influences of finite size effects, defects, and edge effects, all of which have considerable negative impact at the application level. These aspects are especially important for quasi-bound state in the continuum (BIC) metasurfaces, for which the collective mode is highly sensitive to perturbations due to high quality factors and strong near-field enhancement. Here, we quantitatively investigate the mode formation in quasi-BIC metasurfaces on the individual resonator level using scattering scanning near-field optical microscopy (s-SNOM) in combination with a new image processing technique. We find that the quasi-BIC mode is formed at a minimum size of 10 x 10-unit cells much smaller than expected from far-field measurements. Furthermore, we show that the coupling direction of the resonators, defects and edge states have pronounced influence on the quasi-BIC mode. This study serves as a link between the far-field and near-field responses of metasurfaces, offering crucial insights for optimizing spatial footprint and active area, holding promise for augmenting applications such as catalysis and biospectroscopy., Comment: 30 pages, 6 figures, 8 supplementary figures
- Published
- 2024
21. Theory of local $\mathbb{Z}_{2}$ topological markers for finite and periodic two-dimensional systems
- Author
-
Baù, Nicolas and Marrazzo, Antimo
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Disordered Systems and Neural Networks ,Condensed Matter - Materials Science - Abstract
The topological phases of two-dimensional time-reversal symmetric insulators are classified by a $\mathbb{Z}_{2}$ topological invariant. Usually, the invariant is introduced and calculated by exploiting the way time-reversal symmetry acts in reciprocal space, hence implicitly assuming periodicity and homogeneity. Here, we introduce two space-resolved $\mathbb{Z}_{2}$ topological markers that are able to probe the local topology of the ground-state electronic structure also in the case of inhomogeneous and finite systems. The first approach leads to a generalized local spin-Chern marker, that usually remains well-defined also when the perpendicular component of the spin, $S_{z}$, is not conserved. The second marker is solely based on time-reversal symmetry, hence being more general. We validate our markers on the Kane-Mele model both in periodic and open boundary conditions, also in presence of disorder and including topological/trivial heterojunctions., Comment: 13 pages, 7 figures
- Published
- 2024
- Full Text
- View/download PDF
22. Locating and Editing Factual Associations in Mamba
- Author
-
Sharma, Arnab Sen, Atkinson, David, and Bau, David
- Subjects
Computer Science - Computation and Language - Abstract
We investigate the mechanisms of factual recall in the Mamba state space model. Our work is inspired by previous findings in autoregressive transformer language models suggesting that their knowledge recall is localized to particular modules at specific token locations; we therefore ask whether factual recall in Mamba can be similarly localized. To investigate this, we conduct four lines of experiments on Mamba. First, we apply causal tracing or interchange interventions to localize key components inside Mamba that are responsible for recalling facts, revealing that specific components within middle layers show strong causal effects at the last token of the subject, while the causal effect of intervening on later layers is most pronounced at the last token of the prompt, matching previous findings on autoregressive transformers. Second, we show that rank-one model editing methods can successfully insert facts at specific locations, again resembling findings on transformer LMs. Third, we examine the linearity of Mamba's representations of factual relations. Finally we adapt attention-knockout techniques to Mamba in order to dissect information flow during factual recall. We compare Mamba directly to a similar-sized autoregressive transformer LM and conclude that despite significant differences in architectural approach, when it comes to factual recall, the two architectures share many similarities., Comment: 18 pages, COLM-2024
- Published
- 2024
23. Nanoscale mechanical manipulation of ultrathin SiN membranes enabling infrared near-field microscopy of liquid-immersed samples
- Author
-
Baù, Enrico, Gölz, Thorsten, Benoit, Martin, Tittl, Andreas, and Keilmann, Fritz
- Subjects
Physics - Optics - Abstract
Scattering scanning near-field optical microscopy (s-SNOM) is a powerful technique for mid-infrared spectroscopy at nanometer length scales. By investigating objects in aqueous environments through ultrathin membranes, s-SNOM has recently been extended towards label-free nanoscopy of the dynamics of living cells and nanoparticles, assessing both the optical and the mechanical interactions between the tip, the membrane and the liquid suspension underneath. Here, we report that the tapping AFM tip induces a reversible nanometric deformation of the membrane manifested as either an indentation or protrusion. This mechanism depends on the driving force of the tapping cantilever, which we exploit to minimize topographical deformations of the membrane to improve optical measurements. Furthermore, we show that the tapping phase, or phase delay between driving signal and tip oscillation, is a highly sensitive observable for quantifying the mechanics of adhering objects, exhibiting highest contrast for low tapping amplitudes where the membrane remains nearly flat. We correlate mechanical responses with simultaneously recorded spectroscopy data to reveal the thickness of nanometric water pockets between membrane and adhering objects. Besides a general applicability of depth profiling, our technique holds great promise for studying mechano-active biopolymers and living cells, biomaterials that exhibit complex behaviors when under a mechanical load., Comment: 31 pages, 7 figures, 7 supplementary figures
- Published
- 2024
24. Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
- Author
-
Marks, Samuel, Rager, Can, Michaud, Eric J., Belinkov, Yonatan, Bau, David, and Mueller, Aaron
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse feature circuits enable detailed understanding of unanticipated mechanisms. Because they are based on fine-grained units, sparse feature circuits are useful for downstream tasks: We introduce SHIFT, where we improve the generalization of a classifier by ablating features that a human judges to be task-irrelevant. Finally, we demonstrate an entirely unsupervised and scalable interpretability pipeline by discovering thousands of sparse feature circuits for automatically discovered model behaviors., Comment: Code and data at https://github.com/saprmarks/feature-circuits. Demonstration at https://feature-circuits.xyz
- Published
- 2024
25. Model Lakes
- Author
-
Pal, Koyena, Bau, David, and Miller, Renée J.
- Subjects
Computer Science - Databases ,Computer Science - Artificial Intelligence - Abstract
Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand and choose models. However, not all models have complete and reliable documentation. As the number of machine learning models increases, this issue of finding, differentiating, and understanding models is becoming more crucial. Inspired from research on data lakes, we introduce and define the concept of model lakes. We discuss fundamental research challenges in the management of large models. And we discuss what principled data management techniques can be brought to bear on the study of large model management.
- Published
- 2024
26. Origin of optical nonlinearity in plasmonic semiconductor nanostructures
- Author
-
Rossetti, Andrea, Hu, Huatian, Venanzi, Tommaso, Bousseksou, Adel, De Luca, Federico, Deckert, Thomas, Giliberti, Valeria, Pea, Marialilia, Sagnes, Isabelle, Beaudoin, Gregoire, Biagioni, Paolo, Baù, Enrico, Maier, Stefan A., Tittl, Andreas, Brida, Daniele, Colombelli, Raffaele, Ortolani, Michele, and Ciracì, Cristian
- Subjects
Physics - Optics ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
The development of nanoscale nonlinear elements in photonic integrated circuits is hindered by the physical limits to the nonlinear optical response of dielectrics, which requires that the interacting waves propagate in transparent volumes for distances much longer than their wavelength. Here we present experimental evidence that optical nonlinearities in doped semiconductors are due to free-electron and their efficiency could exceed by several orders of magnitude that of conventional dielectric nonlinearities. Our experimental findings are supported by comprehensive computational results based on the hydrodynamic modeling, which naturally includes nonlocal effects, of the free-electron dynamics in heavily doped semiconductors. By studying third-harmonic generation from plasmonic nanoantenna arrays made out of heavily n-doped InGaAs with increasing levels of free-carrier density, we discriminate between hydrodynamic and dielectric nonlinearities. As a result, the value of maximum nonlinear efficiency as well as its spectral location can now be controlled by tuning the doping level. Having employed the common material platform InGaAs/InP that supports integrated waveguides, our findings pave the way for future exploitation of plasmonic nonlinearities in all-semiconductor photonic integrated circuits.
- Published
- 2024
27. Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
- Author
-
Prakash, Nikhil, Shaham, Tamar Rott, Haklay, Tal, Belinkov, Yonatan, and Bau, David
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
Fine-tuning on generalized tasks such as instruction following, code generation, and mathematics has been shown to enhance language models' performance on a range of tasks. Nevertheless, explanations of how such fine-tuning influences the internal computations in these models remain elusive. We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models fine-tuned on mathematics have substantial performance gains. We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking. In fact, the entity tracking circuit of the original model on the fine-tuned versions performs better than the full original model. (ii) The circuits of all the models implement roughly the same functionality: Entity tracking is performed by tracking the position of the correct entity in both the original model and its fine-tuned versions. (iii) Performance boost in the fine-tuned models is primarily attributed to its improved ability to handle the augmented positional information. To uncover these findings, we employ: Patch Patching, DCM, which automatically detects model components responsible for specific semantics, and CMAP, a new approach for patching activations across models to reveal improved mechanisms. Our findings suggest that fine-tuning enhances, rather than fundamentally alters, the mechanistic operation of the model., Comment: ICLR 2024. 26 pages, 13 figures. Code and data at https://finetuning.baulab.info/
- Published
- 2024
28. AGN properties of ~1 million member galaxies of galaxy groups and clusters at z < 1.4 based on the Subaru Hyper Suprime-Cam survey
- Author
-
Toba, Yoshiki, Hashiguchi, Aoi, Ota, Naomi, Oguri, Masamune, Okabe, Nobuhiro, Ueda, Yoshihiro, Imanishi, Masatoshi, Nishizawa, Atsushi J., Goto, Tomotsugu, Hsieh, Bau-Ching, Kondo, Marie, Koyama, Shuhei, Lee, Kianhong, Mitsuishi, Ikuyuki, Nagao, Tohru, Oogi, Taira, Sakuta, Koki, Schramm, Malte, Yanagawa, Anri, and Yoshimoto, Anje
- Subjects
Astrophysics - Astrophysics of Galaxies ,Astrophysics - High Energy Astrophysical Phenomena - Abstract
Herein, we present the statistical properties of active galactic nuclei (AGNs) for approximately 1 million member galaxies of galaxy groups and clusters, with 0.1 $<$ cluster redshift ($z_{\rm cl}$) $<$ 1.4, selected using Subaru Hyper Suprime-Cam, the so-called CAMIRA clusters. In this research, we focused on the AGN power fraction ($f_{\rm AGN}$), which is defined as the proportion of the contribution of AGNs to the total infrared (IR) luminosity, $L_{\rm IR}$ (AGN)/$L_{\rm IR}$, and examined how $f_{\rm AGN}$ depends on (i) $z_{\rm cl}$ and (ii) the distance from the cluster center. We compiled multiwavelength data using the ultraviolet--mid-IR range. Moreover, we performed spectral energy distribution fits to determine $f_{\rm AGN}$ using the CIGALE code with the SKIRTOR AGN model. We found that (i) the value of $f_{\rm AGN}$ in the CAMIRA clusters is positively correlated with $z_{\rm cl}$, with the correlation slope being steeper than that for field galaxies, and (ii) $f_{\rm AGN}$ exhibits a high value at the cluster outskirts. These results indicate that the emergence of AGN population depends on the redshift and environment and that galaxy groups and clusters at high redshifts are important in AGN evolution. Additionally, we demonstrated that cluster--cluster mergers may enhance AGN activity at the outskirts of particularly massive galaxy clusters. Our findings are consistent with a related study on the CAMIRA clusters that was based on the AGN number fraction., Comment: 25 pages, 24 figures, and 3 tables, accepted for publication in ApJ. A value-added CAMIRA member galaxy catalog and the best-fit SED for each member galaxy will be available as FITS or machine-readable tables
- Published
- 2024
29. Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
- Author
-
Li, Kenneth, Liu, Tianle, Bashkansky, Naomi, Bau, David, Viégas, Fernanda, Pfister, Hanspeter, and Wattenberg, Martin
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably against two strong baselines., Comment: COLM 2024; Code and data: https://github.com/likenneth/persona_drift
- Published
- 2024
30. Black-Box Access is Insufficient for Rigorous AI Audits
- Author
-
Casper, Stephen, Ezell, Carson, Siegmann, Charlotte, Kolt, Noam, Curtis, Taylor Lynn, Bucknall, Benjamin, Haupt, Andreas, Wei, Kevin, Scheurer, Jérémy, Hobbhahn, Marius, Sharkey, Lee, Krishna, Satyapriya, Von Hagen, Marvin, Alberti, Silas, Chan, Alan, Sun, Qinyi, Gerovitch, Michael, Bau, David, Tegmark, Max, Krueger, David, and Hadfield-Menell, Dylan
- Subjects
Computer Science - Computers and Society ,Computer Science - Artificial Intelligence ,Computer Science - Cryptography and Security - Abstract
External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone., Comment: FAccT 2024
- Published
- 2024
- Full Text
- View/download PDF
31. A Systematic Search of Distant Superclusters with the Subaru Hyper Suprime-Cam Survey
- Author
-
Chen, Tsung-Chi, Lin, Yen-Ting, Schive, Hsi-Yu, Oguri, Masamune, Chen, Kai-Feng, Okabe, Nobuhiro, Ali, Sadman, Bottrell, Connor, Dalal, Roohi, Koyama, Yusei, Monteiro-Oliveira, Rogério, Shimakawa, Rhythm, Goto, Tomotsugu, Hsieh, Bau-Ching, Kodama, Tadayuki, and Nishizawa, Atsushi J.
- Subjects
Astrophysics - Cosmology and Nongalactic Astrophysics ,Astrophysics - Astrophysics of Galaxies - Abstract
Superclusters, encompassing environments across a wide range of overdensities, can be regarded as unique laboratories for studying galaxy evolution. Although numerous supercluster catalogs have been published, none of them goes beyond redshift $z=0.7$. In this work, we adopt a physically motivated supercluster definition, requiring that superclusters should eventually collapse even in the presence of dark energy. Applying a friends-of-friends (FoF) algorithm to the CAMIRA cluster sample constructed using the Subaru Hyper Suprime-Cam survey data, we have conducted the first systematic search for superclusters at $z=0.5-1.0$ and identified 673 supercluster candidates over an area of 1027 deg$^2$. The FoF algorithm is calibrated by evolving $N$-body simulations to the far future to ensure high purity. We found that these high-$z$ superclusters are mainly composed of $2-4$ clusters, suggesting the limit of gravitationally bound structures in the younger Universe. In addition, we studied the properties of the clusters and brightest cluster galaxies (BCGs) residing in different large-scale environments. We found that clusters associated with superclusters are typically richer, but no apparent dependence of the BCG properties on large-scale structures is found. We also compared the abundance of observed superclusters with mock superclusters extracted from halo light cones, finding that photometric redshift uncertainty is a limiting factor in the performance of superclusters detection., Comment: Accepted by ApJ. 36 pages, 26 figures, 7 tables
- Published
- 2024
32. The ALMaQUEST Survey XII: Dense Molecular Gas as traced by HCN and HCO$^{+}$ in Green Valley Galaxies
- Author
-
Lin, Lihwai, Pan, Hsi-An, Ellison, Sara L., Harada, Nanase, Jimenez-Donaire, Maria J., French, K. Decker, Baker, William M., Hsieh, Bau-Ching, Koyama, Yusei, Lopez-Coba, Carlos, Michiyama, Tomonari, Rowlands, Kate, Sanchez, Sebastian F., and Thorp, Mallory
- Subjects
Astrophysics - Astrophysics of Galaxies - Abstract
We present ALMA observations of two dense gas tracers, HCN(1-0) and HCO$^{+}$(1-0), for three galaxies in the green valley and two galaxies on the star-forming main sequence with comparable molecular gas fractions as traced by the CO(1-0) emissions, selected from the ALMaQUEST survey. We investigate whether the deficit of molecular gas star formation efficiency (SFE$_{\rm mol}$) that leads to the low specific star formation rate in these green valley galaxies is due to a lack of dense gas (characterized by the dense gas fraction $f_{\rm dense}$) or the low star formation efficiency of dense gas (SFE$_{\rm dense}$). We find that SFE$_{\rm mol}$ as traced by the CO emissions, when considering both star-forming and retired spaxels together, is tightly correlated with SFE$_{\rm dense}$ and depends only weakly on $f_{\rm dense}$. The specific star formation rate (sSFR) on kpc scales is primarily driven by SFE$_{\rm mol}$ and SFE$_{\rm dense}$, followed by the dependence on $f_{\rm mol}$, and is least correlated with $f_{\rm dense}$ or the dense-to-stellar mass ratio ($R_{\rm dense}$). When compared with other works in the literature, we find that our green valley sample shows lower global SFE$_{\rm mol}$ as well as lower SFE$_{\rm dense}$ while exhibiting similar dense gas fractions when compared to star-forming and starburst galaxies. We conclude that the star formation of the 3 green valley galaxies with a normal abundance of molecular gas is suppressed mainly due to the reduced SFE$_{\rm dense}$ rather than the lack of dense gas., Comment: 20 pages, 13 figures, ApJ accepted
- Published
- 2024
33. Numerical behavior of the Keplerian Integral methods for initial orbit determination
- Author
-
Rodríguez, Óscar, Gronchi, Giovanni F., Baù, Giulio, and Jedicke, Robert
- Subjects
Astrophysics - Earth and Planetary Astrophysics ,Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
We investigate the behaviour of two recent methods for the computation of preliminary orbits. These methods are based on the conservation laws of Kepler's problem, and enable the linkage of very short arcs of optical observations even when they are separated in time by a few years. Our analysis is performed using both synthetic and real data of 822 main belt asteroids. The differences between computed and true orbital elements have been analysed for the true linkages, as well as the occurrence of alternative solutions. Some metrics have been introduced to quantify the results, with the aim of discarding as many of the false linkages as possible and keeping the vast majority of true ones. These numerical experiments provide thresholds for the metrics which take advantage of the knowledge of the \emph{ground truth}: the values of these thresholds can be used in normal operation mode, when we do not know the correct values of the orbital elements and whether the linkages are true or false.
- Published
- 2024
34. Linking tracklets over the years in large datasets
- Author
-
Rodríguez, Óscar, Gronchi, Giovanni F., Baù, Giulio, and Jedicke, Robert
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics - Abstract
We present a new procedure to identify observations of known objects in large data sets of unlinked detections. It begins with a Keplerian integrals method that allows us to link two tracklets, computing preliminary orbits, even when the tracklets are separated in time by a few years. In the second step, we represent the results in a `graph' where the tracklets are the nodes and the preliminary orbits are the edges. Then, acceptable `3-cycles' are identified and a least squares orbit is computed for each of them. Finally, we construct sequences of $n \geq 4$ tracklets by searching through the orbits of nearby 3-cycles and attempting to attribute the remaining tracklets. We calculate the technique's efficiency at identifying unknown objects using real detections that attempt to mimic key parameters of the Minor Planet Center's Isolated Tracklet File (ITF) and then apply the procedure to the ITF to identify tens of thousands of new objects.
- Published
- 2024
35. Orbit determination from one position vector and a very short arc of optical observations
- Author
-
Scantamburlo, Erica, Gronchi, Giovanni F., and Baù, Giulio
- Subjects
Mathematical Physics - Abstract
In this paper we address the problem of computing a preliminary orbit of a celestial body from one topocentric position vector and a very short arc (VSA) of optical observations. Using the conservation laws of the two-body dynamics, we write the problem as a system of 8 polynomial equations in 6 unknowns. We prove that this system is generically consistent, namely it admits solutions at least in the complex field. From this system we derive a univariate polynomial $\mathfrak{v}$ of degree 8 in the unknown topocentric distance at the mean epoch of the VSA. Through Gr\"obner bases theory, we show that the degree of $\mathfrak{v}$ is minimum among the degrees of all the univariate polynomials solving this problem. The proposed method is relevant for different purposes, e.g. the computation of a preliminary orbit of an Earth satellite with radar and optical observations, the detection of maneuvres of an Earth satellite, and the recovery of asteroids which are lost due to a planetary close encounter. We also show some numerical tests in the case of asteroids undergoing a close encounter with the Earth.
- Published
- 2023
36. Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
- Author
-
Gandikota, Rohit, Materzynska, Joanna, Zhou, Tingrui, Torralba, Antonio, and Bau, David
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. Our approach identifies a low-rank parameter direction corresponding to one concept while minimizing interference with other attributes. A slider is created using a small set of prompts or sample images; thus slider directions can be created for either textual or visual concepts. Concept Sliders are plug-and-play: they can be composed efficiently and continuously modulated, enabling precise control over image generation. In quantitative experiments comparing to previous editing techniques, our sliders exhibit stronger targeted edits with lower interference. We showcase sliders for weather, age, styles, and expressions, as well as slider compositions. We show how sliders can transfer latents from StyleGAN for intuitive editing of visual concepts for which textual description is difficult. We also find that our method can help address persistent quality issues in Stable Diffusion XL including repair of object deformations and fixing distorted hands. Our code, data, and trained sliders are available at https://sliders.baulab.info/
- Published
- 2023
37. An Alternative to Regulation: The Case for Public AI
- Author
-
Vincent, Nicholas, Bau, David, Schwettmann, Sarah, and Tan, Joshua
- Subjects
Computer Science - Computers and Society - Abstract
Can governments build AI? In this paper, we describe an ongoing effort to develop ``public AI'' -- publicly accessible AI models funded, provisioned, and governed by governments or other public bodies. Public AI presents both an alternative and a complement to standard regulatory approaches to AI, but it also suggests new technical and policy challenges. We present a roadmap for how the ML research community can help shape this initiative and support its implementation, and how public AI can complement other responsible AI initiatives., Comment: To be presented at Regulatable ML @ NeurIPS2023 workshop
- Published
- 2023
38. Testing Language Model Agents Safely in the Wild
- Author
-
Naihin, Silen, Atkinson, David, Green, Marc, Hamadi, Merwane, Swift, Craig, Schonholtz, Douglas, Kalai, Adam Tauman, and Bau, David
- Subjects
Computer Science - Artificial Intelligence - Abstract
A prerequisite for safe autonomy-in-the-wild is safe testing-in-the-wild. Yet real-world autonomous tests face several unique safety challenges, both due to the possibility of causing harm during a test, as well as the risk of encountering new unsafe agent behavior through interactions with real-world and potentially malicious actors. We propose a framework for conducting safe autonomous agent tests on the open internet: agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and logged to be examined by humans. We design a basic safety monitor (AgentMonitor) that is flexible enough to monitor existing LLM agents, and, using an adversarial simulated agent, we measure its ability to identify and stop unsafe situations. Then we apply the AgentMonitor on a battery of real-world tests of AutoGPT, and we identify several limitations and challenges that will face the creation of safe in-the-wild tests as autonomous agents grow more capable.
- Published
- 2023
39. Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
- Author
-
Pal, Koyena, Sun, Jiuding, Yuan, Andrew, Wallace, Byron C., and Bau, David
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
We conjecture that hidden state vectors corresponding to individual input tokens encode information sufficient to accurately predict several tokens ahead. More concretely, in this paper we ask: Given a hidden (internal) representation of a single token at position $t$ in an input, can we reliably anticipate the tokens that will appear at positions $\geq t + 2$? To test this, we measure linear approximation and causal intervention methods in GPT-J-6B to evaluate the degree to which individual hidden states in the network contain signal rich enough to predict future hidden states and, ultimately, token outputs. We find that, at some layers, we can approximate a model's output with more than 48% accuracy with respect to its prediction of subsequent tokens through a single hidden state. Finally we present a "Future Lens" visualization that uses these methods to create a new view of transformer states., Comment: Accepted at CoNLL 2023
- Published
- 2023
- Full Text
- View/download PDF
40. Rotaxane-catalyzed aerobic oxidation of primary alcohols
- Author
-
Baù, Ilario, Poderi, Cecilia, Sardu, Francesca, Giancola, Alessia, Turchetti, Anna, Franchi, Paola, Casimiro, Lorenzo, Andreoni, Leonardo, Silvi, Serena, Mezzina, Elisabetta, and Lucarini, Marco
- Published
- 2024
- Full Text
- View/download PDF
41. Local Chern Marker for Periodic Systems
- Author
-
Baù, Nicolas and Marrazzo, Antimo
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Topological invariants are global properties of the ground-state wave function, typically defined as winding numbers in reciprocal space. Over the years, a number of topological markers in real space have been introduced, allowing to map topological order in heterogeneous crystalline and disordered systems. Notably, even if these formulations can be expressed in terms of lattice-periodic quantities, they can actually be deployed in open boundary conditions only, as in practice they require computing the position operator $\mathbf{r}$ in a form that is ill-defined in periodic boundary conditions. Here we derive a local Chern marker for infinite two-dimensional systems with periodic boundary conditions in the large supercell limit, where the electronic structure is sampled with one single point in reciprocal space. We validate our approach with tight-binding numerical simulations on the Haldane model, including trivial/topological superlattices made of pristine and disordered Chern insulators. The strategy introduced here is very general and could be applied to other topological invariants and quantum-geometrical quantities in any dimension., Comment: 7 pages, 3 figures + supplementary material (3 pages)
- Published
- 2023
- Full Text
- View/download PDF
42. Function Vectors in Large Language Models
- Author
-
Todd, Eric, Li, Millicent L., Sharma, Arnab Sen, Mueller, Aaron, Wallace, Byron C., and Bau, David
- Subjects
Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are robust to changes in context, i.e., they trigger execution of the task on inputs such as zero-shot and natural text settings that do not resemble the ICL contexts from which they are collected. We test FVs across a range of tasks, models, and layers and find strong causal effects across settings in middle layers. We investigate the internal structure of FVs and find while that they often contain information that encodes the output space of the function, this information alone is not sufficient to reconstruct an FV. Finally, we test semantic vector composition in FVs, and find that to some extent they can be summed to create vectors that trigger new complex tasks. Our findings show that compact, causal internal vector representations of function abstractions can be explicitly extracted from LLMs. Our code and data are available at https://functions.baulab.info., Comment: ICLR 2024. 52 pages, 30 figures, 23 tables. Code and data at https://functions.baulab.info
- Published
- 2023
43. FIND: A Function Description Benchmark for Evaluating Interpretability Methods
- Author
-
Schwettmann, Sarah, Shaham, Tamar Rott, Materzynska, Joanna, Chowdhury, Neil, Li, Shuang, Andreas, Jacob, Bau, David, and Torralba, Antonio
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Labeling neural network submodules with human-legible descriptions is useful for many downstream tasks: such descriptions can surface failures, guide interventions, and perhaps even explain important model behaviors. To date, most mechanistic descriptions of trained networks have involved small models, narrowly delimited phenomena, and large amounts of human labor. Labeling all human-interpretable sub-computations in models of increasing size and complexity will almost certainly require tools that can generate and validate descriptions automatically. Recently, techniques that use learned models in-the-loop for labeling have begun to gain traction, but methods for evaluating their efficacy are limited and ad-hoc. How should we validate and compare open-ended labeling tools? This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating the building blocks of automated interpretability methods. FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate. The functions span textual and numeric domains, and involve a range of real-world complexities. We evaluate methods that use pretrained language models (LMs) to produce descriptions of function behavior in natural language and code. Additionally, we introduce a new interactive method in which an Automated Interpretability Agent (AIA) generates function descriptions. We find that an AIA, built from an LM with black-box access to functions, can infer function structure, acting as a scientist by forming hypotheses, proposing experiments, and updating descriptions in light of new data. However, AIA descriptions tend to capture global function behavior and miss local details. These results suggest that FIND will be useful for evaluating more sophisticated interpretability methods before they are applied to real-world models., Comment: 28 pages, 10 figures
- Published
- 2023
44. Unified Concept Editing in Diffusion Models
- Author
-
Gandikota, Rohit, Orgad, Hadas, Belinkov, Yonatan, Materzyńska, Joanna, and Bau, David
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Text-to-image models suffer from various safety issues that may limit their suitability for deployment. Previous methods have separately addressed individual issues of bias, copyright, and offensive content in text-to-image models. However, in the real world, all of these issues appear simultaneously in the same model. We present a method that tackles all issues with a single approach. Our method, Unified Concept Editing (UCE), edits the model without training using a closed-form solution, and scales seamlessly to concurrent edits on text-conditional diffusion models. We demonstrate scalable simultaneous debiasing, style erasure, and content moderation by editing text-to-image projections, and we present extensive experiments demonstrating improved efficacy and scalability over prior work. Our code is available at https://unified.baulab.info, Comment: In proceedings of WACV 2024. Project Page: https://unified.baulab.info
- Published
- 2023
45. Linearity of Relation Decoding in Transformer Language Models
- Author
-
Hernandez, Evan, Sharma, Arnab Sen, Haklay, Tal, Meng, Kevin, Wattenberg, Martin, Andreas, Jacob, Belinkov, Yonatan, and Bau, David
- Subjects
Computer Science - Computation and Language - Abstract
Much of the knowledge encoded in transformer language models (LMs) may be expressed in terms of relations: relations between words and their synonyms, entities and their attributes, etc. We show that, for a subset of relations, this computation is well-approximated by a single linear transformation on the subject representation. Linear relation representations may be obtained by constructing a first-order approximation to the LM from a single prompt, and they exist for a variety of factual, commonsense, and linguistic relations. However, we also identify many cases in which LM predictions capture relational knowledge accurately, but this knowledge is not linearly encoded in their representations. Our results thus reveal a simple, interpretable, but heterogeneously deployed knowledge representation strategy in transformer LMs.
- Published
- 2023
46. Multimodal Neurons in Pretrained Text-Only Transformers
- Author
-
Schwettmann, Sarah, Chowdhury, Neil, Klein, Samuel, Bau, David, and Torralba, Antonio
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Language models demonstrate remarkable capacity to generalize representations learned in one modality to downstream tasks in other modalities. Can we trace this ability to individual neurons? We study the case where a frozen text transformer is augmented with vision using a self-supervised visual encoder and a single linear projection learned on an image-to-text task. Outputs of the projection layer are not immediately decodable into language describing image content; instead, we find that translation between modalities occurs deeper within the transformer. We introduce a procedure for identifying "multimodal neurons" that convert visual representations into corresponding text, and decoding the concepts they inject into the model's residual stream. In a series of experiments, we show that multimodal neurons operate on specific visual concepts across inputs, and have a systematic causal effect on image captioning., Comment: Oral presentation at ICCV CLVL 2023
- Published
- 2023
47. Discovering Variable Binding Circuitry with Desiderata
- Author
-
Davies, Xander, Nadeau, Max, Prakash, Nikhil, Shaham, Tamar Rott, and Bau, David
- Subjects
Computer Science - Artificial Intelligence - Abstract
Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits. Here, we introduce an approach which extends causal mediation experiments to automatically identify model components responsible for performing a specific subtask by solely specifying a set of \textit{desiderata}, or causal attributes of the model components executing that subtask. As a proof of concept, we apply our method to automatically discover shared \textit{variable binding circuitry} in LLaMA-13B, which retrieves variable values for multiple arithmetic tasks. Our method successfully localizes variable binding to only 9 attention heads (of the 1.6k) and one MLP in the final token's residual stream.
- Published
- 2023
48. Revisiting the computation of the critical points of the Keplerian distance
- Author
-
Gronchi, Giovanni F., Baù, Giulio, and Grassi, Clara
- Subjects
Mathematical Physics - Abstract
We consider the Keplerian distance $d$ in the case of two elliptic orbits, i.e. the distance between one point on the first ellipse and one point on the second one, assuming they have a common focus. The absolute minimum $d_{\rm min}$ of this function, called MOID or orbit distance in the literature, is relevant to detect possible impacts between two objects following approximately these elliptic trajectories. We revisit and compare two different approaches to compute the critical points of $d^2$, where we squared the distance $d$ to include crossing points among the critical ones. One approach uses trigonometric polynomials, the other uses ordinary polynomials. A new way to test the reliability of the computation of $d_{\rm min}$ is introduced, based on optimal estimates that can be found in the literature. The planar case is also discussed: in this case we present an estimate for the maximal number of critical points of $d^2$, together with a conjecture supported by numerical tests.
- Published
- 2023
49. Assessment of heat transfer and Mach number effects on high-speed turbulent boundary layers
- Author
-
Cogo, Michele, Baù, Umberto, Chinappi, Mauro, Bernardini, Matteo, and Picano, Francesco
- Subjects
Physics - Fluid Dynamics - Abstract
High-speed vehicles experience a highly challenging environment in which the free-stream Mach number and surface temperature greatly influence aerodynamic drag and heat transfer. The interplay of these two parameters strongly affects the near-wall dynamics of high-speed turbulent boundary layers in a non-trivial way, breaking similarity arguments on velocity and temperature fields, typically derived for adiabatic cases. In this work, we present direct numerical simulations of flat-plate zero-pressure-gradient turbulent boundary layers spanning three free-stream Mach numbers [2,4,6] and four wall temperature conditions (from adiabatic to very cold walls), emphasising the choice of the diabatic parameter $\mathit{\Theta}$ (Zhang, Bi, Hussain & She, J. Fluid Mech., vol. 739, pp. 392-420) to recover a similar flow organisation at different Mach numbers. We link qualitative observations on flow patterns to first- and second-order statistics to explain the strong decoupling of temperature-velocity fluctuations that occurs at reduced wall temperatures and high Mach numbers. For these cases, we find that the mean temperature gradient in the near-wall region can reach such a strong intensity that it promotes the formation of a secondary peak of thermal production in the viscous sublayer, which is in direct contrast with the monotonic behaviour of adiabatic profiles. We propose different physical mechanisms induced by wall-cooling and compressibility that result in apparently similar flow features, such as a higher peak in the streamwise velocity turbulence intensity, and distinct ones, such as the separation of turbulent scales.
- Published
- 2023
- Full Text
- View/download PDF
50. An Efficient Surrogate-based Multi-objective Optimisation Framework with Novel Sampling Strategy for Sustainable Island Groundwater Management
- Author
-
W. Yu, D. Baù, A. S. Mayer, and M. Geranmehr
- Subjects
Science ,Geology ,QE1-996.5 ,Dynamic and structural geology ,QE500-639.5 - Abstract
In groundwater pumping optimization (GPO), offline-trained data-driven surrogates can be used to replace numerical-intensive simulators in order to save computing time. The traditional offline training approach involves building surrogates prior to optimization, fitting training datasets that cover the input space uniformly or randomly, which can prove inefficient due to the potential oversampling of low-gradient areas and under-sampling of high-gradient areas. This study proposes an offline machine-learning (ML) algorithm that ranks candidate training points by scoring them based on their distance to the closest training point and on the local gradient of the surrogate estimate and then choosing the highest-rank point. This method is applied to develop surrogates for solving a two-objective GPO problem formulated on a three-dimensional (3D) island aquifer, using hydrogeological conditions representative of San Salvador Island, Bahamas. The objectives are to minimise the supply cost (fOC) resulting from groundwater pumping and desalination and maximise fresh groundwater supply (Qp), subject to constraints on seawater intrusion (SWI) control expressed in terms of aquifer drawdown Δs at pumping locations and aquifer salt mass increase ΔSM. Gaussian Process (GP) is the technique applied to construct surrogates of objectives and constraints, alongside the estimation of uncertainties. Using GP models, it is possible to estimate the probability of “Pareto optimality” for each pumping scheme by Monte Carlo simulation. Pareto optimal pumping schemes (POPS) are then characterized by a probability of occurrence, which can be verified by numerical simulation. The GP training strategy's effectiveness in generating POPS is compared to traditional training approaches, showing that such a strategy can efficiently identify reliable POPS.
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.