2,411 results on '"Ho, Daniel"'
Search Results
2. Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models
- Author
-
Zhang, Andy K., Perry, Neil, Dulepet, Riya, Jones, Eliot, Lin, Justin W., Ji, Joey, Menders, Celeste, Hussein, Gashon, Liu, Samantha, Jasper, Donovan, Peetathawatchai, Pura, Glenn, Ari, Sivashankar, Vikram, Zamoshchin, Daniel, Glikbarg, Leo, Askaryar, Derek, Yang, Mike, Zhang, Teddy, Alluri, Rishi, Tran, Nathan, Sangpisit, Rinnara, Yiorkadjis, Polycarpos, Osele, Kenny, Raghupathi, Gautham, Boneh, Dan, Ho, Daniel E., and Liang, Percy
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computers and Society ,Computer Science - Machine Learning - Abstract
Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have the potential to cause real-world impact. Policymakers, model providers, and other researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetration testing. Toward that end, we introduce Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those tasks. We include 40 professional-level Capture the Flag (CTF) tasks from 4 distinct CTF competitions, chosen to be recent, meaningful, and spanning a wide range of difficulties. Each task includes its own description, starter files, and is initialized in an environment where an agent can execute bash commands and observe outputs. Since many tasks are beyond the capabilities of existing LM agents, we introduce subtasks, which break down a task into intermediary steps for more gradated evaluation; we add subtasks for 17 of the 40 tasks. To evaluate agent capabilities, we construct a cybersecurity agent and evaluate 7 models: GPT-4o, Claude 3 Opus, Claude 3.5 Sonnet, Mixtral 8x22b Instruct, Gemini 1.5 Pro, Llama 3 70B Chat, and Llama 3.1 405B Instruct. Without guidance, we find that agents are able to solve only the easiest complete tasks that took human teams up to 11 minutes to solve, with Claude 3.5 Sonnet and GPT-4o having the highest success rates. Finally, subtasks provide more signal for measuring performance compared to unguided runs, with models achieving a 3.2\% higher success rate on complete tasks with subtask-guidance than without subtask-guidance. All code and data are publicly available at https://cybench.github.io, Comment: 86 pages, 7 figures
- Published
- 2024
3. Regulating AI Adaptation: An Analysis of AI Medical Device Updates
- Author
-
Wu, Kevin, Wu, Eric, Rodolfa, Kit, Ho, Daniel E., and Zou, James
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
While the pace of development of AI has rapidly progressed in recent years, the implementation of safe and effective regulatory frameworks has lagged behind. In particular, the adaptive nature of AI models presents unique challenges to regulators as updating a model can improve its performance but also introduce safety risks. In the US, the Food and Drug Administration (FDA) has been a forerunner in regulating and approving hundreds of AI medical devices. To better understand how AI is updated and its regulatory considerations, we systematically analyze the frequency and nature of updates in FDA-approved AI medical devices. We find that less than 2% of all devices report having been updated by being re-trained on new data. Meanwhile, nearly a quarter of devices report updates in the form of new functionality and marketing claims. As an illustrative case study, we analyze pneumothorax detection models and find that while model performance can degrade by as much as 0.18 AUC when evaluated on new sites, re-training on site-specific data can mitigate this performance drop, recovering up to 0.23 AUC. However, we also observed significant degradation on the original site after re-training using data from new sites, providing insight from one example that challenges the current one-model-fits-all approach to regulatory approvals. Our analysis provides an in-depth look at the current state of FDA-approved AI device updates and insights for future regulatory policies toward model updating and adaptive AI.
- Published
- 2024
4. Locating and measuring marine aquaculture production from space: a computer vision approach in the French Mediterranean
- Author
-
Quaade, Sebastian, Vallebueno, Andrea, Alcabes, Olivia D. N., Rodolfa, Kit T., and Ho, Daniel E.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Aquaculture production -- the cultivation of aquatic plants and animals -- has grown rapidly since the 1990s, but sparse, self-reported and aggregate production data limits the effective understanding and monitoring of the industry's trends and potential risks. Building on a manual survey of aquaculture production from remote sensing imagery, we train a computer vision model to identify marine aquaculture cages from aerial and satellite imagery, and generate a spatially explicit dataset of finfish production locations in the French Mediterranean from 2000-2021 that includes 4,010 cages (69m2 average cage area). We demonstrate the value of our method as an easily adaptable, cost-effective approach that can improve the speed and reliability of aquaculture surveys, and enables downstream analyses relevant to researchers and regulators. We illustrate its use to compute independent estimates of production, and develop a flexible framework to quantify uncertainty in these estimates. Overall, our study presents an efficient, scalable and highly adaptable method for monitoring aquaculture production from remote sensing imagery.
- Published
- 2024
5. Statistical Uncertainty in Word Embeddings: GloVe-V
- Author
-
Vallebueno, Andrea, Handan-Nader, Cassandra, Manning, Christopher D., and Ho, Daniel E.
- Subjects
Computer Science - Computation and Language - Abstract
Static word embeddings are ubiquitous in computational social science applications and contribute to practical decision-making in a variety of fields including law and healthcare. However, assessing the statistical uncertainty in downstream conclusions drawn from word embedding statistics has remained challenging. When using only point estimates for embeddings, researchers have no streamlined way of assessing the degree to which their model selection criteria or scientific conclusions are subject to noise due to sparsity in the underlying data used to generate the embeddings. We introduce a method to obtain approximate, easy-to-use, and scalable reconstruction error variance estimates for GloVe (Pennington et al., 2014), one of the most widely used word embedding models, using an analytical approximation to a multivariate normal model. To demonstrate the value of embeddings with variance (GloVe-V), we illustrate how our approach enables principled hypothesis testing in core word embedding tasks, such as comparing the similarity between different word pairs in vector space, assessing the performance of different models, and analyzing the relative degree of ethnic or gender bias in a corpus using different word lists.
- Published
- 2024
6. Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
- Author
-
Magesh, Varun, Surani, Faiz, Dahl, Matthew, Suzgun, Mirac, Manning, Christopher D., and Ho, Daniel E.
- Subjects
Computer Science - Computation and Language ,Computer Science - Computers and Society - Abstract
Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to "hallucinate," or make up false information, making their use risky in high-stakes domains. Recently, certain legal research providers have touted methods such as retrieval-augmented generation (RAG) as "eliminating" (Casetext, 2023) or "avoid[ing]" hallucinations (Thomson Reuters, 2023), or guaranteeing "hallucination-free" legal citations (LexisNexis, 2023). Because of the closed nature of these systems, systematically assessing these claims is challenging. In this article, we design and report on the first preregistered empirical evaluation of AI-driven legal research tools. We demonstrate that the providers' claims are overstated. While hallucinations are reduced relative to general-purpose chatbots (GPT-4), we find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time. We also document substantial differences between systems in responsiveness and accuracy. Our article makes four key contributions. It is the first to assess and report the performance of RAG-based proprietary legal AI tools. Second, it introduces a comprehensive, preregistered dataset for identifying and understanding vulnerabilities in these systems. Third, it proposes a clear typology for differentiating between hallucinations and accurate legal responses. Last, it provides evidence to inform the responsibilities of legal professionals in supervising and verifying AI outputs, which remains a central open question for the responsible integration of AI into law., Comment: Our dataset, tool outputs, and labels will be made available upon publication. This version of the manuscript (May 30, 2024) is updated to reflect an evaluation of Westlaw's AI-Assisted Research
- Published
- 2024
7. FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning
- Author
-
Niklaus, Joel, Zheng, Lucia, McCarthy, Arya D., Hahn, Christopher, Rosen, Brian M., Henderson, Peter, Ho, Daniel E., Honke, Garrett, Liang, Percy, and Manning, Christopher
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,68T50 ,I.2 - Abstract
Instruction tuning is an important step in making language models useful for direct user interaction. However, many legal tasks remain out of reach for most open LLMs and there do not yet exist any large scale instruction datasets for the domain. This critically limits research in this application area. In this work, we curate LawInstruct, a large legal instruction dataset, covering 17 jurisdictions, 24 languages and a total of 12M examples. We present evidence that domain-specific pretraining and instruction tuning improve performance on LegalBench, including improving Flan-T5 XL by 8 points or 16\% over the baseline. However, the effect does not generalize across all tasks, training regimes, model sizes, and other factors. LawInstruct is a resource for accelerating the development of models with stronger information processing and decision making capabilities in the legal domain.
- Published
- 2024
8. Quantifying the Uncertainty of Imputed Demographic Disparity Estimates: The Dual-Bootstrap
- Author
-
Lu, Benjamin, Wan, Jia, Ouyang, Derek, Goldin, Jacob, and Ho, Daniel E.
- Subjects
Statistics - Methodology - Abstract
Measuring average differences in an outcome across racial or ethnic groups is a crucial first step for equity assessments, but researchers often lack access to data on individuals' races and ethnicities to calculate them. A common solution is to impute the missing race or ethnicity labels using proxies, then use those imputations to estimate the disparity. Conventional standard errors mischaracterize the resulting estimate's uncertainty because they treat the imputation model as given and fixed, instead of as an unknown object that must be estimated with uncertainty. We propose a dual-bootstrap approach that explicitly accounts for measurement uncertainty and thus enables more accurate statistical inference, which we demonstrate via simulation. In addition, we adapt our approach to the commonly used Bayesian Improved Surname Geocoding (BISG) imputation algorithm, where direct bootstrapping is infeasible because the underlying Census Bureau data are unavailable. In simulations, we find that measurement uncertainty is generally insignificant for BISG except in particular circumstances; bias, not variance, is likely the predominant source of error. We apply our method to quantify the uncertainty of prevalence estimates of common health conditions by race using data from the American Family Cohort., Comment: 31 pages; 7 figures; CRIW Race, Ethnicity, and Economic Statistics for the 21st Century, Spring 2024
- Published
- 2024
9. On the Societal Impact of Open Foundation Models
- Author
-
Kapoor, Sayash, Bommasani, Rishi, Klyman, Kevin, Longpre, Shayne, Ramaswami, Ashwin, Cihon, Peter, Hopkins, Aspen, Bankston, Kevin, Biderman, Stella, Bogen, Miranda, Chowdhury, Rumman, Engler, Alex, Henderson, Peter, Jernite, Yacine, Lazar, Seth, Maffulli, Stefano, Nelson, Alondra, Pineau, Joelle, Skowron, Aviya, Song, Dawn, Storchan, Victor, Zhang, Daniel, Ho, Daniel E., Liang, Percy, and Narayanan, Arvind
- Subjects
Computer Science - Computers and Society ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Foundation models are powerful technologies: how they are released publicly directly shapes their societal impact. In this position paper, we focus on open foundation models, defined here as those with broadly available model weights (e.g. Llama 2, Stable Diffusion XL). We identify five distinctive properties (e.g. greater customizability, poor monitoring) of open foundation models that lead to both their benefits and risks. Open foundation models present significant benefits, with some caveats, that span innovation, competition, the distribution of decision-making power, and transparency. To understand their risks of misuse, we design a risk assessment framework for analyzing their marginal risk. Across several misuse vectors (e.g. cyberattacks, bioweapons), we find that current research is insufficient to effectively characterize the marginal risk of open foundation models relative to pre-existing technologies. The framework helps explain why the marginal risk is low in some cases, clarifies disagreements about misuse risks by revealing that past work has focused on different subsets of the framework with different assumptions, and articulates a way forward for more constructive debate. Overall, our work helps support a more grounded assessment of the societal impact of open foundation models by outlining what research is needed to empirically validate their theoretical benefits and risks.
- Published
- 2024
10. How well do LLMs cite relevant medical references? An evaluation framework and analyses
- Author
-
Wu, Kevin, Wu, Eric, Cassasola, Ally, Zhang, Angela, Wei, Kevin, Nguyen, Teresa, Riantawan, Sith, Riantawan, Patricia Shi, Ho, Daniel E., and Zou, James
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Large language models (LLMs) are currently being used to answer medical questions across a variety of clinical domains. Recent top-performing commercial LLMs, in particular, are also capable of citing sources to support their responses. In this paper, we ask: do the sources that LLMs generate actually support the claims that they make? To answer this, we propose three contributions. First, as expert medical annotations are an expensive and time-consuming bottleneck for scalable evaluation, we demonstrate that GPT-4 is highly accurate in validating source relevance, agreeing 88% of the time with a panel of medical doctors. Second, we develop an end-to-end, automated pipeline called \textit{SourceCheckup} and use it to evaluate five top-performing LLMs on a dataset of 1200 generated questions, totaling over 40K pairs of statements and sources. Interestingly, we find that between ~50% to 90% of LLM responses are not fully supported by the sources they provide. We also evaluate GPT-4 with retrieval augmented generation (RAG) and find that, even still, around 30\% of individual statements are unsupported, while nearly half of its responses are not fully supported. Third, we open-source our curated dataset of medical questions and expert annotations for future evaluations. Given the rapid pace of LLM development and the potential harms of incorrect or outdated medical information, it is crucial to also understand and quantify their capability to produce relevant, trustworthy medical references.
- Published
- 2024
11. Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models
- Author
-
Dahl, Matthew, Magesh, Varun, Suzgun, Mirac, and Ho, Daniel E.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
Do large language models (LLMs) know the law? These models are increasingly being used to augment legal practice, education, and research, yet their revolutionary potential is threatened by the presence of hallucinations -- textual output that is not consistent with legal facts. We present the first systematic evidence of these hallucinations, documenting LLMs' varying performance across jurisdictions, courts, time periods, and cases. Our work makes four key contributions. First, we develop a typology of legal hallucinations, providing a conceptual framework for future research in this area. Second, we find that legal hallucinations are alarmingly prevalent, occurring between 58% of the time with ChatGPT 4 and 88% with Llama 2, when these models are asked specific, verifiable questions about random federal court cases. Third, we illustrate that LLMs often fail to correct a user's incorrect legal assumptions in a contra-factual question setup. Fourth, we provide evidence that LLMs cannot always predict, or do not always know, when they are producing legal hallucinations. Taken together, our findings caution against the rapid and unsupervised integration of popular LLMs into legal tasks. Even experienced lawyers must remain wary of legal hallucinations, and the risks are highest for those who stand to benefit from LLMs the most -- pro se litigants or those without access to traditional legal resources.
- Published
- 2024
- Full Text
- View/download PDF
12. The effects of plastisphere on the physicochemical properties of microplastics
- Author
-
Tang, Kuok Ho Daniel and Li, Ronghua
- Published
- 2024
- Full Text
- View/download PDF
13. The Association Between Electronic Device Use During Family Time and Family Well-Being: Population-Based Cross-Sectional Study
- Author
-
Zhao, Sheng Zhi, Guo, Ningyuan, Wang, Man Ping, Fong, Daniel Yee Tak, Lai, Agnes Yuen Kwan, Chan, Sophia Siu-Chee, Lam, Tai Hing, and Ho, Daniel Sai Yin
- Subjects
Computer applications to medicine. Medical informatics ,R858-859.7 ,Public aspects of medicine ,RA1-1270 - Abstract
BackgroundElectronic devices (eDevices) may have positive or negative influences on family communication and well-being depending on how they are used. ObjectiveWe examined eDevice use during family time and its association with the quality of family communication and well-being in Hong Kong Chinese adults. MethodsIn 2017, a probability-based 2-stage random sampling landline telephone survey collected data on eDevice use in daily life and during family time (eg, family dinner) and the presence of rules banning eDevice use during family dinner. Family communication quality was rated from 0 to 10 with higher scores being favorable. Family well-being was calculated as a composite mean score of 3 items each using the same scale from 0 to 10. The associations of family communication quality and well-being with eDevice use in daily life and during family time were estimated using beta-coefficient (β) adjusting for sociodemographics. The mediating role of family communication quality in the association between eDevice use and family well-being was analyzed. ResultsOf the 2064 respondents (mean age 56.4 [SD 19.2] years, 1269/2064 [61.48%] female), 1579/2059 (76.69%) used an eDevice daily for a mean of 3.6 hours (SD 0.1) and 257/686 (37.5%) used it for 30+ minutes before sleep. As much as 794/2046 (38.81%) often or sometimes used an eDevice during family time including dinner (311/2017, 15.42%); 713/2012 (35.44%) reported use of an eDevice by family members during dinner. Lower family communication quality was associated with hours of eDevice use before sleep (adjusted β=–.25; 95% CI –0.44 to –0.05), and often use (vs never use) of eDevice during family dinner by oneself (adjusted β=–.51; 95% CI –0.91 to –0.10) and family members (adjusted β=–.54; 95% CI –0.79 to –0.29). Similarly, lower family well-being was associated with eDevice use before sleep (adjusted β=–.26; 95% CI –0.42 to –0.09), and often use during family dinner by oneself (adjusted β=–.48; 95% CI –0.83 to –0.12) and family members (adjusted β=–.50; 95% CI –0.72 to –0.28). Total ban of eDevice use during family dinner was negatively associated with often use by oneself (adjusted odds ratio 0.49; 95% CI 0.29 to 0.85) and family members (adjusted odds ratio 0.41; 95% CI 0.28, 0.60) but not with family communication and well-being. Lower family communication quality substantially mediated the total effect of the association of eDevice use time before sleep (61.2%) and often use at family dinner by oneself (87.0%) and by family members (67.8%) with family well-being. ConclusionseDevice use before sleep and during family dinner was associated with lower family well-being, and the association was substantially mediated by family communication quality. Our results suggest that interventions on smart use of eDevice may improve family communication and well-being.
- Published
- 2020
- Full Text
- View/download PDF
14. Real-time Estimation of DoS Duration and Frequency for Security Control
- Author
-
Sun, Yifan, Lu, Jianquan, Ho, Daniel W. C., and Li, Lulu
- Subjects
Electrical Engineering and Systems Science - Systems and Control ,Mathematics - Optimization and Control - Abstract
In this paper, we develop a new denial-of-service (DoS) estimator, enabling defenders to identify duration and frequency parameters of any DoS attacker, except for three edge cases, exclusively using real-time data. The key advantage of the estimator lies in its capability to facilitate security control in a wide range of practical scenarios, even when the attacker's information is previously unknown. We demonstrate the advantage and application of our new estimator in the context of two classical control scenarios, namely consensus of multi-agent systems and impulsive stabilization of nonlinear systems, for illustration.
- Published
- 2023
15. Estimating and Implementing Conventional Fairness Metrics With Probabilistic Protected Features
- Author
-
Elzayn, Hadi, Black, Emily, Vossler, Patrick, Jo, Nathanael, Goldin, Jacob, and Ho, Daniel E.
- Subjects
Computer Science - Machine Learning ,Computer Science - Computers and Society ,Statistics - Machine Learning - Abstract
The vast majority of techniques to train fair models require access to the protected attribute (e.g., race, gender), either at train time or in production. However, in many important applications this protected attribute is largely unavailable. In this paper, we develop methods for measuring and reducing fairness violations in a setting with limited access to protected attribute labels. Specifically, we assume access to protected attribute labels on a small subset of the dataset of interest, but only probabilistic estimates of protected attribute labels (e.g., via Bayesian Improved Surname Geocoding) for the rest of the dataset. With this setting in mind, we propose a method to estimate bounds on common fairness metrics for an existing model, as well as a method for training a model to limit fairness violations by solving a constrained non-convex optimization problem. Unlike similar existing approaches, our methods take advantage of contextual information -- specifically, the relationships between a model's predictions and the probabilistic prediction of protected attributes, given the true protected attribute, and vice versa -- to provide tighter bounds on the true disparity. We provide an empirical illustration of our methods using voting data. First, we show our measurement method can bound the true disparity up to 5.5x tighter than previous methods in these applications. Then, we demonstrate that our training technique effectively reduces disparity while incurring lesser fairness-accuracy trade-offs than other fair optimization methods with limited access to protected attributes.
- Published
- 2023
16. A polyfunctionalized carbon framework composite for efficient decontamination of Cr(VI) and polycyclic aromatic nitrides from acidic wastewater
- Author
-
Wu, Weilong, Zhang, Han, Qian, Rong, Yu, Kunru, Li, Ronghua, Tang, Kuok Ho Daniel, Wu, Xuan, Guo, Zhiqiang, Shao, Cong, Yue, Feixue, and Zhang, Zengqiang
- Published
- 2024
- Full Text
- View/download PDF
17. Health risk of human exposure to microplastics: a review
- Author
-
Tang, Kuok Ho Daniel, Li, Ronghua, Li, Zhi, and Wang, Dun
- Published
- 2024
- Full Text
- View/download PDF
18. Insight into the humification and carbon balance of biogas residual biochar amended co-composting of hog slurry and wheat straw
- Author
-
Liu, Yunpeng, Pan, Junting, Wang, Jingwen, Yang, Xu, Zhang, Wanqiang, Tang, Kuok Ho Daniel, Wang, Hailong, Zhang, Xiu, Gao, Runyu, Yang, Guoping, Zhang, Zengqiang, and Li, Ronghua
- Published
- 2024
- Full Text
- View/download PDF
19. Toward Operationalizing Pipeline-aware ML Fairness: A Research Agenda for Developing Practical Guidelines and Tools
- Author
-
Black, Emily, Naidu, Rakshit, Ghani, Rayid, Rodolfa, Kit T., Ho, Daniel E., and Heidari, Hoda
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
While algorithmic fairness is a thriving area of research, in practice, mitigating issues of bias often gets reduced to enforcing an arbitrarily chosen fairness metric, either by enforcing fairness constraints during the optimization step, post-processing model outputs, or by manipulating the training data. Recent work has called on the ML community to take a more holistic approach to tackle fairness issues by systematically investigating the many design choices made through the ML pipeline, and identifying interventions that target the issue's root cause, as opposed to its symptoms. While we share the conviction that this pipeline-based approach is the most appropriate for combating algorithmic unfairness on the ground, we believe there are currently very few methods of \emph{operationalizing} this approach in practice. Drawing on our experience as educators and practitioners, we first demonstrate that without clear guidelines and toolkits, even individuals with specialized ML knowledge find it challenging to hypothesize how various design choices influence model behavior. We then consult the fair-ML literature to understand the progress to date toward operationalizing the pipeline-aware approach: we systematically collect and organize the prior work that attempts to detect, measure, and mitigate various sources of unfairness through the ML pipeline. We utilize this extensive categorization of previous contributions to sketch a research agenda for the community. We hope this work serves as the stepping stone toward a more comprehensive set of resources for ML researchers, practitioners, and students interested in exploring, designing, and testing pipeline-oriented approaches to algorithmic fairness., Comment: EAAMO'23 (Archival)
- Published
- 2023
- Full Text
- View/download PDF
20. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
- Author
-
Guha, Neel, Nyarko, Julian, Ho, Daniel E., Ré, Christopher, Chilton, Adam, Narayana, Aditya, Chohlas-Wood, Alex, Peters, Austin, Waldon, Brandon, Rockmore, Daniel N., Zambrano, Diego, Talisman, Dmitry, Hoque, Enam, Surani, Faiz, Fagan, Frank, Sarfaty, Galit, Dickinson, Gregory M., Porat, Haggai, Hegland, Jason, Wu, Jessica, Nudell, Joe, Niklaus, Joel, Nay, John, Choi, Jonathan H., Tobia, Kevin, Hagan, Margaret, Ma, Megan, Livermore, Michael, Rasumov-Rahe, Nikon, Holzenberger, Nils, Kolt, Noam, Henderson, Peter, Rehaag, Sean, Goel, Sharad, Gao, Shang, Williams, Spencer, Gandhi, Sunny, Zur, Tom, Iyer, Varun, and Li, Zehua
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Computers and Society - Abstract
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables., Comment: 143 pages, 79 tables, 4 figures
- Published
- 2023
21. Finite-Iteration Learning Tracking with FlexRay Communication Protocol
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
22. Tracking Under Measurable and Unmeasurable State Information
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
23. Tracking Under Saturated Finite Interval and HNN-Structural Output
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
24. Tracking Based on Discontinuous Learning Control Strategy
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
25. Finite-Iteration Learning Tracking with Packet Losses
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
26. Consensus Under Switching Topology and Observer Information
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
27. Consensus Under Event-Triggered Transmission and Quantization
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
28. Consensus Under Limited Information Communication
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
29. Multi-layered Sampled-Data Tracking Under Cooperative–Antagonistic Interactions
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
30. Stability of Multi-layer Supply Chain Networks with Constraints
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
31. Introduction
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
32. Security of Network Systems Under Cyber-Attack
- Author
-
Xiong, Wenjun, Luo, Zijian, Ho, Daniel W. C., Shen, Dong, Series Editor, Xiong, Wenjun, Luo, Zijian, and Ho, Daniel W. C.
- Published
- 2024
- Full Text
- View/download PDF
33. Characterization of NFE2L1-616, an isoform of nuclear factor-erythroid-2 related transcription factor-1 that activates antioxidant response element-regulated genes.
- Author
-
Ho, Daniel, Suryajaya, Kaylen, Manh, Kaitlyn, Duong, Amanda, and Chan, Jefferson
- Subjects
Antioxidant Response Elements ,Protein Isoforms ,Gene Expression Regulation ,Cell Line ,NF-E2-Related Factor 1 - Abstract
The NFE2L1 transcription factor (aka Nrf1) is a basic leucine zipper protein that performs a critical role in the cellular stress response pathway. Here, we characterized a novel variant of NFE2L1 referred to as NFE2L1-616. The transcript encoding NFE2L1-616 is derived from an intronic promoter, and it has a distinct first exon than other reported full-length NFE2L1 isoforms. The NFE2L1-616 protein constitutively localizes in the nucleus as it lacks the N-terminal amino acid residues that targets other full-length NFE2L1 isoforms to the endoplasmic reticulum. The expression level of NFE2L1-616 is lower than other NFE2L1 isoforms. It is widely expressed across different cell lines and tissues that were examined. NFE2L1-616 showed strong transcriptional activity driving luciferase reporter expression from a promoter containing antioxidant response element. Together, the results suggest that NFE2L1-616 variant can function as a positive regulator in the transcriptional regulation of NFE2L1 responsive genes.
- Published
- 2023
34. One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support
- Author
-
Stern, Ronja, Rasiah, Vishvaksenan, Matoshi, Veton, Bose, Srinanda Brügger, Stürmer, Matthias, Chalkidis, Ilias, Ho, Daniel E., and Niklaus, Joel
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,68T50 ,I.2 - Abstract
Recent strides in Large Language Models (LLMs) have saturated many Natural Language Processing (NLP) benchmarks, emphasizing the need for more challenging ones to properly assess LLM capabilities. However, domain-specific and multilingual benchmarks are rare because they require in-depth expertise to develop. Still, most public models are trained predominantly on English corpora, while other languages remain understudied, particularly for practical domain-specific NLP tasks. In this work, we introduce a novel NLP benchmark for the legal domain that challenges LLMs in five key dimensions: processing \emph{long documents} (up to 50K tokens), using \emph{domain-specific knowledge} (embodied in legal texts), \emph{multilingual} understanding (covering five languages), \emph{multitasking} (comprising legal document-to-document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks) and \emph{reasoning} (comprising especially Court View Generation, but also the Text Classification tasks). Our benchmark contains diverse datasets from the Swiss legal system, allowing for a comprehensive study of the underlying non-English, inherently multilingual legal system. Despite the large size of our datasets (some with hundreds of thousands of examples), existing publicly available multilingual models struggle with most tasks, even after extensive in-domain pre-training and fine-tuning. We publish all resources (benchmark suite, pre-trained models, code) under permissive open CC BY-SA licenses.
- Published
- 2023
35. Synchronization of multiple rigid body systems: a survey
- Author
-
Jin, X., Ho, Daniel W. C., and Tang, Y.
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
The multi-agent system has been a hot topic in the past few decades owing to its lower cost, higher robustness, and higher flexibility. As a particular multi-agent system, the multiple rigid body system received a growing interest for its wide applications in transportation, aerospace, and ocean exploration. Due to the non-Euclidean configuration space of attitudes and the inherent nonlinearity of the dynamics of rigid body systems, synchronization of multiple rigid body systems is quite challenging. This paper aims to present an overview of the recent progress in synchronization of multiple rigid body systems from the view of two fundamental problems. The first problem focuses on attitude synchronization, while the second one focuses on cooperative motion control in that rotation and translation dynamics are coupled. Finally, a summary and future directions are given in the conclusion.
- Published
- 2023
36. MultiLegalPile: A 689GB Multilingual Legal Corpus
- Author
-
Niklaus, Joel, Matoshi, Veton, Stürmer, Matthias, Chalkidis, Ilias, and Ho, Daniel E.
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,68T50 ,I.2 - Abstract
Large, high-quality datasets are crucial for training Large Language Models (LLMs). However, so far, there are few datasets available for specialized critical domains such as law and the available ones are often only for the English language. We curate and release MultiLegalPile, a 689GB corpus in 24 languages from 17 jurisdictions. The MultiLegalPile corpus, which includes diverse legal data sources with varying licenses, allows for pretraining NLP models under fair use, with more permissive licenses for the Eurlex Resources and Legal mC4 subsets. We pretrain two RoBERTa models and one Longformer multilingually, and 24 monolingual models on each of the language-specific subsets and evaluate them on LEXTREME. Additionally, we evaluate the English and multilingual models on LexGLUE. Our multilingual models set a new SotA on LEXTREME and our English models on LexGLUE. We release the dataset, the trained models, and all of the code under the most open possible licenses., Comment: Accepted to ACL 2024
- Published
- 2023
37. Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators
- Author
-
Herzog, Alexander, Rao, Kanishka, Hausman, Karol, Lu, Yao, Wohlhart, Paul, Yan, Mengyuan, Lin, Jessica, Arenas, Montserrat Gonzalez, Xiao, Ted, Kappler, Daniel, Ho, Daniel, Rettinghouse, Jarek, Chebotar, Yevgen, Lee, Kuang-Huei, Gopalakrishnan, Keerthana, Julian, Ryan, Li, Adrian, Fu, Chuyuan Kelly, Wei, Bob, Ramesh, Sangeetha, Holden, Khem, Kleiven, Kim, Rendleman, David, Kirmani, Sean, Bingham, Jeff, Weisz, Jon, Xu, Ying, Lu, Wenlong, Bennice, Matthew, Fong, Cody, Do, David, Lam, Jessica, Bai, Yunfei, Holson, Benjie, Quinlan, Michael, Brown, Noah, Kalakrishnan, Mrinal, Ibarz, Julian, Pastor, Peter, and Levine, Sergey
- Subjects
Computer Science - Robotics - Abstract
We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrapping from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects. The projects website and videos can be found at \href{http://rl-at-scale.github.io}{rl-at-scale.github.io}., Comment: Published at Robotics: Science and Systems 2023
- Published
- 2023
38. Potential for allocative harm in an environmental justice data tool
- Author
-
Huynh, Benjamin Q., Chin, Elizabeth T., Koenecke, Allison, Ouyang, Derek, Ho, Daniel E., Kiang, Mathew V., and Rehkopf, David H.
- Subjects
Statistics - Applications ,Computer Science - Computers and Society - Abstract
Neighborhood-level screening algorithms are increasingly being deployed to inform policy decisions. We evaluate one such algorithm, CalEnviroScreen - designed to promote environmental justice and used to guide hundreds of millions of dollars in public funding annually - assessing its potential for allocative harm. We observe the model to be sensitive to subjective model decisions, with 16% of tracts potentially changing designation, as well as financially consequential, estimating the effect of its positive designations as a 104% (62-145%) increase in funding, equivalent to \$2.08 billion (\$1.56-2.41 billion) over four years. We also observe allocative tradeoffs and susceptibility to manipulation, raising ethical concerns. We recommend incorporating sensitivity analyses to mitigate allocative harm and accountability mechanisms to prevent misuse.
- Published
- 2023
- Full Text
- View/download PDF
39. Artificial Intelligence for Adjudication: The Social Security Administration and AI Governance
- Author
-
Glaze, Kurt, Ho, Daniel E., Ray, Gerald K., Tsang, Christine, Bullock, Justin B., book editor, Chen, Yu-Che, book editor, Himmelreich, Johannes, book editor, Hudson, Valerie M., book editor, Korinek, Anton, book editor, Young, Matthew M., book editor, and Zhang, Baobao, book editor
- Published
- 2024
- Full Text
- View/download PDF
40. Occurrence and Fate of Microplastics in Anaerobic Digestion of Dewatered Sludge
- Author
-
Tang, Kuok Ho Daniel, Bhat, Sartaj Ahmad, editor, Kumar, Vineet, editor, Li, Fusheng, editor, and Kumar, Sunil, editor
- Published
- 2024
- Full Text
- View/download PDF
41. Terminal deoxynucleotidyl transferase and CD84 identify human multi-potent lymphoid progenitors
- Author
-
Kim, YeEun, Calderon, Ariel A., Favaro, Patricia, Glass, David R., Tsai, Albert G., Ho, Daniel, Borges, Luciene, Greenleaf, William J., and Bendall, Sean C.
- Published
- 2024
- Full Text
- View/download PDF
42. Navigating drug use, cessation, and recovery: a retrospective case notes review among sexual minority men at a community-based service in Singapore
- Author
-
Wah, Tzy Hyi, Ong, Adeline Jia Xin, Naidu, Kuhanesan N. C., Hanafi, Syaza, Tan, Kelvin, Tan, Alaric, Ong, Tricia Jia Jing, Ong, Eleanor, Ho, Daniel Weng Siong, Subramaniam, Mythily, See, Maha Yewtuck, and Tan, Rayner Kay Jin
- Published
- 2024
- Full Text
- View/download PDF
43. Horizontal gene transfer after faecal microbiota transplantation in adolescents with obesity
- Author
-
Behling, Anna H., Wilson, Brooke C., Ho, Daniel, Cutfield, Wayne S., Vatanen, Tommi, and O’Sullivan, Justin M.
- Published
- 2024
- Full Text
- View/download PDF
44. Mitigating allocative tradeoffs and harms in an environmental justice data tool
- Author
-
Huynh, Benjamin Q., Chin, Elizabeth T., Koenecke, Allison, Ouyang, Derek, Ho, Daniel E., Kiang, Mathew V., and Rehkopf, David H.
- Published
- 2024
- Full Text
- View/download PDF
45. Potential and benefits of biochar production: crop straw management and carbon emission mitigation in Shaanxi Province, China
- Author
-
Zhu, Jianchun, Yang, Chuanwen, Qiao, Mengyuan, Zhao, Tianyu, Emmanuel, II, Kevin Scriber, Tang, Kuok Ho Daniel, Wang, Hailong, Zhang, Zengqiang, Pan, Junting, Ren, Xiuna, and Li, Ronghua
- Published
- 2024
- Full Text
- View/download PDF
46. Estimating Racial Disparities When Race is Not Observed
- Author
-
McCartan, Cory, Fisher, Robin, Goldin, Jacob, Ho, Daniel E., and Imai, Kosuke
- Subjects
Statistics - Applications ,Computer Science - Computers and Society - Abstract
The estimation of racial disparities in various fields is often hampered by the lack of individual-level racial information. In many cases, the law prohibits the collection of such information to prevent direct racial discrimination. As a result, analysts have frequently adopted Bayesian Improved Surname Geocoding (BISG) and its variants, which combine individual names and addresses with Census data to predict race. Unfortunately, the residuals of BISG are often correlated with the outcomes of interest, generally attenuating estimates of racial disparities. To correct this bias, we propose an alternative identification strategy under the assumption that surname is conditionally independent of the outcome given (unobserved) race, residence location, and other observed characteristics. We introduce a new class of models, Bayesian Instrumental Regression for Disparity Estimation (BIRDiE), that take BISG probabilities as inputs and produce racial disparity estimates by using surnames as an instrumental variable for race. Our estimation method is scalable, making it possible to analyze large-scale administrative data. We also show how to address potential violations of the key identification assumptions. A validation study based on the North Carolina voter file shows that BISG+BIRDiE reduces error by up to 84% when estimating racial differences in party registration. Finally, we apply the proposed methodology to estimate racial differences in who benefits from the home mortgage interest deduction using individual-level tax data from the U.S. Internal Revenue Service. Open-source software is available which implements the proposed methodology., Comment: 28 pages, 9 figures, plus references and appendices
- Published
- 2023
47. Asking for Help: Failure Prediction in Behavioral Cloning through Value Approximation
- Author
-
Gokmen, Cem, Ho, Daniel, and Khansari, Mohi
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,I.2.9 - Abstract
Recent progress in end-to-end Imitation Learning approaches has shown promising results and generalization capabilities on mobile manipulation tasks. Such models are seeing increasing deployment in real-world settings, where scaling up requires robots to be able to operate with high autonomy, i.e. requiring as little human supervision as possible. In order to avoid the need for one-on-one human supervision, robots need to be able to detect and prevent policy failures ahead of time, and ask for help, allowing a remote operator to supervise multiple robots and help when needed. However, the black-box nature of end-to-end Imitation Learning models such as Behavioral Cloning, as well as the lack of an explicit state-value representation, make it difficult to predict failures. To this end, we introduce Behavioral Cloning Value Approximation (BCVA), an approach to learning a state value function based on and trained jointly with a Behavioral Cloning policy that can be used to predict failures. We demonstrate the effectiveness of BCVA by applying it to the challenging mobile manipulation task of latched-door opening, showing that we can identify failure scenarios with with 86% precision and 81% recall, evaluated on over 2000 real world runs, improving upon the baseline of simple failure classification by 10 percentage-points., Comment: Accepted to the 2023 IEEE International Conference on Robotics and Automation (ICRA 2023)
- Published
- 2023
48. Polyunsaturated fatty acid-bound alpha-fetoprotein promotes immune suppression by altering human dendritic cell metabolism
- Author
-
Munson, Paul V, Adamik, Juraj, Hartmann, Felix J, Favaro, Patricia MB, Ho, Daniel, Bendall, Sean C, Combes, Alexis J, Krummel, Matthew F, Zhang, Karen, Kelley, Robin K, and Butterfield, Lisa H
- Subjects
Cancer ,Inflammatory and immune system ,Humans ,alpha-Fetoproteins ,Liver Neoplasms ,Fatty Acids ,Unsaturated ,Fatty Acids ,Biomarkers ,Dendritic Cells ,Oncology and Carcinogenesis ,Oncology & Carcinogenesis - Abstract
α-Fetoprotein (AFP) is expressed by stem-like and poor outcome hepatocellular cancer tumors and is a clinical tumor biomarker. AFP has been demonstrated to inhibit dendritic cell (DC) differentiation and maturation and to block oxidative phosphorylation. To identify the critical metabolic pathways leading to human DC functional suppression, here, we used two recently described single-cell profiling methods, scMEP (single-cell metabolic profiling) and SCENITH (single-cell energetic metabolism by profiling translation inhibition). Glycolytic capacity and glucose dependence of DCs were significantly increased by tumor-derived, but not normal cord blood-derived, AFP, leading to increased glucose uptake and lactate secretion. Key molecules in the electron transport chain in particular were regulated by tumor-derived AFP. These metabolic changes occurred at mRNA and protein levels, with negative impact on DC stimulatory capacity. Tumor-derived AFP bound significantly more polyunsaturated fatty acids (PUFA) than cord blood-derived AFP. PUFAs bound to AFP increased metabolic skewing and promoted DC functional suppression. PUFAs inhibited DC differentiation in vitro, and ω-6 PUFAs conferred potent immunoregulation when bound to tumor-derived AFP. Together, these findings provide mechanistic insights into how AFP antagonizes the innate immune response to limit antitumor immunity.Significanceα-Fetoprotein (AFP) is a secreted tumor protein and biomarker with impact on immunity. Fatty acid-bound AFP promotes immune suppression by skewing human dendritic cell metabolism toward glycolysis and reduced immune stimulation.
- Published
- 2023
49. Iterative Learning Control for Network Systems Under Constrained Information Communication
- Author
-
Xiong, Wenjun, primary, Luo, Zijian, additional, and Ho, Daniel W. C., additional
- Published
- 2024
- Full Text
- View/download PDF
50. Secure Fusion Estimation Against FDI Sensor Attacks in Cyber-Physical Systems
- Author
-
Chen, Bo, Weng, Pindi, Ho, Daniel W. C., and Yu, Li
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
This paper is concerned with the problem of secure multi-sensors fusion estimation for cyber-physical systems, where sensor measurements may be tampered with by false data injection (FDI) attacks. In this work, it is considered that the adversary may not be able to attack all sensors. That is, several sensors remain not being attacked. In this case, new local reorganized subsystems including the FDI attack signals and un-attacked sensor measurements are constructed by the augmentation method. Then, a joint Kalman fusion estimator is designed under linear minimum variance sense to estimate the system state and FDI attack signals simultaneously. Finally, illustrative examples are employed to show the effectiveness and advantages of the proposed methods., Comment: 10 pages, 5 figures; the first version of this manuscript was completed on 2020
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.