Author: "Uehara, Masatoshi" / Publication Type: Academic Journals - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Uehara, Masatoshi"' showing total 6 results

Start Over Author "Uehara, Masatoshi" Publication Type Academic Journals

6 results on '"Uehara, Masatoshi"'

1. Efficient evaluation of natural stochastic policies in off-line reinforcement learning.

Author: Kallus, Nathan and Uehara, Masatoshi
Subjects: *REINFORCEMENT learning, *MOBILE health
Abstract: We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the unknown behaviour policy. This is a departure from the literature on off-policy evaluation that largely considers the evaluation of explicitly specified policies. Crucially, off-line reinforcement learning with natural stochastic policies can help alleviate issues of weak overlap, lead to policies that build upon current practice and improve policies' implementability in practice. Compared with the classic case of a prespecified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown. In this paper we derive the efficiency bounds of two major types of natural stochastic policies: tilting policies and modified treatment policies. We then propose efficient nonparametric estimators that attain the efficiency bounds under lax conditions and enjoy a partial double robustness property. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Statistical inference with semiparametric nonignorable nonresponse models.

Author: Uehara, Masatoshi, Lee, Danhyang, and Kim, Jae‐Kwang
Subjects: *MAXIMUM likelihood statistics, *PARAMETRIC modeling, *MISSING data (Statistics), *STATISTICS, *INCOME
Abstract: How to deal with nonignorable response is often a challenging problem encountered in statistical analysis with missing data. Parametric model assumption for the response mechanism is sensitive to model misspecification. We consider a semiparametric response model that relaxes the parametric model assumption in the response mechanism. Two types of efficient estimators, profile maximum likelihood estimator and profile calibration estimator, are proposed, and their asymptotic properties are investigated. Two extensive simulation studies are used to compare with some existing methods. We present an application of our method using data from the Korean Labor and Income Panel Survey. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning.

Author: Kallus, Nathan and Uehara, Masatoshi
Subjects: REINFORCEMENT learning, MARKOV processes, INFORMATION storage & retrieval systems
Abstract: Demystifying the Curse of Horizon in Offline Reinforcement Learning in Order to Break It Offline reinforcement learning (RL), where we evaluate and learn new policies using existing off-policy data, is crucial in applications where experimentation is challenging and simulation unreliable, such as medicine. It is also notoriously difficult because the similarity (density ratio) between observed trajectories and those generated by any new policy diminishes exponentially as the horizon grows, known as the curse of horizon, which severely limits the application of offline RL whenever horizons are moderately long or even infinite. In "Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning," Kallus and Uehara set out to understand these limits and when they can be broken. They precisely characterize the curse by deriving the semiparametric efficiency lower bounds for the policy-value estimation problem in different models. On the one hand, this shows why the curse necessarily plagues standard estimators: they work even in non-Markov models and therefore must be limited by the corresponding bound. On the other hand, greater efficiency is possible in certain Markovian models, and they give the first estimator achieving these much lower efficiency bounds in infinite-horizon Markov decision processes. Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long- and infinite-horizon settings due to diminishing overlap between behavior and target policies. In this paper, we study the role of Markovian and time-invariant structure in efficient OPE. We first derive the efficiency bounds and efficient influence functions for OPE when one assumes each of these structures. This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible in the near-on-policy setting, where behavior and target policies are sufficiently similar. But, in time-invariant Markov decision processes, our bounds show that truly off-policy evaluation is feasible, even with only just one dependent trajectory, and provide the limits of how well we could hope to do. We develop a new estimator based on double reinforcement learning (DRL) that leverages this structure for OPE. Our DRL estimator simultaneously uses estimated stationary density ratios and q-functions and remains efficient when both are estimated at slow, nonparametric rates and remains consistent when either is estimated consistently. We investigate these properties and the performance benefits of leveraging the problem structure for more efficient OPE. Funding: This work was supported by the National Science Foundation Division of Information and Intelligent Systems [1846210], and by theMasason Foundation. Supplemental Material: The online appendices are available at https://doi.org/10.1287/opre.2021.2249. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

4. Information criteria for non-normalized models.

Author: Matsuda, Takeru, Uehara, Masatoshi, and Hyvärinen, Aapo
Subjects: *MAXIMUM likelihood statistics, *STATISTICAL models
Abstract: Many statistical models are given in the form of non-normalized densities with an intractable normalization constant. Since maximum likelihood estimation is computationally intensive for these models, several estimation methods have been developed which do not require explicit computation of the normalization constant, such as noise contrastive estimation (NCE) and score matching. However, model selection methods for general nonnormalized models have not been proposed so far. In this study, we develop information criteria for non-normalized models estimated by NCE or score matching. They are approximately unbiased estimators of discrepancy measures for non-normalized models. Simulation results and applications to real data demonstrate that the proposed criteria enable selection of the appropriate non-normalized model in a data-driven manner. [ABSTRACT FROM AUTHOR]
Published: 2021

5. Correlation between selectivity in photochemical cross-cycloaddition and ionization potential of the reacting olefins.

Author: Ishigami, Tatsuzo, Uehara, Masatoshi, Murata, Tomoji, and Endo, Tadashi
Published: 1978
Full Text: View/download PDF

6. Marked dependence of multiplicity in direct [formula omitted], [formula omitted] photoisomerization of a series of methyl cinnamtes on their [formula omitted]-substituents

Author: Ishigami, Tatsuzo, Nakazato, Kenji, Uehara, Masatoshi, and Endo, Tadashi
Published: 1979
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

6 results on '"Uehara, Masatoshi"'

1. Efficient evaluation of natural stochastic policies in off-line reinforcement learning.

2. Statistical inference with semiparametric nonignorable nonresponse models.

3. Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning.

4. Information criteria for non-normalized models.

5. Correlation between selectivity in photochemical cross-cycloaddition and ionization potential of the reacting olefins.

6. Marked dependence of multiplicity in direct [formula omitted], [formula omitted] photoisomerization of a series of methyl cinnamtes on their [formula omitted]-substituents

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

6 results on '"Uehara, Masatoshi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources