Author: "Yokoi, Sho" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Yokoi, Sho"' showing total 42 results

Start Over Author "Yokoi, Sho"

42 results on '"Yokoi, Sho"'

1. Zipfian Whitening

Author: Yokoi, Sho, Bao, Han, Kurita, Hiroto, and Shimodaira, Hidetoshi
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The word embedding space in neural models is skewed, and correcting this can improve task performance. We point out that most approaches for modeling, correcting, and measuring the symmetry of an embedding space implicitly assume that the word frequencies are uniform; in reality, word frequencies follow a highly non-uniform distribution, known as Zipf's law. Surprisingly, simply performing PCA whitening weighted by the empirical word frequency that follows Zipf's law significantly improves task performance, surpassing established baselines. From a theoretical perspective, both our approach and existing methods can be clearly categorized: word representations are distributed according to an exponential family with either uniform or Zipfian base measures. By adopting the latter approach, we can naturally emphasize informative low-frequency words in terms of their vector norm, which becomes evident from the information-geometric perspective, and in terms of the loss functions for imbalanced classification. Additionally, our theory corroborates that popular natural language processing methods, such as skip-gram negative sampling, WhiteningBERT, and headless language models, work well just because their word embeddings encode the empirical word frequency into the underlying probabilistic model., Comment: NeurIPS 2024
Published: 2024

2. Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words

Author: Kurita, Hiroto, Kobayashi, Goro, Yokoi, Sho, and Inui, Kentaro
Subjects: Computer Science - Computation and Language
Abstract: The performance of sentence encoders can be significantly improved through the simple practice of fine-tuning using contrastive loss. A natural question arises: what characteristics do models acquire during contrastive learning? This paper theoretically and experimentally shows that contrastive-based sentence encoders implicitly weight words based on information-theoretic quantities; that is, more informative words receive greater weight, while others receive less. The theory states that, in the lower bound of the optimal value of the contrastive learning objective, the norm of word embedding reflects the information gain associated with the distribution of surrounding words. We also conduct comprehensive experiments using various models, multiple datasets, two methods to measure the implicit weighting of models (Integrated Gradients and SHAP), and two information-theoretic quantities (information gain and self-information). The results provide empirical evidence that contrastive fine-tuning emphasizes informative words., Comment: 16 pages, 6 figures, accepted to EMNLP 2023 Findings (short paper)
Published: 2023

3. Unbalanced Optimal Transport for Unbalanced Word Alignment

Author: Arase, Yuki, Bao, Han, and Yokoi, Sho
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Monolingual word alignment is crucial to model semantic interactions between sentences. In particular, null alignment, a phenomenon in which words have no corresponding counterparts, is pervasive and critical in handling semantically divergent sentences. Identification of null alignment is useful on its own to reason about the semantic similarity of sentences by indicating there exists information inequality. To achieve unbalanced word alignment that values both alignment and null alignment, this study shows that the family of optimal transport (OT), i.e., balanced, partial, and unbalanced OT, are natural and powerful approaches even without tailor-made techniques. Our extensive experiments covering unsupervised and supervised settings indicate that our generic OT-based alignment methods are competitive against the state-of-the-arts specially designed for word alignment, remarkably on challenging datasets with high null alignment frequencies., Comment: Accepted for the Annual Meeting of the Association for Computational Linguistics (ACL 2023)
Published: 2023

4. Transformer Language Models Handle Word Frequency in Prediction Head

Author: Kobayashi, Goro, Kuribayashi, Tatsuki, Yokoi, Sho, and Inui, Kentaro
Subjects: Computer Science - Computation and Language
Abstract: Prediction head is a crucial component of Transformer language models. Despite its direct impact on prediction, this component has often been overlooked in analyzing Transformers. In this study, we investigate the inner workings of the prediction head, specifically focusing on bias parameters. Our experiments with BERT and GPT-2 models reveal that the biases in their word prediction heads play a significant role in the models' ability to reflect word frequency in a corpus, aligning with the logit adjustment method commonly used in long-tailed learning. We also quantify the effect of controlling the biases in practical auto-regressive text generation scenarios; under a particular setting, more diverse text can be generated without compromising text quality., Comment: 11 pages, 12 figures, accepted to ACL 2023 Findings (short paper)
Published: 2023

5. Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps

Author: Kobayashi, Goro, Kuribayashi, Tatsuki, Yokoi, Sho, and Inui, Kentaro
Subjects: Computer Science - Computation and Language
Abstract: Transformers are ubiquitous in wide tasks. Interpreting their internals is a pivotal goal. Nevertheless, their particular components, feed-forward (FF) blocks, have typically been less analyzed despite their substantial parameter amounts. We analyze the input contextualization effects of FF blocks by rendering them in the attention maps as a human-friendly visualization scheme. Our experiments with both masked- and causal-language models reveal that FF networks modify the input contextualization to emphasize specific types of linguistic compositions. In addition, FF and its surrounding components tend to cancel out each other's effects, suggesting potential redundancy in the processing of the Transformer layer., Comment: ICLR 2024 Spotlight; 37 pages, 32 figures, 3 tables
Published: 2023

6. Norm of Word Embedding Encodes Information Gain

Author: Oyama, Momose, Yokoi, Sho, and Shimodaira, Hidetoshi
Subjects: Computer Science - Computation and Language
Abstract: Distributed representations of words encode lexical semantic information, but what type of information is encoded and how? Focusing on the skip-gram with negative-sampling method, we found that the squared norm of static word embedding encodes the information gain conveyed by the word; the information gain is defined by the Kullback-Leibler divergence of the co-occurrence distribution of the word to the unigram distribution. Our findings are explained by the theoretical framework of the exponential family of probability distributions and confirmed through precise experiments that remove spurious correlations arising from word frequency. This theory also extends to contextualized word embeddings in language models or any neural networks with the softmax output layer. We also demonstrate that both the KL divergence and the squared norm of embedding provide a useful metric of the informativeness of a word in tasks such as keyword extraction, proper-noun discrimination, and hypernym discrimination., Comment: 23 pages, EMNLP 2023
Published: 2022

7. Improving word mover's distance by leveraging self-attention matrix

Author: Yamagiwa, Hiroaki, Yokoi, Sho, and Shimodaira, Hidetoshi
Subjects: Computer Science - Computation and Language
Abstract: Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it challenging to distinguish sentences with significant overlaps of similar words, even if they are semantically very different. Here, we attempt to improve WMD by incorporating the sentence structure represented by BERT's self-attention matrix (SAM). The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences. Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity. Our code is available at \url{https://github.com/ymgw55/WSMD}., Comment: 24 pages, accepted to EMNLP 2023 Findings
Published: 2022

8. Subspace Representations for Soft Set Operations and Sentence Similarities

Author: Ishibashi, Yoichi, Yokoi, Sho, Sudoh, Katsuhito, and Nakamura, Satoshi
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: In the field of natural language processing (NLP), continuous vector representations are crucial for capturing the semantic meanings of individual words. Yet, when it comes to the representations of sets of words, the conventional vector-based approaches often struggle with expressiveness and lack the essential set operations such as union, intersection, and complement. Inspired by quantum logic, we realize the representation of word sets and corresponding set operations within pre-trained word embedding spaces. By grounding our approach in the linear subspaces, we enable efficient computation of various set operations and facilitate the soft computation of membership functions within continuous spaces. Moreover, we allow for the computation of the F-score directly within word vectors, thereby establishing a direct link to the assessment of sentence similarity. In experiments with widely-used pre-trained embeddings and benchmarks, we show that our subspace-based set operations consistently outperform vector-based ones in both sentence similarity and set retrieval tasks., Comment: Accepted at NAACL 2024
Published: 2022

9. Instance-Based Neural Dependency Parsing

Author: Ouchi, Hiroki, Suzuki, Jun, Kobayashi, Sosuke, Yokoi, Sho, Kuribayashi, Tatsuki, Yoshikawa, Masashi, and Inui, Kentaro
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Interpretable rationales for model predictions are crucial in practical applications. We develop neural models that possess an interpretable inference process for dependency parsing. Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set. The training edges are explicitly used for the predictions; thus, it is easy to grasp the contribution of each edge to the predictions. Our experiments show that our instance-based models achieve competitive accuracy with standard neural models and have the reasonable plausibility of instance-based explanations., Comment: 15 pages, accepted to TACL 2021
Published: 2021

10. Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

Author: Kobayashi, Goro, Kuribayashi, Tatsuki, Yokoi, Sho, and Inui, Kentaro
Subjects: Computer Science - Computation and Language
Abstract: Transformer architecture has become ubiquitous in the natural language processing field. To interpret the Transformer-based models, their attention patterns have been extensively analyzed. However, the Transformer architecture is not only composed of the multi-head attention; other components can also contribute to Transformers' progressive performance. In this study, we extended the scope of the analysis of Transformers from solely the attention patterns to the whole attention block, i.e., multi-head attention, residual connection, and layer normalization. Our analysis of Transformer-based masked language models shows that the token-to-token interaction performed via attention has less impact on the intermediate representations than previously assumed. These results provide new intuitive explanations of existing reports; for example, discarding the learned attention patterns tends not to adversely affect the performance. The codes of our experiments are publicly available., Comment: 22 pages, accepted to EMNLP 2021 main conference
Published: 2021

11. Revisiting Additive Compositionality: AND, OR and NOT Operations with Word Embeddings

Author: Naito, Masahiro, Yokoi, Sho, Kim, Geewook, and Shimodaira, Hidetoshi
Subjects: Computer Science - Computation and Language, 68T50
Abstract: It is well-known that typical word embedding methods such as Word2Vec and GloVe have the property that the meaning can be composed by adding up the embeddings (additive compositionality). Several theories have been proposed to explain additive compositionality, but the following questions remain unanswered: (Q1) The assumptions of those theories do not hold for the practical word embedding. (Q2) Ordinary additive compositionality can be seen as an AND operation of word meanings, but it is not well understood how other operations, such as OR and NOT, can be computed by the embeddings. We address these issues by the idea of frequency-weighted centering at its core. This paper proposes a post-processing method for bridging the gap between practical word embedding and the assumption of theory about additive compositionality as an answer to (Q1). It also gives a method for taking OR or NOT of the meaning by linear operation of word embedding as an answer to (Q2). Moreover, we confirm experimentally that the accuracy of AND operation, i.e., the ordinary additive compositionality, can be improved by our post-processing method (3.5x improvement in top-100 accuracy) and that OR and NOT operations can be performed correctly., Comment: 13pages; v1: accepted at ACL-IJCNLP 2021 Student Research Workshop; v2: minor revision
Published: 2021

12. Computationally Efficient Wasserstein Loss for Structured Labels

Author: Toyokuni, Ayato, Yokoi, Sho, Kashima, Hisashi, and Yamada, Makoto
Subjects: Computer Science - Machine Learning
Abstract: The problem of estimating the probability distribution of labels has been widely studied as a label distribution learning (LDL) problem, whose applications include age estimation, emotion analysis, and semantic segmentation. We propose a tree-Wasserstein distance regularized LDL algorithm, focusing on hierarchical text classification tasks. We propose predicting the entire label hierarchy using neural networks, where the similarity between predicted and true labels is measured using the tree-Wasserstein distance. Through experiments using synthetic and real-world datasets, we demonstrate that the proposed method successfully considers the structure of labels during training, and it compares favorably with the Sinkhorn algorithm in terms of computation time and memory usage.
Published: 2021

13. Efficient Estimation of Influence of a Training Instance

Author: Kobayashi, Sosuke, Yokoi, Sho, Suzuki, Jun, and Inui, Kentaro
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Understanding the influence of a training instance on a neural network model leads to improving interpretability. However, it is difficult and inefficient to evaluate the influence, which shows how a model's prediction would be changed if a training instance were not used. In this paper, we propose an efficient method for estimating the influence. Our method is inspired by dropout, which zero-masks a sub-network and prevents the sub-network from learning each training instance. By switching between dropout masks, we can use sub-networks that learned or did not learn each training instance and estimate its influence. Through experiments with BERT and VGGNet on classification datasets, we demonstrate that the proposed method can capture training influences, enhance the interpretability of error predictions, and cleanse the training dataset for improving generalization., Comment: This is an extended version of the paper presented at SustaiNLP 2020
Published: 2020

14. Modeling Event Salience in Narratives via Barthes' Cardinal Functions

Author: Otake, Takaki, Yokoi, Sho, Inoue, Naoya, Takahashi, Ryo, Kuribayashi, Tatsuki, and Inui, Kentaro
Subjects: Computer Science - Computation and Language
Abstract: Events in a narrative differ in salience: some are more important to the story than others. Estimating event salience is useful for tasks such as story generation, and as a tool for text analysis in narratology and folkloristics. To compute event salience without any annotations, we adopt Barthes' definition of event salience and propose several unsupervised methods that require only a pre-trained language model. Evaluating the proposed methods on folktales with event salience annotation, we show that the proposed methods outperform baseline methods and find fine-tuning a language model on narrative texts is a key factor in improving the proposed methods., Comment: accepted to COLING 2020
Published: 2020

15. Evaluation of Similarity-based Explanations

Author: Hanawa, Kazuaki, Yokoi, Sho, Hara, Satoshi, and Inui, Kentaro
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Explaining the predictions made by complex machine learning models helps users to understand and accept the predicted outputs with confidence. One promising way is to use similarity-based explanation that provides similar instances as evidence to support model predictions. Several relevance metrics are used for this purpose. In this study, we investigated relevance metrics that can provide reasonable explanations to users. Specifically, we adopted three tests to evaluate whether the relevance metrics satisfy the minimal requirements for similarity-based explanation. Our experiments revealed that the cosine similarity of the gradients of the loss performs best, which would be a recommended choice in practice. In addition, we showed that some metrics perform poorly in our tests and analyzed the reasons of their failure. We expect our insights to help practitioners in selecting appropriate relevance metrics and also aid further researches for designing better relevance metrics for explanations., Comment: ICLR 2021
Published: 2020

16. Word Rotator's Distance

Author: Yokoi, Sho, Takahashi, Ryo, Akama, Reina, Suzuki, Jun, and Inui, Kentaro
Subjects: Computer Science - Computation and Language
Abstract: A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. Alignment-based approaches do not distinguish them, whereas sentence-vector approaches automatically use the norm as the word importance. Accordingly, we propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity using earth mover's distance (i.e., optimal transport cost), which we refer to as word rotator's distance. Besides, we find how to grow the norm and direction of word vectors (vector converter), which is a new systematic approach derived from sentence-vector estimation methods. On several textual similarity datasets, the combination of these simple proposed methods outperformed not only alignment-based approaches but also strong baselines. The source code is available at https://github.com/eumesy/wrd, Comment: 17 pages, accepted at EMNLP 2020
Published: 2020

17. Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition

Author: Ouchi, Hiroki, Suzuki, Jun, Kobayashi, Sosuke, Yokoi, Sho, Kuribayashi, Tatsuki, Konno, Ryuto, and Inui, Kentaro
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Interpretable rationales for model predictions play a critical role in practical applications. In this study, we develop models possessing interpretable inference process for structured prediction. Specifically, we present a method of instance-based learning that learns similarities between spans. At inference time, each span is assigned a class label based on its similar spans in the training set, where it is easy to understand how much each training instance contributes to the predictions. Through empirical analysis on named entity recognition, we demonstrate that our method enables to build models that have high interpretability without sacrificing performance., Comment: Accepted by ACL2020
Published: 2020

18. Filtering Noisy Dialogue Corpora by Connectivity and Content Relatedness

Author: Akama, Reina, Yokoi, Sho, Suzuki, Jun, and Inui, Kentaro
Subjects: Computer Science - Computation and Language
Abstract: Large-scale dialogue datasets have recently become available for training neural dialogue agents. However, these datasets have been reported to contain a non-negligible number of unacceptable utterance pairs. In this paper, we propose a method for scoring the quality of utterance pairs in terms of their connectivity and relatedness. The proposed scoring method is designed based on findings widely shared in the dialogue and linguistics research communities. We demonstrate that it has a relatively good correlation with the human judgment of dialogue quality. Furthermore, the method is applied to filter out potentially unacceptable utterance pairs from a large-scale noisy dialogue corpus to ensure its quality. We experimentally confirm that training data filtered by the proposed method improves the quality of neural dialogue agents in response generation., Comment: 18 pages, Accepted at The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
Published: 2020

19. Attention is Not Only a Weight: Analyzing Transformers with Vector Norms

Author: Kobayashi, Goro, Kuribayashi, Tatsuki, Yokoi, Sho, and Inui, Kentaro
Subjects: Computer Science - Computation and Language
Abstract: Attention is a key component of Transformers, which have recently achieved considerable success in natural language processing. Hence, attention is being extensively studied to investigate various linguistic capabilities of Transformers, focusing on analyzing the parallels between attention weights and specific linguistic phenomena. This paper shows that attention weights alone are only one of the two factors that determine the output of attention and proposes a norm-based analysis that incorporates the second factor, the norm of the transformed input vectors. The findings of our norm-based analyses of BERT and a Transformer-based neural machine translation system include the following: (i) contrary to previous studies, BERT pays poor attention to special tokens, and (ii) reasonable word alignment can be extracted from attention mechanisms of Transformer. These findings provide insights into the inner workings of Transformers., Comment: 19 pages, accepted by EMNLP 2020
Published: 2020

20. Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions

Author: Yokoi, Sho, Kobayashi, Sosuke, Fukumizu, Kenji, Suzuki, Jun, and Inui, Kentaro
Subjects: Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: In this paper, we propose a new kernel-based co-occurrence measure that can be applied to sparse linguistic expressions (e.g., sentences) with a very short learning time, as an alternative to pointwise mutual information (PMI). As well as deriving PMI from mutual information, we derive this new measure from the Hilbert--Schmidt independence criterion (HSIC); thus, we call the new measure the pointwise HSIC (PHSIC). PHSIC can be interpreted as a smoothed variant of PMI that allows various similarity metrics (e.g., sentence embeddings) to be plugged in as kernels. Moreover, PHSIC can be estimated by simple and fast (linear in the size of the data) matrix calculations regardless of whether we use linear or nonlinear kernels. Empirically, in a dialogue response selection task, PHSIC is learned thousands of times faster than an RNN-based PMI while outperforming PMI in accuracy. In addition, we also demonstrate that PHSIC is beneficial as a criterion of a data selection task for machine translation owing to its ability to give high (low) scores to a consistent (inconsistent) pair with other pairs., Comment: Accepted by EMNLP 2018
Published: 2018
Full Text: View/download PDF

21. Unsupervised Learning of Style-sensitive Word Vectors

Author: Akama, Reina, Watanabe, Kento, Yokoi, Sho, Kobayashi, Sosuke, and Inui, Kentaro
Subjects: Computer Science - Computation and Language
Abstract: This paper presents the first study aimed at capturing stylistic similarity between words in an unsupervised manner. We propose extending the continuous bag of words (CBOW) model (Mikolov et al., 2013) to learn style-sensitive word vectors using a wider context window under the assumption that the style of all the words in an utterance is consistent. In addition, we introduce a novel task to predict lexical stylistic similarity and to create a benchmark dataset for this task. Our experiment with this dataset supports our assumption and demonstrates that the proposed extensions contribute to the acquisition of style-sensitive word embeddings., Comment: 7 pages, Accepted at The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018)
Published: 2018

22. Contrastive Learning-based Sentence Encoders Implicitly Weight Informative Words

Author: Kurita, Hiroto, primary, Kobayashi, Goro, additional, Yokoi, Sho, additional, and Inui, Kentaro, additional
Published: 2023
Full Text: View/download PDF

23. Norm of Word Embedding Encodes Information Gain

Author: Oyama, Momose, primary, Yokoi, Sho, additional, and Shimodaira, Hidetoshi, additional
Published: 2023
Full Text: View/download PDF

24. Unbalanced Optimal Transport for Unbalanced Word Alignment

Author: Arase, Yuki, primary, Bao, Han, additional, and Yokoi, Sho, additional
Published: 2023
Full Text: View/download PDF

25. Improving word mover’s distance by leveraging self-attention matrix

Author: Yamagiwa, Hiroaki, primary, Yokoi, Sho, additional, and Shimodaira, Hidetoshi, additional
Published: 2023
Full Text: View/download PDF

26. Transformer Language Models Handle Word Frequency in Prediction Head

Author: Kobayashi, Goro, primary, Kuribayashi, Tatsuki, additional, Yokoi, Sho, additional, and Inui, Kentaro, additional
Published: 2023
Full Text: View/download PDF

27. Subspace-based Set Operations on a Pre-trained Word Embedding Space

Author: Ishibashi, Yoichi, Yokoi, Sho, Sudoh, Katsuhito, Nakamura, Satoshi, Ishibashi, Yoichi, Yokoi, Sho, Sudoh, Katsuhito, and Nakamura, Satoshi
Abstract: Word embedding is a fundamental technology in natural language processing. It is often exploited for tasks using sets of words, although standard methods for representing word sets and set operations remain limited. If we can leverage the advantage of word embedding for such set operations, we can calculate sentence similarity and find words that effectively share a concept with a given word set in a straightforward way. In this study, we formulate representations of sets and set operations in a pre-trained word embedding space. Inspired by \textit{quantum logic}, we propose a novel formulation of set operations using subspaces in a pre-trained word embedding space. Based on our definitions, we propose two metrics based on the degree to which a word belongs to a set and the similarity between embedding two sets. Our experiments with Text Concept Set Retrieval and Semantic Textual Similarity tasks demonstrated the effectiveness of our proposed method.
Published: 2022

28. 言語処理100本ノック

Author: Okazaki, Naoaki, Kiyono, Shun, Takahashi, Ryo, and Yokoi, Sho
Published: 2020

29. Impedance-Tuned Ultrasound Vibration Wireless Power Transfer System for Biomedical Devices

Author: Yokoi, Sho, primary and Mishima, Tomokazu, additional
Published: 2021
Full Text: View/download PDF

30. Computationally Efficient Wasserstein Loss for Structured Labels

Author: Toyokuni, Ayato, primary, Yokoi, Sho, additional, Kashima, Hisashi, additional, and Yamada, Makoto, additional
Published: 2021
Full Text: View/download PDF

31. Instance-Based Neural Dependency Parsing

Author: Ouchi, Hiroki, primary, Suzuki, Jun, additional, Kobayashi, Sosuke, additional, Yokoi, Sho, additional, Kuribayashi, Tatsuki, additional, Yoshikawa, Masashi, additional, and Inui, Kentaro, additional
Published: 2021
Full Text: View/download PDF

32. NLP 100 Exercise

Author: Okazaki, Naoaki, primary, Kiyono, Shun, additional, Takahashi, Ryo, additional, and Yokoi, Sho, additional
Published: 2020
Full Text: View/download PDF

33. Efficient Estimation of Influence of a Training Instance

Author: Kobayashi, Sosuke, primary, Yokoi, Sho, additional, Suzuki, Jun, additional, and Inui, Kentaro, additional
Published: 2020
Full Text: View/download PDF

34. Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition

Author: Ouchi, Hiroki, primary, Suzuki, Jun, additional, Kobayashi, Sosuke, additional, Yokoi, Sho, additional, Kuribayashi, Tatsuki, additional, Konno, Ryuto, additional, and Inui, Kentaro, additional
Published: 2020
Full Text: View/download PDF

35. Attention is Not Only a Weight: Analyzing Transformers with Vector Norms

Author: Kobayashi, Goro, primary, Kuribayashi, Tatsuki, additional, Yokoi, Sho, additional, and Inui, Kentaro, additional
Published: 2020
Full Text: View/download PDF

36. Word Rotator’s Distance

Author: Yokoi, Sho, primary, Takahashi, Ryo, additional, Akama, Reina, additional, Suzuki, Jun, additional, and Inui, Kentaro, additional
Published: 2020
Full Text: View/download PDF

37. Modeling Event Salience in Narratives via Barthes’ Cardinal Functions

Author: Otake, Takaki, primary, Yokoi, Sho, additional, Inoue, Naoya, additional, Takahashi, Ryo, additional, Kuribayashi, Tatsuki, additional, and Inui, Kentaro, additional
Published: 2020
Full Text: View/download PDF

38. Filtering Noisy Dialogue Corpora by Connectivity and Content Relatedness

Author: Akama, Reina, primary, Yokoi, Sho, additional, Suzuki, Jun, additional, and Inui, Kentaro, additional
Published: 2020
Full Text: View/download PDF

39. Pointwise HSIC: A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions

Author: Yokoi, Sho, primary, Kobayashi, Sosuke, additional, Fukumizu, Kenji, additional, Suzuki, Jun, additional, and Inui, Kentaro, additional
Published: 2018
Full Text: View/download PDF

40. Unsupervised Learning of Style-sensitive Word Vectors

Author: Akama, Reina, primary, Watanabe, Kento, additional, Yokoi, Sho, additional, Kobayashi, Sosuke, additional, and Inui, Kentaro, additional
Published: 2018
Full Text: View/download PDF

41. Learning Co-Substructures by Kernel Dependence Maximization

Author: Yokoi, Sho, primary, Mochihashi, Daichi, additional, Takahashi, Ryo, additional, Okazaki, Naoaki, additional, and Inui, Kentaro, additional
Published: 2017
Full Text: View/download PDF

42. Link Prediction in Sparse Networks by Incidence Matrix Factorization

Author: Yokoi, Sho, primary, Kajino, Hiroshi, additional, and Kashima, Hisashi, additional
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

42 results on '"Yokoi, Sho"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources