Author: "Baykara, Batuhan" / Topic: turkish language - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Baykara, Batuhan"' showing total 2 results

Start Over Author "Baykara, Batuhan" Topic turkish language

2 results on '"Baykara, Batuhan"'

1. Turkish abstractive text summarization using pretrained sequence-to-sequence models.

Author: Baykara, Batuhan and Güngör, Tunga
Subjects: TEXT summarization, LANGUAGE models, TURKISH language, ENGLISH language
Abstract: The tremendous amount of increase in the number of documents available on the Web has turned finding the relevant piece of information into a challenging, tedious, and time-consuming activity. Accordingly, automatic text summarization has become an important field of study by gaining significant attention from the researchers. Lately, with the advances in deep learning, neural abstractive text summarization with sequence-to-sequence (Seq2Seq) models has gained popularity. There have been many improvements in these models such as the use of pretrained language models (e.g., GPT, BERT, and XLM) and pretrained Seq2Seq models (e.g., BART and T5). These improvements have addressed certain shortcomings in neural summarization and have improved upon challenges such as saliency, fluency, and semantics which enable generating higher quality summaries. Unfortunately, these research attempts were mostly limited to the English language. Monolingual BERT models and multilingual pretrained Seq2Seq models have been released recently providing the opportunity to utilize such state-of-the-art models in low-resource languages such as Turkish. In this study, we make use of pretrained Seq2Seq models and obtain state-of-the-art results on the two large-scale Turkish datasets, TR-News and MLSum, for the text summarization task. Then, we utilize the title information in the datasets and establish hard baselines for the title generation task on both datasets. We show that the input to the models has a substantial amount of importance for the success of such tasks. Additionally, we provide extensive analysis of the models including cross-dataset evaluations, various text generation options, and the effect of preprocessing in ROUGE evaluations for Turkish. It is shown that the monolingual BERT models outperform the multilingual BERT models on all tasks across all the datasets. Lastly, qualitative evaluations of the generated summaries and titles of the models are provided. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian.

Author: Baykara, Batuhan and Güngör, Tunga
Subjects: *TURKISH language, *INFORMATION needs, *TEXT summarization, *DEEP learning, *HUNGARIAN language
Abstract: Due to the exponential growth in the number of documents on the Web, accessing the salient information relevant to a user need is gaining importance, which increases the popularity of text summarization. Recent progress in deep learning shifted the research in text summarization from extractive methods towards more abstractive approaches. The research and the available resources remain mostly limited to the English language, which prevents progress in other languages. There is need in low-resourced languages for gathering large-scale resources suitable for such tasks. In this study, we release two large-scale datasets (TR-News and HU-News) that can serve as benchmarks in the abstractive summarization task for Turkish and Hungarian. The datasets are primarily compiled for text summarization, but are also suitable for other tasks such as topic classification, title generation, and key phrase extraction. Morphology is important for these agglutinative languages since meaning is carried mostly within the morphemes of the words. We utilize these morphological properties for tokenization to retain the semantic information and reduce the vocabulary sparsity introduced by the agglutinative nature of these languages. Using the datasets compiled, we propose linguistically-oriented tokenization methods (SeparateSuffix and CombinedSuffix) and evaluate them on the state-of-the-art abstractive summarization models. The SeparateSuffix method achieves the highest ROUGE-1 score on the TR-News dataset and provides promising results on the HU-News dataset. In another experiment, we show that the multilingual cased BERT model outperforms monolingual BERT models for both languages and reaches the highest ROUGE-1 score on the HU-News dataset. Lastly, we provide qualitative analysis of the generated summaries on the TR-News dataset. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

2 results on '"Baykara, Batuhan"'

1. Turkish abstractive text summarization using pretrained sequence-to-sequence models.

2. Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

2 results on '"Baykara, Batuhan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources