7 results on '"Fuad Rahman"'
Search Results
2. A Large Multi-target Dataset of Common Bengali Handwritten Graphemes
- Author
-
Ahmed Imtiaz Humayun, Tahsin Reasat, Asif Shahriyar Sushmit, Fuad Rahman, Sadi Mohammad Siddiquee, Mahady Hasan, and Samiul Alam
- Subjects
business.industry ,Computer science ,Deep learning ,Grapheme ,Contrast (statistics) ,Context (language use) ,Optical character recognition ,Word formation ,computer.software_genre ,language.human_language ,Bengali ,language ,Segmentation ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Latin has historically led the state-of-the-art in handwritten optical character recognition (OCR) research. Adapting existing systems from Latin to alpha-syllabary languages is particularly challenging due to a sharp contrast between their orthographies. Due to a cursive writing system and frequent use of diacritics, the segmentation and/or alignment of graphical constituents with corresponding characters becomes significantly convoluted. We propose a labeling scheme based on graphemes (linguistic segments of word formation) that makes segmentation inside alpha-syllabary words linear and present the first dataset of Bengali handwritten graphemes that are commonly used in everyday context. The dataset contains 411k curated samples of \( 1295 \) unique commonly used Bengali graphemes. Additionally, the test set contains \(900 \) uncommon Bengali graphemes for out of dictionary performance evaluation. The dataset is open-sourced as a part of a public Handwritten Grapheme Classification Challenge on Kaggle to benchmark vision algorithms for multi-target grapheme classification. The unique graphemes present in this dataset are selected based on commonality in the Google Bengali ASR corpus. From competition proceedings, we see that deep learning methods can generalize to a large span of out of dictionary graphemes which are absent during training (Kaggle Competition kaggle.com/c/bengaliai-cv19, Supplementary materials and Appendix https://github.com/AhmedImtiazPrio/ICDAR2021supplementary).
- Published
- 2021
- Full Text
- View/download PDF
3. Establishing a Formal Benchmarking Process for Sentiment Analysis for the Bangla Language
- Author
-
Aminul Islam, Fuad Rahman, and Akm Shahariar Azad Rabby
- Subjects
0303 health sciences ,Computer science ,Process (engineering) ,business.industry ,020209 energy ,Sentiment analysis ,02 engineering and technology ,Benchmarking ,Plan (drawing) ,computer.software_genre ,language.human_language ,Task (project management) ,03 medical and health sciences ,Annotation ,Bengali ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,language ,Artificial intelligence ,business ,computer ,Natural language processing ,030304 developmental biology - Abstract
Tracking sentiments is a critical task in many natural language processing applications. A lot of work has been done on many leading languages in the world, such as English. However, in many languages such as Bangla, sentiment analysis is still in early development. Most of the research on this topic suffers from three key issues: (a) the lack of standardized publicly available datasets, (b) the subjectivity of the reported results, which generally manifests as a lack of agreement on core sentiment categorizations, and finally, (c) the lack of an established framework where these efforts can be compared to a formal benchmark. Thus, this seems to be an opportune moment to establish a benchmark for sentiment analysis in Bangla. With that goal in mind, this paper presents benchmark results of ten different sentiment analysis solutions on three publicly available Bangla sentiment analysis corpora. As part of the benchmarking process, we have optimized these algorithms for the task at hand. Finally, we establish and present sixteen different evaluation matrices for benchmarking these algorithms. We hope that this paper will jumpstart an open and transparent benchmarking process, one that we plan to update every two years, to help validating newer and novel algorithms that will be reported in this area in future.
- Published
- 2020
- Full Text
- View/download PDF
4. Okkhor: A Synthetic Corpus of Bangla Printed Characters
- Author
-
Jebun Nahar, Nazmul Hasan, Fuad Rahman, Mridul Banik, and Jamiur Rahman Rifat
- Subjects
0303 health sciences ,Alphanumeric ,Computer science ,business.industry ,First language ,Optical character recognition ,computer.software_genre ,ASCII ,Unicode ,language.human_language ,03 medical and health sciences ,0302 clinical medicine ,Bengali ,Vowel ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,language ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery ,Digitization ,Natural language processing ,030304 developmental biology - Abstract
Bangla is the fifth most-spoken native language in the world. Despite having such a large number of speakers, the resources related to development of language processing solutions are very limited. To realize the full potential of Machine Learning (ML) and Artificial Intelligence (AI) solutions for computer vision and Natural Language Processing (NLP), a complete and standardized fully-annotated corpus is an essential prerequisite. Specifically, development of Optical Character Recognition systems (OCRs) for printed characters, an important resource for language automatic and digitization, requires a large corpus with high coverage and variability of fonts, representing the nuances of the language usage, which does not exist for Bangla. In this paper, we present a novel synthetic corpus of over 5 million printed Bangla characters containing 60 alphanumeric characters, 10 vowel modifiers, 159 compound characters, which corresponds to 229 different classes of both Unicode and ASCII encodings. This is entirely novel work, since there exists no such corpus currently for the Bangla language.
- Published
- 2020
- Full Text
- View/download PDF
5. Bangla Part of Speech Tagging Using Contextual Embeddings and Oversampling Techniques
- Author
-
Akm Shahariar Azad Rabby, Jebun Nahar, K. M. Faizullah Fuhad, Nazmul Hasan, Nabeel Mohammed, Hasan, Fuad Rahman, and Koushik Roy
- Subjects
Recurrent neural network ,Bengali ,Artificial neural network ,Computer science ,Part-of-speech tagging ,Speech recognition ,language ,Oversampling ,Leverage (statistics) ,Part of speech ,language.human_language ,Sequential modeling - Abstract
Part of Speech (PoS) Tagging has been a customary research area in the field of Natural Language Processing. The popularization of Neural Networks has opened substantially more scope of research for Bangla PoS Tagging especially with the class of sequential models particularly using Recurrent Neural Networks like Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU). Our contribution in this paper is that we transformed the overall sequential modeling problem to an inconsequent model using BERT embeddings to leverage the existing well understood oversampling algorithms for improving PoS Tagging using a shallow feed-forward Neural Network. Our experiment results indicate that Synthetic Minority Over-sampling Technique (SMOTE) works well as an oversampling algorithm for BERT embeddings.
- Published
- 2020
- Full Text
- View/download PDF
6. Borno: Bangla Handwritten Character Recognition Using a Multiclass Convolutional Neural Network
- Author
-
Jebun Nahar, Fuad Rahman, Md. Majedul Islam, Nazmul Hasan, and Akm Shahariar Azad Rabby
- Subjects
Computer science ,business.industry ,020209 energy ,Speech recognition ,Deep learning ,Grapheme ,02 engineering and technology ,Optical character recognition ,computer.software_genre ,Convolutional neural network ,language.human_language ,Numeral system ,Bengali ,Handwriting recognition ,Pattern recognition (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
Handwriting recognition is still not a solved problem. With the advancements in artificial intelligence and machine learning, the construction of Optical Character Recognition systems (OCRs) has become more effective. However, there is still no serious commercially available OCRs for many low-resource languages, such as Bangla. Bangla presents additional challenges, since oftentimes, the vowels and consonants in the middle of the words are abbreviated and replaced with notations called diacritics, and multiple letters can be combined to build shorthand representations, called compound characters. Furthermore, the compound characters can have diacritics as well, making the recognition task extremely complex. This means that a successful commercial OCR should not only model individual characters but also model these diacritics and combined characters, leading us to propose a grapheme-based holistic recognition approach. Borno is the first multiclass convolutional neural network-based deep learning model that can recognize Bangla handwritten characters with graphemes. The proposed model has been trained on a dataset of 1,069,132 images, with 50 basic characters, 10 numerals, 146 compound characters, 10 modifiers, and 6 consonant diacritics classes. The trained Borno model achieves a 92.61% average character recognition accuracy in the validation set.
- Published
- 2020
- Full Text
- View/download PDF
7. Federated Learning Approach to Support Biopharma and Healthcare Collaboration to Accelerate Crisis Response
- Author
-
Abrar Rahman, Fuad Rahman, and Arijit Mitra
- Subjects
0303 health sciences ,Coronavirus disease 2019 (COVID-19) ,Status quo ,Computer science ,business.industry ,Process (engineering) ,media_common.quotation_subject ,Principal (computer security) ,Crisis response ,Unstructured data ,02 engineering and technology ,Data science ,Federated learning ,03 medical and health sciences ,Health care ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,business ,030304 developmental biology ,media_common - Abstract
During a pandemic, such as COVID-19, the scientific community must optimize collaboration, as part of the race against time to identify and repurpose existing treatments. Today, Artificial Intelligence (AI) offers us a significant opportunity to generate insights and provide predictive models that could substantially improve the opportunities for understanding the core metrics that characterize the epidemic. A principal barrier for effective AI models in a collaborative environment, especially in the medical and pharmaceutical industries, is dealing with datasets that are distributed across multiple organizations, as traditional AI models rely on the datasets being in one location. In the status quo, organizations must slog through a costly and time-consuming process of extract-transform-loading to build a dataset in a singular location. This paper addresses how Federated Learning may be applied to facilitate flexible AI models that have been trained on biopharma and clinical unstructured data, with a special focus on extracting actionable intelligence from existing research and communications via Natural Language Processing (NLP).
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.