Author: "Abbas, Amro" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Abbas, Amro"' showing total 17 results

Start Over Author "Abbas, Amro"

17 results on '"Abbas, Amro"'

1. DataComp-LM: In search of the next generation of training sets for language models

Author: Li, Jeffrey, Fang, Alex, Smyrnis, Georgios, Ivgi, Maor, Jordan, Matt, Gadre, Samir, Bansal, Hritik, Guha, Etash, Keh, Sedrick, Arora, Kushal, Garg, Saurabh, Xin, Rui, Muennighoff, Niklas, Heckel, Reinhard, Mercat, Jean, Chen, Mayee, Gururangan, Suchin, Wortsman, Mitchell, Albalak, Alon, Bitton, Yonatan, Nezhurina, Marianna, Abbas, Amro, Hsieh, Cheng-Yu, Ghosh, Dhruba, Gardner, Josh, Kilian, Maciej, Zhang, Hanlin, Shao, Rulin, Pratt, Sarah, Sanyal, Sunny, Ilharco, Gabriel, Daras, Giannis, Marathe, Kalyani, Gokaslan, Aaron, Zhang, Jieyu, Chandu, Khyathi, Nguyen, Thao, Vasiljevic, Igor, Kakade, Sham, Song, Shuran, Sanghavi, Sujay, Faghri, Fartash, Oh, Sewoong, Zettlemoyer, Luke, Lo, Kyle, El-Nouby, Alaaeldin, Pouransari, Hadi, Toshev, Alexander, Wang, Stephanie, Groeneveld, Dirk, Soldaini, Luca, Koh, Pang Wei, Jitsev, Jenia, Kollar, Thomas, Dimakis, Alexandros G., Carmon, Yair, Dave, Achal, Schmidt, Ludwig, and Shankar, Vaishaal
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation., Comment: Project page: https://www.datacomp.ai/dclm/
Published: 2024

2. A comparison between humans and AI at recognizing objects in unusual poses

Author: Ollikka, Netta, Abbas, Amro, Perin, Andrea, Kilpeläinen, Markku, and Deny, Stéphane
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Deep learning is closing the gap with human vision on several object recognition benchmarks. Here we investigate this gap for challenging images where objects are seen in unusual poses. We find that humans excel at recognizing objects in such poses. In contrast, state-of-the-art deep networks for vision (EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext) and state-of-the-art large vision-language models (Claude 3.5, Gemini 1.5, GPT-4) are systematically brittle on unusual poses, with the exception of Gemini showing excellent robustness in that condition. As we limit image exposure time, human performance degrades to the level of deep networks, suggesting that additional mental processes (requiring additional time) are necessary to identify objects in unusual poses. An analysis of error patterns of humans vs. networks reveals that even time-limited humans are dissimilar to feed-forward deep networks. In conclusion, our comparison reveals that humans and deep networks rely on different mechanisms for recognizing objects in unusual poses. Understanding the nature of the mental processes taking place during extra viewing time may be key to reproduce the robustness of human vision in silico.
Published: 2024

3. Effective pruning of web-scale datasets based on complexity of concept clusters

Author: Abbas, Amro, Rusak, Evgenia, Tirumala, Kushal, Brendel, Wieland, Chaudhuri, Kamalika, and Morcos, Ari S.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Utilizing massive web-scale datasets has led to unprecedented performance gains in machine learning models, but also imposes outlandish compute requirements for their training. In order to improve training and data efficiency, we here push the limits of pruning large-scale multimodal datasets for training CLIP-style models. Today's most effective pruning method on ImageNet clusters data samples into separate concepts according to their embedding and prunes away the most prototypical samples. We scale this approach to LAION and improve it by noting that the pruning rate should be concept-specific and adapted to the complexity of the concept. Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training. By filtering from the LAION dataset, we find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs. More specifically, we are able to outperform the LAION-trained OpenCLIP-ViT-B32 model on ImageNet zero-shot accuracy by 1.1p.p. while only using 27.7% of the data and training compute. Despite a strong reduction in training cost, we also see improvements on ImageNet dist. shifts, retrieval tasks and VTAB. On the DataComp Medium benchmark, we achieve a new state-of-the-art Imagehttps://info.arxiv.org/help/prep#commentsNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks., Comment: Accepted at ICLR 2024, code available at https://github.com/amro-kamal/effective_pruning
Published: 2024

4. Sieve: Multimodal Dataset Pruning Using Image Captioning Models

Author: Mahmoud, Anas, Elhoushi, Mostafa, Abbas, Amro, Yang, Yu, Ardalani, Newsha, Leather, Hugh, and Morcos, Ari
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Vision-Language Models (VLMs) are pretrained on large, diverse, and noisy web-crawled datasets. This underscores the critical need for dataset pruning, as the quality of these datasets is strongly correlated with the performance of VLMs on downstream tasks. Using CLIPScore from a pretrained model to only train models using highly-aligned samples is one of the most successful methods for pruning. We argue that this approach suffers from multiple limitations including: false positives and negatives due to CLIP's pretraining on noisy labels. We propose a pruning signal, Sieve, that employs synthetic captions generated by image-captioning models pretrained on small, diverse, and well-aligned image-text pairs to evaluate the alignment of noisy image-text pairs. To bridge the gap between the limited diversity of generated captions and the high diversity of alternative text (alt-text), we estimate the semantic textual similarity in the embedding space of a language model pretrained on unlabeled text corpus. Using DataComp, a multimodal dataset filtering benchmark, when evaluating on 38 downstream tasks, our pruning approach, surpasses CLIPScore by 2.6\% and 1.7\% on medium and large scale respectively. In addition, on retrieval tasks, Sieve leads to a significant improvement of 2.7% and 4.5% on medium and large scale respectively., Comment: Accepted in CVPR 2024
Published: 2023

5. SemDeDup: Data-efficient learning at web-scale through semantic deduplication

Author: Abbas, Amro, Tirumala, Kushal, Simig, Dániel, Ganguli, Surya, and Morcos, Ari S.
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Progress in machine learning has been driven in large part by massive increases in data. However, large web-scale datasets such as LAION are largely uncurated beyond searches for exact duplicates, potentially leaving much redundancy. Here, we introduce SemDeDup, a method which leverages embeddings from pre-trained models to identify and remove semantic duplicates: data pairs which are semantically similar, but not exactly identical. Removing semantic duplicates preserves performance and speeds up learning. Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time. Moreover, performance increases out of distribution. Also, analyzing language models trained on C4, a partially curated dataset, we show that SemDeDup improves over prior approaches while providing efficiency gains. SemDeDup provides an example of how simple ways of leveraging quality embeddings can be used to make models learn faster with less data.
Published: 2023

6. Progress and limitations of deep networks to recognize objects in unusual poses

Author: Abbas, Amro and Deny, Stéphane
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep networks should be robust to rare events if they are to be successfully deployed in high-stakes real-world applications (e.g., self-driving cars). Here we study the capability of deep networks to recognize objects in unusual poses. We create a synthetic dataset of images of objects in unusual orientations, and evaluate the robustness of a collection of 38 recent and competitive deep networks for image classification. We show that classifying these images is still a challenge for all networks tested, with an average accuracy drop of 29.5% compared to when the objects are presented upright. This brittleness is largely unaffected by various network design choices, such as training losses (e.g., supervised vs. self-supervised), architectures (e.g., convolutional networks vs. transformers), dataset modalities (e.g., images vs. image-text pairs), and data-augmentation schemes. However, networks trained on very large datasets substantially outperform others, with the best network tested$\unicode{x2014}$Noisy Student EfficentNet-L2 trained on JFT-300M$\unicode{x2014}$showing a relatively small accuracy drop of only 14.5% on unusual poses. Nevertheless, a visual inspection of the failures of Noisy Student reveals a remaining gap in robustness with the human visual system. Furthermore, combining multiple object transformations$\unicode{x2014}$3D-rotations and scaling$\unicode{x2014}$further degrades the performance of all networks. Altogether, our results provide another measurement of the robustness of deep networks that is important to consider when using them in the real world. Code and datasets are available at https://github.com/amro-kamal/ObjectPose.
Published: 2022

7. Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time

Author: Ollikka, Netta, Abbas, Amro, Perin, Andrea, Kilpeläinen, Markku, Deny, Stéphane, Ollikka, Netta, Abbas, Amro, Perin, Andrea, Kilpeläinen, Markku, and Deny, Stéphane
Abstract: Deep learning is closing the gap with humans on several object recognition benchmarks. Here we investigate this gap in the context of challenging images where objects are seen from unusual viewpoints. We find that humans excel at recognizing objects in unusual poses, in contrast with state-of-the-art pretrained networks (EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext) which are systematically brittle in this condition. Remarkably, as we limit image exposure time, human performance degrades to the level of deep networks, suggesting that additional mental processes (requiring additional time) take place when humans identify objects in unusual poses. Finally, our analysis of error patterns of humans vs. networks reveals that even time-limited humans are dissimilar to feed-forward deep networks. We conclude that more work is needed to bring computer vision systems to the level of robustness of the human visual system. Understanding the nature of the mental processes taking place during extra viewing time may be key to attain such robustness.
Published: 2024

8. Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time

Author: Ollikka, Netta, primary, Abbas, Amro, additional, Perin, Andrea, additional, Kilpeläinen, Markku, additional, and Deny, Stephane, additional
Published: 2024
Full Text: View/download PDF

9. Progress and Limitations of Deep Networks to Recognize Objects in Unusual Poses

Author: Abbas, Amro, primary and Deny, Stéphane, additional
Published: 2023
Full Text: View/download PDF

10. Pathogenesis of disease associated with chronic hepatitis C virus infection

Author: Abbas, Amro M.
Subjects: 610, Medicine
Published: 2001

11. Unique and common TCR repertoire features of Ni2+‐, Co2+‐, and Pd2+‐specific human CD154 + CD4+ T cells

Author: Riedel, Franziska, primary, Aparicio‐Soto, Marina, additional, Curato, Caterina, additional, Münch, Lucas, additional, Abbas, Amro, additional, Thierse, Hermann‐Josef, additional, Peitsch, Wiebke K., additional, Luch, Andreas, additional, and Siewert, Katherina, additional
Published: 2022
Full Text: View/download PDF

12. Frequencies and TCR Repertoires of Human 2,4,6-Trinitrobenzenesulfonic Acid-specific T Cells

Author: Curato, Caterina, primary, Aparicio-Soto, Marina, additional, Riedel, Franziska, additional, Wehl, Ingrun, additional, Basaran, Alev, additional, Abbas, Amro, additional, Thierse, Hermann-Josef, additional, Luch, Andreas, additional, and Siewert, Katherina, additional
Published: 2022
Full Text: View/download PDF

13. Unique and common TCR repertoire features of Ni2+‐, Co2+‐, and Pd2+‐specific human CD154 + CD4+ T cells.

Author: Riedel, Franziska, Aparicio‐Soto, Marina, Curato, Caterina, Münch, Lucas, Abbas, Amro, Thierse, Hermann‐Josef, Peitsch, Wiebke K., Luch, Andreas, and Siewert, Katherina
Subjects: T cells, MONONUCLEAR leukocytes, T cell receptors, CD4 antigen, IMMUNOLOGIC memory
Abstract: Background: Apart from Ni2+, Co2+, and Pd2+ ions commonly trigger T cell‐mediated allergic contact dermatitis. However, in vitro frequencies of metal‐specific T cells and the mechanisms of antigen recognition remain unclear. Methods: Here, we utilized a CD154 upregulation assay to quantify Ni2+‐, Co2+‐, and Pd2+‐specific CD4+ T cells in peripheral blood mononuclear cells (PBMC). Involved αβ T cell receptor (TCR) repertoires were analyzed by high‐throughput sequencing. Results: Peripheral blood mononuclear cells incubation with NiSO4, CoCl2, and PdCl2 increased frequencies of CD154 + CD4+ memory T cells that peaked at ~400 μM. Activation was TCR‐mediated as shown by the metal‐specific restimulation of T cell clones. Most abundant were Pd2+‐specific T cells (mean 3.5%, n = 19), followed by Co2+‐ and Ni2+‐specific cells (0.6%, n = 18 and 0.3%, n = 20) in both allergic and non‐allergic individuals. A strong overrepresentation of the gene segment TRAV9‐2 was unique for Ni2+‐specific TCR (28% of TCR) while Co2+ and Pd2+‐specific TCR favorably expressed TRAV2 (8%) and the TRBV4 gene segment family (21%), respectively. As a second, independent mechanism of metal ion recognition, all analyzed metal‐specific TCR showed a common overrepresentation of a histidine in the complementarity determining region 3 (CDR3; 15% of α‐chains, 34% of β‐chains). The positions of the CDR3 histidine among metal‐specific TCR mirrored those in random repertoires and were conserved among cross‐reactive clonotypes. Conclusions: Induced CD154 expression allows a fast and comprehensive detection of Ni2+‐, Co2+‐, and Pd2+‐specific CD4+ T cells. Distinct TCR repertoire features underlie the frequent activation and cross‐reactivity of human metal‐specific T cells. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

14. Continuous cultivation of human hamstring tenocytes on microcarriers in a spinner flask bioreactor system

Author: Stich, Stefan, Ibold, Yvonne, Abbas, Amro, Ullah, Mujib, Sittinger, Michael, Ringe, Jochen, Schulze-Tanzil, Gundula, Müller, Christiane, Kohl, Benjamin, and John, Thilo
Published: 2014
Full Text: View/download PDF

15. Ethnicity affects the diagnostic validity of alpha-fetoprotein in hepatocellular carcinoma

Author: GAD, Amal, TANAKA, Eiji, MATSUMOTO, Akihiro, EL-HAMID SERWAH, Abd, ATTIA, Fawzy, HASSAN, Adel, SANNY, Ahmed, ALI, Khalil, ABBAS, Amro, EL-RAOOF EL-DEEB, Abd, SUN, Xiao Hong, UMEMURA, Takeji, ICHIJO, Tetsuya, EHARA, Takashi, YOSHIZAWA, Kaname, and KIYOSAWA, Kendo
Published: 2005

16. Continuous cultivation of human hamstring tenocytes on microcarriers in a spinner flask bioreactor system

Author: Stich, Stefan, primary, Ibold, Yvonne, additional, Abbas, Amro, additional, Ullah, Mujib, additional, Sittinger, Michael, additional, Ringe, Jochen, additional, Schulze-Tanzil, Gundula, additional, Müller, Christiane, additional, Kohl, Benjamin, additional, and John, Thilo, additional
Published: 2013
Full Text: View/download PDF

17. Unique and common TCR repertoire features of Ni 2+ -, Co 2+ -, and Pd 2+ -specific human CD154 + CD4+ T cells.

Author: Riedel F, Aparicio-Soto M, Curato C, Münch L, Abbas A, Thierse HJ, Peitsch WK, Luch A, and Siewert K
Subjects: Humans, Leukocytes, Mononuclear metabolism, Histidine metabolism, Complementarity Determining Regions genetics, Complementarity Determining Regions metabolism, CD4-Positive T-Lymphocytes, Receptors, Antigen, T-Cell, alpha-beta genetics, Receptors, Antigen, T-Cell, alpha-beta metabolism
Abstract: Background: Apart from Ni 2+ , Co 2+ , and Pd 2+ ions commonly trigger T cell-mediated allergic contact dermatitis. However, in vitro frequencies of metal-specific T cells and the mechanisms of antigen recognition remain unclear., Methods: Here, we utilized a CD154 upregulation assay to quantify Ni 2+ -, Co 2+ -, and Pd 2+ -specific CD4+ T cells in peripheral blood mononuclear cells (PBMC). Involved αβ T cell receptor (TCR) repertoires were analyzed by high-throughput sequencing., Results: Peripheral blood mononuclear cells incubation with NiSO 4 , CoCl 2 , and PdCl 2 increased frequencies of CD154 + CD4+ memory T cells that peaked at ~400 μM. Activation was TCR-mediated as shown by the metal-specific restimulation of T cell clones. Most abundant were Pd 2+ -specific T cells (mean 3.5%, n = 19), followed by Co 2+ - and Ni 2+ -specific cells (0.6%, n = 18 and 0.3%, n = 20) in both allergic and non-allergic individuals. A strong overrepresentation of the gene segment TRAV9-2 was unique for Ni 2+ -specific TCR (28% of TCR) while Co 2+ and Pd 2+ -specific TCR favorably expressed TRAV2 (8%) and the TRBV4 gene segment family (21%), respectively. As a second, independent mechanism of metal ion recognition, all analyzed metal-specific TCR showed a common overrepresentation of a histidine in the complementarity determining region 3 (CDR3; 15% of α-chains, 34% of β-chains). The positions of the CDR3 histidine among metal-specific TCR mirrored those in random repertoires and were conserved among cross-reactive clonotypes., Conclusions: Induced CD154 expression allows a fast and comprehensive detection of Ni 2+ -, Co 2+ -, and Pd 2+ -specific CD4+ T cells. Distinct TCR repertoire features underlie the frequent activation and cross-reactivity of human metal-specific T cells., (© 2022 The Authors. Allergy published by European Academy of Allergy and Clinical Immunology and John Wiley & Sons Ltd.)
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

17 results on '"Abbas, Amro"'

1. DataComp-LM: In search of the next generation of training sets for language models

2. A comparison between humans and AI at recognizing objects in unusual poses

3. Effective pruning of web-scale datasets based on complexity of concept clusters

4. Sieve: Multimodal Dataset Pruning Using Image Captioning Models

5. SemDeDup: Data-efficient learning at web-scale through semantic deduplication

6. Progress and limitations of deep networks to recognize objects in unusual poses

7. Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time

8. Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time

9. Progress and Limitations of Deep Networks to Recognize Objects in Unusual Poses

10. Pathogenesis of disease associated with chronic hepatitis C virus infection

11. Unique and common TCR repertoire features of Ni2+‐, Co2+‐, and Pd2+‐specific human CD154 + CD4+ T cells

12. Frequencies and TCR Repertoires of Human 2,4,6-Trinitrobenzenesulfonic Acid-specific T Cells

13. Unique and common TCR repertoire features of Ni2+‐, Co2+‐, and Pd2+‐specific human CD154 + CD4+ T cells.

14. Continuous cultivation of human hamstring tenocytes on microcarriers in a spinner flask bioreactor system

15. Ethnicity affects the diagnostic validity of alpha-fetoprotein in hepatocellular carcinoma

16. Continuous cultivation of human hamstring tenocytes on microcarriers in a spinner flask bioreactor system

17. Unique and common TCR repertoire features of Ni 2+ -, Co 2+ -, and Pd 2+ -specific human CD154 + CD4+ T cells.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

17 results on '"Abbas, Amro"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources