Start Over

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

Authors :: Coleman, Benjamin
Kang, Wang-Cheng
Fahrbach, Matthew
Wang, Ruoxi
Hong, Lichan
Chi, Ed H.
Cheng, Derek Zhiyuan
Publication Year :: 2023
Abstract: Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely high-cardinality features. This bottleneck has led to substantial progress in alternative embedding algorithms. Many of these methods, however, make the assumption that each feature uses an independent embedding table. This work introduces a simple yet highly effective framework, Feature Multiplexing, where one single representation space is used across many different categorical features. Our theoretical and empirical analysis reveals that multiplexed embeddings can be decomposed into components from each constituent feature, allowing models to distinguish between features. We show that multiplexed representations lead to Pareto-optimal parameter-accuracy tradeoffs for three public benchmark datasets. Further, we propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. Unified embedding gives significant improvements in offline and online metrics compared to highly competitive baselines across five web-scale search, ads, and recommender systems, where it serves billions of users across the world in industry-leading products.

Subjects :: FOS: Computer and information sciences
Computer Science - Machine Learning
Information Retrieval (cs.IR)
Machine Learning (cs.LG)
Computer Science - Information Retrieval

Details

Language :: English
Database :: OpenAIRE
Accession number :: edsair.doi.dedup.....32b5156e3cc2b44096e602c9834d67d8

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources