53 results on '"Joshua J. Yi"'
Search Results
2. Does Academic Research Drive Industrial Innovation in Computer Architecture?—Analyzing Citations to Academic Papers in Patents
- Author
-
Joshua J. Yi
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2023
3. Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part V: References
- Author
-
Joshua J. Yi
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2022
4. Analysis of Historical Patenting Behavior and Patent Characteristics of Computer Architecture Companies—Part IV: Claims
- Author
-
Joshua J. Yi
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2022
5. Review of Patents Issued to Computer Architecture Companies in 2021—Part II
- Author
-
Joshua J. Yi
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2022
6. Review of Patents Issued to Computer Architecture Companies in 2021 [Micro Law]
- Author
-
Joshua J. Yi
- Subjects
Hardware and Architecture ,Electrical and Electronic Engineering ,Software - Published
- 2022
7. Microarchitecture Patents Over Time and Interesting Early Microarchitecture Patents
- Author
-
Joshua J. Yi
- Subjects
Computer architecture ,Hardware and Architecture ,Computer science ,Electrical and Electronic Engineering ,Software ,Microarchitecture - Published
- 2021
8. Adaptive simulation sampling using an Autoregressive framework.
- Author
-
Sharookh Daruwalla, Resit Sendag, and Joshua J. Yi
- Published
- 2009
- Full Text
- View/download PDF
9. Low power/area branch prediction using complementary branch predictors.
- Author
-
Resit Sendag, Joshua J. Yi, Peng-fei Chuang, and David J. Lilja
- Published
- 2008
- Full Text
- View/download PDF
10. Recent Patents for Leading Computer Architecture Companies
- Author
-
Joshua J. Yi
- Subjects
ComputingMilieux_GENERAL ,Subcategory ,Computer architecture ,Hardware and Architecture ,Computer science ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Electrical and Electronic Engineering ,International Patent Classification ,Quarter (United States coin) ,Software ,Digital data processing - Abstract
Reports on recent patents for leading computer architecture companies. One way to reveal what technologies or areas companies are investing their research and development resources is to analyze their recently issued patents. Toward that end, this article examines patents issued to the companies that have the most issued patents in the “Electric digital data processing” subcategory in the first quarter of 2021. This subcategory appears to be the subcategory in the International Patent Classification system that is closest to computer architecture.
- Published
- 2021
11. Evaluating Benchmark Subsetting Approaches.
- Author
-
Joshua J. Yi, Resit Sendag, Lieven Eeckhout, Ajay Joshi, David J. Lilja, and Lizy Kurian John
- Published
- 2006
- Full Text
- View/download PDF
12. The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools.
- Author
-
Joshua J. Yi, Hans Vandierendonck, Lieven Eeckhout, and David J. Lilja
- Published
- 2006
- Full Text
- View/download PDF
13. Evaluating the efficacy of statistical simulation for design space exploration.
- Author
-
Ajay Joshi, Joshua J. Yi, Robert H. Bell Jr., Lieven Eeckhout, Lizy Kurian John, and David J. Lilja
- Published
- 2006
- Full Text
- View/download PDF
14. Characterizing and Comparing Prevailing Simulation Techniques.
- Author
-
Joshua J. Yi, Sreekumar V. Kodakara, Resit Sendag, David J. Lilja, and Douglas M. Hawkins
- Published
- 2005
- Full Text
- View/download PDF
15. A Statistically Rigorous Approach for Improving Simulation Methodology.
- Author
-
Joshua J. Yi, David J. Lilja, and Douglas M. Hawkins
- Published
- 2003
- Full Text
- View/download PDF
16. Increasing Instruction-Level Parallelism with Instruction Precomputation (Research Note).
- Author
-
Joshua J. Yi, Resit Sendag, and David J. Lilja
- Published
- 2002
- Full Text
- View/download PDF
17. Informed Prefetching for Indirect Memory Accesses
- Author
-
Mustafa Cavus, Resit Sendag, and Joshua J. Yi
- Subjects
010302 applied physics ,Hardware_MEMORYSTRUCTURES ,Speedup ,Computer science ,business.industry ,02 engineering and technology ,Parallel computing ,computer.software_genre ,Data structure ,01 natural sciences ,020202 computer hardware & architecture ,Metadata ,Tree traversal ,Software ,Hardware and Architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Compiler ,Programmer ,business ,computer ,Information Systems - Abstract
Indirect memory accesses have irregular access patterns that limit the performance of conventional software and hardware-based prefetchers. To address this problem, we propose the Array Tracking Prefetcher (ATP), which tracks array-based indirect memory accesses using a novel combination of software and hardware. ATP is first configured by special metadata instructions, which are inserted by programmer or compiler to pass data structure traversal knowledge. It then calculates and issues prefetches based on this information. ATP also employs a novel mechanism for dynamically adjusting prefetching distance to reduce early or late prefetches. ATP yields average speedup of 2.17 as compared to a single-core without prefetching. By contrast, the speedup for conventional software and hardware-based prefetching is 1.84 and 1.32, respectively. For four cores, the average speedup for ATP is 1.85, while the corresponding speedups for software and hardware-based prefetching are 1.60 and 1.25, respectively.
- Published
- 2020
18. Quantifying and reducing the effects of wrong-path memory references in cache-coherent multiprocessor systems.
- Author
-
Resit Sendag, Ayse Yilmazer, Joshua J. Yi, and Augustus K. Uht
- Published
- 2006
- Full Text
- View/download PDF
19. Three-Stage Optimization Model to Inform Risk-Averse Investment in Power System Resilience to Winter Storms
- Author
-
Brent G. Austgen, Manuel Garcia, Joshua J. Yip, Bryan Arguello, Brian J. Pierre, Erhan Kutanoglu, John J. Hasenbein, and Surya Santoso
- Subjects
Battery energy storage ,contingency ,generator winterization ,optimization ,power grid ,resilience ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
We propose a three-stage stochastic programming model to inform risk-averse investment in power system resilience to winter storms. The first stage pertains to long-term investment in generator winterization and mobile battery energy storage system (MBESS) resources, the second stage to MBESS deployment prior to an imminent storm, and the third stage to operational response. Serving as a forecast update, an imminent winter storm’s severity is assumed to be known at the time the deployment decisions are made. We incorporate conditional value-at-risk (CVaR) as the risk measure in the objective function to target loss, represented in our model by unserved energy, experienced during high-impact, low-frequency events. We apply the model to a Texas-focused case study based on the ACTIVS 2000-bus synthetic grid with winter storm scenarios generated using historical Winter Storm Uri data. Results demonstrate how the optimal investments are affected by parameters like cost and risk aversion, and also how effectively using CVaR as a risk measure mitigates the outcomes in the tail of the loss distribution over the winter storm impact uncertainty.
- Published
- 2024
- Full Text
- View/download PDF
20. Optimal Application of Mobile Substation Resources for Transmission System Restoration Under Flood Events
- Author
-
Joshua J. Yip, Vinicius C. Cunha, Brent G. Austgen, Surya Santoso, Erhan Kutanoglu, and John J. Hasenbein
- Subjects
Floods ,power outages ,power system restoration ,power transmission ,resilience ,resource management ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This article studies the Transmission Restoration Problem with Mobile Substation Resources, a novel mixed-integer linear programming model that prescribes the most effective usage of mobile-substation resources to enhance the resilience of a power transmission system against a particular, widespread flood event. The model is a two-stage stochastic program in which each scenario captures a different potential progression of flood heights at substations over the event horizon. The first stage concerns the pre-event selection and positioning of mobile-substation resources. The second stage concerns the coordination of mobile-substation resource deployment and permanent-substation restoration to maintain and recover service within the horizon. Experiments in the IEEE 24-Bus System and a synthetic Houston grid confirm the efficacy of the model. Even when isolated from effects related to restoration of permanent substations, the effect of four mobile transformers and eight mobile breakers for a realistic set of flood scenarios in the synthetic Houston grid was found to be an average total-cost reduction of approximately ${\$}$ 35MM (i.e., approximately 8% of a default optimal objective value). Additionally, a novel, parallel heuristic is designed that can efficiently solve the problem as well as, with minor modifications, similar stochastic problems on pre-selection of mobile resources or placement of static ones. For a 40-scenario model instance in the IEEE 24-Bus System, the extensive form was not able to find an integer-feasible solution in six hours, yet the heuristic achieved an optimality gap no worse than 4.5% in two hours.
- Published
- 2024
- Full Text
- View/download PDF
21. Improving computer architecture simulation methodology by adding statistical rigor
- Author
-
Joshua J. Yi, Lilja, David J., and Hawkins, Douglas M.
- Subjects
Performance improvement ,Computer architecture -- Methods ,Simulation methods -- Usage - Published
- 2005
22. Improving Processor Performance by Simplifying and Bypassing Trivial Computations.
- Author
-
Joshua J. Yi and David J. Lilja
- Published
- 2002
- Full Text
- View/download PDF
23. Impact of Future Technologies on Architecture
- Author
-
Trevor Mudge, Igor L. Markov, Joshua J. Yi, Derek Chiou, Resit Sendag, and Frederic T. Chong
- Subjects
Theoretical computer science ,Computer science ,020207 software engineering ,02 engineering and technology ,020202 computer hardware & architecture ,Engineering management ,Hardware and Architecture ,Applications architecture ,0202 electrical engineering, electronic engineering, information engineering ,Session (computer science) ,Reference architecture ,Electrical and Electronic Engineering ,Architecture ,Space-based architecture ,Unconventional computing ,Software - Abstract
This article presents position statements and a question-and-answer session by panelists at the 4th Workshop on Computer Architecture Research Directions. The subject of the debate was new technologies and their impact on future architectures.
- Published
- 2016
24. Proprietary versus Open Instruction Sets
- Author
-
David A. Patterson, Mark D. Hill, Joshua J. Yi, Dave Christie, Derek Chiou, and Resit Sendag
- Subjects
Enterprise architecture framework ,Multimedia ,Computer science ,business.industry ,computer.software_genre ,Instruction set ,Software ,Hardware and Architecture ,Operating system ,Data architecture ,Session (computer science) ,Reference architecture ,Electrical and Electronic Engineering ,business ,computer ,Software architecture description - Abstract
This article presents position statements and a question-and-answer session by panelists at the 4th Workshop on Computer Architecture Research Directions. The subject of the debate was proprietary versus free and open instruction set architectures.
- Published
- 2016
25. Array Tracking Prefetcher for Indirect Accesses
- Author
-
Joshua J. Yi, Mustafa Cavus, and Resit Sendag
- Subjects
010302 applied physics ,Speedup ,Computer science ,business.industry ,Locality ,Contrast (statistics) ,02 engineering and technology ,Parallel computing ,Tracking (particle physics) ,01 natural sciences ,020202 computer hardware & architecture ,Software ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Cache ,business ,Baseline (configuration management) - Abstract
Indirect memory accesses have irregular access patterns and concomitantly poor spatial locality. To address this problem, we propose the Array Tracking Prefetcher which tracks array-based indirect memory accesses using a novel combination of software and hardware. Our results show that ATP yields average speedup of 1.60 over the baseline single-core without prefetching. By contrast, the speedup for conventional software and hardware-based prefetching, is 1.49 and 1.16, respectively. For four-cores, the average speedups for ATP, software, and hardware are 1.49, 1.38, and 1.11, respectively.
- Published
- 2018
26. The Future of Architectural Simulation
- Author
-
Joshua J. Yi, Doug Burger, Derek Chiou, Joel Emer, James C. Hoe, and Resit Sendag
- Subjects
Computer simulation ,Computer architecture simulator ,Computer architecture ,Hardware and Architecture ,Computer science ,Subject (documents) ,Electrical and Electronic Engineering ,computer.software_genre ,computer ,Software ,Simulation software ,Microarchitecture - Abstract
Simulation is an indispensable tool for evaluation and analysis throughout the development cycle of a computer system, and even after the computer system is built. How simulation should evolve as the complexity of computer systems continues to grow is an open question and the subject of this panel from the 2009 Workshop on Computer Architecture Research Directions.
- Published
- 2010
27. Programming Multicores: Do Applications Programmers Need to Write Explicitly Parallel Programs?
- Author
-
Joshua J. Yi, Derek Chiou, David I. August, Resit Sendag, Arvind, and Keshav Pingali
- Subjects
Multi-core processor ,Computer science ,Programming language ,computer.software_genre ,Inductive programming ,Hardware and Architecture ,Programming paradigm ,Reactive programming ,Explicit parallelism ,State (computer science) ,Electrical and Electronic Engineering ,Implicit parallelism ,computer ,Software - Abstract
In this panel discussion from the 2009 Workshop on Computer Architecture Research Directions, David August and Keshav Pingali debate whether explicitly parallel programming is a necessary evil for applications programmers, assess the current state of parallel programming models, and discuss possible routes toward finding the programming model for the multicore era.
- Published
- 2010
28. The impact of wrong-path memory references in cache-coherent multiprocessor systems
- Author
-
Augustus K. Uht, Resit Sendag, Ayse Yilmazer, and Joshua J. Yi
- Subjects
Instruction prefetch ,Memory coherence ,Hardware_MEMORYSTRUCTURES ,Computer Networks and Communications ,Computer science ,CPU cache ,Cache-only memory architecture ,Multiprocessing ,Parallel computing ,Branch predictor ,computer.software_genre ,Theoretical Computer Science ,Non-uniform memory access ,Shared memory ,Artificial Intelligence ,Hardware and Architecture ,Interleaved memory ,Operating system ,Uniprocessor system ,Distributed memory ,Cache ,computer ,Software ,Cache coherence - Abstract
The core of current-generation high-performance multiprocessor systems is out-of-order execution processors with aggressive branch prediction. Despite their relatively high branch prediction accuracy, these processors still execute many memory instructions down mispredicted paths. Previous work that focused on uniprocessors showed that these wrong-path (WP) memory references may pollute the caches and increase the amount of cache and memory traffic. On the positive side, however, they may prefetch data into the caches for memory references on the correct-path. While computer architects have thoroughly studied the impact of WP effects in uniprocessor systems, there is no comparable work for multiprocessor systems. In this paper, we explore the effects of WP memory references on the memory system behavior of shared-memory multiprocessor (SMP) systems for both broadcast and directory-based cache coherence. Our results show that these WP memory references can increase the amount of cache-to-cache transfers by 32%, invalidations by 8% and 20% for broadcast and directory-based SMPs, respectively, and the number of writebacks by up to 67% for both systems. In addition to the extra coherence traffic, WP memory references also increase the number of cache line state transitions by 21% and 32% for broadcast and directory-based SMPs, respectively. In order to reduce the performance impact of these WP memory references, we introduce two simple mechanisms—filtering WP blocks that are not likely-to-be-used and WP aware cache replacement—that yield speedups of up to 37%.
- Published
- 2007
29. Speed versus Accuracy Trade-Offs in Microarchitectural Simulations
- Author
-
David J. Lilja, Joshua J. Yi, Resit Sendag, and D.M. Hawkins
- Subjects
Set (abstract data type) ,Computational Theory and Mathematics ,Computer engineering ,Hardware and Architecture ,Computer science ,Trade offs ,Decision tree ,Sampling (statistics) ,Parallel computing ,Software ,Theoretical Computer Science - Abstract
Due to the long simulation time of the reference input set, computer architects often use reduced time simulation techniques to shorten the simulation time. However, what has not yet been thoroughly evaluated is the accuracy of these techniques relative to the reference input set and with respect to each other. To rectify this deficiency, this paper uses three methods to characterize reduced input set, truncated execution, and sampling-based simulation techniques while also examining their speed versus accuracy trade-off and configuration dependence. Our results show that the three sampling-based techniques, SimPoint, SMARTS, and random sampling, have the best accuracy, the best speed versus accuracy trade-off, and the least configuration dependence. On the other hand, the reduced input set and truncated execution simulation techniques had generally poor accuracy, were not significantly faster than the sampling-based techniques, and were severely configuration dependent. The final contribution of this paper is a decision tree, which can help architects choose the most appropriate technique for their simulations.
- Published
- 2007
30. Single-Threaded vs. Multithreaded: Where Should We Focus?
- Author
-
Joel Emer, Joshua J. Yi, Derek Chiou, Mark D. Hill, Yale N. Patt, and Resit Sendag
- Subjects
Multi-core processor ,Core (game theory) ,Focus (computing) ,Logic synthesis ,Hardware and Architecture ,Computer science ,Multithreading ,Thread (computing) ,Parallel computing ,Electrical and Electronic Engineering ,Latency (engineering) ,Software - Abstract
Today, with the increasing popularity of multicore processors, one approach to optimizing the processor's performance is to reduce the execution times of individual applications running on each core by designing and implementing more powerful cores. Another approach, which is the polar opposite of the first, optimizes the processor's performance by running a larger number of applications on a correspondingly larger number of cores, albeit simpler ones. The difference between these two approaches is that the former focuses on reducing the latency of individual applications or threads (it optimizes the processor's single-threaded performance), whereas the latter focuses on reducing the latency of the applications' threads taken as a group (it optimizes the processor's multithreaded performance). The panel, from the 2007 Workshop on Computer Architecture Research Directions, discusses the relevant issues.
- Published
- 2007
31. Where Does Security Stand? New Vulnerabilities vs. Trusted Computing
- Author
-
Jean-Pierre Seifert, G. Stronqin, Joshua J. Yi, Shay Gueron, Resit Sendag, and Derek Chiou
- Subjects
Security bug ,Cloud computing security ,Computer science ,Firmware ,Vulnerability ,Covert channel ,Trusted Computing ,Vulnerability management ,Computer security model ,computer.software_genre ,Computer security ,Trusted computing base ,Security service ,Hardware and Architecture ,Software security assurance ,Security through obscurity ,Electrical and Electronic Engineering ,computer ,Software ,Secure coding ,Vulnerability (computing) - Abstract
How can we ensure that platform hardware, firmware, and software work in concert to withstand rapidly evolving security threats? Architectural innovations bring performance gains but can also create new security vulnerabilities. In this panel discussion, from the 2007 workshop on Computer Architecture Research directions, we assess the current state of security and discuss possible routes toward trusted computing.
- Published
- 2007
32. Branch Misprediction Prediction: Complementary Branch Predictors
- Author
-
Joshua J. Yi, Peng-fei Chuang, and Resit Sendag
- Subjects
Prediction algorithms ,Computational complexity theory ,Hardware and Architecture ,Computer science ,Branch target predictor ,Scalability ,Parallel computing ,Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING ,Branch predictor ,Branch misprediction ,Critical path method ,Algorithm - Abstract
In this paper, we propose a new class of branch predictors, complementary branch predictors, which can be easily added to any branch predictor to improve the overall prediction accuracy. This mechanism differs from conventional branch predictors in that it focuses only on mispredicted branches. As a result, this mechanism has the advantages of scalability and flexibility (can be implemented with any branch predictor), but is not on the critical path. More specifically, this mechanism improves the branch prediction accuracy by predicting which future branch will be mispredicted next and when that will occur, and then it changes the predicted direction at the predicted time. Our results show that a branch predictor with the branch misprediction predictor achieves the same prediction accuracy as a conventional branch predictor that is 4 to 16 times larger, but without significantly increasing the overall complexity or lengthening the critical path.
- Published
- 2007
33. The future of simulation: a field of dreams
- Author
-
Joshua J. Yi, James E. Smith, Lieven Eeckhout, Brad Calder, Lizy K. John, and David J. Lilja
- Subjects
Flexibility (engineering) ,General Computer Science ,Computer performance ,Computer architecture ,Computer science ,Systems engineering ,Software performance testing ,Benchmarking ,Software system ,Architecture ,Field (computer science) - Abstract
Due to the enormous complexity of computer systems, researchers use simulators to model system behavior and generate quantitative estimates of expected performance. Researchers also use simulators to model and assess the efficacy of future enhancements and novel systems. Arguably the most important tools available to computer architecture researchers, simulators offer a balance of cost, timeliness, and flexibility. Improving the infrastructure, benchmarking, and methodology of simulation - the dominant computer performance evaluation method - results in higher efficiency and let architects gain more insight into processor behavior. For these reasons, architecture researchers have increasingly relied on simulators
- Published
- 2006
34. Simulation of computer architectures: simulators, benchmarks, methodologies, and recommendations
- Author
-
David J. Lilja and Joshua J. Yi
- Subjects
Flexibility (engineering) ,Range (mathematics) ,Computational Theory and Mathematics ,Computer architecture ,Hardware and Architecture ,Design space exploration ,Computer science ,Limit (music) ,Design process ,Throughput (business) ,Software ,Theoretical Computer Science ,Reliability engineering - Abstract
Simulators have become an integral part of the computer architecture research and design process. Since they have the advantages of cost, time, and flexibility, architects use them to guide design space exploration and to quantify the efficacy of an enhancement. However, long simulation times and poor accuracy limit their effectiveness. To reduce the simulation time, architects have proposed several techniques that increase the simulation speed or throughput. To increase the accuracy, architects try to minimize the amount of error in their simulators and have proposed adding statistical rigor to their simulation methodology. Since a wide range of approaches exist and since many of them overlap, this paper describes, classifies, and compares them to aid the computer architect in selecting the most appropriate one.
- Published
- 2006
35. Improving Computer Architecture Simulation Methodology by Adding Statistical Rigor
- Author
-
David J. Lilja, Joshua J. Yi, and D.M. Hawkins
- Subjects
Instruction set ,Computational Theory and Mathematics ,Computer architecture ,Hardware and Architecture ,CPU cache ,Computer science ,Precomputation ,Re-order buffer ,Execution time ,Software ,Theoretical Computer Science - Abstract
Due to cost, time, and flexibility constraints, computer architects use simulators to explore the design space when developing new processors and to evaluate the performance of potential enhancements. However, despite this dependence on simulators, statistically rigorous simulation methodologies are typically not used in computer architecture research. A formal methodology can provide a sound basis for drawing conclusions gathered from simulation results by adding statistical rigor and, consequently, can increase the architect's confidence in the simulation results. This paper demonstrates the application of a rigorous statistical technique to the setup and analysis phases of the simulation process. Specifically, we apply a Plackett and Burman design to: 1) identify key processor parameters, 2) classify benchmarks based on how they affect the processor, and 3) analyze the effect of processor enhancements. Our results showed that, out of the 41 user-configurable parameters in SimpleScalar, only 10 had a significant effect on the execution time. Of those 10, the number of reorder buffer entries and the L2 cache latency were the two most significant ones, by far. Our results also showed that instruction precomputation - a value reuse-like microarchitectural technique - primarily improves the processor's performance by relieving integer ALU contention.
- Published
- 2005
36. Guest Editors' Introduction: Computer Architecture Simulation and Modeling
- Author
-
Timothy Sherwood and Joshua J. Yi
- Subjects
Presentation ,Computer architecture ,Hardware and Architecture ,Computer science ,media_common.quotation_subject ,Electrical and Electronic Engineering ,Software ,media_common - Abstract
Guest Editors Timothy Sherwood and Joshua J. Yi talk about what went into the presentation of IEEE Micro's Computer Architecture Simulation and Modeling special issue.
- Published
- 2006
37. Adaptive simulation sampling using an Autoregressive framework
- Author
-
Resit Sendag, Sharookh Daruwalla, and Joshua J. Yi
- Subjects
Cycles per instruction ,business.industry ,Computer science ,Coefficient of variation ,Sampling (statistics) ,Interval (mathematics) ,Confidence interval ,Software ,Autoregressive model ,Sample size determination ,Benchmark (computing) ,business ,Algorithm ,Simulation - Abstract
Software simulators remain several orders of magnitude slower than the modern microprocessor architectures they simulate. Although various reduced-time simulation tools are available to accurately help pick truncated benchmark simulation, they either come with a need for offline analysis of the benchmarks initially or require many iterative runs of the benchmark. In this paper, we present a novel sampling simulation method, which only requires a single run of the benchmark to achieve a desired confidence interval, with no offline analysis and gives comparable results in accuracy and sample sizes to current simulation methodologies. Our method is a novel configuration independent approach that incorporates an Autoregressive (AR) model using the squared coefficient of variance (SCV) of Cycles per Instruction (CPI). Using the sampled SCVs of past intervals of a benchmark, the model computes the required number of samples for the next interval through a derived relationship between number of samples and the SCVs of the CPI distribution. Our implementation of the AR model achieves an actual average error of only 0.76% on CPI with a 99.7% confidence interval of ±0.3% for all SPEC2K benchmarks while simulating, in detail, an average of 40 million instructions per benchmark.
- Published
- 2009
38. Low power/area branch prediction using complementary branch predictors
- Author
-
David J. Lilja, Peng-fei Chuang, Joshua J. Yi, and Resit Sendag
- Subjects
Computer science ,Pipeline (computing) ,Byte ,Parallel computing ,Hardware_CONTROLSTRUCTURESANDMICROPROGRAMMING ,Branch misprediction ,Branch predictor ,Power (physics) - Abstract
Although high branch prediction accuracy is necessary for high performance, it typically comes at the cost of larger predictor tables and/or more complex prediction algorithms. Unfortunately, large predictor tables and complex algorithms require more chip area and have higher power consumption, which precludes their use in embedded processors. As an alternative to large, complex branch predictors, in this paper, we investigate adding complementary branch predictors (CBP) to embedded processors to reduce their power consumption and/or improve their branch prediction accuracy. A CBP differs from a conventional branch predictor in that it focuses only on frequently mispredicted branches while letting the conventional branch predictor predict the more predictable ones. Our results show that adding a small 16-entry (28 byte) CBP reduces the branch misprediction rate of static, bimodal, and gshare branch predictors by an average of 51.0%, 42.5%, and 39.8%, respectively, across 38 SPEC 2000 and MiBench benchmarks. Furthermore, a 256-entry CBP improves the energy-efficiency of the branch predictor and processor up to 97.8% and 23.6%, respectively. Finally, in addition to being very energy-efficient, a CBP can also improve the processor performance and, due to its simplicity, can be easily added to the pipeline of any processor.
- Published
- 2008
39. Reliability: fallacy or reality?
- Author
-
Derek Chiou, Joshua J. Yi, Resit Sendag, Scott Mahlke, Antonio González, Shubu Mukherjee, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, and Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors
- Subjects
Fallacy ,Process (engineering) ,Computer science ,business.industry ,Risk analysis (engineering) ,Circuits integrats -- Fiabilitat ,Hardware and Architecture ,Embedded system ,Electrical and Electronic Engineering ,business ,Integrated circuits -- Reliability ,Informàtica::Arquitectura de computadors [Àrees temàtiques de la UPC] ,Software ,Reliability (statistics) - Abstract
As chip architects and manufacturers plumb ever-smaller process technologies, new species of faults are compromising device reliability, following an introduction by the authors debate whether reliability is a legitimate concern for the microarchitect. topics include the costs of adding reliability versus those of ignoring it, how to measure it, techniques for improving it, and whether consumers really want it.
- Published
- 2007
- Full Text
- View/download PDF
40. Accurate statistical approaches for generating representative workload compositions
- Author
-
David J. Lilja, Joshua J. Yi, Paul Schrater, Lieven Eeckhout, and Rashmi Sundareswara
- Subjects
Computer science ,Gaussian ,Spec# ,Workload ,computer.software_genre ,Independent component analysis ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,Redundancy (information theory) ,Principal component analysis ,symbols ,Benchmark (computing) ,Data analysis ,Data mining ,computer ,computer.programming_language - Abstract
Composing a representative workload is a crucial step during the design process of a microprocessor. The workload should be composed in such a way that it is representative for the target domain of application and yet, the amount of redundancy in the workload should be minimized as much as possible in order not to overly increase the total simulation time. As a result, there is an important trade-off that needs to be made between workload representativeness and simulation accuracy versus simulation speed. Previous work used statistical data analysis techniques to identify representative benchmarks and corresponding inputs, also called a subset, from a large set of potential benchmarks and inputs. These methodologies measure a number of program characteristics on which principal components analysis (PCA) is applied before identifying distinct program behaviors among the benchmarks using cluster analysis. In this paper we propose independent components analysis (ICA) as a better alternative to PCA as it does not assume that the original data set has a Gaussian distribution, which allows ICA to better find the important axes in the workload space. Our experimental results using SPEC CPU2000 benchmarks show that ICA significantly outperforms PCA in that ICA achieves smaller benchmark subsets that are more accurate than those found by PCA.
- Published
- 2006
41. Evaluating Benchmark Subsetting Approaches
- Author
-
Resit Sendag, Lizy K. John, David J. Lilja, Ajay Joshi, Lieven Eeckhout, and Joshua J. Yi
- Subjects
Set (abstract data type) ,Range (mathematics) ,Core (game theory) ,Computer science ,Suite ,Principal component analysis ,Benchmark (computing) ,Set theory ,Data mining ,computer.software_genre ,computer ,Integer (computer science) - Abstract
To reduce the simulation time to a tractable amount or due to compilation (or other related) problems, computer architects often simulate only a subset of the benchmarks in a benchmark suite. However, if the architect chooses a subset of benchmarks that is not representative, the subsequent simulation results will, at best, be misleading or, at worst, yield incorrect conclusions. To address this problem, computer architects have recently proposed several statistically-based approaches to subset a benchmark suite. While some of these approaches are well-grounded statistically, what has not yet been thoroughly evaluated is the: 1) Absolute accuracy, 2) Relative accuracy across a range of processor and memory subsystem enhancements, and 3) Representativeness and coverage of each approach for a range of subset sizes. Specifically, this paper evaluates statistically-based subsetting approaches based on principal components analysis (PCA) and the Plackett and Burman (P&B) design, in addition to prevailing approaches such as integer vs. floating-point, core vs. memory-bound, by language, and at random. Our results show that the two statistically-based approaches, PCA and P&B, have the best absolute and relative accuracy for CPI and energy-delay product (EDP), produce subsets that are the most representative, and choose benchmark and input set pairs that are most well-distributed across the benchmark space. To achieve a 5% absolute CPI and EDP error, across a wide range of configurations, PCA and P&B typically need about 17 benchmark and input set pairs, while the other five approaches often choose more than 30 benchmark and input set pairs.
- Published
- 2006
42. The exigency of benchmark and compiler drift
- Author
-
Joshua J. Yi, Hans Vandierendonck, Lieven Eeckhout, and David J. Lilja
- Subjects
Computer science ,CPU cache ,Suite ,Re-order buffer ,Spec# ,SDET ,Parallel computing ,Compiler ,computer.software_genre ,computer ,Bottleneck ,Access time ,computer.programming_language - Abstract
Due to the amount of time required to design a new processor, one set of benchmark programs may be used during the design phase while another may be the standard when the design is finally delivered. Using one benchmark suite to design a processor while using a different, presumably more current, suite to evaluate its ultimate performance may lead to sub-optimal design decisions if there are large differences between the characteristics of the two suites and their respective compilers. We call this changes across time "drift". To evaluate the impact of using yesterday's benchmark and compiler technology to design tomorrow's processors, we compare common benchmarks from the SPEC 95 and SPEC 2000 benchmark suites. Our results yield three key conclusions. First, we show that the amount of drift, for common programs in successive SPEC benchmark suites, is significant. In SPEC 2000, the main memory access time is a far more significant performance bottleneck than in SPEC 95, while less significant SPEC 2000 performance bottlenecks include the L2 cache latency, the L1 I-cache size, and the number of reorder buffer entries. Second, using two different statistical techniques, we show that compiler drift is not as significant as benchmark drift. Third, we show that benchmark and compiler drift can have a significant impact on the final design decisions. Specifically, we use a one-parameter-at-a-time optimization algorithm to design two different year-2000 processors, one optimized for SPEC 95 and the other optimized for SPEC 2000, using the energy-delay product (EDP) as the optimization criterion. The results show that using SPEC 95 to design a year-2000 processor results in an 18.5% larger EDP and a 20.8% higher CPI than using the SPEC 2000 benchmarks to design the corresponding processor. Finally, we make a few recommendations to help computer architects minimize the effects of benchmark and compiler drift.
- Published
- 2006
43. Evaluating the efficacy of statistical simulation for design space exploration
- Author
-
R.H. Bell, Lizy K. John, Lieven Eeckhout, Joshua J. Yi, David J. Lilja, and Ajay Joshi
- Subjects
Computer engineering ,Computer science ,Design space exploration ,media_common.quotation_subject ,Probabilistic logic ,Key (cryptography) ,Fidelity ,Workload ,Simulation ,Space exploration ,media_common ,Microarchitecture ,TRACE (psycholinguistics) - Abstract
Recent research has proposed statistical simulation as a technique for fast performance evaluation of superscalar microprocessors. The idea in statistical simulation is to measure a program's key performance characteristics, generate a synthetic trace with these characteristics, and simulate the synthetic trace. Due to the probabilistic nature of statistical simulation the performance estimate quickly converges to a solution, making it an attractive technique to efficiently cull a large microprocessor design space. In this paper, we evaluate the efficacy of statistical simulation in exploring the design space. Specifically, we characterize the following aspects of statistical simulation: (i) fidelity of performance bottlenecks, with respect to cycle-accurate simulation of the program, (ii) ability' to track design changes, and (Hi) trade-off between accuracy and complexity in statistical simulation models. In our characterization experiments, we use the Plackett & Burman (P&B) design to systematically stress statistical simulation by creating different performance bottlenecks. The key results from this paper are: (1) Synthetic traces stress at least the same 10 most significant processor performance bottlenecks as the original workload, (2) Statistical simulation can effectively track design changes to identify feasible design points in a large design space of aggressive microarchitectures, (3) Our evaluation of 4 statistical simulation models shows that although a very detailed model is needed to achieve a good absolute accuracy in performance estimation, a simple model is sufficient to achieve good relative accuracy, and (4) The P&B design technique can be used to quickly identify areas to focus on to improve the accuracy of the statistical simulation model.
- Published
- 2006
44. Computer Architecture
- Author
-
Joshua J. Yi and David J. Lilja
- Published
- 2006
45. Characterizing and Comparing Prevailing Simulation Techniques
- Author
-
D.M. Hawkins, Resit Sendag, David J. Lilja, Sreekumar V. Kodakara, and Joshua J. Yi
- Subjects
Set (abstract data type) ,Speedup ,Characterization methods ,Computer science ,Decision tree ,Sampling (statistics) ,Parallel computing ,Algorithm - Abstract
Due to the simulation time of the reference input set, architects often use alternative simulation techniques. Although these alternatives reduce the simulation time, what has not been evaluated is their accuracy relative to the reference input set, and with respect to each other. To rectify this deficiency, this paper uses three methods to characterize the reduced input set, truncated execution, and sampling simulation techniques while also examining their speed versus accuracy trade-off and configuration dependence. Finally, to illustrate the effect that a technique could have on the apparent speedup results, we quantify the speedups obtained with two processor enhancements. The results show that: 1) the accuracy of the truncated execution techniques was poor for all three characterization methods and for both enhancements, 2) the characteristics of the reduced input sets are not reference-like, and 3) SimPoint and SMARTS, the two sampling techniques, are extremely accurate and have the best speed versus accuracy trade-offs. Finally, this paper presents a decision tree which can help architects choose the most appropriate technique for their simulations.
- Published
- 2005
46. Fluorescent image-guided surgery in breast cancer by intravenous application of a quenched fluorescence activity-based probe for cysteine cathepsins in a syngeneic mouse model
- Author
-
Frans V. Suurs, Si-Qi Qiu, Joshua J. Yim, Carolien P. Schröder, Hetty Timmer-Bosscha, Eric S. Bensen, John T. Santini, Elisabeth G. E. de Vries, Matthew Bogyo, and Gooitzen M. van Dam
- Subjects
Image-guided surgery (IGS) ,Quenched fluorescent activity-based probe (qABP) ,Cathepsin targeting ,Indocyanine green (ICG) ,Breast cancer ,Medical physics. Medical radiology. Nuclear medicine ,R895-920 - Abstract
Abstract Purpose The reoperation rate for breast-conserving surgery is as high as 15–30% due to residual tumor in the surgical cavity after surgery. In vivo tumor-targeted optical molecular imaging may serve as a red-flag technique to improve intraoperative surgical margin assessment and to reduce reoperation rates. Cysteine cathepsins are overexpressed in most solid tumor types, including breast cancer. We developed a cathepsin-targeted, quenched fluorescent activity-based probe, VGT-309, and evaluated whether it could be used for tumor detection and image-guided surgery in syngeneic tumor-bearing mice. Methods Binding specificity of the developed probe was evaluated in vitro. Next, fluorescent imaging in BALB/c mice bearing a murine breast tumor was performed at different time points after VGT-309 administration. Biodistribution of VGT-309 after 24 h in tumor-bearing mice was compared to control mice. Image-guided surgery was performed at multiple time points tumors with different clinical fluorescent camera systems and followed by ex vivo analysis. Results The probe was specifically activated by cathepsins X, B/L, and S. Fluorescent imaging revealed an increased tumor-to-background contrast over time up to 15.1 24 h post probe injection. In addition, VGT-309 delineated tumor tissue during image-guided surgery with different optical fluorescent imaging camera systems. Conclusion These results indicate that optical fluorescent molecular imaging using the cathepsin-targeted probe, VGT-309, may improve intraoperative tumor detection, which could translate to more complete tumor resection when coupled with commercially available surgical tools and techniques.
- Published
- 2020
- Full Text
- View/download PDF
47. Increasing Instruction-Level Parallelism with Instruction Precomputation
- Author
-
David J. Lilja, Resit Sendag, and Joshua J. Yi
- Subjects
Speedup ,Computer science ,CPU cache ,Opcode ,Precomputation ,Table (database) ,Parallel computing ,Instruction-level parallelism ,Operand - Abstract
Value reuse improves a processor’s performance by dynamically caching the results of previous instructions and reusing those results to bypass the execution of future instructions that have the same opcode and input operands. However, continually replacing the least recently used entries could eventually fill the value reuse table with instructions that are not frequently executed. Furthermore, the complex hardware that replaces entries and updates the table may necessitate an increase in the clock period. We propose instruction precomputation to address these issues by profiling programs to determine the opcodes and input operands that have the highest frequencies of execution. These instructions then are loaded into the precomputation table before the program executes. During program execution, the precomputation table is used in the same way as the value reuse table is, with the exception that the precomputation table does not dynamically replace any entries. For a 2K-entry precomputation table implemented on a 4-way issue machine, this approach produced an average speedup of 11.0%. By comparison, a 2K-entry value reuse table produced an average speedup of 6.7%. Instruction precomputation outperforms value reuse, especially for smaller tables, with the same number of table entries while using less area and having a lower access time.
- Published
- 2002
48. Reliability: Is it fortune or fallacy?
- Author
-
Scott Mahlke, Resit Sendag, Shubu Mukherjee, Joshua J. Yi, Antonio González, and Derek Chiou
- Subjects
Fallacy ,Hardware and Architecture ,Computer science ,Electrical and Electronic Engineering ,Software ,Reliability (statistics) ,Reliability engineering - Published
- 2007
49. Low-power design and temperature management
- Author
-
Resit Sendag, Kevin Skadron, Joshua J. Yi, Derek Chiou, Kanad Ghose, and Pradip Bose
- Subjects
Power management ,Multi-core processor ,Computer science ,business.industry ,Thermal management of electronic devices and systems ,law.invention ,Power (physics) ,Microprocessor ,law ,Hardware and Architecture ,Low-power electronics ,Embedded system ,Systems engineering ,Electrical and Electronic Engineering ,business ,Host (network) ,Software - Abstract
One of the primary concerns for microprocessor designers has always been balancing power and thermal management while minimizing performance loss. rather than generate solutions to this dilemma, the advent of multicore chips has raised a host of new challenges. this discussion with Pradip Bose and Kanad Ghose, excerpted from a 2007 Card Workshop Panel, explores the future of low-power design and temperature management.
- Published
- 2007
50. FPGAs versus GPUs in Data centers
- Author
-
Desh Singh, Derek Chiou, Babak Falsafi, Bill Dally, Joshua J. Yi, and Resit Sendag
- Subjects
010302 applied physics ,Computer science ,Field programmable gate arrays ,Servers ,02 engineering and technology ,Parallel computing ,01 natural sciences ,Datacenters ,Hardware and Architecture ,020204 information systems ,Server ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Session (computer science) ,Electrical and Electronic Engineering ,Field-programmable gate array ,Graphics processing units ,Software - Abstract
This article presents position statements and a question-and-answer session by panelists at the Fourth Workshop on Computer Architecture Research Directions. The subject of the debate was the use of field-programmable gate arrays versus GPUs in datacenters.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.