Author: "Cheng-Kok Koh" / Search Limiters: Full Text - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cheng-Kok Koh"' showing total 74 results

Start Over Author "Cheng-Kok Koh" Search Limiters Full Text

74 results on '"Cheng-Kok Koh"'

1. A Scalable, Memory-Efficient Algorithm for Minimum Cycle Mean Calculation in Directed Graphs

Author: Supriyo Maji and Cheng-Kok Koh
Subjects: Computer science, Efficient algorithm, Scalability, Parallel computing, Directed graph, Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2022
Full Text: View/download PDF

2. A Scalable Buffer Queue Sizing Algorithm for Latency Insensitive Systems

Author: Supriyo Maji and Cheng-Kok Koh
Subjects: Linear programming, Computer science, Parameterized complexity, Computer Graphics and Computer-Aided Design, law.invention, Relay, law, Scalability, System on a chip, Electrical and Electronic Engineering, Physical design, Latency (engineering), Queue, Algorithm, Software
Abstract: Timing violations in high performance communication channels in system-on-chips (SoC) may occur in the late stages of the physical design process. To address that, latency insensitive systems (LISs) employ pipelining in the communication channels through the insertion of relay stations. Although the functionality of an LIS is robust with respect to the communication latencies, imbalances in relay station insertion may degrade the throughput of the system. While having a large number of buffer queues can eliminate such performance loss, the system may not have adequate area to accommodate these buffers. The problem of buffer queue sizing for maximizing throughput while meeting buffer area constraints has been solved using a mixed-integer linear program (MILP) formulation; however, such an approach is not scalable. In this work, we formulate the buffer queue sizing problem as a parameterized graph optimization problem where for every communication channel there is a parameterized edge with buffer counts as the edge weight. We then use a minimum cycle mean algorithm to determine from which edges buffers can be removed safely. Experimental results on large LISs suggest that the proposed approach is scalable. Moreover, quality of the solutions, in terms of the throughput and the size of buffer queues, is observed to be as good as that of the MILP-based approach.
Published: 2021
Full Text: View/download PDF

3. Scalable Construction of Clock Trees With Useful Skew and High Timing Quality

Author: Rickard Ewetz and Cheng-Kok Koh
Subjects: Computer science, Skew, Topology (electrical circuits), 02 engineering and technology, Network topology, Computer Graphics and Computer-Aided Design, 020202 computer hardware & architecture, Timing margin, Scalability, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), Electrical and Electronic Engineering, Routing (electronic design automation), Algorithm, Software
Abstract: Clock trees can be constructed based on static arrival time constraints or dynamic implied skew constraints. Dynamic implied skew constraints allow the full timing margins to be utilized. However, the dynamic skew constraints require a high run-time complexity to be evaluated. In contrast, static arrival time constraints are more restrictive but can be evaluated in constant time. Consequently, there is a tradeoff between timing margin utilization and run-time. In this paper, a scalable clock tree synthesis (CTS) framework is proposed for the construction of low-cost useful skew trees (USTs) with high timing quality. The scalability is based on combining the use of arrival time constraints with virtual minimum and maximum delay offsets, which facilitates that a pair of smaller subtrees can be joined into a larger subtree in constant time. The ability to quickly join subtrees is leveraged to perform a high degree of solution space exploration, which translates into the construction of USTs with low-cost. In particular, clock trees with various routing tree topologies, buffer tree topologies, buffer sizes, and stem wire lengths are explored. Moreover, the arrival time constraints are specified with the objective of being the least restrictive to reduce cost. Furthermore, the constraints are respecified throughout the tree construction process using a slack graph (SG) to expose additional timing margins. The high timing quality is obtained by seamlessly integrating arbitrary timing models using the SG. Finally, the proposed CTS framework is integrated with a clock tree optimization framework to demonstrate that the constructed USTs are capable of meeting timing constraints under the influence of on-chip variations.
Published: 2019
Full Text: View/download PDF

4. Saath: Speeding up CoFlows by Exploiting the Spatial Dimension

Author: Cheng-Kok Koh, Rohan Gandhi, Akshay Jajoo, and Y. Charlie Hu
Subjects: Networking and Internet Architecture (cs.NI), FOS: Computer and information sciences, Computer science, Distributed computing, Testbed, 020206 networking & telecommunications, 0102 computer and information sciences, 02 engineering and technology, Parallel computing, 01 natural sciences, Scheduling (computing), Computer Science - Networking and Internet Architecture, Computer Science - Distributed, Parallel, and Cluster Computing, 010201 computation theory & mathematics, 0202 electrical engineering, electronic engineering, information engineering, Distributed, Parallel, and Cluster Computing (cs.DC), Completion time, Queue
Abstract: Coflow scheduling improves data-intensive application performance by improving their networking performance. State-of-the-art Coflow schedulers in essence approximate the classic online Shortest-Job-First (SJF) scheduling, designed for a single CPU, in a distributed setting, with no coordination among how the flows of a Coflow at individual ports are scheduled, and as a result suffer two performance drawbacks: (1) The flows of a Coflow may suffer the out-of-sync problem -- they may be scheduled at different times and become drifting apart, negatively affecting the Coflow completion time (CCT); (2) FIFO scheduling of flows at each port bears no notion of SJF, leading to suboptimal CCT. We propose SAATH, an online Coflow scheduler that overcomes the above drawbacks by explicitly exploiting the spatial dimension of Coflows. In SAATH, the global scheduler schedules the flows of a Coflow using an all-or-none policy which mitigates the out-of-sync problem. To order the Coflows within each queue, SAATH resorts to a Least-Contention-First (LCoF) policy which we show extends the gist of SJF to the spatial dimension, complemented with starvation freedom. Our evaluation using an Azure testbed and simulations of two production cluster traces show that compared to Aalo, SAATH reduces the CCT in median (P90) cases by 1.53x (4.5x) and 1.42x (37x), respectively.
Published: 2021
Full Text: View/download PDF

5. A parallel direct solver for the simulation of large-scale power/ground networks

Author: Cauley, S., Balakrishnan, V., and Cheng-Kok Koh
Subjects: Standard IC, Computer networks -- Design and construction, Information networks -- Design and construction, Integrated circuits -- Design and construction, Semiconductor chips -- Design and construction, Iterative methods (Mathematics) -- Usage
Published: 2010

6. Optimal double via insertion with on-track preference

Author: Kuang-Yao Lee, Ting-Chi Wang, Cheng-Kok Koh, and Kai-Yuan Chao
Subjects: Bridge/router, Internetworking device, ISDN router, Standard IC, Linear programming -- Usage, Bridge/routers -- Usage, Heuristic programming -- Usage, Integrated circuits -- Analysis, Semiconductor chips -- Analysis
Published: 2010

7. Fast and optimal redundant via insertion

Author: Kuang-Yao Lee, Cheng-Kok Koh, Ting-Chi Wang, and Kai-Yuan Chao
Subjects: Circuit designer, Integrated circuit design, Standard IC, Circuit design -- Research, Integer programming -- Usage, Integrated circuits -- Design and construction, Semiconductor chips -- Design and construction, Linear programming -- Usage
Published: 2008

8. Using The Tetris Game To Teach Computing

Author: Yung-Hsiang Lu, Guangwei Zhu, and Cheng-Kok Koh
Published: 2020
Full Text: View/download PDF

9. Two-algorithms for fast and accurate passivity-preserving model order reduction

Author: Ngai Wong, Balakrishnan, Venkataramanan, Cheng-Kok Koh, and Tung-Sang Ng
Subjects: Riccati equation -- Usage, Eigenfunctions -- Usage, Numerical analysis
Published: 2006

10. Performance analysis of latency-insensitive systems

Author: Ruibing Lu and Cheng-Kok Koh
Subjects: Standard IC, Circuit designer, Integrated circuit design, Integrated circuits -- Design and construction, Semiconductor chips -- Design and construction, Circuit design -- Analysis, Mathematical logic -- Usage, Symbolic and mathematical logic -- Usage
Published: 2006

11. Mixed block placement via fractional cut recursive bisection

Author: Agnihotri, Ameya Ramesh, Ono, Satoshi, Chen Li, Yildiz, Mehmet Can, Madden, Patrick H., Khatkhate, Ateen, and Cheng-Kok Koh
Subjects: Circuit designer, Integrated circuit design, Circuit design -- Research
Published: 2005

12. Fast clock scheduling and an application to clock tree synthesis

Author: Rickard Ewetz and Cheng-Kok Koh
Subjects: Synchronous circuit, Computer science, 020208 electrical & electronic engineering, Real-time computing, Matrix clock, Clock gating, 02 engineering and technology, Digital clock manager, Clock skew, Clock synchronization, Timing failure, 020202 computer hardware & architecture, Computer engineering, Hardware and Architecture, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Software, CPU multiplier
Abstract: Clock networks are required to be constructed with adequate safety margins in the skew constraints to operate correctly even under the influence of variations. In this work, a scalable clock scheduler is developed to drive a synthesis framework that constructs useful skew clock trees with large safety margins that are tailored to the tree topology. Sequential elements are clustered early in the topology, if it is impossible provide adequate robustness to variations using only safety margins. Compared to earlier studies, the proposed framework performs the clock scheduling one to two orders of magnitude faster and improves yield and capacitive cost on several synthesized circuits.
Published: 2017
Full Text: View/download PDF

13. An Automatic Design of Factors in a Human-Pose Estimation System Using Neural Networks

Author: C. S. George Lee, Cheng-Kok Koh, and Kai-Chi Chan
Subjects: Computer science, Feature extraction, 02 engineering and technology, Machine learning, computer.software_genre, Probabilistic neural network, 0202 electrical engineering, electronic engineering, information engineering, Electrical and Electronic Engineering, Pose, Artificial neural network, business.industry, 020208 electrical & electronic engineering, Feed forward, Bayesian network, Pattern recognition, Computer Science Applications, Human-Computer Interaction, Control and Systems Engineering, Feedforward neural network, Probability distribution, 020201 artificial intelligence & image processing, Artificial intelligence, business, Random variable, computer, Software
Abstract: Previous studies on human-pose estimation (HPE) rely on the design of factors to represent underlying probability distributions that model human poses. However, designing those factors manually is laborious. Moreover, manually designed factors might not represent underlying probability distributions properly. In this paper, we utilize feedforward neural networks (NNs) to design factors of our previous work on HPE and build an NN-based HPE system. We first propose a mapping that converts a Bayesian network to a feedforward NN. Then, the system is built based on the proposed mapping that consists of two steps: 1) structure identification and 2) parameter learning. In the structure identification, we develop a bottom-up approach to build a feedforward NN while preserving a Bayesian-network structure. In the parameter learning, we create a part-based approach to learn synaptic weights by decomposing a feedforward NN into parts. Using the proposed mapping, our previous work of an action-mixture model (AMM) for HPE is converted to a feedforward NN called NN-AMM. Based on the concept of distributed representation, NN-AMM is further modified to a scalable feedforward NN called NND-AMM. The NN-based HPE system is then built by using viewpoint-and-shape-feature-histogram features extracted from 3-D-point-cloud input and NND-AMM to estimate 3-D human poses. The results showed that the proposed mapping could design AMM factors automatically. NND-AMM could provide more accurate human-pose estimates with fewer hidden neurons than both AMM and NN-AMM could. Both NN-AMM and NND-AMM could adapt to different types of input, showing the adaptability of using feedforward NNs to design factors.
Published: 2016
Full Text: View/download PDF

14. Cost-Effective Robustness in Clock Networks Using Near-Tree Structures

Author: Cheng-Kok Koh and Rickard Ewetz
Subjects: Synchronous circuit, Clock signal, Computer science, Real-time computing, Matrix clock, Clock gating, Digital clock manager, Clock skew, Computer Graphics and Computer-Aided Design, Timing failure, Clock synchronization, Computer engineering, Robustness (computer science), Redundancy (engineering), Electrical and Electronic Engineering, Software, CPU multiplier
Abstract: Clock trees are commonly used to deliver clock signals to sequential elements in circuits. However, by construction, tree structures are inherently prone to failure caused by variations. The robustness of a clock tree can be improved by inserting redundancy in the form of cross links or multilevel fusion trees. Such near-tree structures can provide robustness at low cost. In this paper, we establish that the locations of the inserted redundancy are crucial in providing cost-effective robustness. We present two methods to systematically insert redundancy. The redundancy is realized by either inserting cross links or performing local merges. Moreover, we present a vertex reduction method that reduces the amount of redundancy that needs to be inserted in our near-tree structures. Empirical results show that our structures are more robust to variations and have lower power consumption compared to the state-of-the-art clock networks. Furthermore, our near-tree structures provide smooth trade-offs between cost and robustness, reducing clock skews by 11%–39% at an expense of 3%–68% higher power consumption.
Published: 2015
Full Text: View/download PDF

15. Clock Tree Construction based on Arrival Time Constraints

Author: Cheng-Kok Koh and Rickard Ewetz
Subjects: Mathematical optimization, Computer science, 0211 other engineering and technologies, Skew, Static timing analysis, Topology (electrical circuits), 02 engineering and technology, Network topology, Tree (graph theory), Timing failure, 020202 computer hardware & architecture, Timing margin, 0202 electrical engineering, electronic engineering, information engineering, Time complexity, 021106 design practice & management
Abstract: There are striking differences between constructing clock trees based on dynamic implied skew constraints and based on static arrival time constraints. Dynamic implied skew constraints allow the full timing margins to be utilized, but the constraints are required to be updated (with high time complexity). In contrast, static arrival time constraints are decoupled and are not required to be updated. Therefore, the constraints can be obtained in constant time, which facilitates the exploration of various tree topologies. On the other hand, arrival time constraints do not allow the full timing margins to be utilized. Consequently, there is a trade-off between topology exploration and timing margin utilization. In this paper, the advantages of static arrival time constraints are leveraged to construct clock trees with useful skew while exploring various tree topologies. Moreover, the constraints are specified and respecified throughout the synthesis process reduce the cost of the constructed clock trees. It is experimentally demonstrated that the proposed approach results in clock trees with 16% lower average capacitive cost compared with clock trees constructed based on dynamic implied skew constraints.
Published: 2017
Full Text: View/download PDF

16. A 3-D-Point-Cloud System for Human-Pose Estimation

Author: Cheng-Kok Koh, C. S. George Lee, and Kai-Chi Chan
Subjects: business.industry, Discrete space, Point cloud, Pattern recognition, Computer Science Applications, Human-Computer Interaction, Tree structure, Control and Systems Engineering, Feature (computer vision), Histogram, Probability distribution, Artificial intelligence, Electrical and Electronic Engineering, business, Pose, Software, Mathematics, Curse of dimensionality
Abstract: This paper focuses on human-pose estimation using a stationary depth sensor. The main challenge concerns reducing the feature ambiguity and modeling human poses in high-dimensional human-pose space because of the curse of dimensionality. We propose a 3-D-point-cloud system that captures the geometric properties (orientation and shape) of the 3-D point cloud of a human to reduce the feature ambiguity, and use the result from action classification to discover low-dimensional manifolds in human-pose space in estimating the underlying probability distribution of human poses. In the proposed system, a 3-D-point-cloud feature called viewpoint and shape feature histogram (VISH) is proposed to extract the 3-D points from a human and arrange them into a tree structure that preserves the global and local properties of the 3-D points. A nonparametric action-mixture model (AMM) is then proposed to model human poses using low-dimensional manifolds based on the concept of distributed representation. Since human poses estimated using the proposed AMM are in discrete space, a kinematic model is added in the last stage of the proposed system to model the spatial relationship of body parts in continuous space to reduce the quantization error in the AMM. The proposed system has been trained and evaluated on a benchmark dataset. Computer-simulation results showed that the overall error and standard deviation of the proposed 3-D-point-cloud system were reduced compared with some existing approaches without action classification.
Published: 2014
Full Text: View/download PDF

17. Construction of Latency-Bounded Clock Trees

Author: Cheng-Kok Koh, Rickard Ewetz, and Chuan Yean Tan
Subjects: Clock tree, Skew, 02 engineering and technology, Parallel computing, 020202 computer hardware & architecture, Tree root, Bounding overwatch, Bounded function, Shortest path problem, 0202 electrical engineering, electronic engineering, information engineering, Latency (engineering), Algorithm, Electronic circuit, Mathematics
Abstract: Clock trees must be constructed to function even under the influence of on-chip variations (OCV). Bounding the latency of a clock tree, i.e., the maximum delay from the tree root to any sequential element, is important because the latency correlates with the maximum magnitude of the skews caused by OCV. In this paper, a latency constraint graph (LCG) that captures the latencies of a set of subtrees and the skew constraints between the subtrees is introduced. The minimum latency of a clock tree that can be constructed from the corresponding subtrees is equal to the (negative of the) length of a shortest path in the LCG, which can be computed in $O(VE)$. Based on the LCG, we propose a framework that consists of a latency-aware clock tree synthesis (CTS) phase and a clock tree optimization (CTO) phase to construct latency-bounded clock trees. When applied to a set of synthesized circuits, the framework is capable of constructing latency-bounded clock trees that have higher yield compared to clock trees constructed in previous studies.
Published: 2016
Full Text: View/download PDF

18. A Quadratic Eigenvalue Solver of Linear Complexity for 3-D Electromagnetics-Based Analysis of Large-Scale Integrated Circuits

Author: Dan Jiao, Cheng-Kok Koh, Venkataramanan Balakrishnan, Duo Chen, and Jongwon Lee
Subjects: Inverse iteration, Mathematical optimization, Electromagnetics, Quadratic eigenvalue problem, MathematicsofComputing_NUMERICALANALYSIS, Solver, Computer Graphics and Computer-Aided Design, Computational science, Arnoldi iteration, Hardware_ARITHMETICANDLOGICSTRUCTURES, Electrical and Electronic Engineering, Circuit complexity, Divide-and-conquer eigenvalue algorithm, Software, Eigenvalues and eigenvectors, Mathematics
Abstract: It is of critical importance to efficiently and accurately predict global resonances of a 3-D integrated circuit system that involves arbitrarily shaped lossy conductors and inhomogeneous materials. A quadratic eigenvalue solver of linear complexity and electromagnetic accuracy is developed in this paper to fulfill this task. Without sacrificing accuracy, the proposed eigenvalue solver has shown a clear advantage over state-of-the-art eigenvalue solvers in fast CPU time. It successfully solves a quadratic eigenvalue problem of over 2.5 million unknowns associated with a large-scale 3-D on-chip circuit embedded in inhomogeneous materials in 40 min on a single 3 GHz 8222SE AMD Opteron processor.
Published: 2012
Full Text: View/download PDF

19. Optimal Double Via Insertion With On-Track Preference

Author: Kai-Yuan Chao, Kuang-Yao Lee, Cheng-Kok Koh, and Ting-Chi Wang
Subjects: Mathematical optimization, Linear programming, Integrated circuit design, Computer Graphics and Computer-Aided Design, law.invention, Capacitor, law, Hardware_INTEGRATEDCIRCUITS, Bipartite graph, Electrical and Electronic Engineering, Routing (electronic design automation), Special case, Integer programming, Software, Mathematics, Integer (computer science)
Abstract: As on-track double vias take less routing resources and have better electrical characteristics, we study in this paper the problem of double via insertion with a preference for on-track double vias (DVI/ON) in a postrouting stage. The primary goal is to insert as many double vias as possible, and maximizing the number of on-track double vias is a secondary objective. We present a zero-one integer linear program-based approach to optimally solve the DVI/ON problem. Moreover, we also discuss a special case of the DVI/ON problem and present a maximum-weighted bipartite matching-based optimal approach. Experimental results indicate that our approaches outperform existing algorithms in terms of solution quality.
Published: 2010
Full Text: View/download PDF

20. From $O(k^{2}N)$ to $O(N)$: A Fast and High-Capacity Eigenvalue Solver for Full-Wave Extraction of Very Large Scale On-Chip Interconnects

Author: Cheng-Kok Koh, Venkataramanan Balakrishnan, Dan Jiao, and Jongwon Lee
Subjects: Radiation, Computational complexity theory, Scale (ratio), Iterative method, Solver, Condensed Matter Physics, Arnoldi iteration, Computer Science::Hardware Architecture, Hardware_INTEGRATEDCIRCUITS, Applied mathematics, Electrical and Electronic Engineering, Divide-and-conquer eigenvalue algorithm, Algorithm, Eigendecomposition of a matrix, Eigenvalues and eigenvectors, Mathematics
Abstract: The wave-propagation problem in an on-chip interconnect network can be modeled as a generalized eigenvalue problem. For solving such a generalized eigenvalue problem, the computational complexity of Arnoldi iteration is at best O(k 2 N), where k is the number of dominant eigenvalues and N is the matrix size. In this paper, we reduce the computational complexity of the Arnoldi iteration for interconnect extraction from O(k 2 N) to O(N), thus paving the way for full-wave extraction of very large scale on-chip interconnects, of which a typical value of k is on the order of hundreds of thousands. Numerical and experimental results have demonstrated the accuracy and efficiency of the proposed fast eigenvalue solver.
Published: 2009
Full Text: View/download PDF

21. A Linear-Time Complex-Valued Eigenvalue Solver for Full-Wave Analysis of Large-Scale On-Chip Interconnect Structures

Author: Dan Jiao, Cheng-Kok Koh, Venkataramanan Balakrishnan, and Jongwon Lee
Subjects: Inverse iteration, Eigenvalues and eigenfunctions, Radiation, Computer simulation, finite element analysis, Mathematics::Spectral Theory, Solver, Condensed Matter Physics, Finite element method, Computational science, Reduction (complexity), Frequency domain, integrated circuit interconnections, Electrical and Electronic Engineering, Divide-and-conquer eigenvalue algorithm, Algorithm, Eigenvalues and eigenvectors, Mathematics
Abstract: This paper proposes a linear-time complex-valued eigenvalue solver for solving large-scale on-chip interconnect problems. The fast eigenvalue solution is achieved by eigenvalue clustering, fast system reduction with negligible computational cost, and fast linear-time solution of the reduced system. Numerical and experimental results are presented to demonstrate the accuracy and efficiency of the proposed method.
Published: 2009
Full Text: View/download PDF

22. Tolerating process variations in large, set-associative caches

Author: Cheng-Kok Koh, Hai Li, Weng-Fai Wong, and Yi Chen
Subjects: Hardware_MEMORYSTRUCTURES, Computer science, Process (computing), Parallel computing, law.invention, Set (abstract data type), Microprocessor, Tag RAM, Hardware and Architecture, law, Bus sniffing, Static random-access memory, Cache, Software, Information Systems, Block (data storage)
Abstract: One important trend in today's microprocessor architectures is the increase in size of the processor caches. These caches also tend to be set associative. As technology scales, process variations are expected to increase the fault rates of the SRAM cells that compose such caches. As an important component of the processor, the parametric yield of SRAM cells is crucial to the overall performance and yield of the microchip. In this article, we propose a microarchitectural solution, called the buddy cache that permits large, set-associative caches to tolerate faults in SRAM cells due to process variations. In essence, instead of disabling a faulty cache block in a set (as is the current practice), it is paired with another faulty cache block in the same set—the buddy. Although both cache blocks are faulty, if the faults of the two blocks do not overlap, then instead of losing two blocks, buddying will yield a functional block from the nonfaulty portions of the two blocks. We found that with buddying, caches can better mitigate the negative impacts of process variations on performance and yield, gracefully downgrading performance as opposed to catastrophic failure. We will describe the details of the buddy cache and give insights as to why it is both more performance and yield resilient to faults.
Published: 2009
Full Text: View/download PDF

23. Exact and numerically stable closed-form expressions for potential coefficients of rectangular conductors

Author: Jitesh Jain, Venkataramanan Balakrishnan, and Cheng-Kok Koh
Subjects: Inductance, Signal Processing, Mathematical analysis, Scalar (mathematics), Boundary value problem, Electrical and Electronic Engineering, Closed-form expression, Equating coefficients, Integral equation, Electrical conductor, Numerical stability, Mathematics
Abstract: Existing exact closed-form expressions for the scalar mutual and self-potential coefficients for rectangular conductors may be ill-conditioned for certain geometries. We propose new, exact, closed-form expressions for potential coefficients that are much better-conditioned. The basic idea is to express all potential coefficients as weighted sums of mutual and self-potential coefficients of suitably defined virtual plates. Experimental results are presented to demonstrate the improved numerical stability of the new formulas
Published: 2006
Full Text: View/download PDF

24. On-chip interconnect modeling by wire duplication

Author: Cheng-Kok Koh, Guoan Zhong, and Kaushik Roy
Subjects: Interconnection, Computer science, Circuit design, Equivalent series inductance, MathematicsofComputing_NUMERICALANALYSIS, Phantom circuit, Discrete circuit, Topology, Computer Graphics and Computer-Aided Design, Circuit extraction, Inductance, Matrix (mathematics), Hardware_INTEGRATEDCIRCUITS, Electronic engineering, RLC circuit, Equivalent circuit, Electrical and Electronic Engineering, Physical design, Software, Linear circuit
Abstract: The authors present a novel wire duplication-based interconnect modeling technique. The proposed modeling technique exploits the sparsity of the L/sup -1/ matrix, where L is the inductance matrix, and constructs a sparse and stable equivalent circuit by windowing the original inductance matrix. The resulting circuit model is sparse and exhibits the same stability property as the K method. Numerical results show that the proposed wire duplication model has high accuracy and is more efficient than many existing techniques.
Published: 2003
Full Text: View/download PDF

25. Exact closed-form formula for partial mutual inductances of rectangular conductors

Author: Cheng-Kok Koh and Guoan Zhong
Subjects: Inductance, business.industry, Mathematical analysis, Equivalent series inductance, Electrical engineering, Electrical and Electronic Engineering, Derivation of self inductance, Closed-form expression, business, Electrical conductor, Kinetic inductance, Mathematics
Abstract: In this brief, we propose a new exact closed-form mutual inductance equation for rectangular conductors. We express the mutual inductance between two parallel rectangular conductors as a weighted sum of self inductances. We do not place any restrictions on the alignment of the two parallel rectangular conductors. Moreover, they could be coplanar or reside on different layers. Most important, experimental results show that our formula is numerically more stable than that derived by Hoer and Love.
Published: 2003
Full Text: View/download PDF

26. Decoupling capacitance allocation and its application to power-supply noise-aware floorplanning

Author: Shiyou Zhao, Kaushik Roy, and Cheng-Kok Koh
Subjects: Very-large-scale integration, Engineering, business.industry, Noise reduction, Hardware_PERFORMANCEANDRELIABILITY, Chip, Computer Graphics and Computer-Aided Design, Integrated circuit layout, Floorplan, CMOS, Hardware_INTEGRATEDCIRCUITS, Electronic engineering, Electrical and Electronic Engineering, business, Software, Decoupling (electronics), Electronic circuit
Abstract: We investigate the problem of decoupling capacitance (decap) allocation for power supply noise suppression at floorplan level. First, we assume that a floorplan is given and consider the decap placement as a postfloorplan step. Second, we consider the decap placement as an integral part of a floorplanning methodology (noise-aware floorplanning). In both cases, the objective is to minimize the floorplan area while suppressing the power supply noise below the specified limit. Experimental results on MCNC benchmark circuits show that, for postfloorplan decap placement, the white space allocated for decap is about 6%-9% of the chip area for the 0.25-/spl mu/m technology. The power-supply noise is kept below the specified limit. Compared to postfloorplan approach, the peak power-supply noise can be reduced by as much as 40% and the decap budget can be reduced by as much as 21% by using noise-aware floorplanning methodology. The total area is also reduced due to the reduced total decap budget gained from reduced power supply noise.
Published: 2002
Full Text: View/download PDF

27. Routability-driven repeater block planning for interconnect-centric floorplanning

Author: Cheng-Kok Koh and P. Sarkar
Subjects: Repeater, Very-large-scale integration, Engineering, business.industry, Distributed computing, Chip, Computer Graphics and Computer-Aided Design, Integrated circuit layout, Floorplan, Embedded system, Hardware_INTEGRATEDCIRCUITS, Electrical and Electronic Engineering, Routing (electronic design automation), business, Cluster analysis, Software, Block (data storage)
Abstract: In this paper, we present a repeater block planning algorithm for interconnect-centric floorplanning. We introduce the concept of independent feasible regions for repeaters and derive an analytical formula for their computation. We develop a routability-driven repeater clustering algorithm to perform repeater block planning based on iterative deletion. The goal is to obtain a high-quality solution for the repeater block locations so that performance-driven interconnect synthesis at the routing stage can be carried out with ease while minimizing the chip area. Experimental results show that our method increases the percentage of all global nets that meet their target delays from 67.5% to 85%. Moreover, our approach minimizes the expected routing congestion, making it easier for performance-driven routers to synthesize global nets that require the insertion of repeaters to meet timing constraints.
Published: 2001
Full Text: View/download PDF

28. Stochastic interconnect modeling, power trends, and performance characterization of 3-D circuits

Author: David B. Janes, Cheng-Kok Koh, Rongtian Zhang, and Kaushik Roy
Subjects: Engineering, Interconnection, business.industry, Circuit performance, Electrical engineering, Interconnect bottleneck, Electronic, Optical and Magnetic Materials, Power (physics), Characterization (materials science), Power consumption, Hardware_INTEGRATEDCIRCUITS, Key (cryptography), Electronic engineering, Electrical and Electronic Engineering, business, Electronic circuit
Abstract: Three-dimensional (3-D) technology promises higher integration density and lower interconnection complexity and delay. At present, however, not much work on circuit applications has been done due to lack of insight into 3-D circuit architecture and performance. One of the purposes of realizing 3-D integration is to reduce the interconnect complexity and delay of two dimensions (2-D), which are widely considered as the barriers to continued performance gains in future technology generations. Thus, understanding the interconnect and its related issues, such as the impact on circuit performance, is key to 3-D circuit applications. In this paper, we present a stochastic 3-D interconnect model and study the impact of 3-D integration on circuit performance and power consumption. To model 3-D interconnect, we divide 3-D wires into two parts (horizontal wires and vertical wires) and derive their stochastic distributions. Based on those distributions, we estimate the delay distribution. We show that 3-D structures effectively reduce the number of long delay nets, significantly reduce the number of repeaters, and dramatically improve circuit performance. With 3-D integration, circuits can be clocked at frequencies much higher (double or even triple) than 2-D.
Published: 2001
Full Text: View/download PDF

29. Interconnect sizing and spacing with consideration of coupling capacitance

Author: J. Cong, L. He, null Cheng-Kok Koh, and null Zhigang Pan
Subjects: Electrical and Electronic Engineering, Computer Graphics and Computer-Aided Design, Software
Published: 2001
Full Text: View/download PDF

30. Interconnect layout optimization under higher order RLC model for MCM designs

Author: Patrick H. Madden, Cheng-Kok Koh, and Jason Cong
Subjects: Router, Delay calculation, Mathematical optimization, Optimization problem, Computer science, Topology optimization, Network topology, Computer Graphics and Computer-Aided Design, Integrated circuit layout, Steiner tree problem, symbols.namesake, Shortest path problem, Hardware_INTEGRATEDCIRCUITS, symbols, Electrical and Electronic Engineering, Routing (electronic design automation), Software
Abstract: In this paper, we study the interconnect layout optimization problem under a higher order resistance-inductance-capacitance model to optimize not only delay, but also waveform for interconnects with nonmonotone signal response in the context of multichip-module global routing. We propose a unified approach that considers topology optimization and waveform optimization simultaneously. Using a new incremental moment-computation algorithm, we interleave topology construction with moment computation to facilitate accurate delay calculation and evaluation of waveform quality. Our algorithm considers a large class of routing topologies, ranging from shortest path Steiner trees to bounded-radius Steiner trees and Steiner routings. We construct a set of required arrival-time Steiner (RATS) trees, providing smooth tradeoffs among signal delay, waveform, and routing area. When combined with the MINOTAUR MCM global router (Cong and Madden, 1998), (Madden, 1998) that we have developed, the RATS-tree solutions prove to be effective in reducing overall routing congestion.
Published: 2001
Full Text: View/download PDF

31. A 3D-point-cloud feature for human-pose estimation

Author: C. S. George Lee, Cheng-Kok Koh, and Kai-Chi Chan
Subjects: Orientation (computer vision), Computer science, business.industry, Feature extraction, Point cloud, Kanade–Lucas–Tomasi feature tracker, Pattern recognition, Tree (data structure), Tree structure, Feature (computer vision), Computer vision, Artificial intelligence, business, Pose
Abstract: Estimating human poses is an important step towards developing robots that can understand human motions and improving their cognitive capabilities. This paper presents a geometric feature for estimating human poses from a 3D point cloud input. The proposed feature can be considered as an extension of the idea of visual features, such as color/edge, of color/grayscale images, and it contains the geometric structure of the point cloud. It is derived by arranging the 3D points into a tree structure, which preserves the global and local properties of the 3D points. Shown experimentally, the tree structure (spatial ordering) is particularly important for estimating human poses (i.e., articulated objects). The 3D orientation (pan, tilt and yaw angles) and shape features are then extracted from each node in the tree to describe the geometric distribution of the 3D points. The proposed feature has been evaluated on a benchmark dataset and compared with two existing geometric features. Experimental results show that the proposed feature has the lowest overall error in human-pose estimation.
Published: 2013
Full Text: View/download PDF

32. Performance optimization of VLSI interconnect layout

Author: Cheng-Kok Koh, Jason Cong, Lei He, and Patrick H. Madden
Subjects: Interconnection, Schedule, Engineering, business.industry, Circuit design, Emphasis (telecommunications), Topology optimization, Topology (electrical circuits), Sizing, Hardware and Architecture, Hardware_INTEGRATEDCIRCUITS, Electronic engineering, Electrical and Electronic Engineering, business, Critical path method, Software
Abstract: This paper presents a comprehensive survey of existing techniques for interconnect optimization during the VLSI physical design process, with emphasis on recent studies on interconnect design and optimization for high-performance VLSI circuit design under the deep submicron fabrication technologies. First, we present a number of interconnect delay models and driver/gate delay models of various degrees of accuracy and efficiency which are most useful to guide the circuit design and interconnect optimization process. Then, we classify the existing work on optimization of VLSI interconnect into the following three categories and discuss the results in each category in detail: (i) topology optimization for high-performance interconnects, including the algorithms for total wire length minimization, critical path length minimization, and delay minimization; (ii) device and interconnect sizing, including techniques for efficient driver, gate, and transistor sizing, optimal wire sizing, and simultaneous topology construction, buffer insertion, buffer and wire sizing; (iii) high-performance clock routing, including abstract clock net topology generation and embedding, planar clock routing, buffer and wire sizing for clock nets, non-tree clock routing, and clock schedule optimization. For each method, we discuss its effectiveness, its advantages and limitations, as well as its computational efficiency. We group the related techniques according to either their optimization techniques or optimization objectives so that the reader can easily compare the quality and efficiency of different solutions.
Published: 1996
Full Text: View/download PDF

33. A Two-Dimensional Domain Decomposition Technique for the Simulation of Quantum-Scale Devices

Author: Stephen F. Cauley, Cheng-Kok Koh, Gerhard Klimeck, and Venkataramanan Balakrishnan
Subjects: Large class, Numerical Analysis, Theoretical computer science, Physics and Astronomy (miscellaneous), Computer science, Applied Mathematics, Computation, Atomistic, Domain decomposition methods, NEGF, Parallel, Computer Science Applications, Computational science, Density of States, Nanoscience and Nanotechnology, Computational Mathematics, Atomic orbital, Modeling and Simulation, Density of states, Spatial simulation, Silicon nanowires, Quantum
Abstract: The simulation of realistically sized devices under the Non-Equilibrium Greens Function (NEGF) formalism typically requires prohibitive amounts of memory and computation time. In order to meet the rising computational challenges associated with quantum-scale device simulation we offer a 2-D domain decomposition technique. This technique is applicable to a large class of atomistic and spatial simulation problems. Considering a decomposition along both the cross section and length of the device, the framework presented in this work ensures efficient distribution of both memory and computation based upon the underlying device structure. As an illustration we stably generate the density of states and transmission, under the NEGF formalism, for the atomistic-based simulation of square 5 nm cross section silicon nanowires consisting of over one million atomic orbitals.
Published: 2012

34. Processor caches built using multi-level spin-transfer torque RAM cells

Author: Weng-Fai Wong, Yi Chen, Hai Li, and Cheng-Kok Koh
Subjects: Set (abstract data type), Hardware_MEMORYSTRUCTURES, Tag RAM, Computer science, Encoding (memory), Spin-transfer torque, Torque, Parallel computing, Static random-access memory, Cache, Chip
Abstract: It has been predicted that a processor's caches could occupy as much as 90% of chip area for technology nodes from the current. In this paper, we study the use of multi-level spin-transfer torque RAM (STT-RAM) cells in the design of processor caches. Compared to the traditional SRAM caches, a multi-level cell (MLC) STT-RAM cache design is denser, fast, and consumes less energy. However, a number of critical issues remains to be solved before MLC STT-RAM technology can be deployed in processor caches. In this paper, we shall offer solutions to the issue of bit encoding as well as tackle the write endurance problem. The latter has been neglected in previous works on STT-RAM caches. We propose a set remapping scheme that can potentially prolong the lifetime of a MLC STT-RAM cache by 80× on average. Furthermore, a method for recovering the performance that may be lost in some applications due to set remapping is introduced.
Published: 2011
Full Text: View/download PDF

35. Guest Editorial: Special Section on Contemporary and Emerging Issues in Physical Design

Author: Cliff Sze and Cheng-Kok Koh
Subjects: Engineering, business.industry, Special section, Mechanical engineering, Engineering ethics, Electrical and Electronic Engineering, Physical design, business, Computer Graphics and Computer-Aided Design, Software
Abstract: The eight papers in this special section highlight several studies on contemporary and emerging issues in physical design.
Published: 2014
Full Text: View/download PDF

36. PEDS: Passivity enforcement for descriptor systems via Hamiltonian-symplectic matrix pencil perturbation

Author: Yuanzhe Wang, Zheng Zhang, Cheng-Kok Koh, Grantham K. H. Pang, and Ngai Wong
Published: 2010
Full Text: View/download PDF

37. How to Improve Your Google Ranking: Myths and Reality

Author: Ao-Jan Su, Cheng-Kok Koh, Y. Charlie Hu, and Aleksandar Kuzmanovic
Subjects: Information retrieval, business.industry, Computer science, media_common.quotation_subject, Machine learning, computer.software_genre, Ranking (information retrieval), Search engine, Ranking, Search engine optimization, Ranking SVM, Web page, The Internet, Learning to rank, Artificial intelligence, business, Function (engineering), computer, media_common
Abstract: Search engines have greatly influenced the way people access information on the Internet as such engines provide the preferred entry point to billions of pages on the Web. Therefore, highly ranked web pages generally have higher visibility to people and pushing the ranking higher has become the top priority for webmasters. As a matter of fact, search engine optimization(SEO) has became a sizeable business that attempts to improve their clients’ ranking. Still, the natural reluctance of search engine companies to reveal their internal mechanisms and the lack of ways to validate SEO’s methods have created numerous myths and fallacies associated with ranking algorithms; Google’sin particular. In this paper, we focus on the Google ranking algorithm and design, implement, and evaluate a ranking system to systematically validate assumptions others have made about this popular ranking algorithm. We demonstrate that linear learning models, coupled with a recursive partitioning ranking scheme, are capable of reverse engineering Google’s ranking algorithm with high accuracy. As an example, we manage to correctly predict 7 out of the top 10 pages for 78% of evaluated keywords. Moreover, for content-only ranking, our system can correctly predict 9 or more pages out of the top 10 ones for 77% of search terms. We show how our ranking system can be used to reveal the relative importance of ranking features in Google’s ranking function, provide guidelines for SEOs and webmasters to optimize their web pages, validate or disapprove new ranking features, and evaluate search engine ranking results for possible ranking bias.
Published: 2010
Full Text: View/download PDF

38. A Parallel Direct Solver for the Simulation of Large-Scale Power/Ground Networks

Author: Venkataramanan Balakrishnan, Cheng-Kok Koh, and Stephen F. Cauley
Subjects: Computer science, Iterative method, Solver, interconnections, Electrical and Computer Engineering, Computer Graphics and Computer-Aided Design, Capacitance, Matrix algebra, Computational science, Inductance, Matrix (mathematics), Electronic engineering, iterative methods, Electrical and Electronic Engineering, network analysis, Software, Sparse matrix
Abstract: We present an algorithm for the fast and accurate simulation of power/ground mesh structures. Our method is a direct (non-iterative) approach for simulation based upon a parallel matrix inversion algorithm. Through the use of additional computational resources, this distributed computing technique facilitates the simulation of large-scale power/ground networks. In addition, the new dimension of flexibility provided by our algorithm allows for a more accurate analysis of power/ground mesh structures using RLC interconnect models. Specifically, we offer a method that employs a sparse approximate inverse technique to consider more reluctance coupling terms for increased accuracy of simulation. The inclusion of additional coupling terms, however, does not lead to an increase in either time or memory requirements associated with the primary computational task in transient simulation, thus making the simulation procedure scalable. The parallel matrix inversion algorithm shows substantial computational improvement over the best known direct and iterative numerical techniques that are applicable to these large-scale simulation problems.
Published: 2009

39. The salvage cache: A fault-tolerant cache architecture for next-generation memory technologies

Author: Cheng-Kok Koh, Weng-Fai Wong, Hai Li, and Yi Chen
Subjects: Random access memory, Hardware_MEMORYSTRUCTURES, CPU cache, business.industry, Computer science, Cache coloring, Cache-only memory architecture, Pipeline burst cache, Parallel computing, Cache pollution, Smart Cache, Tag RAM, Cache invalidation, Bus sniffing, Embedded system, Memory architecture, Page cache, Static random-access memory, Cache, business, Cache algorithms
Abstract: There has been much work on the next generation of memory technologies such as MRAM, RRAM and PRAM. Most of these are non-volatile in nature, and compared to SRAM, they are often denser, just as fast, and have much lower energy consumption. Using 3-D stacking technology, it has been proposed that they can be used instead of SRAM in large level 2 caches prevalent in today's microprocessors. However, one of the key challenges in the use of these technologies, such as MRAM, is their higher fault probabilities arising from the larger process variation, defects in its fabrication, and the fact that the cache is much larger. This seriously affect yield. In this paper, we propose a fault resilient set associative cache architecture which we called the salvage cache. In the salvage cache, a faulty cache block is sacrificed and used to repair faults found in other blocks. We will describe in detail the architecture of the salvage cache as well as provide results of yield simulations that show that a much higher yield can be achieved viz-a-viz other fault tolerant techniques. We will also show the performance savings that arise from the use of a large next-generation L2 cache.
Published: 2009
Full Text: View/download PDF

40. A direct integral-equation solver of linear complexity for large-scale 3D capacitance and impedance extraction

Author: Dan Jiao, Cheng-Kok Koh, and Wenwen Chai
Subjects: Computer aided design, Multiprocessing systems, Three dimensional, Capacitance, CPU time, Solver, System of linear equations, Integral equation, Matrix multiplication, Computational science, Electronic engineering, Digital integrated circuits, Linear equation, Sparse matrix, Mathematics
Abstract: State-of-the-art integral-equation-based solvers rely on techniques that can perform a matrix-vector multiplication in O(N) complexity. In this work, a fast inverse of linear complexity was developed to solve a dense system of linear equations directly for the capacitance extraction of any arbitrary shaped 3D structure. The proposed direct solver has demonstrated clear advantages over state-of-the-art solvers such as FastCap and HiCap; with fast CPU time and modest memory consumption, and without sacrificing accuracy. It successfully inverts a dense matrix that involves more than one million unknowns associated with a large-scale on-chip 3D interconnect embedded in inhomogeneous materials. Moreover, we have successfully applied the proposed solver to full-wave extraction.
Published: 2009
Full Text: View/download PDF

41. A linear-time eigenvalue solver for finite-element-based analysis of large-scale wave propagation problems in on-chip interconnect structures

Author: Venkataramanan Balakrishnan, Cheng-Kok Koh, Jongwon Lee, and Dan Jiao
Subjects: Very-large-scale integration, Eigenvalues and eigenfunctions, Mathematical optimization, Electromagnetics, Computer science, finite element analysis, Solver, Matrix algebra, Matrix multiplication, Finite element method, VLSI, Computational science, Computer Science::Hardware Architecture, integrated circuit interconnections, Computational electromagnetics, Time complexity, Eigenvalues and eigenvectors
Abstract: In this paper the analysis and design of next-generation VLSI circuits using accurate electromagnetics-based models result in numerical problems of very large scale is presented. Typically, the solution of a problem with N parameters requires at least O(N) computation. With next generation VLSI circuits, however, even O(N) is prohibitively high since N is very large. The method that partially addresses this issue was developed for full-wave modeling of large-scale interconnect structures. In this method, a number of seeds (a seed has a unique cross section) are first recognized from an interconnect structure. In each seed, the original wave propagation problem is represented as a generalized eigenvalue problem. The complexity of solving 3D interconnects of O(N) is then overcome by seeking the solution of a few 2D seeds, which is then post-processed to obtain the solution of the original 3D problem through the development of an on-chip mode-matching technique. The computational bottleneck is the solution of a generalized eigenvalue problem. Efficient algorithms such as ARPACK [2] still require O(M2) storage and operations due to a dense matrix-vector multiplication. We present an algorithm that provides a solution to the generalized eigenvalue problem with O(M) complexity, thus paving the way for the full-wave simulation of next generation VLSI circuits.
Published: 2008
Full Text: View/download PDF

42. A performance and power co-optimization approach for modern processors

Author: Yongxin Zhu, Cheng-Kok Koh, and Weng-Fai Wong
Subjects: Flexibility (engineering), Computer architecture, Computer engineering, Computer science, Processor design, Workload, Space (commercial competition), Power (physics)
Abstract: In embedded systems, performance and power are important inter-related issues that cannot be decoupled. Expensive and extensive simulations in a processor design space are usually required to verify whether a design meets both performance and power requirements. In this paper, an analytical co-optimization approach based on an integrated workload, performance and power model for modern processors is described and studied. A design space consisting of more than 15 architectural and workload parameters can be quickly explored for co-optimization. Validation with measured results obtained from simulators as well as physical processors showed that the model has a good degree of accuracy. We shall describe the details of approach and the model, and show how to apply the approach to the problem of co-optimizing the power and performance of processor design. With the completeness, flexibility and efficiency, our approach provides clear insights into the tradeoffs of designs for performance and power.
Published: 2005
Full Text: View/download PDF

43. A fast Newton/Smith algorithm for solving algebraic Riccati equations and its application in model order reduction

Author: Cheng-Kok Koh, Tung-Sang Ng, Venkataramanan Balakrishnan, and Ngai Wong
Subjects: Model order reduction, Iterative method, MathematicsofComputing_NUMERICALANALYSIS, Reduction (complexity), Algebraic equation, symbols.namesake, Rate of convergence, ComputingMethodologies_SYMBOLICANDALGEBRAICMANIPULATION, Singular value decomposition, symbols, Riccati equation, Newton's method, Algorithm, Mathematics
Abstract: A very fast Smith-method-based Newton algorithm is introduced for the solution of large-scale continuous-time algebraic Riccati equations (CAREs). When the CARE contains low-rank matrices, as is common in the modeling of physical systems, the proposed algorithm, called the Newton/Smith CARE or NSCARE algorithm, offers significant computational savings over conventional CARE solvers. The effectiveness of the algorithm is demonstrated in the context of VLSI model order reduction, wherein stochastic balanced truncation (SBT) is used to reduce large-scale passive circuits. It is shown that the NSCARE algorithm exhibits guaranteed quadratic convergence under mild assumptions. Moreover, two large-sized matrix factorizations and one large-scale singular value decomposition (SVD), necessary for SBT, can be omitted by utilizing the Smith method output in each Newton iteration, thereby significantly speeding up the model reduction process.
Published: 2004
Full Text: View/download PDF

44. A metric for analyzing effective on-chip inductive coupling

Author: Cheng-Kok Koh, Kaushik Roy, and Guoan Zhong
Subjects: TheoryofComputation_MISCELLANEOUS, Inductance, Matrix (mathematics), Computer science, Spice, Metric (mathematics), Equivalent series inductance, Electronic engineering, Shields, Derivation of self inductance, Topology, Inductive coupling
Abstract: In this paper, we propose a metric for effective inductive coupling: the matrix (R + j/spl omega/L)/sup -1/, where R and L are the resistance and inductance matrices. We use this metric to analyze the effectiveness of shields on reducing inductive coupling. Our analysis shows how the resistances of shields affect the effective inductive coupling between signal nets. SPICE simulations are carried out to validate the proposed metric.
Published: 2003
Full Text: View/download PDF

45. Guest Editorial Special Section on the 2011 International Symposium on Physical Design

Author: Cheng-Kok Koh and Jiang Hu
Subjects: Engineering, business.industry, Electrical engineering, Special section, Electrical and Electronic Engineering, Physical design, business, Computer Graphics and Computer-Aided Design, Software, Construction engineering
Abstract: The eight papers in this special section are extended versions of papers presented at the 2011 International Symposium on Physical Design (ISPD 2011), held in Santa Barbara, CA.
Published: 2012
Full Text: View/download PDF

46. Distributed non-equilibrium Green’s function algorithms for the simulation of nanoelectronic devices with scattering

Author: Stephen F. Cauley, Gerhard Klimeck, Venkataramanan Balakrishnan, Cheng-Kok Koh, and Mathieu Luisier
Subjects: Physics, Matrix (mathematics), Current (mathematics), Distribution (number theory), Scattering, Computation, General Physics and Astronomy, Function (mathematics), Computational problem, Topology, Representation (mathematics)
Abstract: Through the Non-Equilibrium Green's Function (NEGF) formalism, quantum-scale device simulation can be performed with the inclusion of electron-phonon scattering. However, the simulation of realistically sized devices under the NEGF formalism typically requires prohibitive amounts of memory and computation time. Two of the most demanding computational problems for NEGF simulation involve mathematical operations with structured matrices called semiseparable matrices. In this work, we present parallel approaches for these computational problems which allow for efficient distribution of both memory and computation based upon the underlying device structure. This is critical when simulating realistically sized devices due to the aforementioned computational burdens. First, we consider determining a distributed compact representation for the retarded Green's function matrix $G^{R}$. This compact representation is exact and allows for any entry in the matrix to be generated through the inherent semiseparable structure. The second parallel operation allows for the computation of electron density and current characteristics for the device. Specifically, matrix products between the distributed representation for the semiseparable matrix $G^{R}$ and the self-energy scattering terms in $\Sigma^{
Published: 2011
Full Text: View/download PDF

47. Corrections to 'Exact and numerically stable closed-form expressions for potential coefficients of rectangular conductors'

Author: Jitesh Jain, Cheng-Kok Koh, and Venkataramanan Balakrishnan
Subjects: Computer science, Mathematical analysis, Electronic engineering, Electrical and Electronic Engineering, Closed-form expression, Graphics, Electrical conductor, Electronic circuit
Published: 2007
Full Text: View/download PDF

48. From O(k²N) to O(N): A Fast and High-Capacity Eigenvalue Solver for Full-Wave Extraction of Very Large Scale On-Chip Interconnects.

Author: Jongwon Lee, Balakrishnan, Venkataramanan, Cheng-Kok Koh, and Dan Jiao
Subjects: COMPUTATIONAL complexity, MICROPROCESSORS, EIGENVALUES, ITERATIVE methods (Mathematics), METHODOLOGY
Abstract: The wave-propagation problem in an on-chip interconnect network can be modeled as a generalized eigenvalue problem. For solving such a generalized eigenvalue problem, the computational complexity of Arnoldi iteration is at best O ( k² N), where k is the number of dominant eigenvalues and N is the matrix size. In this paper, we reduce the computational complexity of the Arnoldi iteration for interconnectextraction from O ( k² N) to O(N), thus paving the way for full-wave extraction of very large scale on-chip interconnects, of which a typical value of k is on the order of hundreds of thousands. Numerical and experimental results have demonstrated the accuracy and efficiency of the proposed fast eigenvalue solver. [ABSTRACT FROM AUTHOR]
Published: 2009
Full Text: View/download PDF

49. A Linear-Time Complex-Valued Eigenvalue Solver for Full-Wave Analysis of Large-Scale On-Chip Interconnect Structures.

Author: Jongwon Lee, Balakrishnan, Venkataramanan, Cheng-Kok Koh, and Dan Jiao
Subjects: EIGENVALUES, FINITE element method, WAVE analysis, INTEGRATED circuit interconnections, MATRICES (Mathematics), COMPUTATIONAL complexity
Abstract: This paper proposes a linear-time complex-valued eigenvalue solver for solving large-scale on-chip interconnect problems. The fast eigenvalue solution is achieved by eigenvalue clustering, fast system reduction with negligible computational cost, and fast linear-time solution of the reduced system. Numerical and experimental results are presented to demonstrate the accuracy and efficiency of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2009
Full Text: View/download PDF

50. Routability-Driven Placement and White Space Allocation.

Author: Chen Li, Min Xie, Cheng-Kok Koh, Cong, Jason, and Madden, Patrick H.
Subjects: AUTOMATION, ROUTING (Computer network management), PHYSICAL distribution of goods, INTEGRATED circuits, INTEGRATED circuit interconnections
Abstract: We present a two-stage congestion-driven placement flow. First, during each refinement stage of our multilevel global placement framework, we replace cells based on the wirelength weighted by congestion level to reduce the routing demands of congested regions. Second, after the global placement stage, we allocate appropriate amounts of white space into different regions of the chip according to a congestion map by shifting cut lines in a top-down fashion and apply a detailed placer to legalize the placement and further reduce the half-perimeter wirelength while preserving the distribution of white space. Experimental results show that our placement flow can achieve the best routability with the shortest routed wirelength among publicly available placement tools on IBM v2 benchmarks. Our placer obtains 100% successful routings on 16 IBM v2 benchmarks with shorter routed wirelengths by 3.1% to 24.5% compared to other placement tools. Moreover, our white space allocation approach can significantly improve the routability of placements generated by other placement tools. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

74 results on '"Cheng-Kok Koh"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources