108 results on '"Tze Meng Low"'
Search Results
2. SMaLL: Software for Rapidly Instantiating Machine Learning Libraries.
3. Exploiting Fusion Opportunities in Linear Algebraic Graph Query Engines.
4. Families of Butterfly Counting Algorithms for Bipartite Graphs.
5. Modeling Matrix Engines for Portability and Performance.
6. SMaLL: A Software Framework for portable Machine Learning Libraries.
7. Delayed Asynchronous Iterative Graph Algorithms.
8. Linear Algebraic Louvain Method in Python.
9. 3D Coded SUMMA: Communication-Efficient and Robust Parallel Matrix Multiplication.
10. Evaluation of Graph Analytics Frameworks Using the GAP Benchmark Suite.
11. Towards an Objective Metric for the Performance of Exact Triangle Count.
12. qLD: High-performance Computation of Linkage Disequilibrium on CPU and GPU.
13. Delta-Stepping SSSP: From Vertices and Edges to GraphBLAS Implementations.
14. A Portable GPU Framework for SNP Comparisons.
15. Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization.
16. Analytical cache modeling and tilesize optimization for tensor contractions.
17. Linear algebraic depth-first search.
18. Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU.
19. Exploiting Symmetries of Small Prime-Sized DFTs.
20. A Flexible Framework for Multidimensional DFTs.
21. Addressing Unreliability in Emerging Devices and Non-von Neumann Architectures Using Coded Computing.
22. Reformulating the direct convolution for high-performance deep learning inference on ARM processors.
23. Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems.
24. Masterless Coded Computing: A Fully-Distributed Coded FFT Algorithm.
25. FFTX and SpectralPack: A First Look.
26. PageRank Acceleration for Large Graphs with Scalable Hardware and Two-Step SpMV.
27. Linear Algebraic Formulation of Edge-centric K-truss Algorithms with Adjacency Matrices.
28. A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes.
29. High Performance Zero-Memory Overhead Direct Convolutions.
30. Fusing Non Element-wise Layers in DNNs.
31. A Family of Provably Correct Algorithms for Exact Triangle Counting.
32. High Assurance Code Generation for Cyber-Physical Systems.
33. First look: Linear algebra-based triangle counting without matrix multiplication.
34. Mixed data layout kernels for vectorized complex arithmetic.
35. Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU.
36. SPIRAL: Extreme Performance Portability.
37. Efficient Computation of Linkage Disequilibria as Dense Linear Algebra Operations.
38. Compilers, hands-off my hands-on optimizations.
39. A scale-free structure for power-law graphs.
40. CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors.
41. A Flexible Framework for Parallel Multi-Dimensional DFTs.
42. Enabling portable energy efficiency with memory accelerated library.
43. Optimizing Space Time Adaptive Processing through accelerating memory-bounded operations.
44. The BLIS Framework: Experiments in Portability.
45. Analytical Modeling Is Enough for High-Performance BLIS.
46. Coded FFT and Its Communication Overhead.
47. A Unified Coded Deep Neural Network Training Strategy Based on Generalized PolyDot Codes for Matrix Multiplication.
48. Zoom Out: Abstractions for Efficient Radar Algorithms on COTS architectures
49. quickLD: An efficient software for linkage disequilibrium analyses
50. Extracting SMP parallelism for dense linear algebra algorithms from high-level specifications.
Catalog
Books, media, physical & digital resources
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.