Search

Your search keyword '"computer architecture"' showing total 658 results

Search Constraints

Start Over You searched for: Descriptor "computer architecture" Remove constraint Descriptor: "computer architecture" Journal ieee transactions on parallel & distributed systems Remove constraint Journal: ieee transactions on parallel & distributed systems
658 results on '"computer architecture"'

Search Results

1. Floating Point Calculation of the Cube Function on FPGAs.

2. Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors.

3. swMPAS-A: Scaling MPAS-A to 39 Million Heterogeneous Cores on the New Generation Sunway Supercomputer.

4. A Novel Compute-Efficient Tridiagonal Solver for Many-Core Architectures.

5. $TC-Stream$ T C - S t r e a m : Large-Scale Graph Triangle Counting on a Single Machine Using GPUs.

6. Auto-GNAS: A Parallel Graph Neural Architecture Search Framework.

7. Predicting Throughput of Distributed Stochastic Gradient Descent.

8. ReHy: A ReRAM-Based Digital/Analog Hybrid PIM Architecture for Accelerating CNN Training.

9. Heterogeneous Systolic Array Architecture for Compact CNNs Hardware Accelerators.

10. Critique of “MemXCT: Memory-Centric X-Ray CT Reconstruction With Massive Parallelization” by SCC Team From the University of Texas at Austin.

11. Adaptive Resource Efficient Microservice Deployment in Cloud-Edge Continuum.

12. Efficient and Automated Deployment Architecture for OpenStack in TianHe SuperComputing Environment.

13. Exploring Data Analytics Without Decompression on Embedded GPU Systems.

14. SaPus: Self-Adaptive Parameter Update Strategy for DNN Training on Multi-GPU Clusters.

15. Compiler-Assisted Compaction/Restoration of SIMD Instructions.

16. Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution.

17. Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From National Tsing Hua University.

18. Overlapping Communication With Computation in Parameter Server for Scalable DL Training.

19. Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods.

20. A Hybrid Fuzzy Convolutional Neural Network Based Mechanism for Photovoltaic Cell Defect Detection With Electroluminescence Images.

21. A Distributed Framework for EA-Based NAS.

22. iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures.

23. Accelerating Federated Learning Over Reliability-Agnostic Clients in Mobile Edge Computing Systems.

24. Reproducibility: Performance Evaluation of MemXCT on Azure CycleCloud Platform.

25. Middleware to Manage Fault Tolerance Using Semi-Coordinated Checkpoints.

26. Towards Higher Performance and Robust Compilation for CGRA Modulo Scheduling.

27. CURE: A High-Performance, Low-Power, and Reliable Network-on-Chip Design Using Reinforcement Learning.

28. Approximate NoC and Memory Controller Architectures for GPGPU Accelerators.

29. Thread-Level Locking for SIMT Architectures.

30. gMig: Efficient vGPU Live Migration with Overlapped Software-Based Dirty Page Verification.

31. cuTensor-Tubal: Efficient Primitives for Tubal-Rank Tensor Learning Operations on GPUs.

32. Exploring New Opportunities to Defeat Low-Rate DDoS Attack in Container-Based Cloud Environment.

33. FeatherCNN: Fast Inference Computation with TensorGEMM on ARM Architectures.

34. Optimizing Finite Volume Method Solvers on Nvidia GPUs.

35. ISEE: An Intelligent Scene Exploration and Evaluation Platform for Large-Scale Visual Surveillance.

36. SEALDB: An Efficient LSM-tree Based KV Store on SMR Drives with Sets and Dynamic Bands.

37. Coordinated DMA: Improving the DRAM Access Efficiency for Matrix Multiplication.

38. JSensor: A Parallel Simulator for Huge Wireless Sensor Networks Applications.

39. Parallelizing Word2Vec in Shared and Distributed Memory.

40. A Performance Model for GPU Architectures that Considers On-Chip Resources: Application to Medical Image Registration.

41. An Efficient Hybrid I/O Caching Architecture Using Heterogeneous SSDs.

42. A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks.

43. Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer.

44. Portable Programming with RAPID.

45. Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture.

46. Improving the Performance and Energy Efficiency of GPGPU Computing through Integrated Adaptive Cache Management.

47. Hardware Accelerated Semantic Declarative Memory Systems through CUDA and MapReduce.

48. A Virtual Multi-Channel GPU Fair Scheduling Method for Virtual Machines.

49. Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory.

50. PHAST - A Portable High-Level Modern C++ Programming Library for GPUs and Multi-Cores.

Catalog

Books, media, physical & digital resources