您的位置: 专家智库 > >

国家自然科学基金(s61170049)

作品数:1 被引量:0H指数:0
发文基金:国家自然科学基金更多>>
相关领域:理学自动化与计算机技术更多>>

文献类型

  • 1篇中文期刊文章

领域

  • 1篇自动化与计算...
  • 1篇理学

主题

  • 1篇动力学
  • 1篇异构
  • 1篇异构系统
  • 1篇图形处理单元
  • 1篇分子
  • 1篇分子动力学
  • 1篇OFF
  • 1篇FAST
  • 1篇MOLECU...

传媒

  • 1篇Tsingh...

年份

  • 1篇2012
1 条 记 录,以下是 1-1
排序方式:
Fast Parallel Cutoff Pair Interactions for Molecular Dynamics on Heterogeneous Systems
2012年
Heterogeneous systems with both Central Processing Units (CPUs) and Graphics Processing Units (GPUs) are frequently used to accelerate short-ranged Molecular Dynamics (MD) simulations. The most time-consuming task in short-ranged MD simulations is the computation of particle-to-particle interactions. Beyond a certain distance, these interactions decrease to zero. To minimize the operations to investigate distance, previous works have tiled interactions by employing the spatial attribute, which increases the memory access and GPU computations, hence decreasing performance. Other studies ignore the spatial attribute and construct an all-versus-all interaction matrix, which has poor scalability. This paper presents an improved algorithm. The algorithm first bins particles into voxels according to the spatial attributes, and then tiles the all-versus-all matrix into voxel-versus-voxel sub-matrixes. Only the sub-matrixes between neighboring voxels are computed on the GPU. Therefore, the algorithm reduces the distance examine operations and limits additional memory access and GPU computations. This paper also adopts a multi-level programming model to implement the algorithm on multi-nodes of Tianhe-lA. By employing (1) a patch design to exploit parallelism across the simulation domain, (2) a communication overlapping method to overlap the communications between CPUs and GPUs, and (3) a dynamic workload balancing method to adjust the workloads among compute nodes, the implementation achieves a speedup of 4.16x on one NVIDIA Tesla M2050 GPU compared to a 2.93 GHz six-core Intel Xeon X5670 CPU. In addition, it runs 2.41x faster on 256 compute nodes of Tianhe-lA (with two CPUs and one GPU inside a node) than on 256 GPU-excluded nodes.
Qiang WuCanqun YangTao TangKai Lu
关键词:分子动力学异构系统图形处理单元
共1页<1>
聚类工具0