公共文化服务平台

国家自然科学基金(61003075): 作品数：6 被引量：6H指数：2; 相关作者：张英李永进周宏伟张丽霞陈超更多>>; 相关机构：国防科学技术大学湖南师范大学更多>>; 发文基金：国家自然科学基金国家高技术研究发展计划国家科技重大专项更多>>; 相关领域：自动化与计算机技术轻工技术与工程电气工程更多>>

Improving vertex-frontier based GPU breadth-first search: 2014年; Breadth-first search(BFS) is an important kernel for graph traversal and has been used by many graph processing applications. Extensive studies have been devoted in boosting the performance of BFS. As the most effective solution, GPU-acceleration achieves the state-of-the-art result of 3.3×109 traversed edges per second on a NVIDIA Tesla C2050 GPU. A novel vertex frontier based GPU BFS algorithm is proposed, and its main features are three-fold. Firstly, to obtain a better workload balance for irregular graphs, a virtual-queue task decomposition and mapping strategy is introduced for vertex frontier expanding. Secondly, a global deduplicate detection scheme is proposed to remove reduplicative vertices from vertex frontier effectively. Finally, a GPU-based bottom-up BFS approach is employed to process large frontier. The experimental results demonstrate that the algorithm can achieve 10% improvement over the state-of-the-art method on diverse graphs. Especially, it exhibits 2-3 times speedup on low-diameter and scale-free graphs over the state-of-the-art on a NVIDIA Tesla K20 c GPU, reaching a peak traversal rate of 11.2×109 edges/s.; 杨博卢凯高颖慧徐凯王小平程志权; 关键词：GPU

面向片上网络的多播吞吐率和能量模型: 片上网络逐渐成为片上众核中非常有前景的互连方式。基于目录的Cache一致性协议的维护需要片上互连网络高效的支持多播。在借鉴单播的网络吞吐率模型基础上,建立了面向多播的网络吞吐率模型和体系结构级的能量模型。相对于传统的多播...; 齐树波蒋江李晋文张民选; 关键词：片上网络多播路由算法; 文献传递

SPICE modeling of memristors with multilevel resistance states: 2012年; With CMOS technologies approaching the scaling ceiling, novel memory technologies have thrived in recent years, among which the memristor is a rather promising candidate for future resistive memory （RRAM）. Memristor＇s potential to store multiple bits of information as different resistance levels allows its application in multilevel cell （MCL） tech- nology, which can significantly increase the memory capacity. However, most existing memristor models are built for binary or continuous memristance switching. In this paper, we propose the simulation program with integrated circuits emphasis （SPICE） modeling of charge-controlled and flux-controlled memristors with multilevel resistance states based on the memristance versus state map. In our model, the memristance switches abruptly between neighboring resistance states. The proposed model allows users to easily set the number of the resistance levels as parameters, and provides the predictability of resistance switching time if the input current/voltage waveform is given. The functionality of our models has been validated in HSPICE. The models can be used in multilevel RRAM modeling as well as in artificial neural network simulations.; 方旭东唐玉华吴俊杰; 关键词：MEMRISTOR

一种16位CMOS数字可配置环形压控振荡器: 本文提出一种16位CMOS数字可配置环形压控振荡器。该数字可配置环形压控振荡器结构由NMOS晶体管开关阵列组成,输入数字控制字通过控制NMOS晶体管的导通与关断,改变驱动电流的大小,对电路进行重新配置,从而控制输出信号的...; 周少华李思昆姚利俊; 文献传递

板级高速传输总线链路层关键技术研究与实现被引量：2: 2011年; 随着高性能服务器和超大规模计算机的发展,系统设计者对板上高速互连总线的要求越来越高,如何使芯片间的数据传输延迟更小,提高计算通信比是需要解决的重要问题。论文研究了近年来发展迅速的超传输总线和PCI Express总线的链路层的特点,在此基础上提出了一种64位高速总线链路层体系结构,并对其关键技术进行了研究,设计实现了一种能够每时钟周期对16位数据进行加解扰的加解扰器,以及能够纠正链路间最大5个时钟周期延迟偏斜的线间传输延迟偏斜纠正器,功能验证结果表明所提出的设计功能正确。; 周宏伟陈超张丽霞张英李永进; 关键词：传输总线链路层加解扰

片上RF互连技术在NoC中应用研究综述: 片上RF互连技术(RF-Interconnect)由于其数据传输速度快,功耗低,带宽高,灵活的可重构特性等优势,成为当前互连技术研究的热点。片上RF互连技术以射频微波信号的低损耗和近场电容耦合为基础,数据使用调幅和调相的...; 周少华沈剑良李思昆姚利俊; 文献传递

Fast image matching algorithm based on affine invariants: 2014年; Feature-based image matching algorithms play an indispensable role in automatic target recognition （ATR）. In this work, a fast image matching algorithm （FIMA） is proposed which utilizes the geometry feature of extended centroid （EC） to build affine invariants. Based on at-fine invariants of the length ratio of two parallel line segments, FIMA overcomes the invalidation problem of the state-of-the-art algorithms based on affine geometry features, and increases the feature diversity of different targets, thus reducing misjudgment rate during recognizing targets. However, it is found that FIMA suffers from the parallelogram contour problem and the coincidence invalidation. An advanced FIMA is designed to cope with these problems. Experiments prove that the proposed algorithms have better robustness for Gaussian noise, gray-scale change, contrast change, illumination and small three-dimensional rotation. Compared with the latest fast image matching algorithms based on geometry features, FIMA reaches the speedup of approximate 1.75 times. Thus, FIMA would be more suitable for actual ATR applications.; 张毅卢凯高颖慧; 关键词：ROBUSTNESS PERFORMANCE

Aware conflict detection of non-uniform memory access system and prevention for transactional memory被引量：3: 2012年; Most transactional memory (TM) research focused on multi-core processors, and others investigated at the clusters, leaving the area of non-uniform memory access (NUMA) system unexplored. The existing TM implementations made significant performance degradation on NUMA system because they ignored the slower remote memory access. To solve this problem, a latency-based conflict detection and a forecasting-based conflict prevention method were proposed. Using these techniques, the NUMA aware TM system was presented. By reducing the remote memory access and the abort rate of transaction, the experiment results show that the NUMA aware strategies present good practical TM performance on NUMA system.; 王睿伯卢凯卢锡城

板级高速传输总线链路层关键技术研究与实现: 随着高性能服务器和超大规模计算机的发展,系统设计者对板上高速互连总线的要求越来越高,如何使芯片间的数据传输延迟更小,提高计算通信比是需要解决的重要问题。本文研究了近年来发展迅速的超传输总线和PCI Express总线的链...; 周宏伟陈超张丽霞张英李永进; 关键词：传输总线链路层加解扰; 文献传递

GPU acceleration of subgraph isomorphism search in large scale graph被引量：1: 2015年; A novel framework for parallel subgraph isomorphism on GPUs is proposed, named GPUSI, which consists of GPU region exploration and GPU subgraph matching. The GPUSI iteratively enumerates subgraph instances and solves the subgraph isomorphism in a divide-and-conquer fashion. The framework completely relies on the graph traversal, and avoids the explicit join operation. Moreover, in order to improve its performance, a task-queue based method and the virtual-CSR graph structure are used to balance the workload among warps, and warp-centric programming model is used to balance the workload among threads in a warp. The prototype of GPUSI is implemented, and comprehensive experiments of various graph isomorphism operations are carried on diverse large graphs. The experiments clearly demonstrate that GPUSI has good scalability and can achieve speed-up of 1.4–2.6 compared to the state-of-the-art solutions.; 杨博卢凯高颖慧王小平徐凯; 关键词：GPU

国家自然科学基金(61003075)

文献类型

领域

主题

机构

作者

传媒

年份

用户反馈

国家自然科学基金(61003075)

文献类型

领域

主题

机构

作者

传媒

年份

用户登录

用户反馈