公共文化服务平台

共 3 条记录，以下是 1-4

全选清除导出

排序方式：

基于集成算法的半监督学习研究: 半监督学习与集成学习是两种重要的机器学习范式.半监督学习是通过利用和挖掘未标记样本来提高学习分类器性能,同时集成学习则是通过使用多分类器组合来进一步提升学习分类器的泛化性能.值得注意的是,两种不同的范式几乎是并行发展,因...; 葛荐马廷淮; 关键词：半监督学习 TRI-TRAINING; 文献传递

KNN分类算法的MapReduce并行化实现被引量：21: 2013年; 为了提高k-nearest neighbor algorithm(KNN)算法处理大数据集的能力,本文利用Map Reduce并行编程模型,同时结合KNN算法自身的特点,给出了KNN算法在Hadoop平台下的并行化实现。通过设计Map、Combine和Reduce 3个函数,实现了KNN算法的并行化。Map函数完成每个测试样本与训练样本之间的相似度计算,Combine函数作为一个本地的Reduce操作,用以减少中间计算量及通信开销,Reduce函数则根据上述函数得到的中间结果计算出k近邻并作出分类判断。实验结果表明:较之以往的单机版方法,在Hadoop集群上实现的并行化KNN算法具有较好的加速比和良好的扩展性。; 闫永刚马廷淮王建; 关键词：KNN分类并行计算 MAPREDUCE模型 HADOOP

基于信息丰富度的切碎中文文档自动拼接复原被引量：5: 2015年; 针对切碎中文文档的自动拼接复原中无法利用碎纸片形状特征的问题,提出一种基于内容信息丰富度的拼接算法.首先分析了基于汉字内容的碎纸片特征表达方式;在此基础上,提出从横纵2个方面进行碎纸片特征匹配度估计的方法;最后采用信息丰富度确定拼接次序,逐一高效地完成碎纸片的拼接.基于不同碎纸片数量的匹配实验结果表明,相对于传统方法,横纵特征匹配度估计方法分别提高了约4.73%,3.76%的准确度;自动拼接复原实验结果表明,相对于传统算法,基于信息丰富度拼接算法的错误率下降约18%,并大大降低了时间复杂度.; 赵波周宇张正宇那莹马廷淮; 关键词：中文文档自动拼接算法

Improved locality-sensitive hashing method for the approximate nearest neighbor problem: 2014年; In recent years, the nearest neighbor search （NNS） problem has been widely used in various interesting applications. Locality-sensitive hashing （LSH）, a popular algorithm for the approximate nearest neighbor problem, is proved to be an efficient method to solve the NNS problem in the high-dimensional and large-scale databases. Based on the scheme of p-stable LSH, this paper introduces a novel improvement algorithm called randomness-based locality-sensitive hashing （RLSH） based on p-stable LSH. Our proposed algorithm modifies the query strategy that it randomly selects a certain hash table to project the query point instead of mapping the query point into all hash tables in the period of the nearest neighbor query and reconstructs the candidate points for finding the nearest neighbors. This improvement strategy ensures that RLSH spends less time searching for the nearest neighbors than the p-stable LSH algorithm to keep a high recall. Besides, this strategy is proved to promote the diversity of the candidate points even with fewer hash tables. Experiments are executed on the synthetic dataset and open dataset. The results show that our method can cost less time consumption and less space requirements than the p-stable LSH while balancing the same recall.; 陆颖华马廷淮钟水明曹杰王新Abdullah Al-Dhelaane

全选清除导出

共1页<1>

国家自然科学基金(61173143)