Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important goal.Existing data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time(NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost.
Network embedding is a very important task to represent the high-dimensional network in a lowdimensional vector space,which aims to capture and preserve the network structure.Most existing network embedding methods are based on shallow models.However,actual network structures are complicated which means shallow models cannot obtain the high-dimensional nonlinear features of the network well.The recently proposed unsupervised deep learning models ignore the labels information.To address these challenges,in this paper,we propose an effective network embedding method of Structural Labeled Locally Deep Nonlinear Embedding(SLLDNE).SLLDNE is designed to obtain highly nonlinear features through utilizing deep neural network while preserving the label information of the nodes by using a semi-supervised classifier component to improve the ability of discriminations.Moreover,we exploit linear reconstruction of neighborhood nodes to enable the model to get more structural information.The experimental results of vertex classification on two real-world network datasets demonstrate that SLLDNE outperforms the other state-of-the-art methods.
隐朴素贝叶斯(Hidden Naive Bayes,HNB)算法是一种结构扩展后的朴素贝叶斯分类改进算法,其分类精确率较原算法有了很大的提高,但是在分类过程中,HNB算法没有考虑测试实例的各个特征属性的不同取值对分类的贡献程度。针对这个问题,构建相应的加权函数计算各个特征属性取不同值时对分类的贡献程度,并利用得到的结果对HNB算法中用到的条件概率计算公式加权,得到了一个改进的HNB算法,然后利用加利福尼亚大学的埃文斯标准数据集(University of California Irvine,UCI)在Eclipse上对其进行数值实验,结果表明,改进的HNB算法较原始HNB算法的分类精确率有了较大提高。
平均单一依赖估计算法(averaged one-dependence estimators,AODE)是通过放松朴素贝叶斯算法的假设条件得到的一种更加高效的分类算法,但AODE算法将所有父属性对分类的贡献程度看成是一样的,这使得AODE算法的分类效果受到限制。针对这个问题,利用相关系数Tau-y和Lambda-y分别计算各个特征属性对分类的贡献程度,并用计算结果对父属性加权,得到了两个改进的AODE算法:T-AODE和L-AODE算法。然后,利用加利福尼亚大学的埃文斯(University of California Irvine,UCI)标准数据集在Eclipse上对这两个算法进行分类实验,结果显示两个改进的AODE算法的精确度要优于原始AODE算法。