目前基于机器学习的中文语义角色标注(Semantic Role Labeling,SRL)方法大致可以分为两类:基于深层句法分析的方法和基于浅层句法分析的方法.由于基于这两种方法的SRL系统在性能和健壮性上各有优缺点,本文试图联合基于这两种方法的SRL系统的输出,通过一些全局特征训练出联合模型,对候选角色进行过滤,然后解决不满足句子论元结构限制的冲突角色得到最终标注结果,来提高标注的性能.在Chinese PropBank 1.0语料集上,联合模型的F值达到了78.41%,在基于深层句法分析的SRL的F值67.34%和基于浅层句法分析的SRL的F值71.67%基础上有了显著的提高,从而证明我们的联合方法是非常有效的.
Automatic thread labeling for news events can help people know different aspects of a news event. In this paper, we present a method to label threads of a news event. We use latent Dirichlet allocation (LDA) topic model to extract news threads from news corpus. Our method first selects the thread words subset then extracts phrases based on co-occurrence calculation. The extracted phrase is then used as a label of a news thread. Experimental results show that about 60% of generated labels visualize the meaningful aspects of a news event. These labels can help people fast to capture many different aspects of a news event.