本文共 3577 字,大约阅读时间需要 11 分钟。
hanlp好处的,就是它的data字典比较齐全.
github上有国人写hanlp支持es的插件
1下载它的安装release包下载发现解压按它的安装要求总找不到hanlp.properties文件
将源码git下来,发现路径有问题.
package org.elasticsearch.index.analysis;
import com.hankcs.hanlp.HanLP;
import com.hankcs.hanlp.utility.Predefine;import com.hankcs.lucene4.HanLPIndexAnalyzer;import org.elasticsearch.common.inject.Inject;import org.elasticsearch.common.inject.assistedinject.Assisted;import org.elasticsearch.common.settings.Settings;import org.elasticsearch.env.Environment;import org.elasticsearch.index.IndexSettings;/**
*/public class HanLPAnalyzerProvider extends AbstractIndexAnalyzerProvider {private final HanLPIndexAnalyzer analyzer;private static String sysPath = String.valueOf(System.getProperties().get("user.dir"));@Injectpublic HanLPAnalyzerProvider(IndexSettings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) { super(indexSettings, name, settings); //原来路径 //Predefine.HANLP_PROPERTIES_PATH = sysPath.substring(0, sysPath.length()-4) + "/plugins/analysis-hanlp/hanlp.properties"; //修改后正确路径 Predefine.HANLP_PROPERTIES_PATH = sysPath + "/plugins/analysis-hanlp/hanlp.properties"; analyzer = new HanLPIndexAnalyzer(true);}public static HanLPAnalyzerProvider getIndexAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) { return new HanLPAnalyzerProvider(indexSettings, env, name, settings);}public static HanLPAnalyzerProvider getSmartAnalyzerProvider(IndexSettings indexSettings, Environment env, String name, Settings settings) { return new HanLPAnalyzerProvider(indexSettings, env, name, settings);}@Overridepublic HanLPIndexAnalyzer get() { return this.analyzer;}
}
因为它的hanlp版本是1.2.8,最新版本是1.5.4
修改pom.xml为com.hankcs hanlp portable-1.5.4
打包编译
在$ES_HOME下/plugins建立analysis-hanlp文件
目录下结构为hanlp.properties属性(可以直接从 的realease下载修改root路径就行了)
root=/opt/elasticsearch-5.5.1/plugins/analysis-hanlp
CoreDictionaryPath=data/dictionary/CoreNatureDictionary.txt
BiGramDictionaryPath=data/dictionary/CoreNatureDictionary.ngram.txt
CoreStopWordDictionaryPath=data/dictionary/stopwords.txt
CoreSynonymDictionaryDictionaryPath=data/dictionary/synonym/CoreSynonym.txt
PersonDictionaryPath=data/dictionary/person/nr.txt
PersonDictionaryTrPath=data/dictionary/person/nr.tr.txt
tcDictionaryRoot=data/dictionary/tc
CustomDictionaryPath=data/dictionary/custom/CustomDictionary.txt; 现代汉语补充词库.txt; 全国地名大全.txt ns; 人名词典.txt; 机构名词典.txt; 上海地名.txt ns;data/dictionary/person/nrf.txt nrf;
CRFSegmentModelPath=data/model/segment/CRFSegmentModel.txt
HMMSegmentModelPath=data/model/segment/HMMSegmentModel.bin
ShowTermNature=true
plugin-descriptor.properties和plugin-security.policy属性按 elasticsearch-analysis-hanlp的release包属性修改.
修改ES启动,并启动
vim /opt/elasticsearch-5.5.1config/jvm.options
-Djava.security.policy=/opt/elasticsearch-5.5.1/plugins/analysis-hanlp/plugin-security.policy
测试安装成功否命令
GET /_analyze?analyzer=hanlp-index&pretty=true
{ "text":"公安部:各地校车将享最高路权"}
data字典文件从 下载,解压就行了.
文章来源于小白鸽的博客
转载地址:http://hzonl.baihongyu.com/