前一篇文章说了IK中文分词,其实想实现的目的,就是拼音和中文都搜索到东西。类似百度搜索框的输入提示,淘宝搜索框的输入提示。
1,安装配置analysis-pinyin
//下载 $ git clone https://github.com/medcl/elasticsearch-analysis-pinyin.git $ cd elasticsearch-analysis-pinyin $ git branch -a * master //主分支是6.2.3,对应 es6.2.3 remotes/origin/0.16.x remotes/origin/1.x remotes/origin/2.x remotes/origin/5.3.x remotes/origin/5.x remotes/origin/6.1.x remotes/origin/HEAD -> origin/master remotes/origin/master $ mvn package //打包 $ ll target/releases/ total 4400 drwxr-xr-x 3 zhangying staff 102 4 24 13:46 ./ drwxr-xr-x 11 zhangying staff 374 4 24 13:32 ../ -rw-r--r-- 1 zhangying staff 4501993 4 24 13:32 elasticsearch-analysis-pinyin-6.2.3.zip $ cd target/releases/ && unzip elasticsearch-analysis-pinyin-6.2.3.zip $ brew info elasticsearch elasticsearch: stable 6.2.3, HEAD Distributed search & analytics engine https://www.elastic.co/products/elasticsearch /usr/local/Cellar/elasticsearch/6.2.3 (112 files, 30.8MB) * Built from source on 2018-04-24 at 14:17:01 From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/elasticsearch.rb ==> Requirements Required: java = 1.8 ✔ ==> Options --HEAD Install HEAD version ==> Caveats Data: /usr/local/var/lib/elasticsearch/elasticsearch_zhangying/ Logs: /usr/local/var/log/elasticsearch/elasticsearch_zhangying.log Plugins: /usr/local/var/elasticsearch/plugins/ //插件地址 Config: /usr/local/etc/elasticsearch/ To have launchd start elasticsearch now and restart at login: brew services start elasticsearch Or, if you don't want/need a background service you can just run: elasticsearch //将mvn后的插件copy到es插件目录 $ mv elasticsearch /usr/local/var/elasticsearch/plugins/pinyin $ elasticsearch //启动
2,测试pinyin分词
2.1,测试分词
$ curl -XPOST 'http://localhost:9200/pinyin/_analyze?pretty=true' -H 'Content-Type: application/json' -d '
> {
> "analyzer":"pinyin",
> "text":"gaotie"
> }'
{
"tokens" : [
{
"token" : "gao",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "gaotie",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "tie",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 1
}
]
}
$ curl -XPOST 'http://localhost:9200/pinyin/_analyze?pretty=true' -H 'Content-Type: application/json' -d '
> {
> "analyzer":"pinyin",
> "text":"高铁"
> }'
{
"tokens" : [
{
"token" : "gao",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "gt",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 0
},
{
"token" : "tie",
"start_offset" : 0,
"end_offset" : 0,
"type" : "word",
"position" : 1
}
]
}
从上面可以看出,pinyin分词对pinyin和中文都能分的,并且分出来的结果还不一样。
2.2,创建索引,mapping,插入数据
curl -XPUT "http://127.0.0.1:9200/pinyin?pretty"
curl -XPOST "http://127.0.0.1:9200/pinyin/test/_mapping?pretty" -H "Content-Type: application/json" -d '
{
"test": {
"_all":{
"enabled":false
},
"properties": {
"id": {
"type": "integer"
},
"username": {
"type": "text",
"analyzer": "pinyin"
},
"description": {
"type": "text",
"analyzer": "pinyin"
}
}
}
}
'
curl -XPOST "http://127.0.0.1:9200/pinyin/test/?pretty" -H "Content-Type: application/json" -d '
{
"id" : 1,
"username" : "中国高铁速度很快",
"description" : "如果要修改一个字段的类型"
}'
curl -XPOST "http://127.0.0.1:9200/pinyin/test/?pretty" -H "Content-Type: application/json" -d '
{
"id" : 2,
"username" : "动车和复兴号,都属于高铁",
"description" : "现在想要修改为string类型"
}'
2.3,全拼音测试
$ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty" -H "Content-Type: application/json" -d '
> {
> "query": {
> "match": {
> "username": "gao tie"
> }
> }
> }
> '
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.4039931,
"hits" : [
{
"_index" : "pinyin",
"_type" : "test",
"_id" : "TGZ2AWMBlEkarXCPb7ED",
"_score" : 0.4039931,
"_source" : {
"id" : 1,
"username" : "中国高铁速度很快",
"description" : "如果要修改一个字段的类型"
}
},
{
"_index" : "pinyin",
"_type" : "test",
"_id" : "TWZ2AWMBlEkarXCPb7En",
"_score" : 0.35767543,
"_source" : {
"id" : 2,
"username" : "动车和复兴号,都属于高铁",
"description" : "现在想要修改为string类型"
}
}
]
}
}
2.3,拼音分词,汉字搜索
$ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty" -H "Content-Type: application/json" -d '
> {
> "query": {
> "match": {
> "username": "中国高铁"
> }
> }
> }
> '
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.9398875,
"hits" : [
{
"_index" : "pinyin",
"_type" : "test",
"_id" : "TGZ2AWMBlEkarXCPb7ED",
"_score" : 1.9398875,
"_source" : {
"id" : 1,
"username" : "中国高铁速度很快",
"description" : "如果要修改一个字段的类型"
}
},
{
"_index" : "pinyin",
"_type" : "test",
"_id" : "TWZ2AWMBlEkarXCPb7En",
"_score" : 0.35767543,
"_source" : {
"id" : 2,
"username" : "动车和复兴号,都属于高铁",
"description" : "现在想要修改为string类型"
}
}
]
}
}
2.4,部分首字母
$ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty" -H "Content-Type: application/json" -d '
> {
> "query": {
> "match": {
> "username": "Gaot"
> }
> }
> }
> '
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.20199655,
"hits" : [
{
"_index" : "pinyin",
"_type" : "test",
"_id" : "TGZ2AWMBlEkarXCPb7ED",
"_score" : 0.20199655,
"_source" : {
"id" : 1,
"username" : "中国高铁速度很快",
"description" : "如果要修改一个字段的类型"
}
},
{
"_index" : "pinyin",
"_type" : "test",
"_id" : "TWZ2AWMBlEkarXCPb7En",
"_score" : 0.17883772,
"_source" : {
"id" : 2,
"username" : "动车和复兴号,都属于高铁",
"description" : "现在想要修改为string类型"
}
}
]
}
}
//同上
$ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty" -H "Content-Type: application/json" -d '
{
"query": {
"match": {
"username": "gtie"
}
}
}
'
2.5,全首字母搜索
$ curl -XPOST "http://127.0.0.1:9200/pinyin/test/_search?pretty" -H "Content-Type: application/json" -d '
> {
> "query": {
> "match": {
> "username": "gt"
> }
> }
> }
> '
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
全首字母高铁(gt),没有搜索到东西。
3,拼音分词和中文分词混合使用
3.1,自定义analyzer,并设置过滤器
$ curl -XPUT "http://localhost:9200/pinyin_ik/?pretty" -H "Content-Type: application/json" -d'
{
"index": {
"analysis": {
"analyzer": {
"ik_pinyin_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": ["my_pinyin", "word_delimiter"]
}
},
"filter": {
"my_pinyin": {
"type": "pinyin"
}
}
}
}
}'
$ curl -XPOST "http://127.0.0.1:9200/pinyin_ik/test/_mapping?pretty" -H "Content-Type: application/json" -d '
{
"test": {
"_all":{
"enabled":false
},
"properties": {
"id": {
"type": "integer"
},
"username": {
"type": "text",
"analyzer": "ik_pinyin_analyzer"
},
"description": {
"type": "text",
"analyzer": "ik_pinyin_analyzer"
}
}
}
}
'
$ curl -XPOST "http://127.0.0.1:9200/pinyin_ik/test/?pretty" -H "Content-Type: application/json" -d '
{
"id" : 1,
"username" : "中国高铁速度很快",
"description" : "如果要修改一个字段的类型"
}'
$ curl -XPOST "http://127.0.0.1:9200/pinyin_ik/test/?pretty" -H "Content-Type: application/json" -d '
{
"id" : 2,
"username" : "动车和复兴号,都属于高铁",
"description" : "现在想要修改为string类型"
}'
3.2,全首字母搜索
$ curl -XPOST "http://127.0.0.1:9200/pinyin_ik/test/_search?pretty" -H "Content-Type: application/json" -d '
> {
> "query": {
> "match": {
> "username": "gt"
> }
> }
> }
> '
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.6935897,
"hits" : [
{
"_index" : "pinyin_ik",
"_type" : "test",
"_id" : "S2ZzAWMBlEkarXCPu7Hq",
"_score" : 0.6935897,
"_source" : {
"id" : 2,
"username" : "动车和复兴号,都属于高铁",
"description" : "现在想要修改为string类型"
}
},
{
"_index" : "pinyin_ik",
"_type" : "test",
"_id" : "SmZzAWMBlEkarXCPubHw",
"_score" : 0.6827974,
"_source" : {
"id" : 1,
"username" : "中国高铁速度很快",
"description" : "如果要修改一个字段的类型"
}
}
]
}
}
转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/server/1894.html