ElasticSearch 实战
2. 深入功能
es中的数据是如何组织起来的
逻辑设计: 搜索应用所要注意的
- 用于索引和搜索的基本单位是文档, 可以将其理解为数据库的一行
- 文档以类型来分组, 类型包含若干文档, 类似表格包含若干行
- 一个或多个类型存在同一索引里, 类似SQL的数据库
物理设计: Es是如何处理数据的
- ES将每个索引划分片, 每份分片可以在集群中的不同服务间迁移
2.1 理解逻辑设计: 文档、类型和索引
2.1.1 文档
文档的重要属性
- 文档是自我包含的. 一篇文档同时包含字段(如name), 和它的取值(如es denver)
- 文档是层次型的. 文档中还可以包含文档
- 文档拥有灵活的结构. 文档并不依赖预先定义的模式, 例如并非所有的活动都需要描述这个字段值, 所以可以彻底忽略该字段
2.1.2 类型
类型是文档的逻辑容器, 类似于表格是行的容器
在不同类型中, 最好放入不同的结构的文档
例如可以一个类型定义聚会时的分组, 一个类型定义人们参加的活动
每个类型中字段的定义称为映射, 例如name的字段称为string
而location中的geolocation字段映射是geo_point
每个映射的搜索处理方式不一样
例如string映射可以搜索关键字, 而geo_point映射可以搜索哪个分组离你更近
如果一篇新旧索引的文档拥有一个映射尚不存在的字段, es会自动将新字段加入映射
es会对字段进行猜测映射, 例如值是7 就会猜测是长整型
假如索引了值7之后, 可能再想索引hello world, 索引就会失败
对于线上的环境, 最安全的方式是在索引数据之前, 就定义好好所需的映射
2.1.3 索引
索引是映射类型的容器, 一个es索引非常像关系型世界的数据库, 是独立的大量文档集合
每个索引存储在磁盘的同组文件中
所以存储了所有映射类型的字段, 还有一些设置
例如refresh_interval设置, 定义新近索引的文档对于搜索可见的时机间隔
刷新操作比较昂贵, 默认是每秒更新一次, 而不是每来一篇新的文档更新一次
可以设置某个索引的分片的数量, 索引是由一个/多个分片的数据块组成
2.2 理解物理设置: 节点和分片
一个索引创建时, 默认每个索引由5个主要分片组成, 而每个主要分片又有一个副本, 一共10份分片
一个分片是一个目录中的文件, 分片也是es将数据从一个节点移到另一个节点的最小单位
默认情况下, 可以连接集群中的任一节点并访问完整的数据集, 就好像集群只有单独的一个节点
当索引一篇文档时发生了什么
- 系统根据文档id的散列值选择一个主分片, 并把文档发送到该主分片
- 这份主分片可能存在位于另一个节点
- 然后文档被发送到该主分片的所有副本分片进行索引
- 在主分片无法访问时, 副分片自动升级为主分片
搜索索引时发生了什么
es需要在该索引的完整分片集合中进行查找, 这些分片可以是主分片/副分片
2.2.2 理解主分片和副本分片
es所处理的最小单元: 分片
一个分片是Lucene的索引
一个包含倒排索引的文件目录
倒排索引的结构使得es不扫描所有的文档情况下, 就能告知你哪些文档包含特定的词条
副本分片可以在运行时进行添加和删除, 而主分片不行
在创建索引之前, 必须决定分片的数量
过少的分片限制可扩展性, 过多的分片会影响性能, 默认为5是个不错的开始
2.3 索引新数据
手动索引第一篇文档, 使用put请求
http://location:9200/get-together/group/1
- get-together为索引名称
- group为类型的名称
- 1位文档id
2.3.1 通过cURL索引一篇文档
curl -XPUT -u elastic:xxxxxxx  -H "Content-Type: application/json"  'http://es-cn-xxxxxxx.elasticsearch.aliyuncs.com:9200/mzk-es-action/group/1?pretty' -d '{
    "name": "es denver",
    "organizer": "lee"
}'
加pretty的原因是为了得到格式化的返回
返回值
{
  "_index" : "mzk-es-action",
  "_type" : "group",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
2.3.2 创建索引和映射类型
上一节的命令为何会成功?
- 索引之前不存在: 并未发送任何命令来创建一个叫做mzk-es-action的索引
- 映射之前未定义: 每一定义任何称为group的映射类型来刻画文档中的字段
手动创建索引
curl -XPUT -u elastic:xxxxxxx  -H "Content-Type: application/json"  'http://es-cn-xxxxxxx.elasticsearch.aliyuncs.com:9200/mzk-es-action2'
{"acknowledged":true,"shards_acknowledged":true,"index":"mzk-es-action2"}%
获取映射
curl -XGET -u elastic:xxxxxxx  -H "Content-Type: application/json"  'http://es-cn-xxxxxxx.elasticsearch.aliyuncs.com:9200/mzk-es-action/_mapping/group?pretty'
{
  "mzk-es-action" : {
    "mappings" : {
      "group" : {
        "properties" : {
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "organizer" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}
2.4 搜索并获取数据
curl "localhost:9200/get-together/group/_search?\
q=elasticsearch\
&_source=name,location_group\
&size=1\
&pretty"
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : 1.0238004,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0238004,
        "_source" : {
          "location_group" : "Denver, Colorado, USA",
          "name" : "Elasticsearch Denver"
        }
      }
    ]
  }
}
- q=elasticsaerch: 查找包含elasticsearch的文档
- fields=name,location, 只需要name和location的信息
- size=1: 只需要排名靠前的
- pretty: 格式化json
要想搜索所有, 可以使用_all
搜索的3个内容
- 在哪里搜索
- 回复什么内容
- 搜索什么以及如何搜索
2.4.1 在哪里搜索
可以在同一索引的多个字段进行搜索, 也可以在多个索引中搜索
多个类型中搜索, 使用逗号分隔
curl "localhost:9200/get-together/_doc/_search\
?q=elasticsearch&pretty"
在所有索引中的某个类型进行搜索
curl "localhost:9200/_all/event/_search"
日志事件经常以基于事件的索引来组织, 例如logs-2021-03-14, 这种设计意味着当天的搜索很热门
回复的内容
时间
{
  "took" : 6,
  "timed_out" : false,
}
took表示查询所花费的时间, 默认情况下time_out永远为false
除非在请求时待timeout参数,
curl "localhost:9200/get-together/_doc/_search\
?q=elasticsearch\
&timeout=3s\
&pretty"
这样只会返回3S内查询到的数据, 并且超过3S则time_out为true
分片
{
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
}
命中统计数据
{
    "total" : {
        "value" : 10, // 总命中数
        "relation" : "eq"
    },
    "max_score" : 1.0238004, // 匹配最高得分
}
结果文档
{
     "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0238004,
        "_source" : {
          "location_group" : "Denver, Colorado, USA",
          "name" : "Elasticsearch Denver"
        }
      }
    ]
}
2.4.3 如何搜索
设置查询的字符串选项
curl 'localhost:9200/get-together/_search?pretty'  -H "Content-Type: application/json" -d '{
  "query": {
      "query_string": {
          "query": "elasticsearch san francisco",
          "default_field": "name",
          "default_operator": "AND"
      }
  }
}'
获取同样结果的另一种写法:"query": "name:elasticsearch AND name:san AND name:francisco"
es默认人_all字段里查询, 如果想在分组的名称里查询这样指定"default_field": "name"
使用过滤器
如果对得分结果不感兴趣, 只关心是否有一条结果匹配条件
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
      "bool": {
          "filter": {
              "term": {
                  "name": "elasticsearch"
              }
          }
      }
  }
}'
返回的结果跟查询相同, 但没有根据得分来排序
应用聚集
查询统计数
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "aggregations": {
      "organizers": {
          "terms": { "field": "organizer" }
      }
  }
}'
返回异常
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [organizer] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    ...
  "status" : 400
}
需要给该字段增加属性
curl -X PUT "localhost:9200/get-together/_mapping/_doc?pretty" -H 'Content-Type: application/json' -d'
{
  "properties": {
    "organizer": {
      "type": "text",
      "fielddata": true
    }
  }
}
'
{
  "acknowledged" : true
}
然后再次进行查询可得
{
  "took" : 227,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "relationship_type" : "group",
          "name" : "Denver Clojure",
          "organizer" : [
            "Daniel",
            "Lee"
          ],
          "description" : "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",
          "created_on" : "2012-06-15",
          "tags" : [
            "clojure",
            "denver",
            "functional programming",
            "jvm",
            "java"
          ],
          "members" : [
            "Lee",
            "Daniel",
            "Mike"
          ],
          "location_group" : "Denver, Colorado, USA"
        }
      }
      ...
    ]
  },
  "aggregations" : {
    "organizers" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "lee",
          "doc_count" : 2
        },
        {
          "key" : "andy",
          "doc_count" : 1
        },
        {
          "key" : "daniel",
          "doc_count" : 1
        },
        {
          "key" : "mik",
          "doc_count" : 1
        },
        {
          "key" : "tyler",
          "doc_count" : 1
        }
      ]
    }
  }
}
2.6 在集群中加入节点
查询分片信息
 curl  "localhost:9200/_cat/shards?v"
index                  shard prirep state      docs  store ip         node
get-together           1     p      STARTED       4 11.6kb 172.17.0.2 163bef384184
get-together           1     r      UNASSIGNED
get-together           0     p      STARTED      16 20.2kb 172.17.0.2 163bef384184
get-together           0     r      UNASSIGNED
myindex                0     p      STARTED       0   208b 172.17.0.2 163bef384184
myindex                0     r      UNASSIGNED
december_2014_invoices 0     p      STARTED       0   208b 172.17.0.2 163bef384184
december_2014_invoices 0     r      UNASSIGNED
.geoip_databases       0     p      STARTED      42 40.6mb 172.17.0.2 163bef384184
november_2014_invoices 0     p      STARTED       0   208b 172.17.0.2 163bef384184
november_2014_invoices 0     r      UNASSIGNED
3. 索引、更新和删除数据
本章重点介绍下面3种类型的字段
- 核心: 这些字段暴扣字符串和数值型
- 数组和多元字段: 这些字段在某个字段里存储相同核心类型的多个值
- 预定义: 自卸字段包括_ttl和_timestamp
可以使用_ttl字段让过期的文档自动被删除
3.1 使用映射来定义各种文档
3.1.1 检索和定义映射
> 获取目前映射
{
  "get-together" : {
    "mappings" : {
      "properties" : {
        "attendees" : {
          "type" : "text",
          "fields" : {
            "verbatim" : {
              "type" : "keyword"
            }
          }
        },
        "created_on" : {
          "type" : "date",
          "format" : "yyyy-MM-dd"
        },
        "date" : {
          "type" : "date",
          "format" : "date_hour_minute"
        },
        "description" : {
          "type" : "text",
          "term_vector" : "with_positions_offsets"
        },
        "host" : {
          "type" : "text"
        },
        "location_event" : {
          "properties" : {
            "geolocation" : {
              "type" : "geo_point"
            },
            "name" : {
              "type" : "text"
            }
          }
        },
        "location_group" : {
          "type" : "text"
        },
        "members" : {
          "type" : "text"
        },
        "name" : {
          "type" : "text"
        },
        "organizer" : {
          "type" : "text",
          "fielddata" : true
        },
        "relationship_type" : {
          "type" : "join",
          "eager_global_ordinals" : true,
          "relations" : {
            "group" : "event"
          }
        },
        "reviews" : {
          "type" : "integer",
          "null_value" : 0
        },
        "tags" : {
          "type" : "text",
          "fields" : {
            "verbatim" : {
              "type" : "keyword"
            }
          }
        },
        "title" : {
          "type" : "text"
        }
      }
    }
  }
}
索引一篇新的文档
curl -X PUT "localhost:9200/get-together/_doc/1" -H"Content-Type: application/json"  -d'
{
    "name": "Late Night with Elasticsearch",
    "date": "2013-10-25T19:00"
}
'
{"_index":"get-together","_type":"_doc","_id":"1","_version":2,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":16,"_primary_term":1
定义新的映射
curl -X PUT "localhost:9200/get-together/_mapping/_doc" -H"Content-Type: application/json"  -d'
{
    "_doc": {
        "properties": {
            "host": {
                "type": "text"
            }
        }
    }
}
'
{"acknowledged":true}
3.1.2 扩展现有的映射
无法改变现有字段的数据类型
也通常无法改变一个字段被索引的方式
唯一解决这个问题的办法是
- 移除索引里的数据
- 设置新的映射
- 再次索引所有的数据
3.2 用于定义文档字段的核心类型
| 核心类型 | 取值示例 | 
|---|---|
| 字符串 | 'lee' | 
| 数值 | 12, 3.2 | 
| 日期 | 2021-11-11T10:02:26.231+01:00 | 
| 布尔 | true或false | 
3.2.1 字符串类型
假如在索引字符串late night with elasticsearch
那么会生成4个词条late, night, with, elasticsearch
可以指定字段的索引方式
- true: 被索引
- false: 不被索引
curl -X PUT "localhost:9200/get-together/_mapping/_doc" -H"Content-Type: application/json"  -d'
{
    "_doc": {
        "properties": {
            "host": {
                "type": "text",
                "index": true
            }
        }
    }
}
'
3.2.2 数值类型
es自动检测映射更为安全, 为整数值分配long, 为浮点数值分配double
3.2.3 日期类型
curl "localhost:9200/get-together/_mapping/_doc?pretty"
{
  "get-together" : {
    "mappings" : {
      "_doc" : {
        "properties" : {
          "created_on" : {
            "type" : "date",
            "format" : "yyyy-MM-dd"
          }
        }
      }
    }
  }
}
3.3 数组和多字段
3.3.1 数组
curl -X PUT "localhost:9200/get-together/_doc/22" -H"Content-Type: application/json"  -d'
{
    "tags_22": ["first", "initial"]
}
'
{"_index":"get-together","_type":"_doc","_id":"22","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":4,"_primary_term":1
查看到的映射是
{
    "tags_22" : {
        "type" : "text",
        "fields" : {
            "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
            }
        }
    }
}
3.4.1 控制如何存储和搜索文档
存储原有内容的_source
curl "localhost:9200/get-together/_doc/1?pretty"
{
  "_index" : "get-together",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 16,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Late Night with Elasticsearch",
    "date" : "2013-10-25T19:00"
  }
}
仅返回源文档的某些字段
curl "localhost:9200/get-together/_doc/1?pretty&_source=name"
{
  "_index" : "get-together",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "_seq_no" : 16,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "Late Night with Elasticsearch"
  }
}
3.4.2识别文档
为文档提供ID
不指定id时, es将自动生成id
curl -X POST "localhost:9200/get-together/_doc" -H"Content-Type: application/json"  -d'
{
    "name": "xxxxxx"
}
'
{"_index":"get-together","_type":"_doc","_id":"PPqkT3sBnVKKHVdH6BxL","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":17,"_primary_term":1}%
3.5 更新现有文档
3.5.1 使用更新api
发送部分文档
curl -X POST "localhost:9200/get-together/_doc/22/_update" -H"Content-Type: application/json"  -d'
{
    "doc": {
            "name": "yyyyy",
    }
}
'
使用upsert来创建尚不存在的文档
为了处理文档不存在的使用可以使用upsert
curl -X POST "localhost:9200/get-together/_doc/33/_update?pretty" -H"Content-Type: application/json"  -d'
{
    "doc": {
            "name": "xxxx"
    },
    "upsert": {
        "name": "yyy",
        "organizer": "bbb"
    }
}
'
{
  "_index" : "get-together",
  "_type" : "_doc",
  "_id" : "33",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 18,
  "_primary_term" : 1
}
查询添加的文档:
curl "localhost:9200/get-together/_doc/33?pretty"
{
  "_index" : "get-together",
  "_type" : "_doc",
  "_id" : "33",
  "_version" : 1,
  "_seq_no" : 18,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "yyy",
    "organizer" : "zxc"
  }
}
通过脚本更新文档
默认的脚本语言是Groovy, 语法和java类似
curl -X POST "localhost:9200/get-together/_doc/33/_update?pretty" -H"Content-Type: application/json"  -d'
{
    "script": {
        "source": "ctx._source.price += params.price_diff",
        "params": {
                "price_diff": 20
        }
    }
}
'
{
  "_index" : "get-together",
  "_type" : "_doc",
  "_id" : "33",
  "_version" : 4,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 21,
  "_primary_term" : 1
}
3.5.2 通过版本来实现并发控制
如果同一时刻多次更新都在执行, 可能会出现并发问题
es支持并发控制, 为每篇文档设置了一个版本号, 在最初的文档版本是1
当更新操作重新索引它时, 版本号会设置为2
当一个文档先被更新为版本2, 与此同时, 一个更新版本也设置为2, 则更新失败
这种并发控制称为乐观锁
因为它允许并行的操作并假设冲突是很少出现的, 出现的时候就抛出错误
冲突发生时自动重试更新操作, 通过
retry_on_conflict参数, 让es自动重试
curl -X POST "localhost:9200/get-together/_doc/33/_update?pretty&retry_on_conflict=3" -H"Content-Type: application/json"  -d'
{
    "script": {
        "source": "ctx._source.price = 3"
    }
}
'
索引文档的时候使用版本号
更新文档的另一个方法是不使用更新api, 而是在同一索引, 类型和id之处索引一个新的文档
如果认为的版本已经是8了,1 一个重新索引的请求应该是这样的
curl -XPUT "localhost:9200/get-together/_doc/33?version=8&pretty" -H"Content-Type: application/json"  -d'
{
    "price": 10
}
'
如果版本号对不上则会报错"[_doc][33]: version conflict, current version [8] is different than the one provided [3]
使用外部版本号
es一般版本号是每次更改后递增, 也可以指定version
curl -XPUT "localhost:9200/get-together/_doc/33?version=8&version_type=external&pretty" -H"Content-Type: application/json"  -d'
{
    "price": 10
}
'
这将使es接受任何版本号, 只要比现在高, 而且es不会自己增加版本号
3.6 删除数据
3.6.1 删除文档
删除单个文档
 curl -X DELETE "localhost:9200/get-together/_doc/33?pretty" -H"Content-Type: application/json"
{
  "_index" : "get-together",
  "_type" : "_doc",
  "_id" : "33",
  "_version" : 10,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 27,
  "_primary_term" : 1
}
可能会出现删除了文档, 但是由于更新操作重新创建了该文档
为了防止这个问题, es将在一段时间内保留这篇文档的版本, 如此它就能拒绝比删除操作更低的更新操作了, 默认情况是60s, 通过修改index.gc_deletes修改它
删除映射类型和删除查询匹配的文档
 curl -X DELETE "localhost:9200/get-together/_doc" -H"Content-Type: application/json"
或根据查询结果删除
 curl -X DELETE "localhost:9200/get-together/_query?1=es" -H"Content-Type: application/json"
3.6.2 删除单个索引
curl -X DELETE 'localhost:9200/get-together/'
可以设置action.destructive_requires_name:true, 来防止删除_all索引
3.6.3 关闭索引
curl -X POST "localhost:9200/get-together/_close" -H"Content-Type: application/json"
curl -X POST "localhost:9200/get-together/_open" -H"Content-Type: application/json"
4. 搜索数据
4.1 搜索请求的结构
4.1.1 确定搜索范围
curl "localhost:9200/_search" // 搜索整个集群
curl "localhost:9200/get-together/_search" //搜索get-together索引
curl "localhost:9200/get-together/event/_search" //搜索get-together索引的event类型
curl "localhost:9200/_all/event" //所有所有索引的event类型
curl "localhost:9200/get-together,other/event,group/_search"
curl "localhost:9200/+get-toge*, -get-together/_search" //搜索所有get-toge开头的索引, 但不包括get-together
还可以用别名来搜索多个索引, 例如logstash-yymmdd格式命名的索引, 一个logstash别名就可以指向所有相关索引
4.1.2 搜索请求的基本模块
- query: 查询DSL和过滤器DSL
- size: 返回文档的数量
- from: 从第XX条开始查
- _source: 文档的存储值
- sort: 默认按得分排序
基于URL的搜索请求
按时间倒序返回
curl "localhost:9200/get-together/_search?sort=date:asc"
4.1.3 基于请求主体的搜索请求
过滤字段
curl "localhost:9200/get-together/_search?pretty" -H"Content-Type: application/json"  -d'
{
    "query": {
        "match_all": {}
    },
    "_source": {
        "include": ["location.*", "date"],
        "exclude": ["location.geolocation"]
    }
}
'
结果排序
curl "localhost:9200/get-together/_search?pretty" -H"Content-Type: application/json"  -d'
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {"created_on": "asc" },
        {"name": "desc"},
        "_score"
    ]
}
'
4.2 介绍查询和过滤器DSL
4.2.1 match查询和term过滤器
match查询
curl  http://localhost:9200/get-together/_search  -H"Content-Type: application/json" -d '{
  "query": {
      "match": {
          "title": "hadoop"
      }
  }
}'
match查询的是给特定的词条打分, 而过滤器只是为文档是否匹配这个查询 返回简单的是或不是
使用过滤器查询
curl  http://localhost:9200/get-together/_search  -H"Content-Type: application/json" -d '{
  "query": {
      "bool": {
          "must": {
              "match": {
                "title": "hadoop"
            }
          },
          "filter": {
            "term": {
                "host": "andy"
            }
          }
      }
  }
}'
4.2.2 常用的基础查询和过滤器
query_string查询
默认情况下, query_string会查询_all字段
可以通过设置default_field来设置特定请求字段
curl  http://localhost:9200/get-together/_search  -H"Content-Type: application/json" -d '{
  "query": {
      "query_string": {
          "default_field": "description",
          "query": "nosql"
      }
  }
}'
query_string的更多用法:
- 查询所有nosql的分组, 但排除mongodb: name:nosql AND -description:mongodb
查询1999年到2001年期间创建的搜索和Lucene分组
{tag: search OR tag:lucene} AND created_on:[1991-01-01 TO 2001-01-01]
term查询和term过滤器
term为词条
词条查询
curl  http://localhost:9200/get-together/_doc/_search?pretty  -H"Content-Type: application/json" -d '{
  "query": {
      "term": {
          "tags": "elasticsearch"
      }
  },
  "_source": ["name", "tags"]
}'
词条过滤器
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
      "bool": {
          "filter": {
              "term": {
                  "name": "elasticsearch"
              }
          }
      }
  },
   "_source": ["name", "tags"]
}'
terms查询
curl  http://localhost:9200/get-together/_doc/_search?pretty  -H"Content-Type: application/json" -d '{
  "query": {
      "terms": {
          "tags": ["jvm", "hadoop"]
      }
  },
  "_source": ["name", "tags"]
}'
限制每篇文档中匹配词条的最小数量
curl  http://localhost:9200/get-together/_doc/_search?pretty  -H"Content-Type: application/json" -d '{
  "query": {
      "bool": {
          "minimum_should_match": 2,
          "should": [
              { "term": { "tags": "hadoop" } },
              { "term": { "tags": "data" } }
          ]
      }
  },
  "_source": ["name", "tags"]
}'
4.2.3 match查询和term过滤器
和term查询类似, match查询是一个散列映射, 包含希望搜索的字段和字符串
布尔查询行为
默认情况下, match查询使用布尔行为是OR操作符
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
     "match": {
         "name": {
             "query": "Elasticsearch Denver",
             "operator": "and"
         }
     }
  },
   "_source": ["name", "tags"]
}'
词组查询行为
在文档中查询特定的值, phrase查询非常有用
例如只记得enterprise和london两个词, 中间有多少个词分离不太记得了, 
就可以设置slop为1或2, 而不是默认的
 curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
     "match_phrase": {
         "name": {
             "query": "enterprise london",
             "slop": "1"
         }
     }
  },
   "_source": ["name", "tags"]
}'
{
  "took" : 313,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.5762693,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.5762693,
        "_source" : {
          "name" : "Enterprise search London get-together",
          "tags" : [
            "enterprise search",
            "apache lucene",
            "solr",
            "open source",
            "text analytics"
          ]
        }
      }
    ]
  }
}
4.2.4 phrase_prefix查询
和词组最后一个词条进行前缀匹配, 可以通过max_expansions来设置最大的前缀扩展数量
 curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
     "match_phrase_prefix": {
         "name": {
             "query": "elasticsearch den",
             "max_expansions": 1
         }
     }
  },
   "_source": ["name", "tags"]
}'
{
  "took" : 59,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 2.0598295,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.0598295,
        "_source" : {
          "name" : "Elasticsearch Denver",
          "tags" : [
            "denver",
            "elasticsearch",
            "big data",
            "lucene",
            "solr"
          ]
        }
      }
    ]
  }
}
使用multi_match来匹配多个字段
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
     "multi_match": {
        "query": "elasticsearch hadoop",
        "fields": ["name", "description"]
     }
  }
}'
4.3 组合查询和复合查询
4.3.1 bool查询
bool查询可以组合任意数量的查询, 指定哪些部分是必须的must, 应该should, 不能must_not
- must匹配: 只有匹配上查询的才会返回
- should: 只有匹配上指定数量的子句文档才会返回
- 如果没有指定must匹配, 文档至少要匹配一个should子句才返回
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
     "bool": {
         "must": [
             { "term": { "attendees": "david"} }
         ],
         "should": [
             { "term": { "attendees": "clint" } },
             { "term": { "attendees": "andy" } }
         ],
         "must_not": [
             {
                 "range": {
                    "date": {
                        "lt": "2013-06-30T00:00"
                    }
                }
             }
         ],
         "minimum_should_match": 1
     }
  }
}'
{
  "took" : 131,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 2.5546992,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "110",
        "_score" : 2.5546992,
        "_routing" : "4",
        "_source" : {
          "relationship_type" : {
            "name" : "event",
            "parent" : "4"
          },
          "host" : "Andy",
          "title" : "Big Data and the cloud at Microsoft",
          "description" : "Discussion about the Microsoft Azure cloud and HDInsight.",
          "attendees" : [
            "Andy",
            "Michael",
            "Ben",
            "David"
          ],
          "date" : "2013-07-31T18:00",
          "location_event" : {
            "name" : "Bing Boulder office",
            "geolocation" : "40.018528,-105.275806"
          },
          "reviews" : 1
        }
      }
    ]
  }
}
4.3.2 bool过滤器
过滤器和查询版本基本一致, 但在过滤器中,不支持minimum_should_match属性, 默认为1
4.4 超越match和过滤器查询
4.4.1 range查询器
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
     "range": {
         "created_on": {
             "gt": "2012-06-01",
             "lte": "2012-09-01"
         }
     }
  }
}'
{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "relationship_type" : "group",
          "name" : "Elasticsearch San Francisco",
          "organizer" : "Mik",
          "description" : "Elasticsearch group for ES users of all knowledge levels",
          "created_on" : "2012-08-07",
          "tags" : [
            "elasticsearch",
            "big data",
            "lucene",
            "open source"
          ],
          "members" : [
            "Lee",
            "Igor"
          ],
          "location_group" : "San Francisco, California, USA"
        }
      }
    ]
  }
}
4.4.2 prefix查询和过滤器
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
     "prefix": {
         "title": "liber"
     }
  },
  "_source": ["title"]
}'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "100",
        "_score" : 1.0,
        "_routing" : "1",
        "_source" : {
          "title" : "Liberator and Immutant"
        }
      }
    ]
  }
}
4.4.3 wildcard查询
类似于shell里的正则ls *foo?ar
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
     "wildcard": {
         "title": "ba*n"
     }
  },
  "_source": ["title"]
}'
4.5 使用过滤器查询字段的存在性
4.5.1 exists过滤器
过滤文档 是否拥有哪些字段
curl 'localhost:9200/get-together/_search?pretty' -H'Content-Type: application/json' -d '{
  "query": {
     "bool": {
         "filter": {
             "exists": { "field": "location_event.geolocation" }
         }
     }
  }
}'
5. 分析数据
5.1 分析数据
分析是在文档被发送并加入倒排搜索之前,