【Elasticsearch】透過 REST API 進行 CRUD 操作

https://unsplash.com/photos/_DyUcJalGAc

準備好 ElasticSearch 的環境後，我們就能開始學習操作。本文整理出一些 REST API，除了資料基本的 CRUD 操作，也包含排序與分頁。讀者在練習時，可以透過 Postman 這項工具發出請求。或者使用 GUI 工具（如 Elasticvue），將會更便利。

一、ElasticSearch 的索引（index）

ES 的索引並不是一般資料庫中，用來加快查詢速度的索引，而是一種類似「資料表」（table）或「集合」（collection）的存在。所以裡面會存放資料。

（一）建立 index

以下範例是建立一個叫做「student」的 index。

PUT /student

（二）刪除 index

以下範例是刪除一個叫做「student」的 index。

DELETE /student

若想同時刪除多個 index，可用「,」將 index 名稱隔開，例如：

DELETE /student,course

當 index 被刪除，裡面的資料也會隨之刪除。

二、新增資料

ES 是以 JSON 格式來儲存資料，我們稱之為「文件」（document），與 NoSQL 相似。

以下範例是在一個叫做 student 的 index 新增一筆 document。若 index 不存在，ES 會自動建立。

POST /student/_doc
{
    "name": "Vincent Zheng",
    "grade": 2,
    "conductScore": 86,
    "introduction": "I have a blog used to record what I learn in my career.",
    "job": {
        "name": "衛生股長",
        "isPrimary": true
    }
}

每個 document 在 index 中都有自己的 id。由於我們在新增時沒有特別指定，因此 ES 會自動產生。

以下範例是新增另一筆 document，並且指定 id 為「104」。

POST /student/_doc/104
{
    "name": "Winnie Kuo",
    "grade": 1,
    "conductScore": 71,
    "introduction": "To lead a team in company in career, learn to lead students in university first.",
    "job": {
        "name": "班長",
        "isPrimary": false
    }
}

只要在 API 路徑加上想要的 id 即可。雖然此處的 id 都是數字，但 ES 固定將 document 的 id 存為字串。

三、查詢資料

在前一節的範例中，新增了兩筆 document，接下來讓我們試著查詢看看。

（一）無條件

以下範例是查詢所有資料，也就是沒有條件。其實 HTTP 方法使用 GET 也可以，但 GUI 工具可能會提示說不能在 GET 請求攜帶 request body。

POST /student/_search
{
    "query": {
        "match_all": {}
    }
}

基本上就是將條件寫在 query 欄位中，其中「match_all」是全部的意思。

（二）包含關鍵字

以下範例是查詢 introduction 欄位中包含「company」或「career」字眼的資料。

POST /student/_search
{
    "query": {
        "match": {
            "introduction": "company career"
        }
    }
}

其中「match」是包含指定關鍵字的意思。關於更多查詢語法，筆者將在「使用 DSL 撰寫查詢條件」文章中介紹。

（三）閱讀查詢結果

以前面包含關鍵字的條件為例，以下是得到的 response body 之部份內容。附帶一提，ES 預設取最前面 10 筆資料。

{
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.8506131,
        "hits": [
            {
                "_index": "student",
                "_id": "10445000",
                "_score": 0.8506131,
                "_source": {
                    "name": "Winnie Kuo",
                    "grade": 1,
                    "conductScore": 71,
                    "introduction": "To lead a team in company in career, learn to lead students in university first.",
                    "job": {
                        "name": "班長",
                        "isPrimary": false
                    }
                }
            },
            {
                "_index": "student",
                "_id": "eOL6zoEByNeVQnvbc4ea",
                "_score": 0.1878095,
                "_source": {
                    "name": "Vincent Zheng",
                    "grade": 2,
                    "conductScore": 86,
                    "introduction": "I have a blog used to record what I learn in my career.",
                    "job": {
                        "name": "衛生股長",
                        "isPrimary": true
                    }
                }
            }
        ]
    }
}

接著筆者針對幾個欄位做說明。

hits.total.value：查詢到的 document 筆數
hits.max_score：查詢結果中最高的分數
hits.hits：一個包含查詢結果的陣列
hits.hits._id：document 的 id
hits.hits._score：document 的分數
hits.hits._source：document 的實際內容

讀者可以發現，id 為「104」的 document 擁有的關鍵字較多，因此分數更高。此外，由於我們沒有指定排序方式，因此 ES 預設是以分數來排序。

四、更新資料

（一）更新部份欄位

以下範例是更新指定 id 的 document 之特定欄位。若原先不存在該欄位，ES 會視為新增欄位。

POST /student/_update/104
{
    "doc": {
        "grade": 2,
        "job": {
            "isPrimary": true
        },
        "majority": "企業管理"
    }
}

這個請求會將 grade 欄位值設為 2；job 物件的 isPrimary 欄位設為 true；並新增 majority 欄位。

（二）覆蓋整筆資料

以下範例是覆蓋指定 id 的 document 之內容，原先的資料將會被完全取代。

POST /student/_doc/104
{
    "doc": {
        "name": "Winnie Pooh",
        "majority": [
            "經營管理"
        ]
    }
}

以上兩個 API，都是將要用於更新 document 的欄位值放於 request body 的 doc 欄位中。

五、刪除資料

（一）透過 id 刪除

以下範例是刪除 id 為 104 的 document：

DELETE /student/_doc/104

（二）透過條件刪除

以下範例是刪除符合條件的資料。此處刪除 name 欄位包含「Vincent」字眼的 document。

POST /student/_delete_by_query
{
    "query": {
        "match": {
            "name": "Vincent"
        }
    }
}

其實完成刪除操作後，ES 是先將 document 標記為「已刪除」。到了特定的時機才會在背景執行物理上的刪除，釋放出空間。

六、排序與分頁

本節使用以下範例資料示範排序與分頁。讀者練習前不妨將前面使用的 student 的 index 刪除，再逐一新增這 3 筆 document。

{
    "name": "Vincent",
    "grade": 2,
    "conductScore": 86,
    "courses": [
        { "name": "計算機概論", "point": 3 },
        { "name": "程式設計", "point": 4 },
        { "name": "投資學", "point": 3 }
    ],
    "introduction": "company career"
}

{
    "name": "Winnie",
    "grade": 1,
    "conductScore": 71,
    "courses": [
        { "name": "會計學", "point": 3 },
        { "name": "商業概論", "point": 2 }
    ],
    "introduction": "company"
}

{
    "name": "Mario",
    "grade": 3,
    "conductScore": 86,
    "courses": [
        { "name": "會計學", "point": 5 },
        { "name": "審計學", "point": 3 },
        { "name": "企業資源規劃", "point": 3 }
    ],
    "introduction": "career"
}

在學生資料的 document 中，除了名字（name）、年級（grade），還多了操行成績（conductScore）與修習課程（courses）。而課程中又包含學分（point）欄位。

（一）單一欄位排序

以下範例是根據年級遞增排序。

POST /student/_search
{
    "query": {
        "match_all": {}
    },
    "sort": [
        { "grade": "asc" }
    ]
}

在 request body 中，加入了一個 sort 陣列欄位。而陣列元素則描述用來排序的欄位與方向，其中方向不區分大小寫。

此例的查詢結果依序為：Winnie、Vincent、Mario。

除了 document 本身的欄位，我們還能使用 ES 給予 document 的分數來排序。以下範例使用了 match 的查詢條件產生分數差異，並根據分數遞增排序。

POST /student/_search
{
    "query": {
        "match": {
            "introduction": "company career"
        }
    },
    "sort": [
        { "_score": "asc" }
    ]
}

（二）多欄位排序

以下範例是先使用操行成績遞減排序，再使用年級遞增排序。

POST /student/_search
{
    "query": {
        "match_all": {}
    },
    "sort": [
        { "conductScore": "desc" },
        { "grade": "asc" }
    ]
}

只要將先排序的欄位放在陣列中的前面即可。此例的查詢結果依序為：Vincent、Mario、Winnie。

（三）使用陣列排序

ES 支援使用陣列元素進行排序。由於一個陣列中會有多個元素，因此我們得選擇一種「取值」方式，才能用來排序 document，比方說取最大、最小、總和等。

以下範例是依照修習學分數的總和來遞增排序：

POST /student/_search
{
    "query": {
        "match_all": {}
    },
    "sort": [
        {
            "courses.point": {
                "order": "asc",
                "mode": "sum"
            }
        }
    ]
}

此例的查詢結果依序為：Mario、Vincent、Winnie。

在 request body 中，變成是 document 欄位名稱配上一個物件，裡面除了包含方向，還包含取值的模式（mode）。下面筆者列出 ES 支援的取值模式，同樣不區分大小寫。

min：最小值
max：最大值
sum：總和
avg：平均
median：中位數

其中 sum、avg 與 median 只能用在數值欄位；而 min 與 max 除了數值欄位，亦可用於字串與日期。雖說我們可選擇一種取值模式，但 ES 仍會在未選擇時採取預設值。例如使用陣列元素遞增排序時，預設為 min；遞減排序時，預設為 max。

（四）分頁

ES 的分頁，採取的是「skip」與「limit」的概念。也就是先跳過一定數量的資料，再取接下來的幾筆作為結果。

以下範例是依照年級欄位遞增排序後，從第 3 筆資料開始取 1 筆。

POST /student/_search
{
    "query": {
        "match_all": {}
    },
    "sort": [
        { "grade": "asc" }
    ],
    "from": 2,
    "size": 1
}

此例的查詢結果為：Mario。

進行分頁時會使用兩個參數。「from」代表起始的 document 位置；「size」代表要擷取的數量。要注意 document 的位置是從 0 開始算。再舉一個例子，若 from 為 0，size 為 2，則查詢結果依序為：Winnie、Vincent。

上一篇：【ElasticSearch 8】用途介紹與在 Windows 上安裝

下一篇：【ElasticSearch 8】使用 DSL 撰寫查詢條件

延伸閱讀：【ElasticSearch 8】導入到 Spring Boot 並實作 CRUD

新手工程師的程式教室

搜尋此網誌