Need help with elasticsearch-analysis-stconvert?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

medcl
247 Stars 55 Forks Apache License 2.0 96 Commits 7 Opened issues

Description

STConvert is analyzer that convert chinese characters between traditional and simplified.中文简繁體互相转换.

Services available

!
?

Need anything else?

Contributors list

# 2,502
elastic...
HTML
Java
Shell
78 commits
# 45,546
Java
eclipse
test-fr...
Applica...
5 commits
# 95,031
elastic...
PHP
HTML
elastic...
1 commit

STConvert Analysis for Elasticsearch

STConvert is analyzer that convert Chinese characters between Traditional and Simplified. [中文简繁體转换][简体到繁体][繁体到简体][简繁查询Expand]

You can download the pre-build package from release page

The plugin includes analyzer:

stconvert
, tokenizer:
stconvert
, token-filter:
stconvert
, and char-filter:
stconvert

Supported config:

  • convert_type
    : default
    s2t
    ,optional option:
    1. s2t
      ,convert characters from Simple Chinese to Traditional Chinese
    2. t2s
      ,convert characters from Traditional Chinese to Simple Chinese
  • keep_both
    :default
    false
    ,
  • delimiter
    :default
    ,

Custom example:

PUT /stconvert/
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "tsconvert" : {
                    "tokenizer" : "tsconvert"
                    }
            },
            "tokenizer" : {
                "tsconvert" : {
                    "type" : "stconvert",
                    "delimiter" : "#",
                    "keep_both" : false,
                    "convert_type" : "t2s"
                }
            },   
             "filter": {
               "tsconvert" : {
                     "type" : "stconvert",
                     "delimiter" : "#",
                     "keep_both" : false,
                     "convert_type" : "t2s"
                 }
             },
            "char_filter" : {
                "tsconvert" : {
                    "type" : "stconvert",
                    "convert_type" : "t2s"
                }
            }
        }
    }
}

Analyze tests

GET stconvert/_analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "char_filter" : ["tsconvert"],
  "text" : "国际國際"
}

Output: { "tokens": [ { "token": "国际国际", "start_offset": 0, "end_offset": 4, "type": "word", "position": 0 } ] }

Normalizer usage

DELETE index
PUT index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "tsconvert": {
          "type": "stconvert",
          "convert_type": "t2s"
        }
      },
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [
            "tsconvert"
          ],
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}

PUT index/_doc/1 { "foo": "國際" }

PUT index/_doc/2 { "foo": "国际" }

GET index/_search { "query": { "term": { "foo": "国际" } } }

GET index/_search { "query": { "term": { "foo": "國際" } } }

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.