August 8, 2011

Vass Vass
Hobby Entomologist
738 posts

[SOLVED] Need help with regexp for Kanji

 

I need check string for Kanji symbols. Can anybody help me build regexp for this?

Thanks.

 Signature 


Vasiliy

2 replies

August 8, 2011

takumiasaki takumiasaki
Lab Rat
7 posts

Unicode Chapter 12 [unicode.org] will help you a lot.

CJK Unified Ideographs4E00–9FFFCommon
CJK Unified Ideographs Extension A3400–4DBFRare
CJK Unified Ideographs Extension B20000–2A6DFRare, historic
CJK Unified Ideographs Extension C2A700–2B73FRare, historic
CJK Unified Ideographs Extension D2B740–2B81FUncommon, some in current use
CJK Compatibility IdeographsF900–FAFFDuplicates, unifiable variants, corporate
characters
CJK Compatibility Ideographs Supplement2F800–2FA1FUnifiable variants

So, range of Kanji(Han) are very roughly U+3400-U+9FFF, U+F900-U+FAFF, and U+20000-U+2FFFF.

QRegExp:

  1. QRegExp isHan("([\\x3400-\\x9FFF\\xF900-\\xFAFF]|[\\xD840-\\xD87F][\\xDC00-\\xDFFF])+");

Note: This regexp(isHan) doesn’t contain CJK Symbols(U+3000 – U+303F), Hiragana(U+3041 – U+309F), or Katakana(U+30A0 – U+30FF).

If you would like to check them, please add them to regexp.

August 8, 2011

Vass Vass
Hobby Entomologist
738 posts

Thank you, for fast and good answer.

 Signature 


Vasiliy

 
  ‹‹ [SOLVED] QVariant と ポインタ と Q_DECLARE_METATYPE の話      [SOLVED] QTextDocumentのテキスト検索 ››

You must log in to post a reply. Not a member yet? Register here!