[SOLVED] Need help with regexp for Kanji
I need check string for Kanji symbols. Can anybody help me build regexp for this?
Thanks.
2 replies
Unicode Chapter 12 [unicode.org] will help you a lot.
| CJK Unified Ideographs | 4E00–9FFF | Common |
| CJK Unified Ideographs Extension A | 3400–4DBF | Rare |
| CJK Unified Ideographs Extension B | 20000–2A6DF | Rare, historic |
| CJK Unified Ideographs Extension C | 2A700–2B73F | Rare, historic |
| CJK Unified Ideographs Extension D | 2B740–2B81F | Uncommon, some in current use |
| CJK Compatibility Ideographs | F900–FAFF | Duplicates, unifiable variants, corporate characters |
| CJK Compatibility Ideographs Supplement | 2F800–2FA1F | Unifiable variants |
So, range of Kanji(Han) are very roughly U+3400-U+9FFF, U+F900-U+FAFF, and U+20000-U+2FFFF.
QRegExp:
Note: This regexp(isHan) doesn’t contain CJK Symbols(U+3000 – U+303F), Hiragana(U+3041 – U+309F), or Katakana(U+30A0 – U+30FF).
- CJK Symbols and Punctuation [unicode.org]
- Hiragana [unicode.org]
- Katakana [unicode.org]
If you would like to check them, please add them to regexp.
You must log in to post a reply. Not a member yet? Register here!




