Java(TM) ƽ̨ÖеÄÔö²¹×Ö·û
×÷ÕߣºSun Microsystems, Inc. µÄ Norbert Lindenberg ºÍ Masayoshi Okutsu
ÕªÒª
±¾ÎĽéÉÜ Java ƽ̨֧³ÖÔö²¹×Ö·ûµÄ·½Ê½¡£Ôö²¹×Ö·ûÊÇ Unicode ±ê×¼ÖдúÂëµã³¬³ö U+FFFF µÄ×Ö·û£¬Òò´ËËüÃÇÎÞ·¨ÔÚ Java ±à³ÌÓïÑÔÖÐÃèÊöΪµ¥¸öµÄ 16 λʵÌ壨ÀýÈç char Êý¾ÝÀàÐÍ£©¡£ÕâЩ×Ö·ûÒ»°ã¼«ÉÙÓ㬵«ÊÇ£¬ÓÐЩ»áÔÚÖîÈçÖÐÎÄ»òÈÕÎÄÈËÃûÖÐÓõ½£¬Òò´Ë£¬ÔÚ¶«Ñǹú¼Ò£¬Õþ¸®Ó¦ÓóÌÐòͨ³£»áÒªÇóÖ§³ÖÕâЩ×Ö·û¡£
Java ƽ̨ĿǰÕýÔڸĽø£¬ÒÔ±ãÖ§³Ö¶ÔÔö²¹×Ö·ûµÄ´¦Àí£¬ÕâÖָĽø¶ÔÏÖÓеÄÓ¦ÓóÌÐòÓ°Ïì΢ºõÆä΢¡£ÐµĵͲã API ÔÚÐèҪʱÄܹ»Ê¹Óõ¥¸öµÄ×Ö·ûÔËÐС£²»¹ý£¬´ó¶àÊýÎı¾´¦Àí API ¾ùʹÓÃ×Ö·ûÐòÁУ¬ÀýÈç String Àà»ò×Ö·ûÊý×é¡£ÏÖÔÚ£¬ÕâЩ¾ù½âÊÍΪ UTF-16 ÐòÁУ¬¶øÇÒ£¬ÕâЩ API ʵÏÖÒÑת±äΪÕýÈ·µØ´¦ÀíÔö²¹×Ö·û¡£ÕâЩ¸Ä½øÒÑÈÚÈë Java 2 ƽ̨ 1.5 °æ£¬±ê×¼°æ (J2SE)¡£
³ýÏêϸ½âÊÍÕâЩ¸Ä½øÖ®Í⣬±¾ÎÄͬʱΪӦÓóÌÐò¿ª·¢ÈËԱȷ¶¨ºÍʵÏÖ±ØÒªµÄ¸ü¸ÄÌṩָµ¼£¬ÒÔÖ§³ÖÕû¸ö Unicode ×Ö·û¼¯µÄʹÓá£
±³¾°
Unicode ×î³õÉè¼ÆÊÇ×÷ΪһÖ̶ֹ¨¿í¶ÈµÄ 16 λ×Ö·û±àÂë¡£ÔÚ Java ±à³ÌÓïÑÔÖУ¬»ù±¾Êý¾ÝÀàÐÍ char ³õÖÔÊÇͨ¹ýÌṩһÖÖ¼òµ¥µÄ¡¢Äܹ»°üº¬ÈκÎ×Ö·ûµÄÊý¾ÝÀàÐÍÀ´³ä·ÖÀûÓÃÕâÖÖÉè¼ÆµÄÓŵ㡣²»¹ý£¬ÏÖÔÚ¿´À´£¬16 λ±àÂëµÄËùÓÐ 65,536 ¸ö×Ö·û²¢²»ÄÜÍêÈ«±íʾȫÊÀ½çËùÓÐÕýÔÚʹÓûòÔø¾Ê¹ÓõÄ×Ö·û¡£ÓÚÊÇ£¬Unicode ±ê×¼ÒÑÀ©Õ¹µ½°üº¬¶à´ï 1,112,064 ¸ö×Ö·û¡£ÄÇЩ³¬³öÔÀ´µÄ 16 λÏÞÖÆµÄ×Ö·û±»³Æ×÷Ôö²¹×Ö·û¡£Unicode ±ê×¼ 2.0 °æÊǵÚÒ»¸ö°üº¬ÆôÓÃÔö²¹×Ö·ûÉè¼ÆµÄ°æ±¾£¬µ«ÊÇ£¬Ö±µ½ 3.1 °æ²ÅÊÕÈëµÚÒ»ÅúÔö²¹×Ö·û¼¯¡£ÓÉÓÚ J2SE µÄ 1.5 °æ±ØÐëÖ§³Ö Unicode ±ê×¼ 4.0 °æ£¬Òò´ËËü±ØÐëÖ§³ÖÔö²¹×Ö·û¡£
¶ÔÔö²¹×Ö·ûµÄÖ§³ÖÒ²¿ÉÄÜ»á³ÉΪ¶«ÑÇÊг¡µÄÒ»¸öÆÕ±éÉÌÒµÒªÇó¡£Õþ¸®Ó¦ÓóÌÐò»áÐèÒªÕâЩÔö²¹×Ö·û£¬ÒÔÕýÈ·±íʾһЩ°üº¬º±¼ûÖÐÎÄ×Ö·ûµÄÐÕÃû¡£³ö°æÓ¦ÓóÌÐò¿ÉÄÜ»áÐèÒªÕâЩÔö²¹×Ö·û£¬ÒÔ±íʾËùÓеĹŴú×Ö·ûºÍ±äÌå×Ö·û¡£ÖйúÕþ¸®ÒªÇóÖ§³Ö GB18030£¨Ò»ÖÖ¶ÔÕû¸ö Unicode ×Ö·û¼¯½øÐбàÂëµÄ×Ö·û±àÂë±ê×¼£©£¬Òò´Ë£¬Èç¹ûÊÇ Unicode 3.1 °æ»ò¸üа汾£¬Ôò½«°üÀ¨Ôö²¹×Ö·û¡£Ì¨Íå±ê×¼ CNS-11643 °üº¬µÄÐí¶à×Ö·ûÔÚ Unicode 3.1 ÖÐÁÐΪÔö²¹×Ö·û¡£Ïã¸ÛÕþ¸®¶¨ÒåÁËÒ»ÖÖÕë¶ÔÔÁÓïµÄ×Ö·û¼¯£¬ÆäÖеÄһЩ×Ö·ûÊÇ Unicode ÖеÄÔö²¹×Ö·û¡£×îºó£¬ÈÕ±¾µÄһЩ¹©Ó¦ÉÌÕý¼Æ»®ÀûÓÃÔö²¹×Ö·û¿Õ¼äÖдóÁ¿µÄרÓÿռäÊÕÈë 50,000 ¶à¸öÈÕÎĺº×Ö×Ö·û±äÌ壬ÒÔ±ã´ÓÆäרÓÐÏµÍ³Ç¨ÒÆÖÁ»ùÓÚ Java ƽ̨µÄ½â¾ö·½°¸¡£
Òò´Ë£¬Java ƽ̨²»½öÐèÒªÖ§³ÖÔö²¹×Ö·û£¬¶øÇÒ±ØÐëʹӦÓóÌÐòÄܹ»·½±ãµØ×öµ½ÕâÒ»µã¡£ÓÉÓÚÔö²¹×Ö·û´òÆÆÁË Java ±à³ÌÓïÑԵĻù´¡Éè¼Æ¹¹Ï룬¶øÇÒ¿ÉÄÜÒªÇó¶Ô±à³ÌÄ£ÐͽøÐиù±¾ÐÔµÄÐ޸ģ¬Òò´Ë£¬Java Community Process ÕÙ¼¯ÁËÒ»¸öר¼Ò×飬ÒÔÆÚÕÒµ½Ò»¸öÊʵ±µÄ½â¾ö·½°¸¡£¸ÃС×é±»³ÆÎª JSR-204 ר¼Ò×飬ʹÓà Unicode Ôö²¹×Ö·ûÖ§³ÖµÄ Java ¼¼Êõ¹æ·¶ÇëÇóµÄ±àºÅ¡£´Ó¼¼ÊõÉÏÀ´Ëµ£¬¸Ãר¼Ò×éµÄ¾ö¶¨½öÊÊÓÃÓÚ J2SE ƽ̨£¬µ«ÊÇÓÉÓÚ Java 2 ƽ̨ÆóÒµ°æ (J2EE) ´¦ÓÚ J2SE ƽ̨µÄ×îÉϲ㣬Òò´ËËü¿ÉÒÔÖ±½ÓÊÜÒæ£¬ÎÒÃÇÆÚÍû Java 2 ƽ̨ÐäÕä°æ (J2ME) µÄÅäÖÃÒ²²ÉÓÃÏàͬµÄÉè¼Æ·½·¨¡£
²»¹ý£¬ÔÚÁ˽â JSR-204 ר¼Ò×éÈ·¶¨µÄ½â¾ö·½°¸Ö®Ç°£¬ÎÒÃÇÐèÒªÏÈÀí½âһЩÊõÓï¡£
´úÂëµã¡¢×Ö·û±àÂë·½°¸¡¢UTF-16£ºÕâЩÊÇָʲô£¿
²»ÐÒµÄÊÇ£¬ÒýÈëÔö²¹×Ö·ûʹ×Ö·ûÄ£ÐͱäµÃ¸ü¼Ó¸´ÔÓÁË¡£ÔÚ¹ýÈ¥£¬ÎÒÃÇ¿ÉÒÔ¼òµ¥µØËµ“×Ö·û”£¬ÔÚÒ»¸ö»ùÓÚ Unicode µÄ»·¾³£¨ÀýÈç Java ƽ̨£©ÖУ¬¼Ù¶¨×Ö·ûÓÐ 16 룬¶øÏÖÔÚÎÒÃÇÐèÒª¸ü¶àµÄÊõÓï¡£ÎÒÃǻᾡÁ¿½éÉܵÃÏà¶Ô¼òµ¥Ò»Ð© — ÈçÐèÁ˽âËùÓÐÏêϸµÄÌÖÂÛÐÅÏ¢£¬Äú¿ÉÒÔÔĶÁ Unicode ±ê×¼µÚ 2 Õ»ò Unicode ¼¼Êõ±¨¸æ 17“×Ö·û±àÂëÄ£ÐÍ”¡£Unicode רҵÈËÊ¿¿ÉÂÔ¹ýËùÓнéÉÜÖ±½Ó²ÎÔı¾²¿·ÖÖеÄ×îºó¶¨Òå¡£
×Ö·ûÊdzéÏóµÄ×îСÎı¾µ¥Î»¡£ËüûÓй̶¨µÄÐÎ×´£¨¿ÉÄÜÊÇÒ»¸ö×ÖÐΣ©£¬¶øÇÒûÓÐÖµ¡£“A”ÊÇÒ»¸ö×Ö·û£¬“€”£¨µÂ¹ú¡¢·¨¹úºÍÐí¶àÆäËûÅ·ÖÞ¹ú¼ÒͨÓûõ±ÒµÄ±êÖ¾£©Ò²ÊÇÒ»¸ö×Ö·û¡£
×Ö·û¼¯ÊÇ×Ö·ûµÄ¼¯ºÏ¡£ÀýÈ磬ºº×Ö×Ö·ûÊÇÖйúÈË×îÏÈ·¢Ã÷µÄ×Ö·û£¬ÔÚÖÐÎÄ¡¢ÈÕÎÄ¡¢º«ÎĺÍÔ½ÄÏÎĵÄÊéдÖÐʹÓá£
±àÂë×Ö·û¼¯ÊÇÒ»¸ö×Ö·û¼¯£¬ËüΪÿһ¸ö×Ö·û·ÖÅäÒ»¸öΨһÊý×Ö¡£Unicode ±ê×¼µÄºËÐÄÊÇÒ»¸ö±àÂë×Ö·û¼¯£¬×Öĸ“A”µÄ±àÂëΪ 004116 ºÍ×Ö·û“€”µÄ±àÂëΪ 20AC16¡£Unicode ±ê׼ʼÖÕʹÓÃÊ®Áù½øÖÆÊý×Ö£¬¶øÇÒÔÚÊéдʱÔÚÇ°Ãæ¼ÓÉÏǰ׺“U+”£¬ËùÒÔ“A”µÄ±àÂëÊéдΪ“U+0041”¡£
´úÂëµãÊÇÖ¸¿ÉÓÃÓÚ±àÂë×Ö·û¼¯µÄÊý×Ö¡£±àÂë×Ö·û¼¯¶¨ÒåÒ»¸öÓÐЧµÄ´úÂëµã·¶Î§£¬µ«ÊDz¢²»Ò»¶¨½«×Ö·û·ÖÅ䏸ËùÓÐÕâЩ´úÂëµã¡£ÓÐЧµÄ Unicode ´úÂëµã·¶Î§ÊÇ U+0000 ÖÁ U+10FFFF¡£Unicode 4.0 ½«×Ö·û·ÖÅä¸øÒ»°Ù¶àÍò¸ö´úÂëµãÖÐµÄ 96,382 ´úÂëµã¡£
Ôö²¹×Ö·ûÊÇ´úÂëµãÔÚ U+10000 ÖÁ U+10FFFF ·¶Î§Ö®¼äµÄ×Ö·û£¬Ò²¾ÍÊÇÄÇЩʹÓÃÔʼµÄ Unicode µÄ 16 λÉè¼ÆÎÞ·¨±íʾµÄ×Ö·û¡£´Ó U+0000 ÖÁ U+FFFF Ö®¼äµÄ×Ö·û¼¯ÓÐʱºò±»³ÆÎª»ù±¾¶àÓïÑÔÃæ (BMP)¡£Òò´Ë£¬Ã¿Ò»¸ö Unicode ×Ö·ûҪôÊôÓÚ BMP£¬ÒªÃ´ÊôÓÚÔö²¹×Ö·û¡£
×Ö·û±àÂë·½°¸ÊÇ´ÓÒ»¸ö»ò¶à¸ö±àÂë×Ö·û¼¯µ½Ò»¸ö»ò¶à¸ö¹Ì¶¨¿í¶È´úÂëµ¥ÔªÐòÁеÄÓ³Éä¡£×î³£ÓõĴúÂëµ¥ÔªÊÇ×Ö½Ú£¬µ«ÊÇ 16 λ»ò 32 λÕûÊýÒ²¿ÉÓÃÓÚÄÚ²¿´¦Àí¡£UTF-32¡¢UTF-16 ºÍ UTF-8 ÊÇ Unicode ±ê×¼µÄ±àÂë×Ö·û¼¯µÄ×Ö·û±àÂë·½°¸¡£
UTF-32 ¼´½«Ã¿Ò»¸ö Unicode ´úÂëµã±íʾΪÏàֵͬµÄ 32 λÕûÊý¡£ºÜÃ÷ÏÔ£¬ËüÊÇÄÚ²¿´¦Àí×î·½±ãµÄ±í´ï·½Ê½£¬µ«ÊÇ£¬Èç¹û×÷Ϊһ°ã×Ö·û´®±í´ï·½Ê½£¬ÔòÒªÏûºÄ¸ü¶àµÄÄÚ´æ¡£
UTF-16 ʹÓÃÒ»¸ö»òÁ½¸öδ·ÖÅäµÄ 16 λ´úÂëµ¥ÔªµÄÐòÁÐ¶Ô Unicode ´úÂëµã½øÐбàÂë¡£Öµ U+0000 ÖÁ U+FFFF ±àÂëΪһ¸öÏàֵͬµÄ 16 λµ¥Ôª¡£Ôö²¹×Ö·û±àÂëΪÁ½¸ö´úÂëµ¥Ôª£¬µÚÒ»¸öµ¥ÔªÀ´×ÔÓڸߴúÀí·¶Î§£¨U+D800 ÖÁ U+DBFF£©£¬µÚ¶þ¸öµ¥ÔªÀ´×ÔÓڵʹúÀí·¶Î§£¨U+DC00 ÖÁ U+DFFF£©¡£ÕâÔÚ¸ÅÄîÉÏ¿ÉÄÜ¿´ÆðÀ´ÀàËÆÓÚ¶à×Ö½Ú±àÂ룬µ«ÊÇÆäÖÐÓÐÒ»¸öÖØÒªÇø±ð£ºÖµ U+D800 ÖÁ U+DFFF ±£ÁôÓÃÓÚ UTF-16£»Ã»ÓÐÕâЩֵ·ÖÅä×Ö·û×÷Ϊ´úÂëµã¡£ÕâÒâζ×Å£¬¶ÔÓÚÒ»¸ö×Ö·û´®ÖеÄÿ¸öµ¥¶ÀµÄ´úÂëµ¥Ôª£¬Èí¼þ¿ÉÒÔʶ±ðÊÇ·ñ¸Ã´úÂëµ¥Ôª±íʾij¸öµ¥µ¥Ôª×Ö·û£¬»òÕßÊÇ·ñ¸Ã´úÂëµ¥ÔªÊÇij¸öË«µ¥Ôª×Ö·ûµÄµÚÒ»¸ö»òµÚ¶þµ¥Ôª¡£ÕâÏ൱ÓÚijЩ´«Í³µÄ¶à×Ö½Ú×Ö·û±àÂëÀ´ËµÊÇÒ»¸öÏÔÖøµÄ¸Ä½ø£¬ÔÚ´«Í³µÄ¶à×Ö½Ú×Ö·û±àÂëÖУ¬×Ö½ÚÖµ 0x41 ¼È¿ÉÄܱíʾ×Öĸ“A”£¬Ò²¿ÉÄÜÊÇÒ»¸öË«×Ö½Ú×Ö·ûµÄµÚ¶þ¸ö×Ö½Ú¡£
UTF-8 ʹÓÃÒ»ÖÁËĸö×Ö½ÚµÄÐòÁжԱàÂë Unicode ´úÂëµã½øÐбàÂë¡£U+0000 ÖÁ U+007F ʹÓÃÒ»¸ö×Ö½Ú±àÂ룬U+0080 ÖÁ U+07FF ʹÓÃÁ½¸ö×Ö½Ú£¬U+0800 ÖÁ U+FFFF ʹÓÃÈý¸ö×Ö½Ú£¬¶ø U+10000 ÖÁ U+10FFFF ʹÓÃËĸö×Ö½Ú¡£UTF-8 Éè¼ÆÔÀíΪ£º×Ö½ÚÖµ 0x00 ÖÁ 0x7F ʼÖÕ±íʾ´úÂëµã U+0000 ÖÁ U+007F£¨Basic Latin ×Ö·û×Ó¼¯£¬Ëü¶ÔÓ¦ ASCII ×Ö·û¼¯£©¡£ÕâЩ×Ö½ÚÖµÓÀÔ¶²»»á±íʾÆäËû´úÂëµã£¬ÕâÒ»ÌØÐÔʹ UTF-8 ¿ÉÒԺܷ½±ãµØÔÚÈí¼þÖн«ÌØÊâµÄº¬Ò帳ÓèijЩ ASCII ×Ö·û¡£
ϱíËùʾΪ¼¸¸ö×Ö·û²»Í¬±í´ï·½Ê½µÄ±È½Ï£º
|
Unicode ´úÂëµã
|
U+0041
|
U+00DF
|
U+6771
|
U+10400
|
|
±íʾ×ÖÐÎ
|
|
|
|
|
|
UTF-32 ´úÂëµ¥Ôª
|
|
|
|
|
|
UTF-16 ´úÂëµ¥Ôª
|
|
|
|
|
|
UTF-8 ´úÂëµ¥Ôª
|
|
|
|
|
ÁíÍ⣬±¾ÎÄÔÚÐí¶àµØ·½Ê¹ÓÃÊõÓï×Ö·ûÐòÁлò char ÐòÁиÅÀ¨ Java 2 ƽ̨ʶ±ðµÄËùÓÐ×Ö·ûÐòÁеÄÈÝÆ÷£ºchar[], java.lang.CharSequence µÄʵÏÖ£¨ÀýÈç String Àࣩ£¬ºÍ java.text.CharacterIterator µÄʵÏÖ¡£
Õâô¶àÊõÓï¡£ËüÃÇÓëÔÚ Java ƽ̨ÖÐÖ§³ÖÔö²¹×Ö·ûÓÐʲô¹ØÏµÄØ£¿
Java ƽ̨ÖÐÔö²¹×Ö·ûµÄÉè¼Æ·½·¨
JSR-204 ר¼Ò×鱨Ðë×÷³öµÄÖ÷Òª¾ö¶¨ÊÇÈçºÎÔÚ Java API ÖбíʾÔö²¹×Ö·û£¬°üÀ¨µ¥¸ö×Ö·ûºÍËùÓÐÐÎʽµÄ×Ö·ûÐòÁС£×¨¼Ò×鿼ÂDz¢ÅųýÁ˶àÖÖ·½·¨£º
- ÖØÐ¶¨Òå»ù±¾ÀàÐÍ
char£¬Ê¹Æä¾ßÓÐ 32 룬ÕâÑùÒ²»áʹËùÓÐÐÎʽµÄ char ÐòÁгÉΪ UTF-32 ÐòÁС£
- ÔÚÏÖÓÐµÄ 16 λÀàÐÍ
char µÄ»ù´¡ÉÏ£¬Îª×Ö·ûÒýÈëÒ»ÖÖÐ嵀 32 λ»ù±¾ÀàÐÍ£¨ÀýÈ磬char32£©¡£ËùÓÐÐÎʽµÄ Char ÐòÁоù»ùÓÚ UTF-16¡£
- ÔÚÏÖÓÐµÄ 16 λÀàÐÍ
char µÄ»ù´¡ÉÏ£¬Îª×Ö·ûÒýÈëÒ»ÖÖÐ嵀 32 λ»ù±¾ÀàÐÍ£¨ÀýÈ磬char32£©¡£String ºÍ StringBuffer ½ÓÊܲ¢ÐÐ API£¬²¢½«ËüÃǽâÊÍΪ UTF-16 ÐòÁлò UTF-32 ÐòÁУ»ÆäËû char ÐòÁмÌÐø»ùÓÚ UTF-16¡£
- ʹÓÃ
int ±íʾÔö²¹µÄ´úÂëµã¡£String ºÍ StringBuffer ½ÓÊܲ¢ÐÐ API£¬²¢½«ËüÃǽâÊÍΪ UTF-16 ÐòÁлò UTF-32 ÐòÁУ»ÆäËû char ÐòÁмÌÐø»ùÓÚ UTF-16¡£
- ʹÓôúÀí
char ¶Ô£¬±íʾÔö²¹´úÂëµã¡£ËùÓÐÐÎʽµÄ char ÐòÁлùÓÚ UTF-16¡£
- ÒýÈëÒ»ÖÖ·â×°×Ö·ûµÄÀà¡£
String ºÍ StringBuffer ½ÓÊÜÐ嵀 API£¬²¢½«ËüÃǽâÊÍΪ´ËÀà×Ö·ûµÄÐòÁС£
- ʹÓÃÒ»¸ö
CharSequence ʵÀýºÍÒ»¸öË÷ÒýµÄ×éºÏ±íʾ´úÂëµã¡£
ÔÚÕâЩ·½·¨ÖУ¬Ò»Ð©ÔÚÔçÆÚ¾Í±»ÅųýÁË¡£ÀýÈç£¬ÖØÐ¶¨Òå»ù±¾ÀàÐÍ char£¬Ê¹Æä¾ßÓÐ 32 룬Õâ¶ÔÓÚÈ«ÐÂµÄÆ½Ì¨¿ÉÄÜ»á·Ç³£ÓÐÎüÒýÁ¦£¬µ«ÊÇ£¬¶ÔÓÚ J2SE À´Ëµ£¬Ëü»áÓëÏÖÓÐµÄ Java ÐéÄâ»ú1¡¢ÐòÁл¯ºÍÆäËû½Ó¿Ú²»¼æÈÝ£¬¸ü²»ÓÃ˵»ùÓÚ UTF-32 µÄ×Ö·û´®ÒªÊ¹ÓÃÁ½±¶ÓÚ»ùÓÚ UTF-16 µÄ×Ö·û´®µÄÄÚ´æÁË¡£Ìí¼ÓÒ»ÖÖÐÂÀàÐ굀 char32 ¿ÉÄÜ»á¼òµ¥Ò»Ð©£¬µ«ÊÇÈÔÈ»»á³öÏÖÐéÄâ»úºÍÐòÁл¯·½ÃæµÄÎÊÌâ¡£¶øÇÒ£¬ÓïÑÔ¸ü¸Äͨ³£ÐèÒª±È API ¸ü¸ÄÓиü³¤µÄÌáǰÆÚ£¬Òò´Ë£¬Ç°ÃæÁ½ÖÖ·½·¨»á¶ÔÔö²¹×Ö·ûÖ§³Ö´øÀ´ÎÞ·¨½ÓÊܵÄÑÓ³Ù¡£ÎªÁËÔÚÓàÏµķ½·¨ÖÐɸѡ³ö×îÓÅ·½°¸£¬ÊµÏÖС×éʹÓÃËÄÖÖ²»Í¬µÄ·½·¨£¬ÔÚ´óÁ¿½øÐеͲã×Ö·û´¦ÀíµÄ´úÂ루java.util.regex °ü£©ÖÐʵÏÖÁ˶ÔÔö²¹×Ö·ûÖ§³Ö£¬²¢¶ÔÕâËÄÖÖ·½·¨µÄÄÑÒ׳̶ȺÍÔËÐбíÏÖ½øÐÐÁ˱Ƚϡ£
×îÖÕ£¬×¨¼Ò×éÈ·¶¨ÁËÒ»ÖÖ·Ö²ãµÄ·½·¨£º
- ʹÓûù±¾ÀàÐÍ
int ÔڵͲã API Öбíʾ´úÂëµã£¬ÀýÈç Character ÀàµÄ¾²Ì¬·½·¨¡£
- ½«ËùÓÐÐÎʽµÄ
char ÐòÁоù½âÊÍΪ UTF-16 ÐòÁУ¬²¢´Ù½øÆäÔÚ¸ü¸ß²ã¼¶ API ÖеÄʹÓá£
- Ìṩ API£¬ÒÔ·½±ãÔÚ¸÷ÖÖ
char ºÍ»ùÓÚ´úÂëµãµÄ±íʾ·¨Ö®¼äµÄת»»¡£
ÔÚÐèҪʱ£¬´Ë·½·¨¼ÈÄܹ»ÌṩһÖÖ¸ÅÄî¼òÃ÷ÇÒ¸ßЧµÄµ¥¸ö×Ö·û±íʾ·¨£¬ÓÖÄܹ»³ä·ÖÀûÓÃͨ¹ý¸Ä½ø¿ÉÖ§³ÖÔö²¹×Ö·ûµÄÏÖÓÐ API¡£Í¬Ê±£¬»¹Äܹ»´Ù½ø×Ö·ûÐòÁÐÔÚµ¥¸ö×Ö·ûÉϵÄÓ¦Óã¬ÕâÒ»µãÒ»°ã¶ÔÓÚ¹ú¼Ê»¯µÄÈí¼þºÜÓкô¦¡£
ÔÚÕâÖÖ·½·¨ÖУ¬Ò»¸ö char ±íʾһ¸ö UTF-16 ´úÂëµ¥Ôª£¬ÕâÑù¶ÔÓÚ±íʾ´úÂëµãÓÐʱ²¢²»¹»Óá£Äú»á×¢Òâµ½£¬J2SE ¼¼Êõ¹æ·¶ÏÖÔÚʹÓÃÊõÓï´úÂëµãºÍ UTF-16 ´úÂëµ¥Ôª£¨±íʾ·¨ÊÇÏà¹ØµÄ£©ÒÔ¼°Í¨ÓÃÊõÓï×Ö·û£¨±íʾ·¨Óë¸ÃÌÖÂÛûÓйØÏµ£©¡£API ͨ³£Ê¹ÓÃÃû³Æ codePoint ÃèÊö±íʾ´úÂëµãµÄÀàÐÍ int µÄ±äÁ¿£¬¶ø UTF-16 ´úÂëµ¥ÔªµÄÀàÐ͵±È»Îª char¡£
ÎÒÃǽ«ÔÚÏÂÃæÁ½²¿·ÖÖÐÁ˽⵽ J2SE ƽ̨µÄʵÖʱ仯 — ÆäÖÐÒ»²¿·Ö½éÉܵ¥¸ö´úÂëµãµÄµÍ²ã API£¬ÁíÒ»²¿·Ö½éÉܲÉÓÃ×Ö·ûÐòÁеĸ߲ã½Ó¿Ú¡£
¿ª·ÅµÄÔö²¹×Ö·û£º»ùÓÚ´úÂëµãµÄ API
ÐÂÔöµÄµÍ²ã API ·ÖΪÁ½´óÀࣺÓÃÓÚ¸÷ÖÖ char ºÍ»ùÓÚ´úÂëµãµÄ±íʾ·¨Ö®¼äת»»µÄ·½·¨ºÍÓÃÓÚ·ÖÎöºÍÓ³Éä´úÂëµãµÄ·½·¨¡£
×î»ù±¾µÄת»»·½·¨ÊÇ Character.toCodePoint(char high, char low)£¨ÓÃÓÚ½«Á½¸ö UTF-16 ´úÂ뵥Ԫת»»ÎªÒ»¸ö´úÂëµã£©ºÍ Character.toChars(int codePoint)£¨ÓÃÓÚ½«Ö¸¶¨µÄ´úÂëµãת»»ÎªÒ»¸ö»òÁ½¸ö UTF-16 ´úÂëµ¥Ôª£¬È»ºó·â×°µ½Ò»¸ö char[] ÄÚ¡£²»¹ý£¬ÓÉÓÚ´ó¶àÊýÇé¿öÏÂÎı¾ÒÔ×Ö·ûÐòÁеÄÐÎʽ³öÏÖ£¬Òò´Ë£¬ÁíÍâÌṩ codePointAt ºÍ codePointBefore ·½·¨£¬ÓÃÓÚ½«´úÂëµã´Ó¸÷ÖÖ×Ö·ûÐòÁбíʾ·¨ÖÐÌáÈ¡³öÀ´£ºCharacter.codePointAt(char[] a, int index) ºÍ String.codePointBefore(int index) ÊÇÁ½ÖÖµäÐ͵ÄÀý×Ó¡£ÔÚ½«´úÂëµã²åÈë×Ö·ûÐòÁÐʱ£¬´ó¶àÊýÇé¿öϾùÓÐһЩÕë¶Ô StringBuffer ºÍ StringBuilder ÀàµÄ appendCodePoint(int codePoint) ·½·¨£¬ÒÔ¼°Ò»¸öÓÃÓÚÌáÈ¡±íʾ´úÂëµãµÄ int[] µÄ String ¹¹½¨Æ÷¡£
¼¸ÖÖÓÃÓÚ·ÖÎö´úÂëµ¥ÔªºÍ´úÂëµãµÄ·½·¨ÓÐÖúÓÚת»»¹ý³Ì£ºCharacter ÀàÖÐµÄ isHighSurrogate ºÍ isLowSurrogate ·½·¨¿ÉÒÔʶ±ðÓÃÓÚ±íʾÔö²¹×Ö·ûµÄ char Öµ£»charCount(int codePoint) ·½·¨¿ÉÒÔÈ·¶¨ÊÇ·ñÐèÒª½«Ä³¸ö´úÂëµãת»»ÎªÒ»¸ö»òÁ½¸ö char¡£
µ«ÊÇ£¬´ó¶àÊý»ùÓÚ´úÂëµãµÄ·½·¨¾ùÄܹ»¶ÔËùÓÐ Unicode ×Ö·ûʵÏÖ»ùÓÚ char µÄ¾É·½·¨¶Ô BMP ×Ö·ûËùʵÏֵŦÄÜ¡£ÒÔÏÂÊÇһЩµäÐÍÀý×Ó£º
Character.isLetter(int codePoint) ¿É¸ù¾Ý Unicode ±ê׼ʶ±ð×Öĸ¡£
Character.isJavaIdentifierStart(int codePoint) ¿É¸ù¾Ý Java ÓïÑԹ淶ȷ¶¨´úÂëµãÊÇ·ñ¿ÉÒÔÆô¶¯±êʶ·û¡£
Character.UnicodeBlock.of(int codePoint) ¿ÉËÑË÷´úÂëµãËùÊôµÄ Unicode ×Ö·û×Ó¼¯¡£
Character.toUpperCase(int codePoint) ¿É½«¸ø¶¨µÄ´úÂëµãת»»ÎªÆä´óдµÈÖµ×Ö·û¡£¾¡¹Ü´Ë·½·¨Äܹ»Ö§³ÖÔö²¹×Ö·û£¬µ«ÊÇËüÈÔÈ»²»Äܽâ¾ö¸ù±¾µÄÎÊÌ⣬¼´ÔÚijЩÇé¿öÏ£¬Öð¸ö×Ö·ûµÄת»»ÎÞ·¨ÕýÈ·Íê³É¡£ÀýÈ磬µÂÎÄ×Ö·û“"ß"”Ó¦¸Ãת»»Îª“SS”£¬ÕâÐèҪʹÓà String.toUpperCase ·½·¨¡£
×¢Òâ´ó¶àÊý½ÓÊÜ´úÂëµãµÄ·½·¨²¢²»¼ì²é¸ø¶¨µÄ int ÖµÊÇ·ñ´¦ÓÚÓÐЧµÄ Unicode ´úÂëµã·¶Î§Ö®ÄÚ£¨ÈçÉÏËùÊö£¬Ö»ÓÐ 0x0 ÖÁ 0x10FFFF Ö®¼äµÄ·¶Î§ÊÇÓÐЧµÄ£©¡£ÔÚ´ó¶àÊýÇé¿öÏ£¬¸ÃÖµÊÇÒÔÈ·±£ÆäÓÐЧµÄ·½·¨²úÉúµÄ£¬ÔÚÕâЩµÍ²ã API Öз´¸´¼ì²éÆäÓÐЧÐÔ¿ÉÄÜ»á¶ÔϵͳÐÔÄÜÔì³É¸ºÃæµÄÓ°Ïì¡£ÔÚÎÞ·¨È·±£ÓÐЧÐÔµÄÇé¿öÏ£¬Ó¦ÓóÌÐò±ØÐëʹÓà Character.isValidCodePoint ·½·¨È·±£´úÂëµãÓÐЧ¡£´ó¶àÊý·½·¨¶ÔÓÚÎÞЧµÄ´úÂëµã²ÉÈ¡µÄÐÐΪûÓÐÌØ±ð¼ÓÒÔÖ¸¶¨£¬²»Í¬µÄʵÏÖ¿ÉÄÜ»áÓÐËù²»Í¬¡£
API °üº¬Ðí¶à¼ò±ãµÄ·½·¨£¬ÕâЩ·½·¨¿ÉʹÓÃÆäËûµÍ²ãµÄ API ʵÏÖ£¬µ«ÊÇר¼Ò×é¾õµÃ£¬ÕâЩ·½·¨ºÜ³£Ó㬽«ËüÃÇÌí¼Óµ½ J2SE ƽ̨ÉϺÜÓÐÒâÒå¡£²»¹ý£¬×¨¼Ò×éÒ²ÅųýÁËһЩ½¨ÒéµÄ¼ò±ã·½·¨£¬Õâ¸øÎÒÃÇÌṩÁËÒ»´Îչʾ×Ô¼ºÊµÏÖ´ËÀà·½·¨ÄÜÁ¦µÄ»ú»á¡£ÀýÈ磬ר¼Ò×é¾¹ýÌÖÂÛ£¬ÅųýÁËÒ»ÖÖÕë¶Ô String ÀàµÄй¹½¨Æ÷£¨¸Ã¹¹½¨Æ÷¿ÉÒÔ´´½¨Ò»¸ö±£³Öµ¥¸ö´úÂëµãµÄ String£©¡£ÒÔÏÂÊÇʹӦÓóÌÐòʹÓÃÏÖÓÐµÄ API Ìṩ¹¦ÄܵÄÒ»ÖÖ¼ò±ã·½·¨£º
/**
* ´´½¨½öº¬ÓÐÖ¸¶¨´úÂëµãµÄРString¡£
*/
String newString(int codePoint) {
return new String(Character.toChars(codePoint));
}
Äú»á×¢Òâµ½£¬ÔÚÕâ¸ö¼òµ¥µÄʵÏÖÖУ¬toChars ·½·¨Ê¼ÖÕ´´½¨Ò»¸öÖмäÊýÁУ¬¸ÃÊýÁнöʹÓÃÒ»´Î¼´Á¢¼´¶ªÆú¡£Èç¹û¸Ã·½·¨ÔÚÄúµÄÐÔÄÜÆÀ¹ÀÖгöÏÖ£¬Äú¿ÉÄÜ»áÏ£Íû½«ÆäÓÅ»¯ÎªÕë¶Ô×îΪÆÕͨµÄÇé¿ö£¬¼´¸Ã´úÂëµãΪ BMP ×Ö·û£º
/**
* ´´½¨½öº¬ÓÐÖ¸¶¨´úÂëµãµÄРString¡£
* Õë¶Ô BMP ×Ö·ûÓÅ»¯µÄ°æ±¾¡£
*/
String newString(int codePoint) {
if (Character.charCount(codePoint) == 1) {
return String.valueOf((char) codePoint);
} else {
return new String(Character.toChars(codePoint));
}
}
»òÕߣ¬Èç¹ûÄúÐèÒª´´½¨Ðí¶à¸öÕâÑùµÄ string£¬Ôò¿ÉÄÜÏ£Íû±àдһ¸öÖØ¸´Ê¹Óà toChars ·½·¨ËùʹÓõÄÊýÁеÄͨÓð汾£º
/**
* ´´½¨Ã¿Ò»¸ö¾ùº¬ÓÐÒ»¸öÖ¸¶¨
* ´úÂëµãµÄРString¡£
* Õë¶Ô BMP ×Ö·ûÓÅ»¯µÄ°æ±¾¡£
*/
String[] newStrings(int[] codePoints) {
String[] result = new String[codePoints.length];
char[] codeUnits = new char[2];
for (int i = 0; i < codePoints.length; i++) {
int count = Character.toChars(codePoints[i], codeUnits, 0);
result[i] = new String(codeUnits, 0, count);
}
return result;
}
²»¹ý£¬×îÖÕÄú¿ÉÄܻᷢÏÖ£¬ÄúÐèÒªµÄÊÇÒ»¸öÍêÈ«²»Í¬µÄ½â¾ö·½°¸¡£ÐµĹ¹½¨Æ÷ String(int codePoint) ʵ¼ÊÉϽ¨Òé×÷Ϊ String.valueOf(char) µÄÒ»¸ö»ùÓÚ´úÂëµãµÄ±¸Ñ¡·½°¸¡£ÔںܶàÇé¿öÏ£¬´Ë·½·¨ÓÃÓÚÏûÏ¢Éú³ÉµÄ»·¾³£¬ÀýÈ磺
System.out.println("Character " + String.valueOf(char) + " is invalid.");
еĸñʽ»¯ API Ö§³ÖÔö²¹ÎÄ×Ö£¬ÌṩһÖÖ¸ü¼Ó¼òµ¥µÄ±¸Ñ¡·½°¸£º
System.out.printf("Character %c is invalid.%n", codePoint);
ʹÓô˸߲ã API ²»½ö¼ò½Ý£¬¶øËüÓкܶàÌØÊâµÄÓŵ㣺Ëü¿ÉÒÔ±ÜÃâ´®Áª£¨´®Áª»áʹÏûÏ¢ºÜÄѱ¾µØ»¯£©£¬²¢½«ÐèÒªÒÆ½ø×ÊÔ´°ü (resource bundle) µÄ×Ö·û´®ÊýÁ¿´ÓÁ½¸ö¼õÉÙµ½Ò»¸ö¡£
Ôö²¹×Ö·û͸ÊÓ£º¹¦ÄÜÔöÇ¿
ÔÚÖ§³ÖʹÓÃÔö²¹×Ö·ûµÄ Java 2 ƽ̨ÖеĴ󲿷ָü¸ÄûÓз´Ó³µ½Ð嵀 API ÄÚ¡£Ò»°ãÔ¤ÆÚÊÇ£¬´¦Àí×Ö·ûÐòÁеÄËùÓнӿڽ«ÒÔÊÊºÏÆä¹¦Äܵķ½Ê½´¦ÀíÔö²¹×Ö·û¡£±¾²¿·Ö×ÅÖØ½²ÊöΪ´ïµ½´ËÔ¤ÆÚËù×÷һЩ¹¦ÄÜÔöÇ¿¡£
Java ±à³ÌÓïÑÔÖеıêʶ·û
Java ÓïÑԹ淶ָ³öËùÓÐ Unicode ×ÖĸºÍÊý×Ö¾ù¿ÉÓÃÓÚ±êʶ·û¡£Ðí¶àÔö²¹×Ö·ûÊÇ×Öĸ»òÊý×Ö£¬Òò´Ë Java ÓïÑԹ淶ÒѾ²ÎÕÕеĻùÓÚ´úÂëµãµÄ·½·¨½øÐиüУ¬ÒÔÔÚ±êʶ·ûÄÚ¶¨ÒåºÏ·¨×Ö·û¡£ÎªÊ¹ÓÃÕâЩз½·¨£¬ÐèÒª¼ì²â±êʶ·ûµÄ javac ±àÒëÆ÷ºÍÆäËû¹¤¾ß¶¼½øÐÐÁËÐÞ¶©¡£
¿âÄÚµÄÔö²¹×Ö·ûÖ§³Ö
Ðí¶à J2SE ¿âÒѾ¹ýÔöÇ¿£¬¿ÉÒÔͨ¹ýÏÖÓнӿÚÖ§³ÖÔö²¹×Ö·û¡£ÒÔÏÂÊÇһЩÀý×Ó£º
- ×Ö·û´®´óСдת»»¹¦ÄÜÒѸüУ¬¿ÉÒÔ´¦ÀíÔö²¹×Ö·û£¬Ò²¿ÉÒÔʵÏÖ Unicode ±ê×¼Öй涨µÄÌØÊâ´óСд¹æÔò¡£
java.util.regex °üÒѸüУ¬ÕâÑùģʽ×Ö·û´®ºÍÄ¿±ê×Ö·û´®¾ù¿ÉÒÔ°üº¬Ôö²¹×Ö·û²¢½«Æä×÷ΪÍêÕûµ¥Ôª´¦Àí¡£
- ÏÖÔÚ£¬ÔÚ
java.text °üÄÚ½øÐÐÕûÀí´¦Àíʱ£¬»á½«Ôö²¹×Ö·û¿´×÷ÍêÕûµ¥Ôª¡£
java.text.Bidi ÀàÒѸüУ¬¿ÉÒÔ´¦ÀíÔö²¹×Ö·ûºÍ Unicode 4.0 ÖÐÐÂÔöµÄÆäËû×Ö·û¡£Çë×¢Ò⣬Cypriot Syllabary ×Ö·û×Ó¼¯ÄÚµÄÔö²¹×Ö·û¾ßÓдÓÓÒÖÁ×óµÄ·½ÏòÐÔ¡£
- Java 2D API ÄÚµÄ×ÖÌåäÖȾºÍ´òÓ¡¼¼ÊõÒѾ¹ýÔöÇ¿£¬¿ÉÒÔÕýÈ·äÖȾºÍ²âÁ¿°üº¬Ôö²¹×Ö·ûµÄ×Ö·û´®¡£
- Swing Îı¾×é¼þʵÏÖÒѸüУ¬¿ÉÒÔ´¦Àí°üº¬Ôö²¹×Ö·ûµÄÎı¾¡£
×Ö·ûת»»
Ö»ÓкÜÉÙµÄ×Ö·û±àÂë¿ÉÒÔ±íʾÔö²¹×Ö·û¡£Èç¹ûÊÇ»ùÓÚ Unicode µÄ±àÂ루Èç UTF-8 ºÍ UTF-16LE£©£¬Ôò¾É°æµÄ J2RE ÄÚµÄ×Ö·ûת»»Æ÷ÒѾ°´ÕÕÕýÈ·´¦ÀíÔö²¹×Ö·ûµÄ·½Ê½ÊµÏÖת»»¡£¶ÔÓÚ J2RE 1.5£¬¿ÉÒÔ±íʾÔö²¹×Ö·ûµÄÆäËû±àÂëµÄת»»Æ÷ÒѸüУºGB18030¡¢x-EUC-TW£¨ÏÖÔÚʵÏÖËùÓÐ CNS 11643 ²ãÃæ£©ºÍ Big5-HKSCS£¨ÏÖÔÚʵÏÖ HKSCS-2001£©¡£
ÔÚÔ´ÎļþÄÚ±íʾÔö²¹×Ö·û
ÔÚ Java ±à³ÌÓïÑÔÔ´ÎļþÖУ¬Èç¹ûʹÓÿÉÒÔÖ±½Ó±íʾÔö²¹×Ö·ûµÄ×Ö·û±àÂ룬ÔòʹÓÃÔö²¹×Ö·û×îΪ·½±ã¡£UTF-8 ÊÇ×î¼ÑµÄÑ¡Ôñ¡£ÔÚËùʹÓõÄ×Ö·û±àÂëÎÞ·¨Ö±½Ó±íʾ×Ö·ûµÄÇé¿öÏ£¬Java ±à³ÌÓïÑÔÌṩһÖÖ Unicode תÒå·ûÓï·¨¡£´ËÓ﷨ûÓо¹ýÔöÇ¿£¬ÎÞ·¨Ö±½Ó±íʾÔö²¹×Ö·û¡£¶øÊÇʹÓÃÁ½¸öÁ¬ÐøµÄ Unicode תÒå·û½«Æä±íʾΪ UTF-16 ×Ö·û±íʾ·¨ÖеÄÁ½¸ö±àÂëµ¥Ôª¡£ÀýÈ磬×Ö·û U+20000 д×÷“\uD840\uDC00”¡£ÄúÒ²Ðí²»Ô¸Òâ̽¾¿ÕâЩתÒåÐòÁеĺ¬Ò壻×îºÃÊÇдÈëÖ§³ÖËùÐèÔö²¹×Ö·ûµÄ±àÂ룬ȻºóʹÓÃÒ»ÖÖ¹¤¾ß£¨Èç native2ascii£©½«Æäת»»Îª×ªÒåÐòÁС£
Òź¶µÄÊÇ£¬ÓÉÓÚÆä±àÂëÎÊÌ⣬ÊôÐÔÎļþÈÔ¾ÖÏÞÓÚ ISO 8859-1£¨³ý·ÇÄúµÄÓ¦ÓóÌÐòʹÓÃÐ嵀 XML ¸ñʽ£©¡£ÕâÒâζ×ÅÄúʼÖÕ±ØÐë¶ÔÔö²¹×Ö·ûʹÓÃתÒåÐòÁУ¬¶øÇÒ¿ÉÄÜҪʹÓò»Í¬µÄ±àÂë½øÐбàд£¬È»ºóʹÓÃÖîÈç native2ascii µÄ¹¤¾ß½øÐÐת»»¡£
¾ÐÞ¶©µÄ UTF-8
Java ƽ̨¶Ô¾ÐÞ¶©µÄ UTF-8 ÒѾºÜÊìϤ£¬µ«ÊÇ£¬ÎÊÌâÊÇÓ¦ÓóÌÐò¿ª·¢ÈËÔ±ÔÚ¿ÉÄܰüº¬Ôö²¹×Ö·ûµÄÎı¾ºÍ UTF-8 Ö®¼ä½øÐÐת»»Ê±ÐèÒª¸ü¼ÓÁôÉñ¡£ÐèÒªÌØ±ð×¢ÒâµÄÊÇ£¬Ä³Ð© J2SE ½Ó¿ÚʹÓõıàÂëÓë UTF-8 ÏàËÆµ«ÓëÆä²¢²»¼æÈÝ¡£ÒÔǰ£¬´Ë±àÂëÓÐʱ±»³ÆÎª“Java modified UTF-8”£¨¾ Java ÐÞ¶©µÄ UTF-8£© »ò£¨´íÎ󵨣©Ö±½Ó³ÆÎª“UTF-8”¡£¶ÔÓÚ J2SE 1.5£¬Æä˵Ã÷ÎĵµÕýÔÚ¸üУ¬´Ë±àÂ뽫ͳ³ÆÎª“modified UTF-8”£¨¾ÐÞ¶©µÄ UTF-8£©¡£
¾ÐÞ¶©µÄ UTF-8 ºÍ±ê×¼ UTF-8 Ö®¼äÖ®ËùÒÔ²»¼æÈÝ£¬ÆäÔÒòÓÐÁ½µã¡£ÆäÒ»£¬¾ÐÞ¶©µÄ UTF-8 ½«×Ö·û U+0000 ±íʾΪ˫×Ö½ÚÐòÁÐ 0xC0 0x80£¬¶ø±ê×¼ UTF-8 ʹÓõ¥×Ö½ÚÖµ 0x0¡£Æä¶þ£¬¾ÐÞ¶©µÄ UTF-8 ͨ¹ý¶ÔÆä UTF-16 ±íʾ·¨µÄÁ½¸ö´úÀí´úÂëµ¥Ôªµ¥¶À½øÐбàÂë±íʾÔö²¹×Ö·û ¡£Ã¿¸ö´úÀí´úÂëµ¥ÔªÓÉÈý¸ö×Ö½ÚÀ´±íʾ£¬¹²ÓÐÁù¸ö×Ö½Ú¡£¶ø±ê×¼ UTF-8 ʹÓõ¥¸öËÄ×Ö½ÚÐòÁбíʾÕû¸ö×Ö·û¡£
Java ÐéÄâ»ú¼°Æä¸½´øµÄ½Ó¿Ú£¨Èç Java ±¾»ú½Ó¿Ú¡¢¶àÖÖ¹¤¾ß½Ó¿Ú»ò Java ÀàÎļþ£©ÔÚ java.io.DataInput ºÍ DataOutput ½Ó¿ÚºÍÀàÖÐʹÓþÐÞ¶©µÄ UTF-8 ʵÏÖ»òʹÓÃÕâЩ½Ó¿ÚºÍÀà £¬²¢½øÐÐÐòÁл¯¡£Java ±¾»ú½Ó¿ÚÌṩÓë¾ÐÞ¶©µÄ UTF-8 Ö®¼ä½øÐÐת»»µÄÀý³Ì¡£¶ø±ê×¼ UTF-8 ÓÉ String Àà¡¢java.io.InputStreamReader ºÍ OutputStreamWriter Àà¡¢java.nio.charset ÉèÊ© (facility) ÒÔ¼°Ðí¶àÆäÉϲãµÄ API Ìṩ֧³Ö¡£
ÓÉÓÚ¾ÐÞ¶©µÄ UTF-8 Óë±ê×¼µÄ UTF-8 ²»¼æÈÝ£¬Òò´ËÇÐÎðͬʱʹÓÃÕâÁ½ÖÖ°æ±¾µÄ±àÂë¡£¾ÐÞ¶©µÄ UTF-8 Ö»ÄÜÓëÉÏÊöµÄ Java ½Ó¿ÚÅäºÏʹÓá£ÔÚÈÎºÎÆäËûÇé¿öÏ£¬ÓÈÆä¶ÔÓÚ¿ÉÄÜÀ´×Ô·Ç»ùÓÚ Java ƽ̨µÄÈí¼þµÄ»ò¿ÉÄÜͨ¹ýÆä±àÒëµÄÊý¾ÝÁ÷£¬±ØÐëʹÓñê×¼µÄ UTF-8¡£ÐèҪʹÓñê×¼µÄ UTF-8 ʱ£¬Ôò²»ÄÜʹÓà Java ±¾»ú½Ó¿ÚÀý³ÌÓë¾ÐÞ¶©µÄ UTF-8 ½øÐÐת»»¡£
ÔÚÓ¦ÓóÌÐòÄÚÖ§³ÖÔö²¹×Ö·û
ÏÖÔÚ£¬¶Ô´ó¶àÊý¶ÁÕßÀ´Ëµ×îÎªÖØÒªµÄÎÊÌâÊÇ£º±ØÐë¶ÔÓ¦ÓóÌÐò½øÐÐÄÄЩ¸ü¸Ä²ÅÄÜÖ§³ÖÔö²¹×Ö·û£¿
´ð°¸È¡¾öÓÚÔÚÓ¦ÓóÌÐòÖнøÐÐÄÄÖÖÀàÐ͵ÄÎı¾´¦ÀíºÍʹÓÃÄÄЩ Java ƽ̨ API¡£
¶ÔÓÚ½öÒÔ¸÷ÖÖÐÎʽ char ÐòÁУ¨[char[]¡¢java.lang.CharSequence ʵÏÖ¡¢java.text.CharacterIterator ʵÏÖ£©´¦ÀíÎı¾ºÍ½öʹÓýÓÊܺÍÍË»ØÐòÁУ¨Èç char ÐòÁУ©µÄ Java API µÄÓ¦ÓóÌÐò£¬¿ÉÄܸù±¾²»ÐèÒª½øÐÐÈκθü¸Ä¡£Java ƽ̨ API µÄʵÏÖÓ¦¸ÃÄܹ»´¦ÀíÔö²¹×Ö·û¡£
¶ÔÓÚ±¾Éí½âÊ͵¥¸ö×Ö·û¡¢½«µ¥¸ö×Ö·û´«Ë͸ø Java ƽ̨ API »òµ÷ÓÃÄܹ»·µ»Øµ¥¸ö×Ö·ûµÄ·½·¨µÄÓ¦ÓóÌÐò£¬ÔòÐèÒª¿¼ÂÇÕâЩ×Ö·ûµÄÓÐЧֵ¡£ÔںܶàÇé¿öÏ£¬ÍùÍù²»ÒªÇóÖ§³ÖÔö²¹×Ö·û¡£ÀýÈ磬Èç¹ûijӦÓóÌÐòËÑË÷ char ÐòÁÐÖÐµÄ HTML ±ê¼Ç£¬²¢ÖðÒ»¼ì²éÿ¸ö char£¬Ëü»áÖªµÀÕâЩ±ê¼Ç½öʹÓà Basic Latin ×Ö·û×Ó¼¯ÖеÄ×Ö·û¡£Èç¹ûËùËÑË÷µÄÎı¾º¬ÓÐÔö²¹×Ö·û£¬ÔòÕâЩ×Ö·û²»»áÓë±ê¼Ç×Ö·û»ìÏý£¬ÒòΪ UTF-16 ʹÓôúÂëµ¥Ôª±íʾÔö²¹×Ö·û£¬¶ø´úÂëµ¥ÔªµÄÖµ²»»áÓÃÓÚ BMP ×Ö·û¡£
Ö»ÓÐÔÚijӦÓóÌÐò±¾Éí½âÊ͵¥¸ö×Ö·û¡¢½«µ¥¸ö×Ö·û´«Ë͸ø Java ƽ̨ API »òµ÷ÓÃÄܹ»·µ»Øµ¥¸ö×Ö·ûµÄ·½·¨ÇÒÕâЩ×Ö·û¿ÉÄÜΪÔö²¹×Ö·ûʱ£¬²Å±ØÐë¸ü¸Ä¸ÃÓ¦ÓóÌÐò¡£ÔÚÌṩʹÓà char ÐòÁеIJ¢ÐÐ API ʱ£¬×îºÃת¶øÊ¹ÓôËÀà API¡£ÔÚÆäËûÇé¿öÏ£¬ÓбØÒªÊ¹ÓÃÐ嵀 API ÔÚ char ºÍ»ùÓÚ´úÂëµãµÄ±íʾ·¨Ö®¼ä½øÐÐת»»£¬²¢µ÷ÓûùÓÚ´úÂëµãµÄ API¡£µ±È»£¬Èç¹ûÄú·¢ÏÖÔÚ J2SE 1.5 ÖÐÓиüС¢¸ü·½±ãµÄ API£¬Ê¹ÄúÄܹ»Ö§³ÖÔö²¹×Ö·û²¢Í¬Ê±¼ò»¯´úÂ루ÈçÉÏ ¸ñʽ»¯·¶Àý ÖÐËùÊö£©£¬ÔòûÓбØÒªÕâÑù×ö¡£
Äú¿ÉÄÜ»áÓÌÔ¥£¬Êǽ«ËùÓÐÎı¾×ª»»Îª´úÂëµã±íʾ·¨£¨¼´ int[]£©È»ºóÔڸñíʾ·¨Öд¦Àí£¬»¹ÊÇÔÚ´ó¶àÊýÇé¿öÏÂÈÔ²ÉÓà char ÐòÁУ¬½öÔÚÐèҪʱת»»Îª´úÂëµã£¬Á½ÕßÖ®¼äÊëÓÅÊëÁÓºÜÄÑÈ·¶¨¡£µ±È»£¬×ÜÌåÀ´Ëµ£¬Java ƽ̨ API Ïà¶ÔÓÚ char ÐòÁп϶¨¾ßÓÐÒ»¶¨µÄÓÅÊÆ£¬¶øÇÒ²ÉÓà Java ƽ̨ API ¿ÉÒÔ½ÚÊ¡ÄÚ´æ¿Õ¼ä¡£
¶ÔÓÚÐèÒªÓë UTF-8 Ö®¼ä½øÐÐת»»µÄÓ¦ÓóÌÐò£¬»¹ÐèÒªÈÏÕæ¿¼ÂÇÊÇÐèÒª±ê×¼µÄ UTF-8 »¹ÊǾÐÞ¶©µÄ UTF-8£¬²¢Õë¶ÔÿÖÖ UTF-8 ²ÉÓÃÊʵ±µÄ Java ƽ̨¡£“¾ÐÞ¶©µÄ UTF-8”²¿·Ö½éÉܽøÐÐÕýÈ·Ñ¡ÔñËùÐèµÄÐÅÏ¢¡£
ʹÓÃÔö²¹×Ö·û²âÊÔÓ¦ÓóÌÐò
¾¹ýÇ°Ãæ²¿·ÖµÄ½éÉܺó£¬ÎÞÂÛÄúÊÇ·ñÐèÒªÐÞ¶©Ó¦ÓóÌÐò£¬²âÊÔÓ¦ÓóÌÐòÊÇ·ñÔËÐÐÕý³£Ê¼ÖÕÊÇÒ»ÖÖÕýÈ·µÄ×ö·¨¡£¶ÔÓÚ²»º¬ÓÐͼÐÎÓû§½çÃæµÄÓ¦ÓóÌÐò£¬ÓйؓÔÚÔ´ÎļþÄÚ±íʾÔö²¹×Ö·û”¡¡µÄÐÅÏ¢ÓÐÖúÓÚÉè¼Æ²âÊÔÓÃÀý¡£ÒÔÏÂÊÇÓйØÊ¹ÓÃͼÐÎÓû§½çÃæ½øÐвâÊԵIJ¹³äÐÅÏ¢¡£
¶ÔÓÚÎı¾ÊäÈ룬Java 2 SDK ÌṩÓÃÓÚ½ÓÊÜ“\Uxxxxxx”¸ñʽ×Ö·û´®µÄ´úÂëµãÊäÈë·½·¨£¬ÕâÀï´óдµÄ“U”±íʾתÒåÐòÁаüº¬Áù¸öÊ®Áù½øÖÆÊý×Ö£¬Òò´ËÔÊÐíʹÓÃÔö²¹×Ö·û¡£Ð¡Ð´µÄ“u”±íʾתÒåÐòÁГ\uxxxx”µÄÔʼ¸ñʽ¡£Äú¿ÉÒÔÔÚ J2SDK Ŀ¼ demo/jfc/CodePointIM ÄÚÕÒµ½´ËÊäÈë·½·¨¼°Æä˵Ã÷Îĵµ£¨´Ó J2SE 1.5.0 Beta 2 °æ±¾¿ªÊ¼£©¡£
¶ÔÓÚ×ÖÌåäÖȾ£¬ÄúÐèÒªÖÁÉÙÄܹ»äÖȾһЩÔö²¹×Ö·ûµÄ×ÖÌå¡£ÆäÖÐÒ»ÖÖ´ËÀà×ÖÌåΪ James Kass µÄ Code2001 ×ÖÌ壬ËüÌṩÊÖдÌå×ÖÐΣ¨Èç Deseret ºÍ Old Italic£©¡£ÀûÓà Java 2D ¿âÖÐÌṩй¦ÄÜ£¬ÄúÖ»Ð轫¸Ã×ÖÌå°²×°µ½ J2RE µÄ lib/fonts/fallback Ŀ¼ÄÚ¼´¿É£¬È»ºóËü¿É×Ô¶¯Ìí¼ÓÖÁÔÚ 2D ºÍ XAWT äÖȾʱʹÓõÄËùÓÐÂß¼×ÖÌå — ÎÞÐè±à¼×ÖÌåÅäÖÃÎļþ¡£
ÖÁ´Ë£¬Äú¾Í¿ÉÒÔÈ·ÈÏ£¬ÄúµÄÓ¦ÓóÌÐòÄܹ»Íêȫ֧³ÖÔö²¹×Ö·ûÁË£¡
½áÂÛ
¶ÔÔö²¹×Ö·ûµÄÖ§³ÖÒѾÒýÈë Java ƽ̨£¬´ó²¿·ÖÓ¦ÓóÌÐòÎÞÐè¸ü¸Ä´úÂë¼´¿É´¦ÀíÕâЩ×Ö·û¡£½âÊ͵¥¸ö×Ö·ûµÄÓ¦ÓóÌÐò¿ÉÒÔÔÚ Character ÀàºÍ¶àÖÖ CharSequence ×ÓÀàÖÐʹÓûùÓÚ´úÂëµãµÄРAPI¡£
Ãùл
Java ƽ̨ÖеÄÔö²¹×Ö·ûÖ§³ÖÓÉ Java Community Process µÄ JSR-204 ר¼Ò×éÉè¼Æ¡£¼¼Êõ¹æ·¶Éè¼ÆÖ÷³ÖΪ Masayoshi Okutsu ºÍ Brian Beck (Sun Microsystems)£¬ÆäËûר¼Ò×é³ÉÔ±ÓÐ Craig Cummings (Oracle)¡¢Mark Davis (IBM)¡¢Markus Eble (SAP AG)¡¢Jere Käpyaho (Nokia Corp.)¡¢Kazuhiro Kazama (NTT)¡¢Kenji Kazumura (Fujitsu Limited)¡¢Eiichi Kimura (NEC Corp.)¡¢Changshin Lee (Tmax Soft Inc.) ºÍ Toshiki Murata (Oki Electric Industry Co.)¡£²Î¿¼ÊµÏÖÓÉ Sun Microsystems µÄ Java Internationalization ÍŶÓÍê³É£¬²¢³ÐÃÉλÓÚÊ¥ºÎÈûµÄ IBM Globalization Center of Competency µÄÐÖú¡£¼¼Êõ¹æ·¶µÄ¼¼Êõ¼æÈÝÌ×¼þΪ Java Compatibility Kit£¬ÓÉ Sun Microsystems µÄ JCK ÍŶÓʵÏÖ¡£
²Î¿¼ÊéÄ¿
Masayoshi Okutsu, Brian Beck (ed.): Unicode Supplementary Character Support. Public Review Draft. Sun Microsystems, 2004.
Java
2 Platform, Standard Edition, v 1.5.0 API Specification. Sun Microsystems, 2004.
The Unicode Consortium: The Unicode Standard, Version 4.0. Addison-Wesley, 2003.
Ken Whistler, Mark Davis: Character Encoding Model. Unicode Technical Report #17. The Unicode Consortium, 2000.
James Kass: Code2001, a Plane 1 Unicode-based Font.
¹ØÓÚ×÷Õß
Norbert Lindenberg ÊÇ Sun Microsystems µÄ Java Web Services ÍŶÓÄÚ Java Internationalization ¼¼ÊõÖ÷¹Ü¡£ÔÚ¼ÓÃË Sun ֮ǰ£¬Ôø¾¹©Ö°ÓÚ General Magic ºÍ Apple Computer£¬²ÎÓë¹ý¶à¸ö¹ú¼Ê»¯ÏîÄ¿¡£Ëû±ÏÒµÓڵ¹úµÄ¿¨¶û˹³¶ò´óѧ£¬ÓµÓмÆËã»ú¿ÆÑ§Àí¿ÆË¶Ê¿Ñ§Î»¡£
Masayoshi Okutsu ÊÇ Sun Microsystems µÄ Java Web Services ÍŶӵÄÒ»Ãû¹ú¼Ê»¯¹¤³Ìʦ£¬Ä¿Ç°µ£ÈÎ Unicode Supplementary Character Support µÄ Java Specification Request 204 µÄ¼¼Êõ¹æ·¶Ö÷¹Ü¡£ÔÚ¼ÓÃË Sun Microsystems ֮ǰ£¬¹©Ö°ÓÚ Digital Equipment Corporation£¬ÆÚ¼äÔø¾²ÎÓë¶à¸ö¹ú¼Ê»¯ÏîÄ¿¡£Ëû±ÏÒµÓÚÈÕ±¾É½Ðδóѧ£¬ÓµÓеç×Ó¹¤³ÌÀíѧʿѧλ¡£
|