获取汉字首字母

作者在 2010-12-15 10:34:53 发布以下内容
此工具用来获取拼音首字母,原理是获取汉字编码的首字节,对比特殊汉字的首字节,并设定对应的拼音首字母关系来获取。

 

此算法的来自网上,我做了修改与封装,拿出来以方便使用。

 

/**
*
拼音首字母的工具
*
* @author leizhimin 2009-7-4 16:41:28
*/

public class PinyinToolkit {
        
private PinyinToolkit() {
        }

        
//字母Z使用了两个标签,这里有27个值
        
//i, u, v都不做声母, 跟随前面的字母
        
private static char[] chartable = {
                        '
', '', '', '', '', '', '', '', '',
                        '
', '', '', '', '', '', '', '', '',
                        '
', '', '', '', '', '', '', '', ''
        };
        
//首字母表
        
private static char[] initialtable = {
                        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
                        'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
                        's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
        };

        
private static int[] table = new int[27];

        
//初始化
        
static {
                
for (int i = 0; i < 27; ++i) {
                        table[i] = gbValue(chartable[i]);
                }
        }

        
/**
         *
根据一个包含汉字的字符串返回一个汉字拼音首字母的字符串
         *
         * @param SourceStr
源字符串
         * @return
拼音首字母的字符串
         */

        
public static String cn2Pinyin(String SourceStr) {
                String Result = "";
                
int StrLength = SourceStr.length();
                
int i;
                
try {
                        
for (i = 0; i < StrLength; i++) {
                                Result += Char2Initial(SourceStr.charAt(i));
                        }
                }
catch (Exception e) {
                        Result = "";
                }
                
return Result;
        }

        
//------------------------private方法区------------------------
        
/**
         *
输入字符,得到他的声母,英文字母返回对应的大写字母,其他非简体汉字返回 '0'
         *
         * @param ch
字符
         * @return
拼音首字母
         */

        
private static char Char2Initial(char ch) {
                
if (ch >= 'a' && ch <= 'z')
                        
return (char) (ch - 'a' + 'A');
                
if (ch >= 'A' && ch <= 'Z')
                        
return ch;
                
int gb = gbValue(ch);
                
if (gb < table[0])
                        
return ch;
                
int i;
                
for (i = 0; i < 26; ++i) {
                        
if (match(i, gb))
                                
break;
                }
                
if (i >= 26)
                        
return ch;
                
else
                        
return initialtable[i];
        }


        
private static boolean match(int i, int gb) {
                
if (gb < table[i])
                        
return false;
                
int j = i + 1;
                
//字母Z使用了两个标签
                
while (j < 26 && (table[j] == table[i]))
                        ++j;
                
if (j == 26)
                        
return gb <= table[j];
                
else
                        
return gb < table[j];

        }

        
/**
         *
取出汉字的编码
         * cn
汉字
         */

        
private static int gbValue(char ch) {
                String str =
new String();
                str += ch;
                
try {
                        
byte[] bytes = str.getBytes("GB2312");
                        
if (bytes.length < 2)
                                
return 0;
                        
return (bytes[0] << 8 & 0xff00) + (bytes[1] & 0xff);
                }
catch (Exception e) {
                        
return 0;
                }
        }

        
public static void main(String[] args) throws Exception {
                System.out.println(cn2Pinyin(
"熔岩"));
                System.out.println(cn2Pinyin(""));
                System.out.println(cn2Pinyin(
"熔岩abc123"));
        }
}

 

运行结果:

ry

ryABC123

Process finished with exit code 0

 

特别说明:本工具支持的中文文字有限,如果您有更好的算法,也请奉献出来,或与我交流。

 

当然,如果要获取完整的拼音,则需要借助拼音库来实现了。

专业文章 | 阅读 1172 次
文章评论,共0条
游客请输入验证码
浏览275927次