public final class EncodingUtils
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
(package private) static interface |
EncodingUtils.GetBytes
Getter callback: called to retrieve 1 or more additional UTF-8 bytes.
|
(package private) static interface |
EncodingUtils.PutBytes
Putter callbacks: called to store 1 or more additional UTF-8 bytes.
|
Modifier and Type | Field and Description |
---|---|
static int |
FSM_ASCII
states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch
character sets.
|
static int |
FSM_ESC
state ESC.
|
static int |
FSM_ESCD
state ESCD.
|
static int |
FSM_ESCDP
state ESCDP.
|
static int |
FSM_ESCP
state ESCP.
|
static int |
FSM_NONASCII
state NONASCII.
|
static int |
HIGH_UTF16_SURROGATE
UTF-16 high surrogate.
|
static int |
LOW_UTF16_SURROGATE
utf16 low surrogate.
|
private static int[] |
MAC2UNICODE
John Love-Jensen contributed this table for mapping MacRoman character set to Unicode.
|
static int |
MAX_UTF16_FROM_UCS4
Max UTF-16 value.
|
static int |
MAX_UTF8_FROM_UCS4
Max UTF-88 valid char value.
|
private static int |
NUM_UTF8_SEQUENCES
number of valid utf8 sequances.
|
private static int[] |
OFFSET_UTF8_SEQUENCES
Offset for utf8 sequences.
|
private static int[] |
SYMBOL2UNICODE
table to map symbol font characters to Unicode; undefined characters are mapped to 0x0000 and characters without
any unicode equivalent are mapped to '?'.
|
static int |
UNICODE_BOM
the default (big-endian) UNICODE BOM.
|
static int |
UNICODE_BOM_BE
the big-endian (default) UNICODE BOM.
|
static int |
UNICODE_BOM_LE
the little-endian UNICODE BOM.
|
static int |
UNICODE_BOM_UTF8
the UTF-8 UNICODE BOM.
|
static int |
UTF16_HIGH_SURROGATE_BEGIN
UTF-16 surrogate pair areas: high surrogates begin.
|
static int |
UTF16_HIGH_SURROGATE_END
UTF-16 surrogate pair areas: high surrogates end.
|
static int |
UTF16_LOW_SURROGATE_BEGIN
UTF-16 surrogate pair areas: low surrogates begin.
|
static int |
UTF16_LOW_SURROGATE_END
UTF-16 surrogate pair areas: low surrogates end.
|
static int |
UTF16_SURROGATES_BEGIN
UTF-16 surrogates begin.
|
private static int |
UTF8_BYTE_SWAP_NOT_A_CHAR
UTF-8 bye swap: invalid char.
|
private static int |
UTF8_NOT_A_CHAR
UTF-8 invalid char.
|
private static ValidUTF8Sequence[] |
VALID_UTF8
Array of valid UTF8 sequences.
|
private static int[] |
WIN2UNICODE
Mapping for Windows Western character set (128-159) to Unicode.
|
Modifier | Constructor and Description |
---|---|
private |
EncodingUtils()
don't instantiate.
|
Modifier and Type | Method and Description |
---|---|
protected static int |
decodeMacRoman(int c)
Function to convert from MacRoman to Unicode.
|
(package private) static int |
decodeSymbolFont(int c)
Function to convert from Symbol Font chars to Unicode.
|
(package private) static boolean |
decodeUTF8BytesToChar(int[] c,
int firstByte,
byte[] successorBytes,
EncodingUtils.GetBytes getter,
int[] count,
int startInSuccessorBytesArray)
Decodes an array of bytes to a char.
|
protected static int |
decodeWin1252(int c)
Function for conversion from Windows-1252 to Unicode.
|
(package private) static boolean |
encodeCharToUTF8Bytes(int c,
byte[] encodebuf,
EncodingUtils.PutBytes putter,
int[] count)
Encode a char to an array of bytes.
|
public static final int UNICODE_BOM_BE
public static final int UNICODE_BOM
public static final int UNICODE_BOM_LE
public static final int UNICODE_BOM_UTF8
public static final int FSM_ASCII
public static final int FSM_ESC
public static final int FSM_ESCD
public static final int FSM_ESCDP
public static final int FSM_ESCP
public static final int FSM_NONASCII
public static final int MAX_UTF8_FROM_UCS4
public static final int MAX_UTF16_FROM_UCS4
public static final int LOW_UTF16_SURROGATE
public static final int UTF16_SURROGATES_BEGIN
public static final int UTF16_LOW_SURROGATE_BEGIN
public static final int UTF16_LOW_SURROGATE_END
public static final int UTF16_HIGH_SURROGATE_BEGIN
public static final int UTF16_HIGH_SURROGATE_END
public static final int HIGH_UTF16_SURROGATE
private static final int UTF8_BYTE_SWAP_NOT_A_CHAR
private static final int UTF8_NOT_A_CHAR
private static final int[] WIN2UNICODE
private static final int[] MAC2UNICODE
private static final int[] SYMBOL2UNICODE
private static final ValidUTF8Sequence[] VALID_UTF8
private static final int NUM_UTF8_SEQUENCES
private static final int[] OFFSET_UTF8_SEQUENCES
protected static int decodeWin1252(int c)
c
- char to decodeprotected static int decodeMacRoman(int c)
c
- char to decodestatic int decodeSymbolFont(int c)
c
- char to decodestatic boolean decodeUTF8BytesToChar(int[] c, int firstByte, byte[] successorBytes, EncodingUtils.GetBytes getter, int[] count, int startInSuccessorBytesArray)
c
- will contain the decoded charfirstByte
- first input bytesuccessorBytes
- array containing successor bytes (can be null if a getter is provided).getter
- callback used to get new bytes if successorBytes doesn't contain enough bytescount
- will contain the number of bytes readstartInSuccessorBytesArray
- starting offset for bytes in successorBytestrue
if errorstatic boolean encodeCharToUTF8Bytes(int c, byte[] encodebuf, EncodingUtils.PutBytes putter, int[] count)
c
- char to encodeencodebuf
- will contain the decoded bytesputter
- if not null it will be called to write bytes to outcount
- number of bytes writtenfalse
= ok, true
= error