Package org.w3c.tidy

Class EncodingUtils

java.lang.Object
org.w3c.tidy.EncodingUtils

public final class EncodingUtils extends Object
Version:
$Revision: 622 $ ($Author: fgiust $)
Author:
Fabrizio Giustina
  • Field Details

    • UNICODE_BOM_BE

      public static final int UNICODE_BOM_BE
      the big-endian (default) UNICODE BOM.
      See Also:
    • UNICODE_BOM

      public static final int UNICODE_BOM
      the default (big-endian) UNICODE BOM.
      See Also:
    • UNICODE_BOM_LE

      public static final int UNICODE_BOM_LE
      the little-endian UNICODE BOM.
      See Also:
    • UNICODE_BOM_UTF8

      public static final int UNICODE_BOM_UTF8
      the UTF-8 UNICODE BOM.
      See Also:
    • FSM_ASCII

      public static final int FSM_ASCII
      states for ISO 2022 A document in ISO-2022 based encoding uses some ESC sequences called "designator" to switch character sets. The designators defined and used in ISO-2022-JP are: "ESC" + "(" + ? for ISO646 variants "ESC" + "$" + ? and "ESC" + "$" + "(" + ? for multibyte character sets. State ASCII.
      See Also:
    • FSM_ESC

      public static final int FSM_ESC
      state ESC.
      See Also:
    • FSM_ESCD

      public static final int FSM_ESCD
      state ESCD.
      See Also:
    • FSM_ESCDP

      public static final int FSM_ESCDP
      state ESCDP.
      See Also:
    • FSM_ESCP

      public static final int FSM_ESCP
      state ESCP.
      See Also:
    • FSM_NONASCII

      public static final int FSM_NONASCII
      state NONASCII.
      See Also:
    • MAX_UTF8_FROM_UCS4

      public static final int MAX_UTF8_FROM_UCS4
      Max UTF-88 valid char value.
      See Also:
    • MAX_UTF16_FROM_UCS4

      public static final int MAX_UTF16_FROM_UCS4
      Max UTF-16 value.
      See Also:
    • LOW_UTF16_SURROGATE

      public static final int LOW_UTF16_SURROGATE
      utf16 low surrogate.
      See Also:
    • UTF16_SURROGATES_BEGIN

      public static final int UTF16_SURROGATES_BEGIN
      UTF-16 surrogates begin.
      See Also:
    • UTF16_LOW_SURROGATE_BEGIN

      public static final int UTF16_LOW_SURROGATE_BEGIN
      UTF-16 surrogate pair areas: low surrogates begin.
      See Also:
    • UTF16_LOW_SURROGATE_END

      public static final int UTF16_LOW_SURROGATE_END
      UTF-16 surrogate pair areas: low surrogates end.
      See Also:
    • UTF16_HIGH_SURROGATE_BEGIN

      public static final int UTF16_HIGH_SURROGATE_BEGIN
      UTF-16 surrogate pair areas: high surrogates begin.
      See Also:
    • UTF16_HIGH_SURROGATE_END

      public static final int UTF16_HIGH_SURROGATE_END
      UTF-16 surrogate pair areas: high surrogates end.
      See Also:
    • HIGH_UTF16_SURROGATE

      public static final int HIGH_UTF16_SURROGATE
      UTF-16 high surrogate.
      See Also:
  • Method Details

    • decodeWin1252

      protected static int decodeWin1252(int c)
      Function for conversion from Windows-1252 to Unicode.
      Parameters:
      c - char to decode
      Returns:
      decoded char
    • decodeMacRoman

      protected static int decodeMacRoman(int c)
      Function to convert from MacRoman to Unicode.
      Parameters:
      c - char to decode
      Returns:
      decoded char