Package org.w3c.tidy

Class Configuration

java.lang.Object
org.w3c.tidy.Configuration
All Implemented Interfaces:
Serializable

public class Configuration extends Object implements Serializable
Read configuration file and manage configuration properties. Configuration files associate a property name with a value. The format is that of a Java .properties file.
Version:
$Revision: 817 $ ($Author: steffenyount $)
Author:
Dave Raggett dsr@w3.org , Andy Quick ac.quick@sympatico.ca (translation to Java), Fabrizio Giustina
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected String
    default text for alt attribute.
    static final int
    Deprecated. 
    protected boolean
    convert quotes and dashes to nearest ASCII char.
    static final int
    Deprecated. 
    protected boolean
    output BODY content only.
    protected boolean
    o/p newline before br or not?
    protected boolean
    create slides on each h2 element.
    protected String
    CSS class naming for -clean option.
    protected int
    track what types of tags user has defined to eliminate unnecessary searches.
    static final int
    treatment of doctype: auto.
    static final int
    treatment of doctype: loose.
    static final int
    treatment of doctype: omit.
    static final int
    treatment of doctype: strict.
    static final int
    treatment of doctype: user.
    protected int
    see doctype property.
    protected String
    user specified doctype.
    protected boolean
    discard empty p elements.
    protected boolean
    discard presentation tags.
    protected boolean
    discard proprietary attributes.
    protected int
    Keep first or last duplicate attribute.
    protected boolean
    if true format error output for GNU Emacs.
    protected boolean
    if yes text in blocks is wrapped in p's.
    protected boolean
    if yes text at body is wrapped in p's.
    protected String
    file name to write errors to.
    protected boolean
    replace CDATA sections with escaped text.
    protected boolean
    fix URLs by replacing \ with /.
    protected boolean
    fix comments with adjacent hyphens.
    protected boolean
    properly escape URLs.
    protected boolean
    output document even if errors were found.
    protected boolean
    hides all (real) comments in output.
    protected boolean
    suppress optional end tags.
    protected boolean
    output plain-old HTML, even for XHTML input.
    protected boolean
    newline+indent before each attribute.
    protected boolean
    indent CDATA sections.
    protected boolean
    indent content of appropriate tags.
    static final int
    Deprecated. 
    protected boolean
    join multiple class attributes.
    protected boolean
    join multiple style attributes.
    static final int
    Keep first duplicate attribute.
    static final int
    Keep last duplicate attribute.
    protected boolean
    if yes last modied time is preserved.
    protected String
    RJ language property.
    static final int
    Deprecated. 
    protected boolean
    if true attributes may use newlines.
    protected boolean
    replace i by em and b by strong.
    protected boolean
    folds known attribute values to lower case.
    static final int
    Deprecated. 
    protected boolean
    Make bare HTML: remove Microsoft cruft.
    protected boolean
    remove presentational clutter.
    protected boolean
    allow numeric character references.
    protected char[]
    bytes for the newline marker.
    protected boolean
    use numeric entities.
    protected boolean
    if true normal output is suppressed.
    protected boolean
    no 'Parsing X', guessed DTD or summary.
    protected boolean
    output naked ampersand as &.
    protected boolean
    output " marks as ".
    protected boolean
    output non-breaking space as entity.
    static final int
    Deprecated.
    use Tidy.setRawOut(true) for raw output
    protected boolean
    Avoid mapping values > 127 to entities.
    protected boolean
    replace hex color attribute values with names.
    protected String
    char encoding used when replacing illegal SGML chars, regardless of specified encoding.
    protected Report
    Report instance.
    static final int
    Deprecated. 
    protected int
    number of errors to put out.
    protected boolean
    however errors are always shown.
    protected String
    Deprecated.
    does nothing
    protected boolean
    does text/block level content effect indentation.
    protected int
    default indentation.
    protected int
    default tab size (8).
    protected boolean
    add meta element indicating tidied doc.
    protected boolean
    trim empty elements.
    protected TagTable
    TagTable associated with this Configuration.
    protected boolean
    output attributes in upper not lower case.
    protected boolean
    output tags in upper not lower case.
    static final int
    Deprecated. 
    static final int
    Deprecated. 
    static final int
    Deprecated. 
    static final int
    Deprecated. 
    static final int
    Deprecated. 
    protected boolean
    draconian cleaning for Word2000.
    protected boolean
    wrap within ASP pseudo elements.
    protected boolean
    wrap within attribute values.
    protected boolean
    wrap within JSTE pseudo elements.
    protected int
    default wrap margin (68).
    protected boolean
    wrap within PHP pseudo elements.
    protected boolean
    wrap within JavaScript string literals.
    protected boolean
    wrap within CDATA section tags.
    protected boolean
    if true then output tidied markup.
    protected boolean
    output extensible HTML.
    protected boolean
    create output as XML.
    protected boolean
    add <?xml?> for XML docs.
    protected boolean
    If set to yes PIs must end with ?>.
    protected boolean
    if set to yes adds xml:space attr as needed.
    protected boolean
    treat input as XML.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    protected
    Instantiates a new Configuration.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    adds configuration Properties.
    void
    Ensure that config is self consistent.
    protected String
    Convert a char encoding from the deprecated tidy constant to a standard java encoding name.
    protected String
    Getter for inCharEncodingName.
    protected String
    Getter for outCharEncodingName.
    static boolean
    Is the given String a valid configuration flag?
    void
    parseFile(String filename)
    Parses a property file.
    void
    printConfigOptions(Writer errout, boolean showActualConfiguration)
    prints available configuration options.
    protected void
    setInCharEncoding(int encoding)
    Deprecated.
    use setInCharEncodingName(String)
    protected void
    Setter for inCharEncodingName.
    protected void
    Setter for inOutCharEncodingName.
    protected void
    setOutCharEncoding(int encoding)
    Deprecated.
    use setOutCharEncodingName(String)
    protected void
    Setter for outCharEncodingName.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • RAW

      public static final int RAW
      Deprecated.
      use Tidy.setRawOut(true) for raw output
      character encoding = RAW.
      See Also:
    • ASCII

      public static final int ASCII
      Deprecated.
      character encoding = ASCII.
      See Also:
    • LATIN1

      public static final int LATIN1
      Deprecated.
      character encoding = LATIN1.
      See Also:
    • UTF8

      public static final int UTF8
      Deprecated.
      character encoding = UTF8.
      See Also:
    • ISO2022

      public static final int ISO2022
      Deprecated.
      character encoding = ISO2022.
      See Also:
    • MACROMAN

      public static final int MACROMAN
      Deprecated.
      character encoding = MACROMAN.
      See Also:
    • UTF16LE

      public static final int UTF16LE
      Deprecated.
      character encoding = UTF16LE.
      See Also:
    • UTF16BE

      public static final int UTF16BE
      Deprecated.
      character encoding = UTF16BE.
      See Also:
    • UTF16

      public static final int UTF16
      Deprecated.
      character encoding = UTF16.
      See Also:
    • WIN1252

      public static final int WIN1252
      Deprecated.
      character encoding = WIN1252.
      See Also:
    • BIG5

      public static final int BIG5
      Deprecated.
      character encoding = BIG5.
      See Also:
    • SHIFTJIS

      public static final int SHIFTJIS
      Deprecated.
      character encoding = SHIFTJIS.
      See Also:
    • DOCTYPE_OMIT

      public static final int DOCTYPE_OMIT
      treatment of doctype: omit.
      See Also:
      To do:
      should be an enumeration DocTypeMode
    • DOCTYPE_AUTO

      public static final int DOCTYPE_AUTO
      treatment of doctype: auto.
      See Also:
    • DOCTYPE_STRICT

      public static final int DOCTYPE_STRICT
      treatment of doctype: strict.
      See Also:
    • DOCTYPE_LOOSE

      public static final int DOCTYPE_LOOSE
      treatment of doctype: loose.
      See Also:
    • DOCTYPE_USER

      public static final int DOCTYPE_USER
      treatment of doctype: user.
      See Also:
    • KEEP_LAST

      public static final int KEEP_LAST
      Keep last duplicate attribute.
      See Also:
      To do:
      should be an enumeration DupAttrMode
    • KEEP_FIRST

      public static final int KEEP_FIRST
      Keep first duplicate attribute.
      See Also:
    • spaces

      protected int spaces
      default indentation.
    • wraplen

      protected int wraplen
      default wrap margin (68).
    • tabsize

      protected int tabsize
      default tab size (8).
    • docTypeMode

      protected int docTypeMode
      see doctype property.
    • duplicateAttrs

      protected int duplicateAttrs
      Keep first or last duplicate attribute.
    • altText

      protected String altText
      default text for alt attribute.
    • slidestyle

      protected String slidestyle
      Deprecated.
      does nothing
      style sheet for slides.
    • language

      protected String language
      RJ language property.
    • docTypeStr

      protected String docTypeStr
      user specified doctype.
    • errfile

      protected String errfile
      file name to write errors to.
    • writeback

      protected boolean writeback
      if true then output tidied markup.
    • onlyErrors

      protected boolean onlyErrors
      if true normal output is suppressed.
    • showWarnings

      protected boolean showWarnings
      however errors are always shown.
    • quiet

      protected boolean quiet
      no 'Parsing X', guessed DTD or summary.
    • indentContent

      protected boolean indentContent
      indent content of appropriate tags.
    • smartIndent

      protected boolean smartIndent
      does text/block level content effect indentation.
    • hideEndTags

      protected boolean hideEndTags
      suppress optional end tags.
    • xmlTags

      protected boolean xmlTags
      treat input as XML.
    • xmlOut

      protected boolean xmlOut
      create output as XML.
    • xHTML

      protected boolean xHTML
      output extensible HTML.
    • htmlOut

      protected boolean htmlOut
      output plain-old HTML, even for XHTML input. Yes means set explicitly.
    • xmlPi

      protected boolean xmlPi
      add <?xml?> for XML docs.
    • upperCaseTags

      protected boolean upperCaseTags
      output tags in upper not lower case.
    • upperCaseAttrs

      protected boolean upperCaseAttrs
      output attributes in upper not lower case.
    • makeClean

      protected boolean makeClean
      remove presentational clutter.
    • makeBare

      protected boolean makeBare
      Make bare HTML: remove Microsoft cruft.
    • logicalEmphasis

      protected boolean logicalEmphasis
      replace i by em and b by strong.
    • dropFontTags

      protected boolean dropFontTags
      discard presentation tags.
    • dropProprietaryAttributes

      protected boolean dropProprietaryAttributes
      discard proprietary attributes.
    • dropEmptyParas

      protected boolean dropEmptyParas
      discard empty p elements.
    • fixComments

      protected boolean fixComments
      fix comments with adjacent hyphens.
    • trimEmpty

      protected boolean trimEmpty
      trim empty elements.
    • breakBeforeBR

      protected boolean breakBeforeBR
      o/p newline before br or not?
    • burstSlides

      protected boolean burstSlides
      create slides on each h2 element.
    • numEntities

      protected boolean numEntities
      use numeric entities.
    • quoteMarks

      protected boolean quoteMarks
      output " marks as ".
    • quoteNbsp

      protected boolean quoteNbsp
      output non-breaking space as entity.
    • quoteAmpersand

      protected boolean quoteAmpersand
      output naked ampersand as &.
    • wrapAttVals

      protected boolean wrapAttVals
      wrap within attribute values.
    • wrapScriptlets

      protected boolean wrapScriptlets
      wrap within JavaScript string literals.
    • wrapSection

      protected boolean wrapSection
      wrap within CDATA section tags.
    • wrapAsp

      protected boolean wrapAsp
      wrap within ASP pseudo elements.
    • wrapJste

      protected boolean wrapJste
      wrap within JSTE pseudo elements.
    • wrapPhp

      protected boolean wrapPhp
      wrap within PHP pseudo elements.
    • fixBackslash

      protected boolean fixBackslash
      fix URLs by replacing \ with /.
    • indentAttributes

      protected boolean indentAttributes
      newline+indent before each attribute.
    • xmlPIs

      protected boolean xmlPIs
      If set to yes PIs must end with ?>.
    • xmlSpace

      protected boolean xmlSpace
      if set to yes adds xml:space attr as needed.
    • encloseBodyText

      protected boolean encloseBodyText
      if yes text at body is wrapped in p's.
    • encloseBlockText

      protected boolean encloseBlockText
      if yes text in blocks is wrapped in p's.
    • keepFileTimes

      protected boolean keepFileTimes
      if yes last modied time is preserved.
    • word2000

      protected boolean word2000
      draconian cleaning for Word2000.
    • tidyMark

      protected boolean tidyMark
      add meta element indicating tidied doc.
    • emacs

      protected boolean emacs
      if true format error output for GNU Emacs.
    • literalAttribs

      protected boolean literalAttribs
      if true attributes may use newlines.
    • bodyOnly

      protected boolean bodyOnly
      output BODY content only.
    • fixUri

      protected boolean fixUri
      properly escape URLs.
    • lowerLiterals

      protected boolean lowerLiterals
      folds known attribute values to lower case.
    • replaceColor

      protected boolean replaceColor
      replace hex color attribute values with names.
    • hideComments

      protected boolean hideComments
      hides all (real) comments in output.
    • indentCdata

      protected boolean indentCdata
      indent CDATA sections.
    • forceOutput

      protected boolean forceOutput
      output document even if errors were found.
    • showErrors

      protected int showErrors
      number of errors to put out.
    • asciiChars

      protected boolean asciiChars
      convert quotes and dashes to nearest ASCII char.
    • joinClasses

      protected boolean joinClasses
      join multiple class attributes.
    • joinStyles

      protected boolean joinStyles
      join multiple style attributes.
    • escapeCdata

      protected boolean escapeCdata
      replace CDATA sections with escaped text.
    • ncr

      protected boolean ncr
      allow numeric character references.
    • cssPrefix

      protected String cssPrefix
      CSS class naming for -clean option.
    • replacementCharEncoding

      protected String replacementCharEncoding
      char encoding used when replacing illegal SGML chars, regardless of specified encoding.
    • tt

      protected TagTable tt
      TagTable associated with this Configuration.
    • report

      protected Report report
      Report instance. Used for messages.
    • definedTags

      protected int definedTags
      track what types of tags user has defined to eliminate unnecessary searches.
    • newline

      protected char[] newline
      bytes for the newline marker.
    • rawOut

      protected boolean rawOut
      Avoid mapping values > 127 to entities.
  • Constructor Details

    • Configuration

      protected Configuration(Report report)
      Instantiates a new Configuration. This method should be called by Tidy only.
      Parameters:
      report - Report instance
  • Method Details

    • addProps

      public void addProps(Properties p)
      adds configuration Properties.
      Parameters:
      p - Properties
    • parseFile

      public void parseFile(String filename)
      Parses a property file.
      Parameters:
      filename - file name
    • isKnownOption

      public static boolean isKnownOption(String name)
      Is the given String a valid configuration flag?
      Parameters:
      name - configuration parameter name
      Returns:
      true if the given String is a valid config option
    • adjust

      public void adjust()
      Ensure that config is self consistent.
    • printConfigOptions

      public void printConfigOptions(Writer errout, boolean showActualConfiguration)
      prints available configuration options.
      Parameters:
      errout - where to write
      showActualConfiguration - print actual configuration values
    • getInCharEncodingName

      protected String getInCharEncodingName()
      Getter for inCharEncodingName.
      Returns:
      Returns the inCharEncodingName.
    • setInCharEncodingName

      protected void setInCharEncodingName(String encoding)
      Setter for inCharEncodingName.
      Parameters:
      encoding - The inCharEncodingName to set.
    • getOutCharEncodingName

      protected String getOutCharEncodingName()
      Getter for outCharEncodingName.
      Returns:
      Returns the outCharEncodingName.
    • setOutCharEncodingName

      protected void setOutCharEncodingName(String encoding)
      Setter for outCharEncodingName.
      Parameters:
      encoding - The outCharEncodingName to set.
    • setInOutEncodingName

      protected void setInOutEncodingName(String encoding)
      Setter for inOutCharEncodingName.
      Parameters:
      encoding - The CharEncodingName to set.
    • setOutCharEncoding

      protected void setOutCharEncoding(int encoding)
      Deprecated.
      use setOutCharEncodingName(String)
      Setter for outCharEncoding.
      Parameters:
      encoding - The outCharEncoding to set.
    • setInCharEncoding

      protected void setInCharEncoding(int encoding)
      Deprecated.
      use setInCharEncodingName(String)
      Setter for inCharEncoding.
      Parameters:
      encoding - The inCharEncoding to set.
    • convertCharEncoding

      protected String convertCharEncoding(int code)
      Convert a char encoding from the deprecated tidy constant to a standard java encoding name.
      Parameters:
      code - encoding code
      Returns:
      encoding name