Package org.w3c.tidy

Class Node

java.lang.Object
org.w3c.tidy.Node

public class Node extends Object
Used for elements and text nodes element name is null for text nodes start and end are offsets into lexbuf which contains the textual content of all elements in the parse tree. Parent and content allow traversal of the parse tree in any direction. attributes are represented as a linked list of AttVal nodes which hold the strings for attribute/value pairs.
Version:
$Revision: 1107 $ ($Author: aditsu $)
Author:
Dave Raggett dsr@w3.org , Andy Quick ac.quick@sympatico.ca (translation to Java), Fabrizio Giustina
  • Field Details

    • ROOT_NODE

      public static final short ROOT_NODE
      node type: root.
      See Also:
    • DOCTYPE_TAG

      public static final short DOCTYPE_TAG
      node type: doctype.
      See Also:
    • COMMENT_TAG

      public static final short COMMENT_TAG
      node type: comment.
      See Also:
    • PROC_INS_TAG

      public static final short PROC_INS_TAG
      node type: .
      See Also:
    • TEXT_NODE

      public static final short TEXT_NODE
      node type: text.
      See Also:
    • START_TAG

      public static final short START_TAG
      Start tag.
      See Also:
    • END_TAG

      public static final short END_TAG
      End tag.
      See Also:
    • START_END_TAG

      public static final short START_END_TAG
      Start of an end tag.
      See Also:
    • CDATA_TAG

      public static final short CDATA_TAG
      node type: CDATA.
      See Also:
    • SECTION_TAG

      public static final short SECTION_TAG
      node type: section tag.
      See Also:
    • ASP_TAG

      public static final short ASP_TAG
      node type: asp tag.
      See Also:
    • JSTE_TAG

      public static final short JSTE_TAG
      node type: jste tag.
      See Also:
    • PHP_TAG

      public static final short PHP_TAG
      node type: php tag.
      See Also:
    • XML_DECL

      public static final short XML_DECL
      node type: doctype.
      See Also:
    • parent

      protected Node parent
      parent node.
    • prev

      protected Node prev
      pevious node.
    • next

      protected Node next
      next node.
    • last

      protected Node last
      last node.
    • start

      protected int start
      start of span onto text array.
    • end

      protected int end
      end of span onto text array.
    • textarray

      protected byte[] textarray
      the text array.
    • type

      protected short type
      TextNode, StartTag, EndTag etc.
    • closed

      protected boolean closed
      true if closed by explicit end tag.
    • implicit

      protected boolean implicit
      true if inferred.
    • linebreak

      protected boolean linebreak
      true if followed by a line break.
    • was

      protected Dict was
      old tag when it was changed.
    • tag

      protected Dict tag
      tag's dictionary definition.
    • element

      protected String element
      Tag name.
    • attributes

      protected AttVal attributes
      Attribute/Value linked list.
    • content

      protected Node content
      Contained node.
    • adapter

      protected Node adapter
      DOM adapter.
  • Constructor Details

    • Node

      public Node()
      Instantiates a new text node.
    • Node

      public Node(short type, byte[] textarray, int start, int end)
      Instantiates a new node.
      Parameters:
      type - node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECL
      textarray - array of bytes contained in the Node
      start - start position
      end - end position
    • Node

      public Node(short type, byte[] textarray, int start, int end, String element, TagTable tt)
      Instantiates a new node.
      Parameters:
      type - node type: Node.ROOT_NODE | Node.DOCTYPE_TAG | Node.COMMENT_TAG | Node.PROC_INS_TAG | Node.TEXT_NODE | Node.START_TAG | Node.END_TAG | Node.START_END_TAG | Node.CDATA_TAG | Node.SECTION_TAG | Node. ASP_TAG | Node.JSTE_TAG | Node.PHP_TAG | Node.XML_DECL
      textarray - array of bytes contained in the Node
      start - start position
      end - end position
      element - tag name
      tt - tag table instance
  • Method Details

    • getAttrByName

      public AttVal getAttrByName(String name)
      Returns an attribute with the given name in the current node.
      Parameters:
      name - attribute name.
      Returns:
      AttVal instance or null if no attribute with the iven name is found
    • checkAttributes

      public void checkAttributes(Lexer lexer)
      Default method for checking an element's attributes.
      Parameters:
      lexer - Lexer
    • repairDuplicateAttributes

      public void repairDuplicateAttributes(Lexer lexer)
      The same attribute name can't be used more than once in each element. Discard or join attributes according to configuration.
      Parameters:
      lexer - Lexer
    • addAttribute

      public void addAttribute(String name, String value)
      Adds an attribute to the node.
      Parameters:
      name - attribute name
      value - attribute value
    • removeAttribute

      public void removeAttribute(AttVal attr)
      Remove an attribute from node and then free it.
      Parameters:
      attr - attribute to remove
    • findDocType

      public Node findDocType()
      Find the doctype element.
      Returns:
      doctype node or null if not found
    • discardDocType

      public void discardDocType()
      Discard the doctype node.
    • discardElement

      public static Node discardElement(Node element)
      Remove node from markup tree and discard it.
      Parameters:
      element - discarded node
      Returns:
      next node
    • insertNodeAtStart

      public void insertNodeAtStart(Node node)
      Insert a node into markup tree.
      Parameters:
      node - to insert
    • insertNodeAtEnd

      public void insertNodeAtEnd(Node node)
      Insert node into markup tree.
      Parameters:
      node - Node to insert
    • insertNodeAsParent

      public static void insertNodeAsParent(Node element, Node node)
      Insert node into markup tree in pace of element which is moved to become the child of the node.
      Parameters:
      element - child node. Will be inserted as a child of element
      node - parent node
    • insertNodeBeforeElement

      public static void insertNodeBeforeElement(Node element, Node node)
      Insert node into markup tree before element.
      Parameters:
      element - child node. Will be insertedbefore element
      node - following node
    • insertNodeAfterElement

      public void insertNodeAfterElement(Node node)
      Insert node into markup tree after element.
      Parameters:
      node - new node to insert
    • trimEmptyElement

      public static void trimEmptyElement(Lexer lexer, Node element)
      Trim an empty element.
      Parameters:
      lexer - Lexer
      element - empty node to be removed
    • trimTrailingSpace

      public static void trimTrailingSpace(Lexer lexer, Node element, Node last)
      This maps hello world to hello world . If last child of element is a text node then trim trailing white space character moving it to after element's end tag.
      Parameters:
      lexer - Lexer
      element - node
      last - last child of element
    • escapeTag

      protected static Node escapeTag(Lexer lexer, Node element)
      Escapes the given tag.
      Parameters:
      lexer - Lexer
      element - node to be escaped
      Returns:
      escaped node
    • isBlank

      public boolean isBlank(Lexer lexer)
      Is the node content empty or blank? Assumes node is a text node.
      Parameters:
      lexer - Lexer
      Returns:
      true if the node content empty or blank
    • trimInitialSpace

      public static void trimInitialSpace(Lexer lexer, Node element, Node text)
      This maps <p> hello <em> world </em> to <p> hello <em> world </em>. Trims initial space, by moving it before the start tag, or if this element is the first in parent's content, then by discarding the space.
      Parameters:
      lexer - Lexer
      element - parent node
      text - text node
    • trimSpaces

      public static void trimSpaces(Lexer lexer, Node element)
      Move initial and trailing space out. This routine maps: hello world to hello world and hello world to hello world .
      Parameters:
      lexer - Lexer
      element - Node
    • isDescendantOf

      public boolean isDescendantOf(Dict tag)
      Is this node contained in a given tag?
      Parameters:
      tag - descendant tag
      Returns:
      true if node is contained in tag
    • insertDocType

      public static void insertDocType(Lexer lexer, Node element, Node doctype)
      The doctype has been found after other tags, and needs moving to before the html element.
      Parameters:
      lexer - Lexer
      element - document
      doctype - doctype node to insert at the beginning of element
    • findBody

      public Node findBody(TagTable tt)
      Find the body node.
      Parameters:
      tt - tag table
      Returns:
      body node
    • isElement

      public boolean isElement()
      Is the node an element?
      Returns:
      true if type is START_TAG | START_END_TAG
    • moveBeforeTable

      public static void moveBeforeTable(Node row, Node node, TagTable tt)
      Unexpected content in table row is moved to just before the table in accordance with Netscape and IE. This code assumes that node hasn't been inserted into the row.
      Parameters:
      row - Row node
      node - Node which should be moved before the table
      tt - tag table
    • fixEmptyRow

      public static void fixEmptyRow(Lexer lexer, Node row)
      If a table row is empty then insert an empty cell.This practice is consistent with browser behavior and avoids potential problems with row spanning cells.
      Parameters:
      lexer - Lexer
      row - row node
    • coerceNode

      public static void coerceNode(Lexer lexer, Node node, Dict tag)
      Coerce a node.
      Parameters:
      lexer - Lexer
      node - Node
      tag - tag dictionary reference
    • removeNode

      public void removeNode()
      Extract this node and its children from a markup tree.
    • insertMisc

      public static boolean insertMisc(Node element, Node node)
      Insert a node at the end.
      Parameters:
      element - parent node
      node - will be inserted at the end of element
      Returns:
      true if the node has been inserted
    • isNewNode

      public boolean isNewNode()
      Is this a new (user defined) node? Used to determine how attributes without values should be printed. This was introduced to deal with user defined tags e.g. Cold Fusion.
      Returns:
      true if this node represents a user-defined tag.
    • hasOneChild

      public boolean hasOneChild()
      Does the node have one (and only one) child?
      Returns:
      true if the node has one child
    • findHTML

      public Node findHTML(TagTable tt)
      Find the "html" element.
      Parameters:
      tt - tag table
      Returns:
      html node
    • findHEAD

      public Node findHEAD(TagTable tt)
      Find the head tag.
      Parameters:
      tt - tag table
      Returns:
      head node
    • findTITLE

      public Node findTITLE(TagTable tt)
    • checkNodeIntegrity

      public boolean checkNodeIntegrity()
      Checks for node integrity.
      Returns:
      false if node is not consistent
    • addClass

      public void addClass(String classname)
      Add a css class to the node. If a class attribute already exists adds the value to the existing attribute.
      Parameters:
      classname - css class name
    • toString

      public String toString()
      Overrides:
      toString in class Object
      See Also:
    • getAdapter

      protected Node getAdapter()
      Returns a DOM Node which wrap the current tidy Node.
      Returns:
      org.w3c.dom.Node instance
    • cloneNode

      protected Node cloneNode(boolean deep)
      Clone this node.
      Parameters:
      deep - if true deep clone the node (also clones all the contained nodes)
      Returns:
      cloned node
    • setType

      protected void setType(short newType)
      Setter for node type.
      Parameters:
      newType - a valid node type constant
    • isJavaScript

      public boolean isJavaScript()
      Used to check script node for script language.
      Returns:
      true if the script node contains javascript
    • expectsContent

      public boolean expectsContent()
      Does the node expect contents?
      Returns:
      false if this node should be empty