public class Clean
extends java.lang.Object
...
....
Such rules are applied to the element's content and then to the element itself until none of the rules more apply. Having applied all the rules to an element, it will have a style attribute with one or more properties. Other rules strip the element they apply to, replacing it by style properties on the contents, e.g....
... These rules are applied to an element before processing its content and replace the current element by the first element in the exposed content. After applying both sets of rules, you can replace the style attribute by a class value and style rule in the document head. To support this, an association of styles and class names is built. A naive approach is to rely on string matching to test when two property lists are the same. A better approach would be to first sort the properties before matching.
Modifier and Type | Field and Description |
---|---|
private int |
classNum
sequential number for generated css classes.
|
private TagTable |
tt
Tag table.
|
Constructor and Description |
---|
Clean(TagTable tagTable)
Instantiates a new Clean.
|
Modifier and Type | Method and Description |
---|---|
private void |
addAlign(Node node,
java.lang.String align)
Adds an align style.
|
private void |
addColorRule(Lexer lexer,
java.lang.String selector,
java.lang.String color)
Adds a css rule for color.
|
private void |
addFontColor(Node node,
java.lang.String color)
Adds a font color style.
|
private void |
addFontFace(Node node,
java.lang.String face)
Adds a font-family style.
|
private void |
addFontSize(Node node,
java.lang.String size)
Adds a font size style.
|
private void |
addFontStyles(Node node,
AttVal av)
Add style properties to node corresponding to the font face, size and color attributes.
|
private java.lang.String |
addProperty(java.lang.String style,
java.lang.String property)
Creates a string with merged properties.
|
private void |
addStyleProperty(Node node,
java.lang.String property)
Add style property to element, creating style attribute as needed and adding ; delimiter.
|
private boolean |
blockStyle(Lexer lexer,
Node node)
Symptom: the only child of a block-level element is a presentation element such as B, I or FONT.
|
void |
bQ2Div(Node node)
Replace implicit blockquote by div with an indent taking care to reduce nested blockquotes to a single div with
the indent set to match the nesting depth.
|
(package private) static void |
bumpObject(Lexer lexer,
Node html)
Where appropriate move object elements from head to body.
|
private boolean |
center2Div(Lexer lexer,
Node node,
Node[] pnode)
Symptom:
|
private void |
cleanBodyAttrs(Lexer lexer,
Node body)
Move presentation attribs from body to style element.
|
private Node |
cleanNode(Lexer lexer,
Node node)
Applies all matching rules to a node.
|
void |
cleanTree(Lexer lexer,
Node doc)
Clean an html tree.
|
void |
cleanWord2000(Lexer lexer,
Node node)
This is a major clean up to strip out all the extra stuff you get when you save as web page from Word 2000.
|
private StyleProp |
createProps(StyleProp prop,
java.lang.String style)
Create sorted linked list of properties from style string.
|
private java.lang.String |
createPropString(StyleProp props)
Create a css property.
|
private void |
createStyleElement(Lexer lexer,
Node doc)
Create style element using rules from dictionary.
|
private Node |
createStyleProperties(Lexer lexer,
Node node,
Node[] prepl)
Special case: if the current node is destroyed by CleanNode() lower in the tree, this node and its parent no
longer exist.
|
private void |
defineStyleRules(Lexer lexer,
Node node)
Find style attribute in node content, and replace it by corresponding class attribute.
|
private boolean |
dir2Div(Lexer lexer,
Node node)
Symptom:
<dir><li> where <li> is only child. |
private void |
discardContainer(Node element,
Node[] pnode)
Used to strip font start and end tags.
|
void |
dropSections(Lexer lexer,
Node node)
Drop if/endif sections inserted by word2000.
|
void |
emFromI(Node node)
Replace i by em and b by strong.
|
(package private) Node |
findEnclosingCell(Node node)
Find the enclosing table cell for the given node.
|
private java.lang.String |
findStyle(Lexer lexer,
java.lang.String tag,
java.lang.String properties)
Finds a css style.
|
private void |
fixNodeLinks(Node node)
Ensure bidirectional links are consistent.
|
private boolean |
font2Span(Lexer lexer,
Node node,
Node[] pnode)
Replace font elements by span elements, deleting the font element's attributes and replacing them by a single
style attribute.
|
private java.lang.String |
fontSize2Name(java.lang.String size)
Map a % font size to a named font size.
|
private java.lang.String |
gensymClass(Lexer lexer)
Generates a new css class name.
|
private boolean |
inlineStyle(Lexer lexer,
Node node,
Node[] pnode)
If the node has only one b, i, or font child remove the child node and add the appropriate style attributes to
parent.
|
private StyleProp |
insertProperty(StyleProp props,
java.lang.String name,
java.lang.String value)
Insert a css style property.
|
boolean |
isWord2000(Node root)
Check if the current document is a converted Word document.
|
void |
list2BQ(Node node)
Some people use dir or ul without an li to indent the content.
|
private void |
mergeClasses(Node node,
Node child)
Merge class attributes from 2 nodes.
|
private boolean |
mergeDivs(Lexer lexer,
Node node)
Symptom:
<div><div>...</div></div> Action: merge the two divs. |
private java.lang.String |
mergeProperties(java.lang.String s1,
java.lang.String s2)
Create new string that consists of the combined style properties in s1 and s2.
|
private void |
mergeStyles(Node node,
Node child)
Merge style from 2 nodes.
|
void |
nestedEmphasis(Node node)
simplifies ...
|
private boolean |
nestedList(Lexer lexer,
Node node,
Node[] pnode)
Symptom:
...
|
private boolean |
niceBody(Lexer lexer,
Node doc)
Check deprecated attributes in body tag.
|
(package private) boolean |
noMargins(Node node)
Used to hunt for hidden preformatted sections.
|
private void |
normalizeSpaces(Lexer lexer,
Node node)
Map non-breaking spaces to regular spaces.
|
Node |
pruneSection(Lexer lexer,
Node node)
node is
<![if ...]> prune up to <![endif]> . |
void |
purgeWord2000Attributes(Node node)
Remove word2000 attributes from node.
|
(package private) boolean |
singleSpace(Lexer lexer,
Node node)
Does element have a single space as its content?
|
private void |
stripOnlyChild(Node node)
Used to strip child of node when the node has one and only one child.
|
Node |
stripSpan(Lexer lexer,
Node span)
Word2000 uses span excessively, so we strip span out.
|
private void |
style2Rule(Lexer lexer,
Node node)
Find style attribute in node, and replace it by corresponding class attribute.
|
private void |
tableBgColor(Node node) |
private void |
textAlign(Lexer lexer,
Node node)
Symptom:
<p align=center> . |
private int classNum
private TagTable tt
public Clean(TagTable tagTable)
tagTable
- tag table instanceprivate StyleProp insertProperty(StyleProp props, java.lang.String name, java.lang.String value)
props
- StyleProp instancename
- property namevalue
- property valueprivate StyleProp createProps(StyleProp prop, java.lang.String style)
prop
- StylePropstyle
- style stringprivate java.lang.String createPropString(StyleProp props)
props
- StylePropprivate java.lang.String addProperty(java.lang.String style, java.lang.String property)
style
- css styleproperty
- css propertiesprivate java.lang.String gensymClass(Lexer lexer)
lexer
- Lexerprivate java.lang.String findStyle(Lexer lexer, java.lang.String tag, java.lang.String properties)
lexer
- Lexertag
- tag nameproperties
- css propertiesprivate void style2Rule(Lexer lexer, Node node)
lexer
- Lexernode
- node with a style attributeprivate void addColorRule(Lexer lexer, java.lang.String selector, java.lang.String color)
lexer
- Lexerselector
- css selectorcolor
- color valueprivate void cleanBodyAttrs(Lexer lexer, Node body)
background="foo" . body { background-image: url(foo) } bgcolor="foo" . body { background-color: foo } text="foo" . body { color: foo } link="foo" . :link { color: foo } vlink="foo" . :visited { color: foo } alink="foo" . :active { color: foo }
lexer
- Lexerbody
- body nodeprivate boolean niceBody(Lexer lexer, Node doc)
lexer
- Lexerdoc
- document root nodetrue
is the body doesn't contain deprecated attributes, false otherwise.private void createStyleElement(Lexer lexer, Node doc)
lexer
- Lexerdoc
- root nodeprivate void fixNodeLinks(Node node)
node
- root nodeprivate void stripOnlyChild(Node node)
node
- parent nodeprivate void discardContainer(Node element, Node[] pnode)
element
- original nodepnode
- passed in as array to allow modification. pnode[0] will contain the final nodeprivate void addStyleProperty(Node node, java.lang.String property)
node
- nodeproperty
- property added to nodeprivate java.lang.String mergeProperties(java.lang.String s1, java.lang.String s2)
s1
- first propertys2
- second propertyprivate void mergeClasses(Node node, Node child)
node
- Nodechild
- Child nodeprivate void mergeStyles(Node node, Node child)
node
- Nodechild
- Child nodeprivate java.lang.String fontSize2Name(java.lang.String size)
size
- size in %private void addFontFace(Node node, java.lang.String face)
node
- Nodeface
- font faceprivate void addFontSize(Node node, java.lang.String size)
node
- Nodesize
- font sizeprivate void addFontColor(Node node, java.lang.String color)
node
- Nodecolor
- color valueprivate void addAlign(Node node, java.lang.String align)
node
- Nodealign
- align valueprivate void addFontStyles(Node node, AttVal av)
node
- font tagav
- attribute list for nodeprivate void textAlign(Lexer lexer, Node node)
<p align=center>
. Action: <p style="text-align: center">
.lexer
- Lexernode
- node with center attribute. Will be modified to use css style.private void tableBgColor(Node node)
private boolean dir2Div(Lexer lexer, Node node)
<dir><li>
where <li>
is only child. Action: coerce
<dir> <li>
to <div>
with indent. The clean up rules use the pnode argument
to return the next node when the original node has been deleted.lexer
- Lexernode
- dir tagtrue
if a dir tag has been coerced to a divprivate boolean center2Div(Lexer lexer, Node node, Node[] pnode)
<center>.
Action: replace <center>
by <div style="text-align: center">
lexer
- Lexernode
- center tagpnode
- pnode[0] is the same as node, passed in as an array to allow modificationtrue
if a center tag has been replaced by a divprivate boolean mergeDivs(Lexer lexer, Node node)
<div><div>...</div></div>
Action: merge the two divs. This is useful after
nested <dir>s used by Word for indenting have been converted to <div>s.lexer
- Lexernode
- first divprivate boolean nestedList(Lexer lexer, Node node, Node[] pnode)
lexer
- Lexernode
- Nodepnode
- passed in as array to allow modifications.true
if nested lists have been found and replacedprivate boolean blockStyle(Lexer lexer, Node node)
<p> <b><font face="Arial" size="6">Draft Recommended Practice</font></b> </p>becomes:
<p style="font-weight: bold; font-family: Arial; font-size: 6"> Draft Recommended Practice </p>
This code also replaces the align attribute by a style attribute. However, to avoid CSS problems with Navigator 4, this isn't done for the elements: caption, tr and table
lexer
- Lexernode
- parent nodetrue
if the child node has been removedprivate boolean inlineStyle(Lexer lexer, Node node, Node[] pnode)
lexer
- Lexernode
- parent nodepnode
- passed as an array to allow modificationstrue
if child node has been stripped, replaced by style attributes.private boolean font2Span(Lexer lexer, Node node, Node[] pnode)
lexer
- Lexernode
- font tagpnode
- passed as an array to allow modificationstrue
if a font tag has been dropped and replaced by style attributesprivate Node cleanNode(Lexer lexer, Node node)
lexer
- Lexernode
- original nodeprivate Node createStyleProperties(Lexer lexer, Node node, Node[] prepl)
lexer
- Lexernode
- Nodeprepl
- passed in as array to allow modificationsprivate void defineStyleRules(Lexer lexer, Node node)
lexer
- Lexernode
- parent nodepublic void cleanTree(Lexer lexer, Node doc)
lexer
- Lexerdoc
- root nodepublic void nestedEmphasis(Node node)
node
- root Nodepublic void emFromI(Node node)
node
- root Nodepublic void list2BQ(Node node)
node
- root Nodepublic void bQ2Div(Node node)
node
- root NodeNode findEnclosingCell(Node node)
node
- Nodepublic Node pruneSection(Lexer lexer, Node node)
<![if ...]>
prune up to <![endif]>
.lexer
- Lexernode
- Nodepublic void dropSections(Lexer lexer, Node node)
lexer
- Lexernode
- Node root nodepublic void purgeWord2000Attributes(Node node)
node
- node to cleanuppublic Node stripSpan(Lexer lexer, Node span)
lexer
- Lexerspan
- Node spanprivate void normalizeSpaces(Lexer lexer, Node node)
lexer
- Lexernode
- Nodeboolean noMargins(Node node)
node
- checked nodetrue
if the node has a "margin-top: 0" or "margin-bottom: 0" styleboolean singleSpace(Lexer lexer, Node node)
lexer
- Lexernode
- checked nodetrue
if the element has a single space as its contentpublic void cleanWord2000(Lexer lexer, Node node)
lexer
- Lexernode
- node to clean uppublic boolean isWord2000(Node root)
root
- root Nodetrue
if the document has been geenrated by Microsoft Word.