Class OfficeReader
This class reads and collects global information about an OOo document. This includes styles, forms, information about indexes and references etc.
-
Constructor Summary
ConstructorsConstructorDescriptionOfficeReader(OfficeDocument oooDoc, boolean bAllParagraphsAreSoft, boolean bDestructive) Constructor; read a document -
Method Summary
Modifier and TypeMethodDescriptionvoidaddFigureSequenceName(String sName) Add a sequence name for figure captions.voidaddTableSequenceName(String sName) Add a sequence name for table captions.booleanbookmarkInHeading(String sName) Is this bookmark contained in a heading?booleanbookmarkInList(String sName) Is this bookmark contained in a list?fixRelativeLink(String sLink) In OpenDocument package format ../ means "leave the package".Get the text:bibliography-configuration elementGet the raw list of all text:bibliography-mark elements.intgetBookmarkHeadingLevel(String sName) Get the level of the heading associated with this bookmarkintgetBookmarkListLevel(String sName) Get the list level associated with a bookmark in a listgetBookmarkListStyle(String sName) Get the list style name associated with a bookmark in a listgetCellStyle(String sName) static intgetCharacterCount(Node node) Counts the number of characters (text nodes) in this element excluding footnotes etc.getColumnStyle(String sName) Get the content elementgetDrawingPageStyle(String sName) getEmbeddedObject(String sName) Get an embedded object in this office documentGet the very first image in this document, if anyReturns the first master page used in the document.getFontDeclaration(String sName) Get a specific font declarationGet the collection of all font declarations.getForms()Get the forms belonging to this document.getFrameStyle(String sName) getHeadingStyle(int nLevel) Returns the paragraph style associated with headings of a specific level.getListStyle(String sName) Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than stylegetMasterPage(String sName) static chargetNextChar(Node node) Return the next character in logical ordergetPageLayout(String sName) static ElementgetParagraph(Element node) Get the paragraph or heading containing a nodegetParStyle(String sName) getPresentationStyle(String sName) getRowStyle(String sName) getSectionStyle(String sName) getSequenceFromRef(String sRefName) Get the sequence name associated with a reference namegetSequenceName(Element par) Get the sequence name associated with a paragraphgetTableReader(Element node) Read a table from a table:table nodegetTableStyle(String sName) static StringgetTextContent(Node node) getTextStyle(String sName) getTocReader(Element onode) Returns a reader for a specific tocbooleanhasBookmarkRefTo(String sName) Is there a reference to this bookmark?booleanhasEndnoteRefTo(String sId) Is there a reference to this endnote?booleanhasFootnoteRefTo(String sId) Is there a reference to this footnote id?booleanIs there a link to this sequence anchor name?booleanhasNoteRefTo(String sId) Is there a reference to this note id?booleanhasReferenceRefTo(String sName) Is there a reference to this reference mark?booleanhasSequenceRefTo(String sId) Is there a reference to this sequence field?static booleanisDrawElement(Node node) Checks, if a node is an element in the draw namespacebooleanisFigureSequenceName(String sName) Does this sequence name belong to a lof?booleanisIndexSourceStyle(String sStyleName) Is this style used in some toc as an index source style?booleanisInPackage(String sUrl) Checks whether this url is internal to the packagestatic booleanisNoteElement(Node node) Checks, if a node is an element representing a note (footnote/endnote)static booleanisNoTextPar(Node node) Checks, if the only text content of this node is whitespace.booleanIs this an OASIS OpenDocument or an OOo 1.0 document?booleanChecks whether or not this document is in package formatbooleanIs this a presentation document?static booleanisSingleParagraph(Node node) Checks, if this node contains at most one element, and that this is a paragraph.booleanIs this a spreadsheet document?static booleanisTableElement(Node node) Checks, if a node is an element in the table namespacebooleanisTableSequenceName(String sName) Does this sequence name belong to a lot?booleanisText()Is this an text document?static booleanisTextElement(Node node) Checks, if a node is an element in the text namespacestatic booleanChecks, if this text is whitespacestatic booleanisWhitespaceContent(Node node) Checks, if the only text content of this node is whitespacebooleanreferenceMarkInHeading(String sName) Is this reference mark contained in a heading?
-
Constructor Details
-
OfficeReader
Constructor; read a document
-
-
Method Details
-
isTextElement
Checks, if a node is an element in the text namespace- Parameters:
node- the node to check- Returns:
- true if this is a text element
-
isTableElement
Checks, if a node is an element in the table namespace- Parameters:
node- the node to check- Returns:
- true if this is a table element
-
isDrawElement
Checks, if a node is an element in the draw namespace- Parameters:
node- the node to check- Returns:
- true if this is a draw element
-
isNoteElement
Checks, if a node is an element representing a note (footnote/endnote)- Parameters:
node- the node to check- Returns:
- true if this is a note element
-
getParagraph
Get the paragraph or heading containing a node- Parameters:
node- the node in question- Returns:
- the paragraph or heading
-
isSingleParagraph
Checks, if this node contains at most one element, and that this is a paragraph.- Parameters:
node- the node to check- Returns:
- true if the node contains a single paragraph or nothing
-
isNoTextPar
Checks, if the only text content of this node is whitespace. Other (draw) content is allowed.- Parameters:
node- the node to check (should be a paragraph node or a child of a paragraph node)- Returns:
- true if the node contains whitespace only
-
isWhitespaceContent
Checks, if the only text content of this node is whitespace
- Parameters:
node- the node to check (should be a paragraph node or a child of a paragraph node)- Returns:
- true if the node contains whitespace only
-
isWhitespace
Checks, if this text is whitespace
- Parameters:
s- the String to check- Returns:
- true if the String contains whitespace only
-
getCharacterCount
Counts the number of characters (text nodes) in this element excluding footnotes etc.- Parameters:
node- the node to count in- Returns:
- the number of characters
-
getTextContent
-
getNextChar
Return the next character in logical order -
isPackageFormat
public boolean isPackageFormat()Checks whether or not this document is in package format- Returns:
- true if it's in package format
-
isInPackage
Checks whether this url is internal to the package- Parameters:
sUrl- the url to check- Returns:
- true if the url is internal to the package
-
fixRelativeLink
In OpenDocument package format ../ means "leave the package". Consequently this prefix must be removed to obtain a valid link- Parameters:
sLink-- Returns:
- the corrected link
-
getEmbeddedObject
Get an embedded object in this office document -
getFontDeclarations
Get the collection of all font declarations.
- Returns:
- the
OfficeStyleFamilyof font declarations
-
getFontDeclaration
Get a specific font declaration
- Parameters:
sName- the name of the font declaration- Returns:
- a
FontDeclarationrepresenting the font
-
getTextStyles
-
getTextStyle
-
getParStyles
-
getParStyle
-
getDefaultParStyle
-
getSectionStyles
-
getSectionStyle
-
getTableStyles
-
getTableStyle
-
getColumnStyles
-
getColumnStyle
-
getRowStyles
-
getRowStyle
-
getCellStyles
-
getCellStyle
-
getDefaultCellStyle
-
getFrameStyles
-
getFrameStyle
-
getDefaultFrameStyle
-
getPresentationStyles
-
getPresentationStyle
-
getDefaultPresentationStyle
-
getDrawingPageStyles
-
getDrawingPageStyle
-
getDefaultDrawingPageStyle
-
getListStyles
-
getListStyle
-
getPageLayouts
-
getPageLayout
-
getMasterPages
-
getMasterPage
-
getOutlineStyle
-
getFootnotesConfiguration
-
getEndnotesConfiguration
-
getHeadingStyle
Returns the paragraph style associated with headings of a specific level. Returns
nullif no such style is known.In principle, different styles can be used for each heading, in practice the same (soft) style is used for all headings of a specific level.
- Parameters:
nLevel- the level of the heading- Returns:
- a
StyleWithPropertiesobject representing the style
-
getFirstMasterPage
Returns the first master page used in the document. If no master page is used explicitly, the first master page found in the styles is returned. Returns null if no master pages exists.
- Returns:
- a
MasterPageobject representing the master page
-
getMajorityLanguage
Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than style- Returns:
- the iso language
-
getTocReader
Returns a reader for a specific toc
- Parameters:
onode- thetext:table-of-content-node- Returns:
- the reader, or null
-
isIndexSourceStyle
Is this style used in some toc as an index source style?
- Parameters:
sStyleName- the name of the style- Returns:
- true if this is an index source style
-
getBibliographyConfiguration
Get the text:bibliography-configuration element- Returns:
- the bibliography configuration
-
isFigureSequenceName
Does this sequence name belong to a lof?
- Parameters:
sName- the name of the sequence- Returns:
- true if it belongs to an index
-
isTableSequenceName
Does this sequence name belong to a lot?
- Parameters:
sName- the name of the sequence- Returns:
- true if it belongs to an index
-
addTableSequenceName
Add a sequence name for table captions.
OpenDocument has a very weak notion of table captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of tables. If there's no list of tables, captions cannot be identified. Thus this method lets the user add a sequence name to identify the table captions.
- Parameters:
sName- the name to add
-
addFigureSequenceName
Add a sequence name for figure captions.
OpenDocument has a very weak notion of figure captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of figures. If there's no list of figures, captions cannot be identified. Thus this method lets the user add a sequence name to identify the figure captions.
- Parameters:
sName- the name to add
-
getSequenceName
Get the sequence name associated with a paragraph
- Parameters:
par- the paragraph to look up- Returns:
- the sequence name or null
-
getSequenceFromRef
Get the sequence name associated with a reference name
- Parameters:
sRefName- the reference name to use- Returns:
- the sequence name or null
-
hasNoteRefTo
Is there a reference to this note id?
- Parameters:
sId- the id of the note- Returns:
- true if there is a reference
-
hasFootnoteRefTo
Is there a reference to this footnote id?
- Parameters:
sId- the id of the footnote- Returns:
- true if there is a reference
-
hasEndnoteRefTo
Is there a reference to this endnote?
- Parameters:
sId- the id of the endnote- Returns:
- true if there is a reference
-
referenceMarkInHeading
Is this reference mark contained in a heading?- Parameters:
sName- the name of the reference mark- Returns:
- true if so
-
hasReferenceRefTo
Is there a reference to this reference mark?- Parameters:
sName- the name of the reference mark- Returns:
- true if there is a reference
-
bookmarkInHeading
Is this bookmark contained in a heading?- Parameters:
sName- the name of the bookmark- Returns:
- true if so
-
getBookmarkHeadingLevel
Get the level of the heading associated with this bookmark- Parameters:
sName- the name of the bookmark- Returns:
- the level or 0 if the bookmark does not exist
-
bookmarkInList
Is this bookmark contained in a list?- Parameters:
sName- the name of the bookmark- Returns:
- true if so
-
getBookmarkListStyle
Get the list style name associated with a bookmark in a list- Parameters:
sName- the name of the bookmark- Returns:
- the list style name or null if the bookmark does not exist or the list does not have a style name
-
getBookmarkListLevel
Get the list level associated with a bookmark in a list- Parameters:
sName- the name of the bookmark- Returns:
- the level or 0 if the bookmark does not exist
-
hasBookmarkRefTo
Is there a reference to this bookmark?
- Parameters:
sName- the name of the bookmark- Returns:
- true if there is a reference
-
getBibliographyMarks
Get the raw list of all text:bibliography-mark elements. The marks are returned in document order and includes any duplicates- Returns:
- the list
-
hasSequenceRefTo
Is there a reference to this sequence field?
- Parameters:
sId- the id of the sequence field- Returns:
- true if there is a reference
-
hasLinkTo
Is there a link to this sequence anchor name?
- Parameters:
sName- the name of the anchor- Returns:
- true if there is a link
-
isOpenDocument
public boolean isOpenDocument()Is this an OASIS OpenDocument or an OOo 1.0 document?
- Returns:
- true if it's an OASIS OpenDocument
-
isText
public boolean isText()Is this an text document?
- Returns:
- true if it's a text document
-
isSpreadsheet
public boolean isSpreadsheet()Is this a spreadsheet document?
- Returns:
- true if it's a spreadsheet document
-
isPresentation
public boolean isPresentation()Is this a presentation document?
- Returns:
- true if it's a presentation document
-
getContent
Get the content element
In the old file format this means the
office:bodyelementIn the OpenDocument format this means a
office:text,office:spreadsheetoroffice:presentationelement.- Returns:
- the content
Element
-
getForms
Get the forms belonging to this document.
- Returns:
- a
FormsReaderrepresenting the forms
-
getTableReader
Read a table from a table:table node
- Parameters:
node- the table:table Element node- Returns:
- a
TableReaderobject representing the table
-
getFirstImage
Get the very first image in this document, if any- Returns:
- the first image, or null if no images exists
-