API Reference

This section provides detailed documentation for all classes and methods in the Confluence Content Parser.

Core Classes

ConfluenceDocument

class confluence_content_parser.document.ConfluenceDocument(*, root: ~confluence_content_parser.nodes.Node | None = None, metadata: dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

A parsed Confluence document with convenient access to content.

root: Node | None
metadata: dict[str, Any]
property text: str

Get all text content from the document with proper line breaks.

find_all() list[Node][source]
find_all(node_type: type[T1]) list[T1]
find_all(t1: type[T1], t2: type[T2]) tuple[list[T1], list[T2]]
find_all(t1: type[T1], t2: type[T2], t3: type[T3]) tuple[list[T1], list[T2], list[T3]]
find_all(t1: type[T1], t2: type[T2], t3: type[T3], t4: type[T4]) tuple[list[T1], list[T2], list[T3], list[T4]]
find_all(t1: type[T1], t2: type[T2], t3: type[T3], t4: type[T4], t5: type[T5]) tuple[list[T1], list[T2], list[T3], list[T4], list[T5]]

Find all nodes of specific type(s) in the document with modern variadic generics.

Parameters:

*node_types – Either no arguments (all nodes), a single node class, or multiple node classes.

Returns:

list[Node] (all nodes) - Single type: list[T] where T is the requested type - Multiple types: tuple of lists with proper typing for each type

Return type:

  • No arguments

Examples

# All nodes all_nodes = document.find_all()

# Single type headings = document.find_all(HeadingElement)

# Multiple types headings, panels = document.find_all(HeadingElement, PanelMacro)

walk() list[Node][source]

Get all nodes in the document.

ConfluenceParser

class confluence_content_parser.parser.ConfluenceParser(*, raise_on_finish: bool = True)[source]

Bases: object

Efficient Confluence storage-format XML parser with generic element handling.

NS_AC = 'http://www.atlassian.com/schema/confluence/4/ac/'
NS_RI = 'http://www.atlassian.com/schema/confluence/4/ri/'
NS_AT = 'http://www.atlassian.com/schema/confluence/4/at/'
parse(content: str) ConfluenceDocument[source]

Parse Confluence storage-format XML into a ConfluenceDocument.

ParsingError

class confluence_content_parser.parser.ParsingError(diagnostics: list[str])[source]

Bases: Exception

Raised when parsing fails with diagnostics.

Base Node Classes

Node

class confluence_content_parser.nodes.Node(*, is_block_level: bool = False)[source]

Bases: BaseModel, ABC

Base class for all content nodes in the Confluence document tree.

is_block_level: bool
walk() Iterator[Node][source]

Walk through this node and all its descendants.

get_children() list[Node][source]

Get direct children of this node. Override in subclasses.

to_text() str[source]

Get text representation of this node. Override in subclasses.

find_all() list[Node][source]
find_all(node_type: type[T1]) list[T1]
find_all(t1: type[T1], t2: type[T2]) tuple[list[T1], list[T2]]
find_all(t1: type[T1], t2: type[T2], t3: type[T3]) tuple[list[T1], list[T2], list[T3]]
find_all(t1: type[T1], t2: type[T2], t3: type[T3], t4: type[T4]) tuple[list[T1], list[T2], list[T3], list[T4]]
find_all(t1: type[T1], t2: type[T2], t3: type[T3], t4: type[T4], t5: type[T5]) tuple[list[T1], list[T2], list[T3], list[T4], list[T5]]

Find all nodes of specific type(s) in this subtree with modern variadic generics.

Parameters:

*node_types – Either no arguments (all nodes), a single node class, or multiple node classes.

Returns:

list[Node] (all nodes) - Single type: list[T] where T is the requested type - Multiple types: tuple of lists with proper typing for each type

Return type:

  • No arguments

Examples

# All nodes all_nodes = node.find_all()

# Single type - returns list[HeadingElement] headings = node.find_all(HeadingElement)

# Multiple types - returns tuple with proper typing headings, panels = node.find_all(HeadingElement, PanelMacro) headings, panels, links = node.find_all(HeadingElement, PanelMacro, LinkElement)

ContainerElement

class confluence_content_parser.nodes.ContainerElement(*, is_block_level: bool = False, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>)[source]

Bases: Node

Base for container elements.

children: list[Node]
styles: dict[str, str]
get_children() list[Node][source]

Get direct children of this node. Override in subclasses.

to_text() str[source]

Get text representation of this node. Override in subclasses.

Text and Formatting Nodes

Text

class confluence_content_parser.nodes.Text(*, is_block_level: bool = False, text: str)[source]

Bases: Node

A node containing plain text content.

text: str
to_text() str[source]

Get text representation of this node. Override in subclasses.

TextEffectElement

class confluence_content_parser.nodes.TextEffectElement(*, is_block_level: bool = False, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, type: ~confluence_content_parser.nodes.TextEffectType)[source]

Bases: ContainerElement

Base for inline formatting elements like bold, italic, etc.

type: TextEffectType

TextEffectType

class confluence_content_parser.nodes.TextEffectType(*values)[source]

Bases: Enum

Type of inline element.

Enumeration of text effect types.

Values:

  • STRONG - Bold text

  • EMPHASIS - Italic text

  • UNDERLINE - Underlined text

  • STRIKETHROUGH - Strikethrough text

  • MONOSPACE - Monospace/code text

  • SUBSCRIPT - Subscript text

  • SUPERSCRIPT - Superscript text

  • BLOCKQUOTE - Block quotation

  • SPAN - Generic inline container

TextBreakElement

class confluence_content_parser.nodes.TextBreakElement(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, type: ~confluence_content_parser.nodes.TextBreakType)[source]

Bases: ContainerElement

A text break element.

type: TextBreakType
is_block_level: bool
to_text() str[source]

Generate text representation of text break elements.

TextBreakType

class confluence_content_parser.nodes.TextBreakType(*values)[source]

Bases: Enum

Type of text break element.

Enumeration of text break types.

Values:

  • PARAGRAPH - Paragraph element

  • LINE_BREAK - Line break element

  • HORIZONTAL_RULE - Horizontal rule element

Structure Nodes

HeadingElement

class confluence_content_parser.nodes.HeadingElement(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, type: ~confluence_content_parser.nodes.HeadingType)[source]

Bases: ContainerElement

A heading element.

type: HeadingType
is_block_level: bool

HeadingType

class confluence_content_parser.nodes.HeadingType(*values)[source]

Bases: Enum

Type of heading element.

Enumeration of heading types.

Values:

  • H1 - Level 1 heading

  • H2 - Level 2 heading

  • H3 - Level 3 heading

  • H4 - Level 4 heading

  • H5 - Level 5 heading

  • H6 - Level 6 heading

List Nodes

ListElement

class confluence_content_parser.nodes.ListElement(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, type: ~confluence_content_parser.nodes.ListType, start: int | None = None)[source]

Bases: ContainerElement

A list element.

type: ListType
start: int | None
is_block_level: bool
to_text(indent_level: int = 0) str[source]

Convert list to text with appropriate markers and indentation.

ListType

class confluence_content_parser.nodes.ListType(*values)[source]

Bases: Enum

Type of list element.

Enumeration of list types.

Values:

  • UNORDERED - Unordered/bulleted list

  • ORDERED - Ordered/numbered list

  • TASK - Task list with checkboxes

ListItem

class confluence_content_parser.nodes.ListItem(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, task_id: str | None = None, uuid: str | None = None, status: ~confluence_content_parser.nodes.TaskListItemStatus | None = None)[source]

Bases: ContainerElement

A list item element that can be regular or task item.

task_id: str | None
uuid: str | None
status: TaskListItemStatus | None
is_block_level: bool

TaskListItemStatus

class confluence_content_parser.nodes.TaskListItemStatus(*values)[source]

Bases: Enum

Type of task list item status.

Enumeration of task list item statuses.

Values:

  • COMPLETE - Task is completed

  • INCOMPLETE - Task is not completed

Table Nodes

Table

class confluence_content_parser.nodes.Table(*, is_block_level: bool = False, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, width: str | None = None, layout: str | None = None, local_id: str | None = None, display_mode: str | None = None)[source]

Bases: ContainerElement

A table element with metadata and rows.

width: str | None
layout: str | None
local_id: str | None
display_mode: str | None
to_text() str[source]

Generate text representation of table.

TableRow

class confluence_content_parser.nodes.TableRow(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>)[source]

Bases: ContainerElement

A table row.

is_block_level: bool
to_text() str[source]

Format row as text with | separators.

TableCell

class confluence_content_parser.nodes.TableCell(*, is_block_level: bool = False, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, is_header: bool = False, rowspan: int | None = None, colspan: int | None = None)[source]

Bases: ContainerElement

A table cell.

is_header: bool
rowspan: int | None
colspan: int | None

Layout Nodes

LayoutElement

class confluence_content_parser.nodes.LayoutElement(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>)[source]

Bases: ContainerElement

A page layout container containing sections.

is_block_level: bool

LayoutSection

class confluence_content_parser.nodes.LayoutSection(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, section_type: ~confluence_content_parser.nodes.LayoutSectionType, breakout_mode: str | None = None, breakout_width: str | None = None)[source]

Bases: ContainerElement

A layout section (row) containing cells.

section_type: LayoutSectionType
breakout_mode: str | None
breakout_width: str | None
is_block_level: bool

LayoutSectionType

class confluence_content_parser.nodes.LayoutSectionType(*values)[source]

Bases: Enum

Type of layout section.

Enumeration of layout section types.

Values:

  • SINGLE - Single column layout

  • FIXED_WIDTH - Fixed width layout

  • TWO_EQUAL - Two equal columns

  • TWO_LEFT_SIDEBAR - Two columns with left sidebar

  • TWO_RIGHT_SIDEBAR - Two columns with right sidebar

  • THREE_EQUAL - Three equal columns

  • THREE_WITH_SIDEBARS - Three columns with sidebars

  • THREE_LEFT_SIDEBARS - Three columns with left sidebars

  • THREE_RIGHT_SIDEBARS - Three columns with right sidebars

  • FOUR_EQUAL - Four equal columns

  • FIVE_EQUAL - Five equal columns

LayoutCell

class confluence_content_parser.nodes.LayoutCell(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>)[source]

Bases: ContainerElement

A layout cell (column) containing content.

is_block_level: bool

Macro Nodes

PanelMacro

class confluence_content_parser.nodes.PanelMacro(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, type: ~confluence_content_parser.nodes.PanelMacroType, bg_color: str | None = None, panel_icon: str | None = None, panel_icon_id: str | None = None, panel_icon_text: str | None = None)[source]

Bases: ContainerElement

A panel macro element with background color and optional icon.

type: PanelMacroType
bg_color: str | None
panel_icon: str | None
panel_icon_id: str | None
panel_icon_text: str | None
is_block_level: bool
to_text() str[source]

Generate text representation of panel with content.

PanelMacroType

class confluence_content_parser.nodes.PanelMacroType(*values)[source]

Bases: Enum

Type of panel macro based on visual presentation.

Enumeration of panel macro types.

Values:

  • PANEL - Generic panel

  • NOTE - Note panel

  • SUCCESS - Success panel

  • WARNING - Warning panel

  • ERROR - Error panel

  • INFO - Info panel

CodeMacro

class confluence_content_parser.nodes.CodeMacro(*, is_block_level: bool = True, language: str | None = None, breakout_mode: str | None = None, breakout_width: str | None = None, code: str)[source]

Bases: Node

A code macro element with syntax highlighting.

language: str | None
breakout_mode: str | None
breakout_width: str | None
code: str
is_block_level: bool
to_text() str[source]

Generate text representation of code block.

StatusMacro

class confluence_content_parser.nodes.StatusMacro(*, is_block_level: bool = False, title: str | None = None, colour: str | None = None)[source]

Bases: Node

A status macro element with title and color.

title: str | None
colour: str | None
to_text() str[source]

Generate text representation of status.

Other Macro Nodes

ExpandMacro

class confluence_content_parser.nodes.ExpandMacro(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, title: str | None = None, breakout_width: str | None = None)[source]

Bases: ContainerElement

An expand/collapse macro element.

title: str | None
breakout_width: str | None
is_block_level: bool
to_text() str[source]

Generate text representation of expand macro.

DetailsMacro

class confluence_content_parser.nodes.DetailsMacro(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>)[source]

Bases: ContainerElement

A details macro element for collapsible content sections.

is_block_level: bool
to_text() str[source]

Generate text representation of details macro.

TocMacro

class confluence_content_parser.nodes.TocMacro(*, is_block_level: bool = True, style: str | None = None)[source]

Bases: Node

A table of contents macro element.

style: str | None
is_block_level: bool
to_text() str[source]

Generate text representation of table of contents.

JiraMacro

class confluence_content_parser.nodes.JiraMacro(*, is_block_level: bool = False, key: str | None = None, server_id: str | None = None, server: str | None = None)[source]

Bases: Node

A JIRA issue macro element.

key: str | None
server_id: str | None
server: str | None
to_text() str[source]

Generate text representation of JIRA issue.

IncludeMacro

class confluence_content_parser.nodes.IncludeMacro(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, space_key: str | None = None, content_title: str | None = None, version_at_save: str | None = None)[source]

Bases: ContainerElement

An include macro element for including other pages.

space_key: str | None
content_title: str | None
version_at_save: str | None
is_block_level: bool
to_text() str[source]

Generate text representation of include macro.

TasksReportMacro

class confluence_content_parser.nodes.TasksReportMacro(*, is_block_level: bool = True, spaces: str | None = None, is_missing_required_parameters: bool = False)[source]

Bases: Node

A tasks report macro element.

spaces: str | None
is_missing_required_parameters: bool
is_block_level: bool
to_text() str[source]

Generate text representation of tasks report.

ExcerptIncludeMacro

class confluence_content_parser.nodes.ExcerptIncludeMacro(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, space_key: str | None = None, content_title: str | None = None, posting_day: str | None = None, version_at_save: str | None = None)[source]

Bases: ContainerElement

An excerpt include macro element.

space_key: str | None
content_title: str | None
posting_day: str | None
version_at_save: str | None
is_block_level: bool
to_text() str[source]

Generate text representation of excerpt include.

AttachmentsMacro

class confluence_content_parser.nodes.AttachmentsMacro(*, is_block_level: bool = True)[source]

Bases: Node

An attachments macro element for listing page attachments.

is_block_level: bool
to_text() str[source]

Generate text representation of attachments macro.

ViewPdfMacro

class confluence_content_parser.nodes.ViewPdfMacro(*, is_block_level: bool = True, filename: str | None = None, version_at_save: str | None = None)[source]

Bases: Node

A view PDF macro element.

filename: str | None
version_at_save: str | None
is_block_level: bool
to_text() str[source]

Generate text representation of PDF viewer.

ViewFileMacro

class confluence_content_parser.nodes.ViewFileMacro(*, is_block_level: bool = True, filename: str | None = None, version_at_save: str | None = None)[source]

Bases: Node

A view file macro element for displaying files inline.

filename: str | None
version_at_save: str | None
is_block_level: bool
to_text() str[source]

Generate text representation of file viewer.

ProfileMacro

class confluence_content_parser.nodes.ProfileMacro(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, account_id: str | None = None)[source]

Bases: ContainerElement

A profile macro element for displaying user profiles.

account_id: str | None
is_block_level: bool
to_text() str[source]

Generate text representation of profile macro.

AnchorMacro

class confluence_content_parser.nodes.AnchorMacro(*, is_block_level: bool = False, anchor_name: str | None = None)[source]

Bases: Node

An anchor macro element for creating page anchors.

anchor_name: str | None
to_text() str[source]

Generate text representation of anchor.

ExcerptMacro

class confluence_content_parser.nodes.ExcerptMacro(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>)[source]

Bases: ContainerElement

An excerpt macro element for marking excerptable content.

is_block_level: bool
to_text() str[source]

Generate text representation of excerpt.

Decision and Task Nodes

DecisionListItemState

class confluence_content_parser.nodes.DecisionListItemState(*values)[source]

Bases: Enum

State of decision list item.

Enumeration of decision list item states.

Values:

  • DECIDED - Decision has been made

  • PENDING - Decision is pending

DecisionList

class confluence_content_parser.nodes.DecisionList(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, local_id: str | None = None)[source]

Bases: ContainerElement

A decision list element containing decision items.

local_id: str | None
is_block_level: bool
to_text() str[source]

Generate text representation of decision list.

DecisionListItem

class confluence_content_parser.nodes.DecisionListItem(*, is_block_level: bool = True, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>, local_id: str | None = None, state: ~confluence_content_parser.nodes.DecisionListItemState | None = None)[source]

Bases: ContainerElement

A decision item element.

local_id: str | None
state: DecisionListItemState | None
is_block_level: bool
to_text() str[source]

Generate text representation of decision item.

Utility Nodes

Fragment

class confluence_content_parser.nodes.Fragment(*, is_block_level: bool = False, children: list[~confluence_content_parser.nodes.Node] = <factory>, styles: dict[str, str] = <factory>)[source]

Bases: ContainerElement

Neutral container for multiple top-level nodes (non-rendering).

Emoticon

class confluence_content_parser.nodes.Emoticon(*, is_block_level: bool = False, name: str, emoji_shortname: str | None = None, emoji_id: str | None = None, emoji_fallback: str | None = None)[source]

Bases: Node

An emoticon element.

name: str
emoji_shortname: str | None
emoji_id: str | None
emoji_fallback: str | None
to_text() str[source]

Return the best text representation of the emoticon.

Time

class confluence_content_parser.nodes.Time(*, is_block_level: bool = False, datetime: str | None = None)[source]

Bases: Node

A time element with datetime.

datetime: str | None
to_text() str[source]

Generate text representation of time.

PlaceholderElement

class confluence_content_parser.nodes.PlaceholderElement(*, is_block_level: bool = False, text: str)[source]

Bases: Node

A placeholder element for content hints.

text: str
to_text() str[source]

Generate text representation of placeholder.

ResourceIdentifierType

class confluence_content_parser.nodes.ResourceIdentifierType(*values)[source]

Bases: Enum

Type of resource identifier.

Enumeration of resource identifier types.

Values:

  • PAGE - Page reference

  • BLOG_POST - Blog post reference

  • ATTACHMENT - Attachment reference

  • URL - URL reference

  • SHORTCUT - Shortcut reference

  • USER - User reference

  • SPACE - Space reference

  • CONTENT_ENTITY - Content entity reference

ResourceIdentifier

class confluence_content_parser.nodes.ResourceIdentifier(*, is_block_level: bool = False, type: ResourceIdentifierType, space_key: str | None = None, content_title: str | None = None, content_id: str | None = None, posting_day: str | None = None, filename: str | None = None, value: str | None = None, key: str | None = None, parameter: str | None = None, account_id: str | None = None, local_id: str | None = None, userkey: str | None = None, version_at_save: str | None = None)[source]

Bases: Node

A resource identifier element.

type: ResourceIdentifierType
space_key: str | None
content_title: str | None
content_id: str | None
posting_day: str | None
filename: str | None
value: str | None
key: str | None
parameter: str | None
account_id: str | None
local_id: str | None
userkey: str | None
version_at_save: str | None
to_text() str[source]

Generate appropriate text representation based on type.

Usage Examples

For practical usage examples, see the Examples section and the User Guide.