-
Notifications
You must be signed in to change notification settings - Fork 807
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reviewers: I recommend reviewing commit-by-commit or just looking at the final version of `partition/docx.py` as View File. This refactor solves a few problems but mostly lays the groundwork to allow us to refine further aspects such as page-break detection, list-item detection, and moving python-docx internals upstream to that library so our work doesn't depend on that domain-knowledge.
- Loading branch information
Showing
61 changed files
with
1,286 additions
and
434 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
[tool.black] | ||
line-length = 100 | ||
|
||
[tool.ruff] | ||
line-length = 100 | ||
select = [ | ||
"C4", # -- flake8-comprehensions -- | ||
"COM", # -- flake8-commas -- | ||
"E", # -- pycodestyle errors -- | ||
"F", # -- pyflakes -- | ||
"I", # -- isort (imports) -- | ||
"PLR0402", # -- Name compared with itself like `foo == foo` -- | ||
"PT", # -- flake8-pytest-style -- | ||
"SIM", # -- flake8-simplify -- | ||
"UP015", # -- redundant `open()` mode parameter (like "r" is default) -- | ||
"UP018", # -- Unnecessary {literal_type} call like `str("abc")`. (rewrite as a literal) -- | ||
"UP032", # -- Use f-string instead of `.format()` call -- | ||
"UP034", # -- Avoid extraneous parentheses -- | ||
] | ||
ignore = [ | ||
"COM812", # -- over aggressively insists on trailing commas where not desireable -- | ||
"PT011", # -- pytest.raises({exc}) too broad, use match param or more specific exception -- | ||
"PT012", # -- pytest.raises() block should contain a single simple statement -- | ||
"SIM117", # -- merge `with` statements for context managers that have same scope -- | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from docx.api import Document | ||
|
||
__all__ = ["Document"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
from typing import BinaryIO, Optional, Union | ||
|
||
import docx.document | ||
|
||
def Document(docx: Optional[Union[str, BinaryIO]] = None) -> docx.document.Document: ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
from typing import Sequence | ||
|
||
from docx.oxml.xmlchemy import BaseOxmlElement | ||
from docx.table import Table | ||
from docx.text.paragraph import Paragraph | ||
|
||
class BlockItemContainer: | ||
_element: BaseOxmlElement | ||
@property | ||
def paragraphs(self) -> Sequence[Paragraph]: ... | ||
@property | ||
def tables(self) -> Sequence[Table]: ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# pyright: reportPrivateUsage=false | ||
|
||
from typing import BinaryIO, Optional, Union | ||
|
||
from docx.blkcntnr import BlockItemContainer | ||
from docx.oxml.document import CT_Document | ||
from docx.section import Sections | ||
from docx.settings import Settings | ||
from docx.styles.style import _ParagraphStyle | ||
from docx.text.paragraph import Paragraph | ||
|
||
class Document(BlockItemContainer): | ||
def add_paragraph( | ||
self, text: str = "", style: Optional[Union[_ParagraphStyle, str]] = None | ||
) -> Paragraph: ... | ||
@property | ||
def element(self) -> CT_Document: ... | ||
def save(self, path_or_stream: Union[str, BinaryIO]) -> None: ... | ||
@property | ||
def sections(self) -> Sections: ... | ||
@property | ||
def settings(self) -> Settings: ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
import enum | ||
|
||
class WD_SECTION_START(enum.Enum): | ||
CONTINUOUS: enum.Enum | ||
EVEN_PAGE: enum.Enum | ||
NEW_COLUMN: enum.Enum | ||
NEW_PAGE: enum.Enum | ||
ODD_PAGE: enum.Enum | ||
|
||
# -- alias -- | ||
WD_SECTION = WD_SECTION_START |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# pyright: reportPrivateUsage=false | ||
|
||
from typing import Union | ||
|
||
from lxml import etree | ||
|
||
def parse_xml(xml: Union[str, bytes]) -> etree._Element: ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
from typing import Iterator | ||
|
||
from docx.oxml.xmlchemy import BaseOxmlElement | ||
|
||
class CT_Body(BaseOxmlElement): | ||
def __iter__(self) -> Iterator[BaseOxmlElement]: ... | ||
|
||
class CT_Document(BaseOxmlElement): | ||
@property | ||
def body(self) -> CT_Body: ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
from typing import Dict | ||
|
||
nsmap: Dict[str, str] | ||
|
||
def qn(tag: str) -> str: ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
from typing import Optional | ||
|
||
from docx.oxml.xmlchemy import BaseOxmlElement | ||
|
||
class CT_SectPr(BaseOxmlElement): | ||
@property | ||
def preceding_sectPr(self) -> Optional[CT_SectPr]: ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from docx.oxml.xmlchemy import BaseOxmlElement | ||
|
||
class CT_Tbl(BaseOxmlElement): ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from docx.oxml.xmlchemy import BaseOxmlElement | ||
|
||
class CT_P(BaseOxmlElement): ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from docx.oxml.xmlchemy import BaseOxmlElement | ||
|
||
class CT_PPr(BaseOxmlElement): ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
from typing import Optional | ||
|
||
from docx.oxml.xmlchemy import BaseOxmlElement | ||
|
||
class CT_Br(BaseOxmlElement): | ||
type: Optional[str] | ||
clear: Optional[str] | ||
|
||
class CT_R(BaseOxmlElement): ... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
from typing import Any, Iterator | ||
|
||
from lxml import etree | ||
|
||
class BaseOxmlElement(etree.ElementBase): | ||
def __iter__(self) -> Iterator[BaseOxmlElement]: ... | ||
@property | ||
def xml(self) -> str: ... | ||
def xpath(self, xpath_str: str) -> Any: | ||
"""Return type is typically Sequence[ElementBase], but ... | ||
lxml.etree.XPath has many possible return types including bool, (a "smart") str, | ||
float. The return type can also be a list containing ElementBase, comments, | ||
processing instructions, str, and tuple. So you need to cast the result based on | ||
the XPath expression you use. | ||
""" | ||
... |
Oops, something went wrong.