Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Should fail hard when installing packages that want 2to3 #2769

Closed
1 task done
plumdog opened this issue Sep 6, 2021 · 13 comments · Fixed by #2770
Closed
1 task done

[BUG] Should fail hard when installing packages that want 2to3 #2769

plumdog opened this issue Sep 6, 2021 · 13 comments · Fixed by #2770
Labels

Comments

@plumdog
Copy link
Contributor

plumdog commented Sep 6, 2021

setuptools version

setuptools==58.0.0

Python version

Python 3.9

OS

Linux

Additional environment information

No response

Description

Tried to install a package that uses 2to3 (specifically https://pypi.org/project/demjson/), got a message from pip like:

Successfully installed demjson-2.2.3

But really it wasn't successful, and then get confusing SyntaxError exceptions.

Expected behavior

I have now worked out that this is because of the change to remove 2to3 from setuptools, which I think is a perfectly reasonable change, and because my setuptools version wasn't pinned. But setuptools is not doing something the package is asking for, and the result is going to be something that does not run (though I guess there are caveats here).

I would have expected to get an error like "This module wants to use 2to3, but this is not supported [link to github issue about removing 2to3 support]". At least, that would certainly have saved me time.

How to Reproduce

Install demjson==2.2.3 with setuptools==58.0.0. Attempt to import and get a syntax error.

Output

$ pip install -U setuptools==58.0.0 && pip install demjson==2.2.3 && python -c 'import demjson'
Collecting setuptools==58.0.0
  Using cached setuptools-58.0.0-py3-none-any.whl (816 kB)
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 56.0.0
    Uninstalling setuptools-56.0.0:
      Successfully uninstalled setuptools-56.0.0
Successfully installed setuptools-58.0.0
Collecting demjson==2.2.3
  Using cached demjson-2.2.3.tar.gz (131 kB)
Using legacy 'setup.py install' for demjson, since package 'wheel' is not installed.
Installing collected packages: demjson
    Running setup.py install for demjson ... done
Successfully installed demjson-2.2.3
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/user/example/venv/lib/python3.9/site-packages/demjson.py", line 645
    class json_int( (1L).__class__ ):    # Have to specify base this way to satisfy 2to3
                      ^
SyntaxError: invalid syntax

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@plumdog plumdog added bug Needs Triage Issues that need to be evaluated for severity and status. labels Sep 6, 2021
@plumdog plumdog changed the title [BUG] Should Fail hard when installing packages that want 2to3 [BUG] Should fail hard when installing packages that want 2to3 Sep 6, 2021
@jaraco
Copy link
Member

jaraco commented Sep 6, 2021

Good suggestion. I think I agree.

@arpheno
Copy link

arpheno commented Sep 7, 2021

I think this broke a lot of builds unintentionally.

@ikapelyukhin
Copy link

I think this broke a lot of builds unintentionally.

It sure did. 😬

@plumdog
Copy link
Contributor Author

plumdog commented Sep 7, 2021

My intention when raising this issue was to make it so builds fail, rather than getting runtime errors because 2to3 wasn't applied to a dependency that needs it. That is, this behaviour breaks builds intentionally where the setuptools version and the dependencies are incompatible.

@webknjaz webknjaz removed the Needs Triage Issues that need to be evaluated for severity and status. label Sep 7, 2021
@mattclay
Copy link

mattclay commented Sep 7, 2021

@plumdog Unfortunately the implementation in #2770 also breaks builds which explicitly disable the use_2to3 feature, which would otherwise install correctly.

@plumdog
Copy link
Contributor Author

plumdog commented Sep 7, 2021

@mattclay Oh. 😬

langmm added a commit to cropsinsilico/yggdrasil that referenced this issue Sep 9, 2021
…he no longer maintained uses use_2to3 which was deprecated in the most recent release of setuptools; see pypa/setuptools#2769)
ajnelson-nist added a commit to ajnelson-nist/rdflib-jsonld that referenced this issue Sep 10, 2021
The resolution of setuptools 2769 made any package using `use_2to3` to
fail its build.  This patch removes the flag, in support of outroducing
rdflib-jsonld.

References:
* pypa/setuptools#2769
* RDFLib/rdflib#1405

Reported-by: Ralf Grubenmann <[email protected]>
Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to ajnelson-nist/rdflib-jsonld that referenced this issue Sep 10, 2021
The resolution of setuptools 2769 made any package using `use_2to3` to
fail its build.  This patch removes the flag, in support of outroducing
rdflib-jsonld.

The test suite is showing other follow-on patches will be necessary to
fix matters 2to3 had been quietly fixing along the way.

References:
* pypa/setuptools#2769
* RDFLib/rdflib#1405

Reported-by: Ralf Grubenmann <[email protected]>
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this issue Sep 16, 2021
= 4.10.0 (20210907)

* This is the first release of Beautiful Soup to only support Python
  3. I dropped Python 2 support to maintain support for newer versions
  (58 and up) of setuptools. See:
  pypa/setuptools#2769 [bug=1942919]

* The behavior of methods like .get_text() and .strings now differs
  depending on the type of tag. The change is visible with HTML tags
  like <script>, <style>, and <template>. Starting in 4.9.0, methods
  like get_text() returned no results on such tags, because the
  contents of those tags are not considered 'text' within the document
  as a whole.

  But a user who calls script.get_text() is working from a different
  definition of 'text' than a user who calls div.get_text()--otherwise
  there would be no need to call script.get_text() at all. In 4.10.0,
  the contents of (e.g.) a <script> tag are considered 'text' during a
  get_text() call on the tag itself, but not considered 'text' during
  a get_text() call on the tag's parent.

  Because of this change, calling get_text() on each child of a tag
  may now return a different result than calling get_text() on the tag
  itself. That's because different tags now have different
  understandings of what counts as 'text'. [bug=1906226] [bug=1868861]

* NavigableString and its subclasses now implement the get_text()
  method, as well as the properties .strings and
  .stripped_strings. These methods will either return the string
  itself, or nothing, so the only reason to use this is when iterating
  over a list of mixed Tag and NavigableString objects. [bug=1904309]

* The 'html5' formatter now treats attributes whose values are the
  empty string as HTML boolean attributes. Previously (and in other
  formatters), an attribute value must be set as None to be treated as
  a boolean attribute. In a future release, I plan to also give this
  behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]

* The 'replace_with()' method now takes a variable number of arguments,
  and can be used to replace a single element with a sequence of elements.
  Patch by Bill Chandos. [rev=605]

* Corrected output when the namespace prefix associated with a
  namespaced attribute is the empty string, as opposed to
  None. [bug=1915583]

* Performance improvement when processing tags that speeds up overall
  tree construction by 2%. Patch by Morotti. [bug=1899358]

* Corrected the use of special string container classes in cases when a
  single tag may contain strings with different containers; such as
  the <template> tag, which may contain both TemplateString objects
  and Comment objects. [bug=1913406]

* The html.parser tree builder can now handle named entities
  found in the HTML5 spec in much the same way that the html5lib
  tree builder does. Note that the lxml HTML tree builder doesn't handle
  named entities this way. [bug=1924908]

* Added a second way to pass specify encodings to UnicodeDammit and
  EncodingDetector, based on the order of precedence defined in the
  HTML5 spec, starting at:
  https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding

  Encodings in 'known_definite_encodings' are tried first, then
  byte-order-mark sniffing is run, then encodings in 'user_encodings'
  are tried. The old argument, 'override_encodings', is now a
  deprecated alias for 'known_definite_encodings'.

  This changes the default behavior of the html.parser and lxml tree
  builders, in a way that may slightly improve encoding
  detection but will probably have no effect. [bug=1889014]

* Improve the warning issued when a directory name (as opposed to
  the name of a regular file) is passed as markup into the BeautifulSoup
  constructor. [bug=1913628]

Signed-off-by: Zang Ruochen <[email protected]>
Signed-off-by: Khem Raj <[email protected]>
halstead pushed a commit to openembedded/meta-openembedded that referenced this issue Sep 17, 2021
= 4.10.0 (20210907)

* This is the first release of Beautiful Soup to only support Python
  3. I dropped Python 2 support to maintain support for newer versions
  (58 and up) of setuptools. See:
  pypa/setuptools#2769 [bug=1942919]

* The behavior of methods like .get_text() and .strings now differs
  depending on the type of tag. The change is visible with HTML tags
  like <script>, <style>, and <template>. Starting in 4.9.0, methods
  like get_text() returned no results on such tags, because the
  contents of those tags are not considered 'text' within the document
  as a whole.

  But a user who calls script.get_text() is working from a different
  definition of 'text' than a user who calls div.get_text()--otherwise
  there would be no need to call script.get_text() at all. In 4.10.0,
  the contents of (e.g.) a <script> tag are considered 'text' during a
  get_text() call on the tag itself, but not considered 'text' during
  a get_text() call on the tag's parent.

  Because of this change, calling get_text() on each child of a tag
  may now return a different result than calling get_text() on the tag
  itself. That's because different tags now have different
  understandings of what counts as 'text'. [bug=1906226] [bug=1868861]

* NavigableString and its subclasses now implement the get_text()
  method, as well as the properties .strings and
  .stripped_strings. These methods will either return the string
  itself, or nothing, so the only reason to use this is when iterating
  over a list of mixed Tag and NavigableString objects. [bug=1904309]

* The 'html5' formatter now treats attributes whose values are the
  empty string as HTML boolean attributes. Previously (and in other
  formatters), an attribute value must be set as None to be treated as
  a boolean attribute. In a future release, I plan to also give this
  behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]

* The 'replace_with()' method now takes a variable number of arguments,
  and can be used to replace a single element with a sequence of elements.
  Patch by Bill Chandos. [rev=605]

* Corrected output when the namespace prefix associated with a
  namespaced attribute is the empty string, as opposed to
  None. [bug=1915583]

* Performance improvement when processing tags that speeds up overall
  tree construction by 2%. Patch by Morotti. [bug=1899358]

* Corrected the use of special string container classes in cases when a
  single tag may contain strings with different containers; such as
  the <template> tag, which may contain both TemplateString objects
  and Comment objects. [bug=1913406]

* The html.parser tree builder can now handle named entities
  found in the HTML5 spec in much the same way that the html5lib
  tree builder does. Note that the lxml HTML tree builder doesn't handle
  named entities this way. [bug=1924908]

* Added a second way to pass specify encodings to UnicodeDammit and
  EncodingDetector, based on the order of precedence defined in the
  HTML5 spec, starting at:
  https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding

  Encodings in 'known_definite_encodings' are tried first, then
  byte-order-mark sniffing is run, then encodings in 'user_encodings'
  are tried. The old argument, 'override_encodings', is now a
  deprecated alias for 'known_definite_encodings'.

  This changes the default behavior of the html.parser and lxml tree
  builders, in a way that may slightly improve encoding
  detection but will probably have no effect. [bug=1889014]

* Improve the warning issued when a directory name (as opposed to
  the name of a regular file) is passed as markup into the BeautifulSoup
  constructor. [bug=1913628]

Signed-off-by: Zang Ruochen <[email protected]>
Signed-off-by: Khem Raj <[email protected]>
Signed-off-by: Trevor Gamblin <[email protected]>
kpawar-sap added a commit to sapcc/requirements that referenced this issue Sep 22, 2021
setuptools 58.0.0 and above fails to install due to issue
pypa/setuptools#2769
kpawar-sap added a commit to sapcc/requirements that referenced this issue Sep 22, 2021
kpawar-sap added a commit to sapcc/loci that referenced this issue Sep 22, 2021
kpawar-sap added a commit to sapcc/loci that referenced this issue Sep 22, 2021
honzajavorek added a commit to juniorguru/junior.guru that referenced this issue Sep 23, 2021
alexeagle pushed a commit to alexeagle/rules_python that referenced this issue Sep 30, 2021
With the recent change in pypa/setuptools#2769, some wheels started to
fail build immediately with an unpinned setuptools in isolation mode.

Signed-off-by: Thulio Ferraz Assis <[email protected]>
@ajnelson-nist
Copy link

@jaraco , I apologize, I just found I'd forgotten to reply.

Your reasoning makes sense, and I won't push the issue further. What I hadn't appreciated fully is that 2to3-related breakage was supposed to be introduced with this major release. Thank you for the kind discussion.

f0rmiga added a commit to alexeagle/rules_python that referenced this issue Oct 27, 2021
With the recent change in pypa/setuptools#2769, some wheels started to
fail build immediately with an unpinned setuptools in isolation mode.

Signed-off-by: Thulio Ferraz Assis <[email protected]>
alexeagle pushed a commit to alexeagle/rules_python that referenced this issue Nov 10, 2021
With the recent change in pypa/setuptools#2769, some wheels started to
fail build immediately with an unpinned setuptools in isolation mode.

Signed-off-by: Thulio Ferraz Assis <[email protected]>
f0rmiga added a commit to alexeagle/rules_python that referenced this issue Nov 16, 2021
With the recent change in pypa/setuptools#2769, some wheels started to
fail build immediately with an unpinned setuptools in isolation mode.

Signed-off-by: Thulio Ferraz Assis <[email protected]>
alexeagle added a commit to bazelbuild/rules_python that referenced this issue Nov 17, 2021
Gazelle plugin

* Add new example to --deleted_packages

* Update examples/build_file_generation/BUILD

Co-authored-by: Jonathon Belotti <[email protected]>

* fix: gazelle:exclude on coarse-grained

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* fix: comment on Kinds()

Co-authored-by: Jonathon Belotti <[email protected]>

* owner: f0rmiga

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* fix: build and setuptools pinned versions

With the recent change in pypa/setuptools#2769, some wheels started to
fail build immediately with an unpinned setuptools in isolation mode.

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* refactor: use local_repository in examples

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* bump: examples Bazel version

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* fix: add missing .gitignore to example

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* refactor: remove python_coarse_grained_generation

Also add the python_generation_mode directive.

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* fix: gazelle spam from org_golang_x_tools

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* revert: example .bazelversion

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* fix: simplify std_modules.py

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* feat: test py_library without __init__.py

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* feat: manifest generation tag manual

Signed-off-by: Thulio Ferraz Assis <[email protected]>

* fix: check std modules last

Performing the check last is more correct and yields better performance,
noticeable on large repositories.

Signed-off-by: Thulio Ferraz Assis <[email protected]>

Co-authored-by: Alex Eagle <[email protected]>
Co-authored-by: Jonathon Belotti <[email protected]>
GuillemCalidae added a commit to calidae/authentication-dummy that referenced this issue Feb 3, 2022
New setuptools versions stop deliberately building packages that use 2to3. See pypa/setuptools#2769

This option should have been removed in commit 3aeb6ff
n1ngu pushed a commit to calidae/authentication-dummy that referenced this issue Feb 4, 2022
New setuptools versions stop deliberately building packages that use 2to3. See pypa/setuptools#2769

This option should have been removed in commit 3aeb6ff
Carthaca pushed a commit to sapcc/requirements that referenced this issue Apr 12, 2022
setuptools 58.0.0 and above fails to install due to issue
pypa/setuptools#2769
Carthaca pushed a commit to sapcc/requirements that referenced this issue Apr 12, 2022
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Nov 30, 2022
4.11.1 (20220408)

This release was done to ensure that the unit tests are packaged along
with the released source. There are no functionality changes in this
release, but there are a few other packaging changes:

* The Japanese and Korean translations of the documentation are included.
* The changelog is now packaged as CHANGELOG, and the license file is
  packaged as LICENSE. NEWS.txt and COPYING.txt are still present,
  but may be removed in the future.
* TODO.txt is no longer packaged, since a TODO is not relevant for released
  code.

4.11.0 (20220407)

* Ported unit tests to use pytest.

* Added special string classes, RubyParenthesisString and RubyTextString,
  to make it possible to treat ruby text specially in get_text() calls.


* It's now possible to customize the way output is indented by
  providing a value for the 'indent' argument to the Formatter
  constructor. The 'indent' argument works very similarly to the
  argument of the same name in the Python standard library's
  json.dump() function.

* If the charset-normalizer Python module
  (https://pypi.org/project/charset-normalizer/) is installed, Beautiful
  Soup will use it to detect the character sets of incoming documents.
  This is also the module used by newer versions of the Requests library.
  For the sake of backwards compatibility, chardet and cchardet both take
  precedence if installed.

* Added a workaround for an lxml bug
  (https://bugs.launchpad.net/lxml/+bug/1948551) that causes
  problems when parsing a Unicode string beginning with BYTE ORDER MARK.


* Issue a warning when an HTML parser is used to parse a document that
  looks like XML but not XHTML.

* Do a better job of keeping track of namespaces as an XML document is
  parsed, so that CSS selectors that use namespaces will do the right
  thing more often.

* Some time ago, the misleadingly named "text" argument to find-type
  methods was renamed to the more accurate "string." But this supposed
  "renaming" didn't make it into important places like the method
  signatures or the docstrings. That's corrected in this
  version. "text" still works, but will give a DeprecationWarning.


* Fixed a crash when pickling a BeautifulSoup object that has no
  tree builder.

* Fixed a crash when overriding multi_valued_attributes and using the
  html5lib parser.

* Standardized the wording of the MarkupResemblesLocatorWarning
  warnings to omit untrusted input and make the warnings less
  judgmental about what you ought to be doing.

* Removed support for the iconv_codec library, which doesn't seem
  to exist anymore and was never put up on PyPI. (The closest
  replacement on PyPI, iconv_codecs, is GPL-licensed, so we can't use
  it--it's also quite old.)

4.10.0 (20210907)

* This is the first release of Beautiful Soup to only support Python
  3. I dropped Python 2 support to maintain support for newer versions
  (58 and up) of setuptools. See:
  pypa/setuptools#2769

* The behavior of methods like .get_text() and .strings now differs
  depending on the type of tag. The change is visible with HTML tags
  like <script>, <style>, and <template>. Starting in 4.9.0, methods
  like get_text() returned no results on such tags, because the
  contents of those tags are not considered 'text' within the document
  as a whole.

  But a user who calls script.get_text() is working from a different
  definition of 'text' than a user who calls div.get_text()--otherwise
  there would be no need to call script.get_text() at all. In 4.10.0,
  the contents of (e.g.) a <script> tag are considered 'text' during a
  get_text() call on the tag itself, but not considered 'text' during
  a get_text() call on the tag's parent.

  Because of this change, calling get_text() on each child of a tag
  may now return a different result than calling get_text() on the tag
  itself. That's because different tags now have different
  understandings of what counts as 'text'.

* NavigableString and its subclasses now implement the get_text()
  method, as well as the properties .strings and
  .stripped_strings. These methods will either return the string
  itself, or nothing, so the only reason to use this is when iterating
  over a list of mixed Tag and NavigableString objects.

* The 'html5' formatter now treats attributes whose values are the
  empty string as HTML boolean attributes. Previously (and in other
  formatters), an attribute value must be set as None to be treated as
  a boolean attribute. In a future release, I plan to also give this
  behavior to the 'html' formatter. Patch by Isaac Muse.

* The 'replace_with()' method now takes a variable number of arguments,
  and can be used to replace a single element with a sequence of elements.
  Patch by Bill Chandos. [rev=605]

* Corrected output when the namespace prefix associated with a
  namespaced attribute is the empty string, as opposed to
  None.

* Performance improvement when processing tags that speeds up overall
  tree construction by 2%. Patch by Morotti.

* Corrected the use of special string container classes in cases when a
  single tag may contain strings with different containers; such as
  the <template> tag, which may contain both TemplateString objects
  and Comment objects.

* The html.parser tree builder can now handle named entities
  found in the HTML5 spec in much the same way that the html5lib
  tree builder does. Note that the lxml HTML tree builder doesn't handle
  named entities this way.

* Added a second way to pass specify encodings to UnicodeDammit and
  EncodingDetector, based on the order of precedence defined in the
  HTML5 spec, starting at:
  https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding

  Encodings in 'known_definite_encodings' are tried first, then
  byte-order-mark sniffing is run, then encodings in 'user_encodings'
  are tried. The old argument, 'override_encodings', is now a
  deprecated alias for 'known_definite_encodings'.

  This changes the default behavior of the html.parser and lxml tree
  builders, in a way that may slightly improve encoding
  detection but will probably have no effect.

* Improve the warning issued when a directory name (as opposed to
  the name of a regular file) is passed as markup into the BeautifulSoup
  constructor.
nikhil pushed a commit to mskcc/rdflib-jsonld that referenced this issue Jan 5, 2024
The resolution of setuptools 2769 made any package using `use_2to3` to
fail its build.  This patch removes the flag, in support of outroducing
rdflib-jsonld.

The test suite is showing other follow-on patches will be necessary to
fix matters 2to3 had been quietly fixing along the way.  However, this
first patch does restore a working call to `pip install .` with
up-to-date setuptools.

Unit test results: This causes only the same five tests as were
previously failing to fail.

setuptools versions tested:
* 41.2.0
* 58.0.4

References:
* pypa/setuptools#2769
* RDFLib/rdflib#1405

Reported-by: Ralf Grubenmann <[email protected]>
Signed-off-by: Alex Nelson <[email protected]>
daregit pushed a commit to daregit/yocto-combined that referenced this issue May 22, 2024
= 4.10.0 (20210907)

* This is the first release of Beautiful Soup to only support Python
  3. I dropped Python 2 support to maintain support for newer versions
  (58 and up) of setuptools. See:
  pypa/setuptools#2769 [bug=1942919]

* The behavior of methods like .get_text() and .strings now differs
  depending on the type of tag. The change is visible with HTML tags
  like <script>, <style>, and <template>. Starting in 4.9.0, methods
  like get_text() returned no results on such tags, because the
  contents of those tags are not considered 'text' within the document
  as a whole.

  But a user who calls script.get_text() is working from a different
  definition of 'text' than a user who calls div.get_text()--otherwise
  there would be no need to call script.get_text() at all. In 4.10.0,
  the contents of (e.g.) a <script> tag are considered 'text' during a
  get_text() call on the tag itself, but not considered 'text' during
  a get_text() call on the tag's parent.

  Because of this change, calling get_text() on each child of a tag
  may now return a different result than calling get_text() on the tag
  itself. That's because different tags now have different
  understandings of what counts as 'text'. [bug=1906226] [bug=1868861]

* NavigableString and its subclasses now implement the get_text()
  method, as well as the properties .strings and
  .stripped_strings. These methods will either return the string
  itself, or nothing, so the only reason to use this is when iterating
  over a list of mixed Tag and NavigableString objects. [bug=1904309]

* The 'html5' formatter now treats attributes whose values are the
  empty string as HTML boolean attributes. Previously (and in other
  formatters), an attribute value must be set as None to be treated as
  a boolean attribute. In a future release, I plan to also give this
  behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]

* The 'replace_with()' method now takes a variable number of arguments,
  and can be used to replace a single element with a sequence of elements.
  Patch by Bill Chandos. [rev=605]

* Corrected output when the namespace prefix associated with a
  namespaced attribute is the empty string, as opposed to
  None. [bug=1915583]

* Performance improvement when processing tags that speeds up overall
  tree construction by 2%. Patch by Morotti. [bug=1899358]

* Corrected the use of special string container classes in cases when a
  single tag may contain strings with different containers; such as
  the <template> tag, which may contain both TemplateString objects
  and Comment objects. [bug=1913406]

* The html.parser tree builder can now handle named entities
  found in the HTML5 spec in much the same way that the html5lib
  tree builder does. Note that the lxml HTML tree builder doesn't handle
  named entities this way. [bug=1924908]

* Added a second way to pass specify encodings to UnicodeDammit and
  EncodingDetector, based on the order of precedence defined in the
  HTML5 spec, starting at:
  https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding

  Encodings in 'known_definite_encodings' are tried first, then
  byte-order-mark sniffing is run, then encodings in 'user_encodings'
  are tried. The old argument, 'override_encodings', is now a
  deprecated alias for 'known_definite_encodings'.

  This changes the default behavior of the html.parser and lxml tree
  builders, in a way that may slightly improve encoding
  detection but will probably have no effect. [bug=1889014]

* Improve the warning issued when a directory name (as opposed to
  the name of a regular file) is passed as markup into the BeautifulSoup
  constructor. [bug=1913628]

Signed-off-by: Zang Ruochen <[email protected]>
Signed-off-by: Khem Raj <[email protected]>
Signed-off-by: Trevor Gamblin <[email protected]>
daregit pushed a commit to daregit/yocto-combined that referenced this issue May 22, 2024
= 4.10.0 (20210907)

* This is the first release of Beautiful Soup to only support Python
  3. I dropped Python 2 support to maintain support for newer versions
  (58 and up) of setuptools. See:
  pypa/setuptools#2769 [bug=1942919]

* The behavior of methods like .get_text() and .strings now differs
  depending on the type of tag. The change is visible with HTML tags
  like <script>, <style>, and <template>. Starting in 4.9.0, methods
  like get_text() returned no results on such tags, because the
  contents of those tags are not considered 'text' within the document
  as a whole.

  But a user who calls script.get_text() is working from a different
  definition of 'text' than a user who calls div.get_text()--otherwise
  there would be no need to call script.get_text() at all. In 4.10.0,
  the contents of (e.g.) a <script> tag are considered 'text' during a
  get_text() call on the tag itself, but not considered 'text' during
  a get_text() call on the tag's parent.

  Because of this change, calling get_text() on each child of a tag
  may now return a different result than calling get_text() on the tag
  itself. That's because different tags now have different
  understandings of what counts as 'text'. [bug=1906226] [bug=1868861]

* NavigableString and its subclasses now implement the get_text()
  method, as well as the properties .strings and
  .stripped_strings. These methods will either return the string
  itself, or nothing, so the only reason to use this is when iterating
  over a list of mixed Tag and NavigableString objects. [bug=1904309]

* The 'html5' formatter now treats attributes whose values are the
  empty string as HTML boolean attributes. Previously (and in other
  formatters), an attribute value must be set as None to be treated as
  a boolean attribute. In a future release, I plan to also give this
  behavior to the 'html' formatter. Patch by Isaac Muse. [bug=1915424]

* The 'replace_with()' method now takes a variable number of arguments,
  and can be used to replace a single element with a sequence of elements.
  Patch by Bill Chandos. [rev=605]

* Corrected output when the namespace prefix associated with a
  namespaced attribute is the empty string, as opposed to
  None. [bug=1915583]

* Performance improvement when processing tags that speeds up overall
  tree construction by 2%. Patch by Morotti. [bug=1899358]

* Corrected the use of special string container classes in cases when a
  single tag may contain strings with different containers; such as
  the <template> tag, which may contain both TemplateString objects
  and Comment objects. [bug=1913406]

* The html.parser tree builder can now handle named entities
  found in the HTML5 spec in much the same way that the html5lib
  tree builder does. Note that the lxml HTML tree builder doesn't handle
  named entities this way. [bug=1924908]

* Added a second way to pass specify encodings to UnicodeDammit and
  EncodingDetector, based on the order of precedence defined in the
  HTML5 spec, starting at:
  https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding

  Encodings in 'known_definite_encodings' are tried first, then
  byte-order-mark sniffing is run, then encodings in 'user_encodings'
  are tried. The old argument, 'override_encodings', is now a
  deprecated alias for 'known_definite_encodings'.

  This changes the default behavior of the html.parser and lxml tree
  builders, in a way that may slightly improve encoding
  detection but will probably have no effect. [bug=1889014]

* Improve the warning issued when a directory name (as opposed to
  the name of a regular file) is passed as markup into the BeautifulSoup
  constructor. [bug=1913628]

Signed-off-by: Zang Ruochen <[email protected]>
Signed-off-by: Khem Raj <[email protected]>
Signed-off-by: Trevor Gamblin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants