Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove uses of Charset name parsing #85795

Merged
merged 2 commits into from
Apr 12, 2022
Merged

Conversation

rjernst
Copy link
Member

@rjernst rjernst commented Apr 11, 2022

There are many places in Elasticsearch which must decode some stream of
bytes into characters. Most of the time this is expected to be UTF-8
encoded data, and we hardcode that charset name. However, methods in the
JDK that take a String charset name require catching
UnsupportedEncodingException. Yet most of these APIs also has a variant
of the same methods which take a known Charset instance, for which we
can use StandardCharsets.UTF_8. This commit converts most instances of
passing string charset names to use a Charset instance.

There are many places in Elasticsearch which must decode some stream of
bytes into characters. Most of the time this is expected to be UTF-8
encoded data, and we hardcode that charset name. However, methods in the
JDK that take a String charset name require catching
UnsupportedEncodingException. Yet most of these APIs also has a variant
of the same methods which take a known Charset instance, for which we
can use StandardCharsets.UTF_8. This commit converts most instances of
passing string charset names to use a Charset instance.
@rjernst rjernst added :Core/Infra/Core Core issues without another label >refactoring v8.3.0 labels Apr 11, 2022
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Apr 11, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@pugnascotia pugnascotia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Is there any scope for adding some forbidden APIs rules to avoid more instances being added?

@rjernst
Copy link
Member Author

rjernst commented Apr 12, 2022

Unfortunately this doesn't quite get rid of all of the uses because there are some older apis that simply don't have methods taking Charset. I will separately investigate if we can forbid the ones I can find that do take Charset.

@rjernst rjernst merged commit f0d0c37 into elastic:master Apr 12, 2022
@rjernst rjernst deleted the utf8_charset branch April 12, 2022 19:05
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Apr 13, 2022
* upstream/master: (40 commits)
  Fix BuildTests serialization (elastic#85827)
  Use urgent priority for node shutdown cluster state update (elastic#85838)
  Remove Task classes from HLRC (elastic#85835)
  Remove unused migration classes (elastic#85834)
  Remove uses of Charset name parsing (elastic#85795)
  Remove legacy versioned logic for DefaultSystemMemoryInfo (elastic#85761)
  Expose proxy settings for GCS repositories (elastic#85785)
  Remove SLM classes from HLRC (elastic#85825)
  TSDB: fix the time_series in order collect priority (elastic#85526)
  Remove ILM classes from HLRC (elastic#85822)
  FastVectorHighlighter should use ValueFetchers to load source data (elastic#85815)
  Iteratively execute synchronous ingest processors (elastic#84250)
  Remove TransformClient from HLRC  (elastic#85787)
  Mute XPackRestIT deprecation/10_basic/Test Deprecations (elastic#85807)
  Unmute Lintian packaging test (elastic#85778)
  Add a highlighter unit test base class (elastic#85719)
  Remove NIO Transport Plugin (elastic#82085)
  [TEST] Remove token methods from HLRC SecurityClient (elastic#85515)
  [Test] Use thread-safe hashSet for result collection (elastic#85653)
  [TEST] Mute BuildTests.testSerialization (elastic#85801)
  ...

# Conflicts:
#	server/src/test/java/org/elasticsearch/search/aggregations/timeseries/TimeSeriesIndexSearcherTests.java
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

weizijun added a commit to weizijun/elasticsearch that referenced this pull request Apr 13, 2022
…n/elasticsearch into datastream-reuse-pipeline-source

* 'datastream-reuse-pipeline-source' of github.com:weizijun/elasticsearch: (28 commits)
  Add JDK 19 to Java testing matrix
  [ML] add nlp config update serialization tests (elastic#85867)
  [ML] A text categorization aggregation that works like ML categorization (elastic#80867)
  [ML] Fix serialisation of text embedding updates (elastic#85863)
  TSDB: fix wrong initial value of tsidOrd in TimeSeriesIndexSearcher (elastic#85713)
  Enforce external id uniqueness during DesiredNode construction (elastic#84227)
  Fix Intellij integration (elastic#85866)
  Upgrade Azure SDK to version 12.14.4 (elastic#83884)
  [discovery-gce] Fix initialisation of transport in FIPS mode (elastic#85817)
  Remove unnecessary docs/changelog/85534.yaml
  Prevent ThreadContext header leak when sending response (elastic#68649)
  Add support for impact_areas to health impacts  (elastic#85830)
  Reduce port range re-use in tests (elastic#85777)
  Fix TranslogTests#testStats (elastic#85828)
  Remove hppc from cat allocation api (elastic#85842)
  Fix BuildTests serialization (elastic#85827)
  Use urgent priority for node shutdown cluster state update (elastic#85838)
  Remove Task classes from HLRC (elastic#85835)
  Remove unused migration classes (elastic#85834)
  Remove uses of Charset name parsing (elastic#85795)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >refactoring Team:Core/Infra Meta label for core/infra team v8.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants