-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove uses of Charset name parsing #85795
Conversation
There are many places in Elasticsearch which must decode some stream of bytes into characters. Most of the time this is expected to be UTF-8 encoded data, and we hardcode that charset name. However, methods in the JDK that take a String charset name require catching UnsupportedEncodingException. Yet most of these APIs also has a variant of the same methods which take a known Charset instance, for which we can use StandardCharsets.UTF_8. This commit converts most instances of passing string charset names to use a Charset instance.
Pinging @elastic/es-core-infra (Team:Core/Infra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Is there any scope for adding some forbidden APIs rules to avoid more instances being added?
Unfortunately this doesn't quite get rid of all of the uses because there are some older apis that simply don't have methods taking Charset. I will separately investigate if we can forbid the ones I can find that do take Charset. |
* upstream/master: (40 commits) Fix BuildTests serialization (elastic#85827) Use urgent priority for node shutdown cluster state update (elastic#85838) Remove Task classes from HLRC (elastic#85835) Remove unused migration classes (elastic#85834) Remove uses of Charset name parsing (elastic#85795) Remove legacy versioned logic for DefaultSystemMemoryInfo (elastic#85761) Expose proxy settings for GCS repositories (elastic#85785) Remove SLM classes from HLRC (elastic#85825) TSDB: fix the time_series in order collect priority (elastic#85526) Remove ILM classes from HLRC (elastic#85822) FastVectorHighlighter should use ValueFetchers to load source data (elastic#85815) Iteratively execute synchronous ingest processors (elastic#84250) Remove TransformClient from HLRC (elastic#85787) Mute XPackRestIT deprecation/10_basic/Test Deprecations (elastic#85807) Unmute Lintian packaging test (elastic#85778) Add a highlighter unit test base class (elastic#85719) Remove NIO Transport Plugin (elastic#82085) [TEST] Remove token methods from HLRC SecurityClient (elastic#85515) [Test] Use thread-safe hashSet for result collection (elastic#85653) [TEST] Mute BuildTests.testSerialization (elastic#85801) ... # Conflicts: # server/src/test/java/org/elasticsearch/search/aggregations/timeseries/TimeSeriesIndexSearcherTests.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…n/elasticsearch into datastream-reuse-pipeline-source * 'datastream-reuse-pipeline-source' of github.com:weizijun/elasticsearch: (28 commits) Add JDK 19 to Java testing matrix [ML] add nlp config update serialization tests (elastic#85867) [ML] A text categorization aggregation that works like ML categorization (elastic#80867) [ML] Fix serialisation of text embedding updates (elastic#85863) TSDB: fix wrong initial value of tsidOrd in TimeSeriesIndexSearcher (elastic#85713) Enforce external id uniqueness during DesiredNode construction (elastic#84227) Fix Intellij integration (elastic#85866) Upgrade Azure SDK to version 12.14.4 (elastic#83884) [discovery-gce] Fix initialisation of transport in FIPS mode (elastic#85817) Remove unnecessary docs/changelog/85534.yaml Prevent ThreadContext header leak when sending response (elastic#68649) Add support for impact_areas to health impacts (elastic#85830) Reduce port range re-use in tests (elastic#85777) Fix TranslogTests#testStats (elastic#85828) Remove hppc from cat allocation api (elastic#85842) Fix BuildTests serialization (elastic#85827) Use urgent priority for node shutdown cluster state update (elastic#85838) Remove Task classes from HLRC (elastic#85835) Remove unused migration classes (elastic#85834) Remove uses of Charset name parsing (elastic#85795) ...
There are many places in Elasticsearch which must decode some stream of
bytes into characters. Most of the time this is expected to be UTF-8
encoded data, and we hardcode that charset name. However, methods in the
JDK that take a String charset name require catching
UnsupportedEncodingException. Yet most of these APIs also has a variant
of the same methods which take a known Charset instance, for which we
can use StandardCharsets.UTF_8. This commit converts most instances of
passing string charset names to use a Charset instance.