Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for alternative encodings #39

Merged
merged 1 commit into from
Jul 20, 2020
Merged

Conversation

camdencheek
Copy link
Contributor

Description of Changes

This commit adds support for all encodings supported by the
x/text/encoding package. This can be configured with the new
encoding parameter of the file input.

A limitation of the current implementation is that it does not respect
the BOM in UTF16, so users will have to explicitly choose utf-16le or
utf16-be

Description of Changes

Please check that the PR fulfills these requirements

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Add a changelog entry (for non-trivial bug fixes / features)
  • CI passes

This commit adds support for all encodings supported by the
`x/text/encoding` package. This can be configured with the new
`encoding` parameter of the file input.

A limitation of the current implementation is that it does not respect
the BOM in UTF16, so users will have to explicitly choose `utf-16le` or
`utf16-be`
@camdencheek camdencheek force-pushed the multibyte-character branch from a2db099 to 6b0fccc Compare July 20, 2020 15:14
@djaglowski
Copy link
Member

Log Files Logs / Second CPU Average (%) Memory Average (MB)
1 1000 2.672506 27.110857
1 5000 6.413973 34.212418
1 10000 10.276125 44.73141
1 50000 35.895184 131.51764
1 100000 64.899315 205.3909
10 100 4.0863647 29.544855
10 500 7.3966722 34.274784
10 1000 13.086602 44.794044
10 5000 44.208054 104.66851
10 10000 76.07151 178.36543

@codecov
Copy link

codecov bot commented Jul 20, 2020

Codecov Report

Merging #39 into master will decrease coverage by 0.14%.
The diff coverage is 69.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #39      +/-   ##
==========================================
- Coverage   75.44%   75.30%   -0.14%     
==========================================
  Files          61       61              
  Lines        3718     3753      +35     
==========================================
+ Hits         2805     2826      +21     
- Misses        690      697       +7     
- Partials      223      230       +7     
Impacted Files Coverage Δ
plugin/builtin/input/file/file.go 78.34% <60.00%> (-1.52%) ⬇️
plugin/builtin/input/file/read_to_end.go 68.42% <73.33%> (-1.73%) ⬇️
plugin/builtin/input/file/line_splitter.go 92.59% <76.47%> (-2.86%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89dbd2e...6b0fccc. Read the comment docs.

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Nice tests.

@@ -4,6 +4,9 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to capture it this way

@camdencheek camdencheek merged commit 3133779 into master Jul 20, 2020
@camdencheek camdencheek deleted the multibyte-character branch July 20, 2020 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants