Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conversion from Reader to InputStream #5376

Closed
Marcono1234 opened this issue Jan 2, 2021 · 9 comments
Closed

Add conversion from Reader to InputStream #5376

Marcono1234 opened this issue Jan 2, 2021 · 9 comments

Comments

@Marcono1234
Copy link
Contributor

It appears there are use cases where one wants to convert a java.io.Reader to a java.io.InputStream using a given charset, see this Stack Overflow question or this Baeldung article.

Guava already has this functionality implemented as ReaderInputStream, but only exposes it for CharSource (therefore not for Reader), see also #642 (comment).

Would it make sense to expose it also for Reader, e.g. as com.google.common.io.CharStreams.asInputStream(Reader, Charset)?

Considerations:

  • Users might want to specify buffer size?
  • Users very likely want to specify behavior on unencodable chars, currently ReaderInputStream uses CodingErrorAction.REPLACE for them
@pt657407064
Copy link


  public static InputStream asInputStream(Reader reader, Charset charset, int bufferSize,
                                          CodingErrorAction newAction) {
    return new ReaderInputStream(reader,
            charset.newEncoder()
                    .onMalformedInput(newAction)
                     .onUnmappableCharacter(newAction),
            bufferSize);
  }

Something looks like this maybe? @Marcono1234

@Marcono1234
Copy link
Contributor Author

Yes that looks reasonable, but it would probably be good to have an asInputStream(Reader, Charset) method for simple use cases as well.
Though I am not sure if this feature is actually something the maintainers want to add.

@Marcono1234
Copy link
Contributor Author

Marcono1234 commented Oct 30, 2021

@ManishOffi, I am not a member of this project, so I cannot assign you, sorry.

@ghost ghost deleted a comment Nov 8, 2021
@louiscb
Copy link

louiscb commented Feb 28, 2022

Hi @Marcono1234 & @pt657407064 is this issue still relevant? I'd like to contribute and try to fix it with a PR.

@eamonnmcmanus
Copy link
Member

It is possible for anyone to do this with the existing API, like this:

public static InputStream asInputStream(Reader reader, Charset charset) throws IOException {
  return new CharSource() {
    @Override public Reader openStream() {
      return reader;
    }
  }.asByteSource(charset).openStream();
}

Not very elegant, but the com.google.common.io API does put the emphasis on reusable sources and sinks rather than one-shot streams like Reader and InputStream.

@louiscb
Copy link

louiscb commented Mar 1, 2022

Great thanks @eamonnmcmanus. Can I be assigned to the issue?

@eamonnmcmanus
Copy link
Member

The Guava team has talked this over and we don't think there's enough evidence of demand for such a method. The com.google.common.io package has existed at Google for 15 years and we seem to have got by just fine all that time without the method. People who do need it for some reason can do what I suggested above.

@kevinb9n
Copy link
Contributor

kevinb9n commented Mar 2, 2022

I think also that the functionality is slightly unusual. Normally encoding is part of writing, and decoding associated with reading. (Where "encode/decode" really mean "transcode from/to utf-16".) It's enough that I would at least inquire into how the user got into that situation.

@bjmi
Copy link

bjmi commented Mar 27, 2023

@eamonnmcmanus Unfortunately CharSource.AsByteSource.openStream() creates a ReaderInputStream that silently replaces and thus ignores conversion errors. This could lead to corrupt data, especially bad if it's not detected over a longer period of time.

.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE),
It would be great if there is an overload of CharSource.asByteSource(...) with CharsetEncoder. Alternatively ReaderInputStream could be made public then a CharsetEncoder can be passed.
ReaderInputStream(Reader reader, CharsetEncoder encoder, int bufferSize) {

@kevinb9n [...] It's enough that I would at least inquire into how the user got into that situation.

ReaderInputStream from Apache Commons explains it quite good. One uses a third-party API that requires an InputStream and the actual data is character based like a String or a Reader and just an adapter is needed.

Note that while there are use cases where there is no alternative to using this class, very often the need to use this class is an indication of a flaw in the design of the code. This class is typically used in situations where an existing API only accepts an InputStream, but where the most natural way to produce the data is as a character stream, i.e. by providing a Reader instance. [...]

Guava should provide such an adapter too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants