️This article has been over 1 years since the last update.
Recently I’m working on a project with the IO streams, here are some details.
How does stream get transferred?
Basicly, streams can be copied via a method called transferTo
, creating an 8k memory chunk where bytes get accumulated and emit at once.
1 | // transferTo or IOUtils.copyLarge or BufferedInputStream are roughly the same |
In most cases, using as few as 8k chunk is enough while handling a continuous data stream, including calling an API with JSON, downloading a blob, and forwarding a socket. For example, JFrog Artifactory, a production-proven binaray storage solution, use FileUtils.copyInputStreamToFile internally.
Should I use a buffer for performance?
We might add a redundant buffer via BufferedInputStream for “performance” in transmission. But it is often a misused buffer that leads to insufficient performance. Here is an example.
1 | InputStream fis = Files.newInputStream(Paths.get("/512MB.dmg"))); |
The unaligned chunk will hit the feek
and fstat
system call in each loop, making a performance loss in transmission. Even if the buffer size is fixed to a multiple of 8k, the insufficiency can’t be fixed duo to the overhead of memcpy
call in a loop.
Next, let’s profile some popular open source frameworks with the following code.
1 | // only okio example here |
Here is the result. The following percentage in the lists indicates how long does ChannelInputStream take in an execution.
- No buffer, 96%
- BufferedInputStream, 8k buffer, 94.6% (actually no cache is used)
- BufferedInputStream, 8k + 1 buffer, 59.38% (wastes on fseek, fstat, memcpy)
- BufferedInputStream, 16k and more buffer, 85.86% (wastes on memcpy)
- Okio, 8k buffer, 84% (wastes on segment maintenance)
As seen above, no buffer is the best for simple and continuous transmission.
It is relatively imperative to profilie use cases before introducing a framework.
Search on StackOverflow, the only use case for BufferedInputStream is writing codec and parser while reading byte by byte. But if someone has the skill to write a parser, I believe he can also complete a more complicated buffering mechanism ranther than a simple array.
Here are some real use cases in famous projects
- Okhttp, a popular HTTP written in Java, uses Okio internally.
- Grasscutter, a game server, uses KcpChannel and io.netty.buffer internally.
- opentelemetry-java, a java metric client, uses Protobuf and manages flushing and buffering by itself.
- Apache mina, a java SSH implementation, implements pointers and arrays by itself.
Appendix
Timeout
Timeout can be implemented by
- checking with
throwIfReached
in each while loop in Okio - Guava’s SimpleTimerTask through Future.get()
- Synchronized code fragment with object.wait().
Zero-Copy
- If you want to reduce memcpy and context switch overhead, some low-level skills such as Java NIO or MMAP will be required.
- If you need zero-copy at the CPU level, RoCE Ethernet cards over RDMA will be required for network offloads.