Memory diagnosis for java application on kubernetes

2023-02-08 / modified at 2024-07-17 / 1k words / 6 mins

This post will introduce how to analyse native and JVM memory on containers.

Prequenence

Following terminologies should be resolved before the diagnosis.

You should known RSS in Linux.
You should known JVM memory layout spec.
RssAnon: mapped anon pages, or RSS, it dosn’t contains any file cache.
RssFile: cache pages, or Cache, reserves roughly 15MB for JVM.
CGroup memory footprint: accounts memory for RssAnon and RssFile(when used frequently)
Swap: Swapness are disabled by default, we use RAM only for performance.

How memory is managed

Memory footprint in JVM

JVM can be divided into its specs and implementations. Open source implemetations for OpenJDK are HotSpot(Originally by Oracle) and OpenJ9(Originally by IBM). But it’s mostly HotSpot based JVM that is preferred, which can be freely downloaded at eclipse temurin.

Keep in mind, all memory list above can throw OutOfMemoryError when insufficient. And once more, there are no concept of young generation gc and jvmoptions for now.

https://docs.oracle.com/javase/specs/jvms/se17/html/index.html

Memory footprint in a container

To see the RAM limit inside a container

# kubectl exec <podname> -n <ns_name> -it -- sh
cat /sys/fs/cgroup/memory/memory.usage_in_bytes
# hard limit MB, will be killed when OOM
echo "$(cat /sys/fs/cgroup/memory/memory.limit_in_bytes)/1024/1024" | bc
# soft limit MB, not used in kubernetes but used in Nomad
echo "$(cat /sys/fs/cgroup/memory/memory.soft_limit_in_bytes)/1024/1024" | bc

To see current RAM usage

# a fuzz value
echo "$(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)/1024/1024" | bc
# exact memory usage caculated by cgroup
echo "$(cat /sys/fs/cgroup/memory/memory.stat | grep '^rss ' | awk '{print $2}')"/1024/1024 | bc
# exact RAM usage in proc
cat /proc/1/status |grep '^Rss'

You could learn CGroup for better understanding. And Java Native Memory Tracking(NMT) can be used together with cgroup.

Memory footprint in an orchestrator

Application running on kubernetes can’t make full use of the memory.

On an 8GB VM, we have to reserve 1G for OS and kubernetes. Click here fore more.

Build a java based docker image

Choose the base image

It is eclipse-temurin, formerly known as AdoptOpenJDK that is widely used among developers. Here are examples

SpringBoot documents for beginners.

Here are some production proven open source images

Sonarqube, properly configured with non-root, gpg signs and chmod.
Jenkins, with a multi-stage build.
PlantUML for war applications.

Reduce JRE size by 100MB

It’s just for development use when use JRE based docker image. In production, we could use Multi-stage builds to reduce JRE size by 100MB.

Configuration in orchestrator

Configure JVM heap Max RAM Percentage through CGroup limit

Java 11+ can directly configure reservable memory from the hard limit in CGroups.

## JVM reads from cgroup files
## mainly in /sys/fs/cgroup/memory/memory.limit_in_bytes
java -Xlog:os+container=trace -version
## RSS Memory calculated by JVM
echo $(java -XX:+PrintFlagsFinal -version 2>&1 | grep -E 'MaxHeapSize' | grep 'product' | awk '{print $4}')/1024/1024*4 | bc

Since we have the limit, the parameter -Xmx is no more required, use -XX:MaxRAMPercentage instead. For example, you can check your RAM with following cmd:

1 2	java -XX:+PrintFlagsFinal -XX:MaxRAMPercentage=70 -version \ \| grep -E "MaxRAMPercentage \| MaxHeapSize"

To permanently alter the default config, pass the environment in pods or nomad HCL.

1	JAVA_TOOL_OPTIONS="-XX:MaxRAMPercentage=70"

The percentage requires a accurately estimation depending on your workloads. Allocating 90% memory is not as ideal as the default value. I have tested that allocating more than 70% RAM for heap might be more likely to be OOM-Killed by Kernel, making the developer impossible to dump a hprof file.

Configure RSS in orchestrator

To config a nomad job, configure the memory attribute for oversubscription.

# Mebibyte
resources {
  # the equivalence of memory.soft_limit_in_bytes/10^20
  memory = 400
  # the equivalence of memory.limit_in_bytes/10^20
  memory_max = 520
}

To config a pod in Kubernetes, configure the resources attribute.

# Mi means Mebibyte
# controlled by memory.limit_in_bytes
resources:
  requests:
    memory: 400Mi
  limits:
    memory: 520Mi

For example, we have an application with 3GB heap memory reserverd, we might consider following

Heap size: 3G, with MaxRAMPercentage=70~75%
- for typical workload: 2.4GB
- extra for peak periods request: 0.6GB
Hard limit(limits in kubernetes): 3GB/0.75=4G
Soft limit(requests in kubernetes): 2.4GB/0.75=3.2G

To use a calculator, click here

Soft limit is not designed for peak request, use Horizontal Pod Autoscaling instead.

Improve observability

Collect performance metrics with APM

Here are some java agent based solution to collect realtime JVM memory usage and send to a centralized database. A sidecar jar needs to be packed into the image.

Free: Uber JVM Profiler Sending Metrics to Kafka. The development seems no more under maintenance, but only a few lines make it easy to be customized.
Free: OpenTelemetry
Free: Prometheus JMX Exporter
Paid: DataDog with managed dashboard service.
Paid: Buying enterprise JVM from Azul Mission Control.

Free versions require a SRE team to maintain the TSDB and dashboard. For more solutions, check out at OpenAPM

Following metrics are important

JVM heap eden space commited/used
JVM heap young space commited/used
RssAnon and RssCache: used for estimate the percentage of heap.

OOM Killer in JVM

pass -XX:+HeapDumpOnOutOfMemoryError to save hpref files in the pod. Howerver, your pod might be destroyed when health check fails.

Troubleshoot memory issues on kubernetes

At first, read the official documents before step in.

When your application slows down or crashes by OutOfMemoryError, it usually leads by

Heap exceeds Linux memory quota.
- Heavy/Buster work loads
- Memory leak
Max thread exceeds Linux process quota: unable to create new native thread
- Native threads/stacks/descriptor get leak or exceed the quota in ulimit.
- Misuse -Xss, we could use -Xss512k to reduce the size.

Docker Kubernetes