Diagnosing Java + Oracle Performance with Dumps and Logs

Sep 30, 2025

Performance issues in Java applications backed by Oracle DB often present as generic "slowness." In this post, we walk through a practical workflow: capturing thread dumps, analyzing GC logs, taking heap histograms/dumps, correlating with Oracle AWR/ASH, and mapping symptoms to root causes.

Java + Oracle Performance Diagnostics

0) Quick Triage Flow (what to grab first)

1. High CPU or “app is hung” right now? - Capture 3 × thread dumps 10s apart → analyze for deadlocks, hotspots, blocked JDBC. - Grab GC log segment covering the slowdown. - If memory pressure suspected, trigger a heap histogram (fast) then a heap dump (slower; do it off-peak if huge).

2. Spikes when calling Oracle? - Capture thread dumps (look for oracle.jdbc. frames waiting on I/O). - From DB side, pull AWR/ASH for the same time window.

3. Latency every few minutes? - Check GC log for STW pauses, mixed cycles, promotion failures, humongous alloc (G1).

4. Steady memory climb? - Heap dumps (before/after) → MAT dominator tree → GC roots.

---

1) Safe & repeatable data capture

1.1 Identify the JVM

bash
jcmd                             # lists JVMs
ps -ef | grep java               # find the target PID
export PID=<java-pid>


1.2 Thread dumps (low overhead)

bash jcmd $PID Thread.print > tdump_$(date +%F_%T).txt

or

jstack -l $PID > tdump_$(date +%F_%T).txt


Best practice: take 3 dumps 10 seconds apart to distinguish transient vs. persistent states.
1.3 Heap snapshot options

 Fast first look (histogram):

bash jcmd $PID GC.class_histogram > histo_$(date +%F_%T).txt
or
jmap -histo $PID > histo_$(date +%F_%T).txt

 Full heap dump (can pause briefly, large files):

bash jcmd $PID GC.heap_dump filename=/tmp/heap_$(date +%F_%T).hprof

or (Java 8):

jmap -dump:format=b,file=/tmp/heap_$(date +%F_%T).hprof $PID


1.4 GC logs

 Java 9+:

bash -Xlog:gc,gc+age=trace:file=/var/log/app/gc.log:time,uptime,level,tags


 Java 8:

bash -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/app/gc.log -XX:+PrintTenuringDistribution
1.5 JFR (Java Flight Recorder)
bash jcmd $PID JFR.start name=diag settings=profile filename=/tmp/jfr_$(date +%F_%T).jfr duration=120s
1.6 App & OS event logs

App logs (ERROR/WARN around incident). Container/host logs:dmesg, journalctl, sar, vmstat, iostat, top. Connection pool debug (timeouts, wait times).


1.7 Oracle DB

AWR report for the incident window. ASH report (fine-grained timeline). Top SQL by elapsed time / buffer gets / executions. Wait event breakdown. Optionally SQL Trace (10046) +tkprof.

--- 2) Analyzing thread dumps What to look for

States: RUNNABLE, BLOCKED, WAITING. Hot methods at the top of many stacks. Locks/monitors:

One monitor owner, many blocked threads → contention. Deadlock section (jstack prints automatically). JDBC:oracle.jdbc.driver. or socket reads = DB latency.

Many threads stuck in getConnection = DB pool starvation.

Rapid checklist

GC threads dominating? (GC Thread, G1Conc) Many threads insynchronizedblocks? Workers stuck on CompletableFuture.join() / CountDownLatch.await()? N+1 query patterns?


---
3) Heap histogram & dump

Histogram first

Check retained size: byte[], char[], HashMap, ArrayList, caches. Classloader leaks: manyClass/ClassLoader. Lots of duplicate char[] (enable string deduplication).


Full dump (MAT/VisualVM)

1. Open .hprofin Eclipse MAT → Leak Suspects. 2. Dominator Tree: sort by retained heap. 3. Leak patterns:


    Unbounded caches.
    Static maps holding sessions.
    ThreadLocals not cleared.
    Large arrays never freed.
    Redeploy classloader leaks.

Memory pressure patterns

High allocation churn → optimize hot paths. G1 humongous objects → tune-XX:G1HeapRegionSize.

--- 4) GC log analysis Tools

 GCViewer, GCeasy.
Key signals

 STW duration & frequency.
 Allocation rate.
 Promotion failure.
 CMS concurrent mode failure / G1 mixed GC storms.
 Humongous allocations.

G1 quick mapping

 Rising old gen during Mixed GCs → leak/retained set.
 Long Remark/Cleanup → huge live set.
 Frequent Full GCs → leak or undersized heap.
---
5) Application stack traces

 Timeout storms (JDBC/socket).
 OOM or StackOverflowError.
 Retry loops, circuit breaker opens.
 Profile hot methods with JFR.
---
6) Oracle DB angle

DB-side

AWR Top Events: db file scattered read, log file sync, row lock contention, cursor: mutex S.Top SQL: elapsed time, gets. ASH timeline: correlate with app pauses. Temp/sort usage: spills to disk.


JVM-side

 Threads waiting in JDBC + high DB waits = DB bottleneck.
 Pool metrics: wait time, active vs max.
 Fix:

Add indexes, avoid full scans. Batch ops, fewer commits. Resize pool only if DB has headroom. Backpressure/timeouts. Use SQL tracing +EXPLAIN PLAN.


---
7) Symptom → Cause map

| Symptom          | Likely Cause                 | Confirm            | Fix                                    |
| ---------------- | ---------------------------- | ------------------ | -------------------------------------- |
| Long GC pauses   | Heap undersized / high alloc | GC log             | Right-size, tune G1                    |
| RSS growth → OOM | Leak                         | MAT dominator tree | Fix leak, evict                        |
| CPU pegged       | Hot loop                     | JFR hotspot        | Optimize/caching                       |
| BLOCKED threads  | Lock contention              | Thread dumps       | Refactor locking                       |
| JDBC waits       | DB bottleneck                | Thread dumps + AWR | Index/tune, pool                       |
| Frequent Full GC | Old gen pressure             | GC logs            | Increase heap, reduce long-lived alloc |
| Metaspace OOM    | CL leak                      | Heap dump          | Fix redeploy pattern                   |
---
8) Minimal capture script

bash #!/usr/bin/env bash set -euo pipefail PID="${1:-$(jcmd | awk '/java/{print $1; exit}')}"

STAMP=$(date +%F_%H%M%S) OUT="diag_$STAMP"; mkdir -p "$OUT"

echo "[] Thread dumps..." for i in 1 2 3; do jcmd $PID Thread.print > "$OUT/tdump_${i}.txt" sleep 10 done

echo "[] Heap histogram..." jcmd $PID GC.class_histogram > "$OUT/histo.txt"

echo "[] 120s JFR..." jcmd $PID JFR.start name=diag settings=profile filename="$OUT/prof.jfr" duration=120s

echo "[] JVM & OS stats..." jcmd $PID VM.flags > "$OUT/vm_flags.txt" jcmd $PID VM.system_properties > "$OUT/sysprops.txt" (top -b -n 1; free -m; vmstat 1 5; iostat -xz 1 5) > "$OUT/os_snapshot.txt"

echo "Copy GC + app logs into $OUT/" echo "Done: $OUT"


---
9) Quick tuning

JVM (G1 starter flags)

bash -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=30 -XX:+ParallelRefProcEnabled -XX:+UseStringDeduplication


Connection pools (HikariCP)

ini maximumPoolSize = match DB cores connectionTimeout = 3s–10s idleTimeout = 60s–300s maxLifetime = 30m


Resilience

 Per-call timeouts.
 Bulkheads & circuit breakers.
 Backpressure.

---
10) Example: 2–4s latency spikes

1. GC log: Mixed GC every ~2m, 1.8s STW. 2. Histogram: manybyte[]from JSON; alloc ~1–2 GB/s. 3. JFR hotspot: Jackson serialization. 4. Fix: cache/precompute JSON → GCs <200ms, p99 normalized.
--- 11) Example: Throughput tanks at peak

1. Thread dumps: many threads in oracle.jdbc.reads; others waiting in pool. 2. AWR:db file sequential read, bad SQL_ID. 3. Plan: full scan due to missing index. 4. Fix: composite index + gather stats → pool wait gone, CPU normal.


---
12) Tooling shortlist

JDK: jcmd, jstack, jmap, jstat, JFR/JMCHeap: Eclipse MAT, VisualVM GC: GCViewer, GCeasy Profiling: JFR, async-profiler * DB: AWR, ASH, ADDM,tkprof, SQL Monitor, EXPLAIN PLAN


---
Wrap-up

Follow the flow: thread dumps → GC logs → heap histograms/dumps → JFR → AWR/ASH.
Correlate timestamps ruthlessly.
Section 7’s table maps symptoms to root causes and fixes.

Would you like me to also prep this as a ready-to-drop Markdown blog post (with proper front matter YAML so your blog system will render it)? `