Diagnosing Java + Oracle Performance with Dumps and Logs
Performance issues in Java applications backed by Oracle DB often present as generic "slowness." In this post, we walk through a practical workflow: capturing thread dumps, analyzing GC logs, taking heap histograms/dumps, correlating with Oracle AWR/ASH, and mapping symptoms to root causes.
Java + Oracle Performance Diagnostics
0) Quick Triage Flow (what to grab first)
1. High CPU or “app is hung” right now? - Capture 3 × thread dumps 10s apart → analyze for deadlocks, hotspots, blocked JDBC. - Grab GC log segment covering the slowdown. - If memory pressure suspected, trigger a heap histogram (fast) then a heap dump (slower; do it off-peak if huge).
2. Spikes when calling Oracle?
- Capture thread dumps (look for oracle.jdbc.
frames waiting on I/O).
- From DB side, pull AWR/ASH for the same time window.
3. Latency every few minutes? - Check GC log for STW pauses, mixed cycles, promotion failures, humongous alloc (G1).
4. Steady memory climb? - Heap dumps (before/after) → MAT dominator tree → GC roots.
---
1) Safe & repeatable data capture
1.1 Identify the JVM
bash
jcmd # lists JVMs
ps -ef | grep java # find the target PID
export PID=<java-pid>
1.2 Thread dumps (low overhead)
bash
jcmd $PID Thread.print > tdump_$(date +%F_%T).txt
or
jstack -l $PID > tdump_$(date +%F_%T).txt
Best practice: take 3 dumps 10 seconds apart to distinguish transient vs. persistent states.
1.3 Heap snapshot options
Fast first look (histogram):
bash
jcmd $PID GC.class_histogram > histo_$(date +%F_%T).txt
or
jmap -histo $PID > histo_$(date +%F_%T).txt
Full heap dump (can pause briefly, large files):
bash
jcmd $PID GC.heap_dump filename=/tmp/heap_$(date +%F_%T).hprof
or (Java 8):
jmap -dump:format=b,file=/tmp/heap_$(date +%F_%T).hprof $PID
1.4 GC logs
Java 9+:
bash
-Xlog:gc,gc+age=trace:file=/var/log/app/gc.log:time,uptime,level,tags
Java 8:
bash
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/log/app/gc.log -XX:+PrintTenuringDistribution
1.5 JFR (Java Flight Recorder)
bash
jcmd $PID JFR.start name=diag settings=profile filename=/tmp/jfr_$(date +%F_%T).jfr duration=120s
1.6 App & OS event logs
App logs (ERROR/WARN around incident).
Container/host logs:
dmesg,
journalctl,
sar,
vmstat,
iostat,
top.
Connection pool debug (timeouts, wait times).
1.7 Oracle DB
AWR report for the incident window.
ASH report (fine-grained timeline).
Top SQL by elapsed time / buffer gets / executions.
Wait event breakdown.
Optionally SQL Trace (10046) +
tkprof.
---
2) Analyzing thread dumps
What to look for
States:
RUNNABLE,
BLOCKED,
WAITING.
Hot methods at the top of many stacks.
Locks/monitors:
One monitor owner, many blocked threads → contention.
Deadlock section (jstack prints automatically).
JDBC:
oracle.jdbc.driver. or socket reads = DB latency.
Many threads stuck in
getConnection = DB pool starvation.
Rapid checklist
GC threads dominating? (
GC Thread,
G1Conc)
Many threads in
synchronized blocks?
Workers stuck on
CompletableFuture.join() /
CountDownLatch.await()?
N+1 query patterns?
---
3) Heap histogram & dump
Histogram first
Check retained size:
byte[],
char[],
HashMap,
ArrayList, caches.
Classloader leaks: many
Class/
ClassLoader.
Lots of duplicate
char[] (enable string deduplication).
Full dump (MAT/VisualVM)
1. Open
.hprof in Eclipse MAT → Leak Suspects.
2. Dominator Tree: sort by retained heap.
3. Leak patterns:
Unbounded caches.
Static maps holding sessions.
ThreadLocals not cleared.
Large arrays never freed.
Redeploy classloader leaks.
Memory pressure patterns
High allocation churn → optimize hot paths.
G1 humongous objects → tune
-XX:G1HeapRegionSize.
---
4) GC log analysis
Tools
GCViewer, GCeasy.
Key signals
STW duration & frequency.
Allocation rate.
Promotion failure.
CMS concurrent mode failure / G1 mixed GC storms.
Humongous allocations.
G1 quick mapping
Rising old gen during Mixed GCs → leak/retained set.
Long Remark/Cleanup → huge live set.
Frequent Full GCs → leak or undersized heap.
---
5) Application stack traces
Timeout storms (JDBC/socket).
OOM or StackOverflowError.
Retry loops, circuit breaker opens.
Profile hot methods with JFR.
---
6) Oracle DB angle
DB-side
AWR Top Events:
db file scattered read,
log file sync,
row lock contention,
cursor: mutex S.
Top SQL: elapsed time, gets.
ASH timeline: correlate with app pauses.
Temp/sort usage: spills to disk.
JVM-side
Threads waiting in JDBC + high DB waits = DB bottleneck.
Pool metrics: wait time, active vs max.
Fix:
Add indexes, avoid full scans.
Batch ops, fewer commits.
Resize pool only if DB has headroom.
Backpressure/timeouts.
Use SQL tracing +
EXPLAIN PLAN.
---
7) Symptom → Cause map
| Symptom | Likely Cause | Confirm | Fix |
| ---------------- | ---------------------------- | ------------------ | -------------------------------------- |
| Long GC pauses | Heap undersized / high alloc | GC log | Right-size, tune G1 |
| RSS growth → OOM | Leak | MAT dominator tree | Fix leak, evict |
| CPU pegged | Hot loop | JFR hotspot | Optimize/caching |
| BLOCKED threads | Lock contention | Thread dumps | Refactor locking |
| JDBC waits | DB bottleneck | Thread dumps + AWR | Index/tune, pool |
| Frequent Full GC | Old gen pressure | GC logs | Increase heap, reduce long-lived alloc |
| Metaspace OOM | CL leak | Heap dump | Fix redeploy pattern |
---
8) Minimal capture script
bash
#!/usr/bin/env bash
set -euo pipefail
PID="${1:-$(jcmd | awk '/java/{print $1; exit}')}"
STAMP=$(date +%F_%H%M%S) OUT="diag_$STAMP"; mkdir -p "$OUT"
echo "[] Thread dumps..." for i in 1 2 3; do jcmd $PID Thread.print > "$OUT/tdump_${i}.txt" sleep 10 done
echo "[] Heap histogram..." jcmd $PID GC.class_histogram > "$OUT/histo.txt"
echo "[] 120s JFR..." jcmd $PID JFR.start name=diag settings=profile filename="$OUT/prof.jfr" duration=120s
echo "[] JVM & OS stats..." jcmd $PID VM.flags > "$OUT/vm_flags.txt" jcmd $PID VM.system_properties > "$OUT/sysprops.txt" (top -b -n 1; free -m; vmstat 1 5; iostat -xz 1 5) > "$OUT/os_snapshot.txt"
echo "Copy GC + app logs into $OUT/"
echo "Done: $OUT"
---
9) Quick tuning
JVM (G1 starter flags)
bash
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:InitiatingHeapOccupancyPercent=30
-XX:+ParallelRefProcEnabled
-XX:+UseStringDeduplication
Connection pools (HikariCP)
ini
maximumPoolSize = match DB cores
connectionTimeout = 3s–10s
idleTimeout = 60s–300s
maxLifetime = 30m
Resilience
Per-call timeouts. Bulkheads & circuit breakers. Backpressure.
---
10) Example: 2–4s latency spikes
1. GC log: Mixed GC every ~2m, 1.8s STW.
2. Histogram: many byte[] from JSON; alloc ~1–2 GB/s.
3. JFR hotspot: Jackson serialization.
4. Fix: cache/precompute JSON → GCs <200ms, p99 normalized.
---
11) Example: Throughput tanks at peak
1. Thread dumps: many threads in oracle.jdbc. reads; others waiting in pool.
2. AWR:
db file sequential read, bad
SQL_ID.
3. Plan: full scan due to missing index.
4. Fix: composite index + gather stats → pool wait gone, CPU normal.
---
12) Tooling shortlist
JDK: jcmd,
jstack,
jmap,
jstat, JFR/JMC
Heap: Eclipse MAT, VisualVM
GC: GCViewer, GCeasy
Profiling: JFR, async-profiler
* DB: AWR, ASH, ADDM, tkprof, SQL Monitor,
EXPLAIN PLAN
---
Wrap-up
Follow the flow: thread dumps → GC logs → heap histograms/dumps → JFR → AWR/ASH. Correlate timestamps ruthlessly. Section 7’s table maps symptoms to root causes and fixes.
Would you like me to also prep this as a ready-to-drop Markdown blog post (with proper front matter YAML so your blog system will render it)?
`