起因
最近排查一个历史老问题,历史问题就是大家都知道,但是没人愿意处理。
线上节点 16 和 32 G 内存频繁的出现GC,并且由于 GC 还导至两个问题,1.CPU使用率在GC时变高,2.业务出现异常。
这个问题变得无法忍耐了,跟踪程序线程使用,CMS GC 线程CPU 使用在初始化时都 11-99% 之间,严重影响业务处理。
通过 jstat
观察,FullGC 不多,但是 YoungGC 非常频繁,YoungGC 多不影响服务性能吗?也影响。
计算一个服务GC的好坏,不是看 FullGC 多少,而是看整体服务的吞吐量。
1 2 3 4 5 6 7 8 9 10 11 12 13
| S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 68096.0 68096.0 16840.1 0.0 545344.0 115481.1 6658496.0 4835834.6 63280.0 60712.9 7084.0 6615.1 1930261 53981.101 4804 1248.371 55229.471 68096.0 68096.0 0.0 17347.7 545344.0 12072.1 6658496.0 4836565.4 63280.0 60712.9 7084.0 6615.1 1930262 53981.125 4804 1248.371 55229.496 68096.0 68096.0 0.0 12854.1 545344.0 252246.5 6658496.0 4838818.8 63280.0 60712.9 7084.0 6615.1 1930264 53981.168 4804 1248.371 55229.539 68096.0 68096.0 19501.0 0.0 545344.0 187933.5 6658496.0 4838869.4 63280.0 60712.9 7084.0 6615.1 1930265 53981.189 4804 1248.371 55229.559 68096.0 68096.0 19501.0 0.0 545344.0 445775.5 6658496.0 4838869.4 63280.0 60712.9 7084.0 6615.1 1930265 53981.189 4804 1248.371 55229.559 68096.0 68096.0 11858.3 0.0 545344.0 515043.9 6658496.0 4845614.9 63280.0 60712.9 7084.0 6615.1 1930267 53981.239 4804 1248.371 55229.610 68096.0 68096.0 0.0 16469.0 545344.0 361227.7 6658496.0 4846356.5 63280.0 60712.9 7084.0 6615.1 1930268 53981.263 4804 1248.371 55229.634 68096.0 68096.0 0.0 20075.6 545344.0 226522.4 6658496.0 4847100.6 63280.0 60712.9 7084.0 6615.1 1930270 53981.316 4804 1248.371 55229.687 68096.0 68096.0 19591.2 0.0 545344.0 325872.6 6658496.0 4852312.8 63280.0 60712.9 7084.0 6615.1 1930271 53981.344 4804 1248.371 55229.715 68096.0 68096.0 0.0 19282.2 545344.0 206195.9 6658496.0 4853497.6 63280.0 60712.9 7084.0 6615.1 1930272 53981.368 4804 1248.371 55229.739 68096.0 68096.0 0.0 16319.6 545344.0 520305.2 6658496.0 4858184.1 63280.0 60712.9 7084.0 6615.1 1930274 53981.414 4804 1248.371 55229.785 68096.0 68096.0 19240.3 0.0 545344.0 349365.5 6658496.0 4859016.3 63280.0 60712.9 7084.0 6615.1 1930275 53981.445 4804 1248.371 55229.815
|
计算吞吐量
排查
先排除环境原因,从开发环境到线上环境发现情况是相同的,可以初步认定是一个普遍问题。
定位 16G 内存服务情况
先从 16G 内存开始,收集一些基本信息,再观察运行情况。
1 2 3 4 5 6 7 8 9 10 11
| java -Xms9g -Xmx9g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -Xloggc:./gc.log -XX:+PrintGCDateStamps -XX:+CMSParallelRemarkEnabled -XX:ReservedCodeCacheSize=256m -XX:+CMSScavengeBeforeRemark -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=1g -jar Node.jar -c config.conf
|
从这上面能看到关于 CMS 关相的几个参数。
1 2 3
| -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
|
内存使用情况
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 9663676416 (9216.0MB) NewSize = 697892864 (665.5625MB) MaxNewSize = 697892864 (665.5625MB) OldSize = 8965783552 (8550.4375MB) NewRatio = 2 SurvivorRatio = 8 MetaspaceSize = 21807104 (20.796875MB) CompressedClassSpaceSize = 1073741824 (1024.0MB) MaxMetaspaceSize = 17592186044415 MB G1HeapRegionSize = 0 (0.0MB)
Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 628162560 (599.0625MB) used = 511774816 (488.0664978027344MB) free = 116387744 (110.99600219726562MB) 81.47171585648148% used Eden Space: capacity = 558432256 (532.5625MB) used = 487164440 (464.5962142944336MB) free = 71267816 (67.9662857055664MB) 87.23787617311275% used From Space: capacity = 69730304 (66.5MB) used = 24610376 (23.47028350830078MB) free = 45119928 (43.02971649169922MB) 35.29365941097862% used To Space: capacity = 69730304 (66.5MB) used = 0 (0.0MB) free = 69730304 (66.5MB) 0.0% used concurrent mark-sweep generation: capacity = 8965783552 (8550.4375MB) used = 4737287088 (4517.829025268555MB) free = 4228496464 (4032.6084747314453MB) 52.837401890471156% used
20983 interned Strings occupying 2039024 bytes.
|
这里看出一个问题
1 2 3 4 5 6 7 8
| Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 9663676416 (9216.0MB) NewSize = 697892864 (665.5625MB) MaxNewSize = 697892864 (665.5625MB) OldSize = 8965783552 (8550.4375MB) NewRatio = 2
|
NewRatio=2 但是实际新生代获得的内存空间只有 665.5625MB,这就有点奇怪了,不是说好的老年代:新生代 2 : 1 吗。
这是一个问题,另外从当前的 GC 回收时间推算出,当前吞吐量为: 96%,还没有达到最优的情况。
修复新生代比例大小
添加启动参数 -XX:NewRatio=2,完整启动命令:
1 2 3 4 5 6 7 8 9 10 11 12
| java -Xms9g -Xmx9g -XX:NewRatio=2 -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -Xloggc:./gc.log -XX:+PrintGCDateStamps -XX:+CMSParallelRemarkEnabled -XX:ReservedCodeCacheSize=256m -XX:+CMSScavengeBeforeRemark -XX:+HeapDumpOnOutOfMemoryError -XX:MaxDirectMemorySize=1g -jar Node.jar -c config.conf
|
通过 jstat -gc pid 1000
观察一下 GC 情况
添加 -XX:NewRatio=2
启动参数的机器情况
jmap -heap 查看一下调整后的新生代占比,现在 NewSize=3072.0MB
OldSize=6144.0MB
这时比例生效了。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
| Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 9663676416 (9216.0MB) NewSize = 3221225472 (3072.0MB) MaxNewSize = 3221225472 (3072.0MB) OldSize = 6442450944 (6144.0MB) NewRatio = 2 SurvivorRatio = 8 MetaspaceSize = 21807104 (20.796875MB) CompressedClassSpaceSize = 1073741824 (1024.0MB) MaxMetaspaceSize = 17592186044415 MB G1HeapRegionSize = 0 (0.0MB)
Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 2899116032 (2764.8125MB) used = 2516795744 (2400.2034606933594MB) free = 382320288 (364.6090393066406MB) 86.81252203154317% used Eden Space: capacity = 2577006592 (2457.625MB) used = 2194686304 (2093.0159606933594MB) free = 382320288 (364.6090393066406MB) 85.16417112836008% used From Space: capacity = 322109440 (307.1875MB) used = 322109440 (307.1875MB) free = 0 (0.0MB) 100.0% used To Space: capacity = 322109440 (307.1875MB) used = 0 (0.0MB) free = 322109440 (307.1875MB) 0.0% used concurrent mark-sweep generation: capacity = 6442450944 (6144.0MB) used = 525052336 (500.7289276123047MB) free = 5917398608 (5643.271072387695MB) 8.149884889523188% used
13727 interned Strings occupying 1224528 bytes.
|
1 2 3 4 5 6 7 8 9 10 11 12
| S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 314560.0 314560.0 0.0 85539.9 2516608.0 1975197.4 6291456.0 4613067.7 63012.0 59752.3 7024.0 6520.3 9317 494.820 222 22.566 517.386 314560.0 314560.0 0.0 85539.9 2516608.0 2228635.2 6291456.0 4613067.7 63012.0 59752.3 7024.0 6520.3 9317 494.820 222 22.566 517.386 314560.0 314560.0 87689.1 0.0 2516608.0 681984.2 6291456.0 4621812.0 63012.0 59752.3 7024.0 6520.3 9318 494.875 222 22.566 517.441 314560.0 314560.0 87689.1 0.0 2516608.0 1965292.1 6291456.0 4621812.0 63012.0 59752.3 7024.0 6520.3 9318 494.875 222 22.566 517.441 314560.0 314560.0 0.0 95789.9 2516608.0 145469.7 6291456.0 4629480.3 63012.0 59752.3 7024.0 6520.3 9319 494.928 222 22.566 517.494 314560.0 314560.0 0.0 95789.9 2516608.0 1741481.8 6291456.0 4629480.3 63012.0 59752.3 7024.0 6520.3 9319 494.928 222 22.566 517.494 314560.0 314560.0 0.0 95789.9 2516608.0 1978876.4 6291456.0 4629480.3 63012.0 59752.3 7024.0 6520.3 9319 494.928 222 22.566 517.494 314560.0 314560.0 74735.8 0.0 2516608.0 306828.0 6291456.0 4638677.5 63012.0 59752.3 7024.0 6520.3 9320 494.982 222 22.566 517.549 314560.0 314560.0 74735.8 0.0 2516608.0 1513196.3 6291456.0 4638677.5 63012.0 59752.3 7024.0 6520.3 9320 494.982 222 22.566 517.549 314560.0 314560.0 74735.8 0.0 2516608.0 1941220.4 6291456.0 4638677.5 63012.0 59752.3 7024.0 6520.3 9320 494.982 222 22.566 517.549 314560.0 314560.0 74735.8 0.0 2516608.0 2415940.5 6291456.0 4638677.5 63012.0 59752.3 7024.0 6520.3 9320 494.982 222 22.566 517.549
|
对比存在问题的情况
1 2 3 4 5 6 7 8 9 10 11 12
| S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 68096.0 68096.0 0.0 16853.2 545344.0 371165.3 8755648.0 5890791.6 62312.0 59746.5 7076.0 6608.5 43879 1312.752 60 5.206 1317.958 68096.0 68096.0 19377.9 0.0 545344.0 420827.3 8755648.0 5891528.9 62312.0 59746.5 7076.0 6608.5 43880 1312.785 60 5.206 1317.991 68096.0 68096.0 0.0 22147.2 545344.0 295435.0 8755648.0 5891577.4 62312.0 59746.5 7076.0 6608.5 43881 1312.816 60 5.206 1318.022 68096.0 68096.0 21257.5 0.0 545344.0 461020.2 8755648.0 5898299.9 62312.0 59746.5 7076.0 6608.5 43882 1312.850 60 5.206 1318.056 68096.0 68096.0 0.0 13895.9 545344.0 419534.8 8755648.0 5901746.3 62312.0 59746.5 7076.0 6608.5 43885 1312.961 60 5.206 1318.167 68096.0 68096.0 0.0 17542.7 545344.0 63902.2 8755648.0 5906856.1 62312.0 59746.5 7076.0 6608.5 43887 1313.028 60 5.206 1318.234 68096.0 68096.0 0.0 17542.7 545344.0 444878.0 8755648.0 5906856.1 62312.0 59746.5 7076.0 6608.5 43887 1313.028 60 5.206 1318.234 68096.0 68096.0 15241.8 0.0 545344.0 403475.0 8755648.0 5909706.0 62312.0 59746.5 7076.0 6608.5 43890 1313.108 60 5.206 1318.314 68096.0 68096.0 24031.2 0.0 545344.0 52019.0 8755648.0 5916571.4 62312.0 59746.5 7076.0 6608.5 43892 1313.174 60 5.206 1318.380 68096.0 68096.0 17381.4 0.0 545344.0 349457.1 8755648.0 5919082.7 62312.0 59746.5 7076.0 6608.5 43894 1313.232 60 5.206 1318.438 68096.0 68096.0 0.0 20887.7 545344.0 442091.1 8755648.0 5919151.3 62312.0 59746.5 7076.0 6608.5 43895 1313.261 60 5.206 1318.467
|
这两组数据能看出什么?
YGC : 新生代垃圾回收次数
YGCT : 新生代垃圾回收时间
FGC : 老年代垃圾回收次数
FGCT : 老年代垃圾回收时间
GCT : 垃圾回收总消耗时间
对比两组GC回收时间数据
- YGC
9320 : 43895
,次数减少 4.7倍
- YGCT
494.982 : 1313.261
, 减少 2.6倍
- GCT
517.549: 1318.467
,减少 2.5 倍
- FGC
222 : 60
却增加了 3.7 倍的次数
这有点出乎意料,如果新生代空间够大之后,对象有足够的新生代空间可以分配对象,保证尽可能多的对象在新生代中被回收,熬不到老年代。