互联网技术 / 互联网资讯 · 2023年12月23日

Hbase集群崩溃的一次惊险经历

这是以前的一次hbase集群异常事故,由于不规范操作,集群无法启动,在腾讯云大佬的帮助下,花了一个周末才修好,真的是一次难忘的回忆。

版本信息

cdh-6.0.1 hadoop-3.0 hbase-2.0.0

问题

想在空闲时候重启一下hbase释放一下内存,顺便修改一下yaRn的一些配置,结果停掉后,hbase起不来了,错误信息就是hbase:naMespace表is not Online,Master一直初始化,具体错误信息:

15:41:59.313 [ProcExecTiMeout] WARN org.Apache.hadoop.hbase.Master.aSSignMent.ASSignMentManageR – STUCK Region-In-TRansITion RIT=opening, location=node4,16020,1589648302672, table=Real_tiMe_data, Region=74cac15d22e99800ad0ACE14c9ed74d6 15:41:59.313 [ProcExecTiMeout] WARN org.Apache.hadoop.hbase.Master.aSSignMent.ASSignMentManageR – STUCK Region-In-TRansITion RIT=opening, location=node3,16020,1596598630022, table=Real_tiMe_data, Region=8e68891d5826c09974d81ad5d705c3b6 15:41:59.313 [ProcExecTiMeout] WARN org.Apache.hadoop.hbase.Master.aSSignMent.ASSignMentManageR – STUCK Region-In-TRansITion RIT=opening, location=node3,16020,1596598630022, table=Real_tiMe_data, Region=75c42d75e2556bf70FF527f2425e8509 15:41:59.313 [ProcExecTiMeout] WARN org.Apache.hadoop.hbase.Master.aSSignMent.ASSignMentManageR – STUCK Region-In-TRansITion RIT=opening, location=node3,16020,1596598630022, table=Real_tiMe_data, Region=2eee04869ac2c35984d4d22e6e9f2f31 15:42:08.264 [Master/node3:16000] INFO org.Apache.hadoop.hbase.client.RPCRetryingCalleRIMpl – Call exception, tRies=15, RetRies=15, staRted=128887 Ms ago, cancelled=FAlse, MSG=oRg.Apache.hadoop.hbase.NotSeRvingRegionException: hbase:naMespace,,1558205786137.40562c48c9210c06813adce48773cb6a. is not Online on node1,16020,1596957741742 at oRg.Apache.hadoop.hbase.RegionseRveR.HRegionSeRveR.getRegionByEncodedNaMe(HRegionSeRveR.java:3273) at oRg.Apache.hadoop.hbase.RegionseRveR.HRegionSeRveR.getRegion(HRegionSeRveR.java:3250) at oRg.Apache.hadoop.hbase.RegionseRveR.RSRPCSeRvices.getRegion(RSRPCSeRvices.java:1414) at oRg.Apache.hadoop.hbase.RegionseRveR.RSRPCSeRvices.get(RSRPCSeRvices.java:2446) at oRg.Apache.hadoop.hbase.shaded.Protobuf.geneRated.clientProtos$clientSeRvice$2.callBlockingmethod(clientProtos.java:41998) at oRg.Apache.hadoop.hbase.iPC.RPCSeRveR.call(RPCSeRveR.java:409) at oRg.Apache.hadoop.hbase.iPC.CallRunneR.Run(CallRunneR.java:131) at oRg.Apache.hadoop.hbase.iPC.RPCExecuTor$HandleR.Run(RPCExecuTor.java:324) at oRg.Apache.hadoop.hbase.iPC.RPCExecuTor$HandleR.Run(RPCExecuTor.java:304) , details=Row ”deFAult” on table ”hbase:naMespace” at Region=hbase:naMespace,,1558205786137.40562c48c9210c06813adce48773cb6a., hostnaMe=node1,16020,1589648239142, seqNuM=55 … … 15:44:58.229 [qtp1792826268-435] WARN oRg.eclIPse.jetty.seRvlet.SeRvletHandleR – /Master-statUS oRg.Apache.hadoop.hbase.PleaseHoldException: Master is inITializing at oRg.Apache.hadoop.hbase.Master.HMaster.isInMAIntenanceMode(HMaster.java:2827) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.Apache.hadoop.hbase.tMpl.Master.MasterStatUSTMplIMpl.RendeRNoFlUSh(MasterStatUSTMplIMpl.java:271) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.Apache.hadoop.hbase.tMpl.Master.MasterStatUSTMpl.RendeRNoFlUSh(MasterStatUSTMpl.java:389) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.Apache.hadoop.hbase.tMpl.Master.MasterStatUSTMpl.RendeR(MasterStatUSTMpl.java:380) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.Apache.hadoop.hbase.Master.MasterStatuSSeRvlet.doGet(MasterStatuSSeRvlet.java:81) ~[hbase-seRveR-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at javax.seRvlet.http.httpseRvlet.seRvice(httpseRvlet.java:687) ~[javax.seRvlet-API-3.1.0.jaR:3.1.0] at javax.seRvlet.http.httpseRvlet.seRvice(httpseRvlet.java:790) ~[javax.seRvlet-API-3.1.0.jaR:3.1.0] at oRg.eclIPse.jetty.seRvlet.SeRvletHoldeR.handle(SeRvletHoldeR.java:848) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1772) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.lib.StaticUserWebFilteR$StaticUserFilteR.doFilteR(StaticUserWebFilteR.java:112) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.ClickjackingPReventionFilteR.doFilteR(ClickjackingPReventionFilteR.java:48) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.httpseRveR$QuotingInputFilteR.doFilteR(httpseRveR.java:1374) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.NoCacheFilteR.doFilteR(NoCacheFilteR.java:49) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.Apache.hadoop.hbase.http.NoCacheFilteR.doFilteR(NoCacheFilteR.java:49) ~[hbase-http-2.0.0.3.0.0.0-1634.jaR:2.0.0.3.0.0.0-1634] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR$CachedChAIn.doFilteR(SeRvletHandleR.java:1759) ~[jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR.doHandle(SeRvletHandleR.java:582) [jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.ScopedHandleR.handle(ScopedHandleR.java:143) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.security.securityHandleR.handle(securityHandleR.java:548) [jetty-security-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.seSSion.SeSSionHandleR.doHandle(SeSSionHandleR.java:226) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.contextHandleR.doHandle(contextHandleR.java:1180) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRvlet.SeRvletHandleR.doScope(SeRvletHandleR.java:512) [jetty-seRvlet-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.seSSion.SeSSionHandleR.doScope(SeSSionHandleR.java:185) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.contextHandleR.doScope(contextHandleR.java:1112) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.ScopedHandleR.handle(ScopedHandleR.java:141) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.HandleRCollection.handle(HandleRCollection.java:119) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.handleR.HandleRWRappeR.handle(HandleRWRappeR.java:134) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.SeRveR.handle(SeRveR.java:534) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.HttPChannel.handle(HttPChannel.java:320) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.seRveR.HttPConnection.onFillable(HttPConnection.java:251) [jetty-seRveR-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.io.AbstRactconnection$ReadCallback.sUCceeded(AbstRactconnection.java:283) [jetty-io-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.io.FillInteRest.fillable(FillInteRest.java:108) [jetty-io-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.io.SelectChannelEndPoint$2.Run(SelectChannelEndPoint.java:93) [jetty-io-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.stRategy.ExecuteProdUCeConsuMe.executeProdUCeConsuMe(ExecuteProdUCeConsuMe.java:303) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.stRategy.ExecuteProdUCeConsuMe.ProdUCeConsuMe(ExecuteProdUCeConsuMe.java:148) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.stRategy.ExecuteProdUCeConsuMe.Run(ExecuteProdUCeConsuMe.java:136) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.QueuedThReadPool.RunJob(QueuedThReadPool.java:671) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at oRg.eclIPse.jetty.util.thRead.QueuedThReadPool$2.Run(QueuedThReadPool.java:589) [jetty-util-9.3.19.v20170502.jaR:9.3.19.v20170502] at java.lang.ThRead.Run(ThRead.java:745) [?:1.8.0_121]

常规操作

到这里,我尝试使用hbck命令查看详情并修复,发现hbase2.0.0版本hbck已经废弃了修复的命令。

然后,查阅资料看到了hbck2,官方地址:https://Github.coM/Apache/hbase-opeRaTor-Tools/tRee/Master/hbase-hbck2, 这个工具,本来以为抓住了救命的稻草,结果:

wtM,服了。hbase2.0.0 ~ 2.0.2以及hbase2.1.0 ~ 2.1.0是不适用的,既不能使用hbck,也不能使用hbck2,这里出现了断层。

解决办法

1. 修复Master,让集群正常启动

由于目前Master无法初始化

OpenMagic API

Need more than content? Move into the product flow.

If you are here for model access, pricing, developer docs, or the future API console, the dedicated product path now lives on api.openmagic.ai.

登录免费注册