關於我們

質量為本、客戶為根、勇於拼搏、務實創新

< 返回新聞公共列表

關於2022年1月2日HK3區故障説明

發布時間:2022-01-05 16:09:29

尊敬的用戶,

我司運營的香港數數據中心在20221212點至1310分之間出現網絡全部中斷的嚴重故障,在此我們將該故障的情況做如下説明:

一、經過工程師排查此次故障發生的原因是,由於我們接入的國際運營商衆多,而且全部都是采用IPV4&IPV6雙棧運行,同時我們還有4種不同的BGP組合,導致路由表數倍于正常BGP的路由表,核心路由設備内存資源消耗居高不下,最終導致核心路由設備不堪重負在1212點左右完全宕機,經過工程師的緊急處理,核心路由設備最終在121310分左右完全恢復正常,本次故障導致所有網絡業務中斷70分鐘。

二、針對此次故障,我們早有預估,也一直在積極推進新上設備來緩解核心路由器的壓力,原本就計劃在12日上午9點至21點之間做擴容割接操作。經過現場工程師與遠程工程師的全力配合,我們也于2日晚上23點完成所有擴容操作,現在整個網絡架構都是采用的雙冗餘方案,從路由器到接入網核心交換機都是雙機冗餘方案。相信在以後的日子裏不會再出現這種極端的網絡故障,衹有一種可能就是整個網絡核心機櫃掉電,但是這種概率比設備故障的概率要低得多。同時我們也在將來會推進多數據中心冗餘的方案,我們沒辦法將將SLA做到100%,但是我們將竭盡所能去做到無限接近100%

三、在此我們對於此次故障帶給大家的困擾,深表歉意,再次對所有支持與關心我們的朋友表示衷心的感謝!


Respected User,

        The Hong Kong Digital Data Center operated by our company experienced a serious failure of all the network interruption between 12:10 and 13:10 on January 2, 2022. Here we explain the failure situation as follows:

1. After the engineers checked the reason for this failure, because we have access to many international operators, and all of them use IPV4&IPV6 dual-stack operation, and we also have 4 different BGP combinations, resulting in multiple times the routing table In the normal BGP routing table, the memory resource consumption of the core routing equipment remained high, which eventually caused the core routing equipment to be overwhelmed. The core routing equipment was completely down at about 12 o'clock on January 2. It was fully restored to normal around 13:10, and this failure caused all network services to be interrupted for 70 minutes.

2. In response to this failure, we have already estimated and have been actively promoting new equipment to relieve the pressure on the core routers. We originally planned to expand and cut over from 9 am to 21 am on January 2. With the full cooperation of on-site engineers and remote engineers, we also completed all expansion operations at 23:00 on the 2nd. Now the entire network architecture adopts a dual redundancy scheme, from routers to access network core switches. plan. I believe that such extreme network failures will not occur again in the future. There is only one possibility that the entire network core cabinet is powered down, but this probability is much lower than the probability of equipment failure. At the same time, we will also promote multiple data center redundancy programs in the future. We cannot achieve 100% SLA, but we will do our best to achieve infinitely close to 100%.

3. Here we apologize for the trouble caused by this fault, and once again express our heartfelt thanks to all friends who support and care about us!







/../template/Home/newBGP/PC/Static