Addressing Data Center Relocation Challenges Insights from H3C's bilibili Project

2025-03-06 6 min read
Topics:

    Bilibili is a popular platform that serves as a unique blend of social media and video-sharing site within the Chinese internet landscape. It offers a wide range of content, including diverse activities, lifestyle insights, gaming, entertainment, and technology knowledge. Users can engage with a variety of topics as well as participate in community-driven content through user-generated videos (PUGV) and professionally produced videos (OGV). The platform not only supports content consumption but also emphasizes commercial video production, allowing creators to reach their audiences directly and engage in monetization opportunities.

    After 18 months of work across multiple regions and the relocation of tens of thousands of servers and switching equipment, the bilibili data center has successfully completed its relocation project. The new data center features more advanced infrastructure and enhanced technical support. This upgrade will optimize the business layout, support overall remote multi-active operations, improve resource utilization and operational stability, and provide a better access service experience for bilibili users.

    Data center relocation is a complex systematic project that not only involves the relocation of servers, switches, routers, firewalls, and storage devices, but also requires consideration of data and business migration, network connection migration, computer room environment adjustment, and other aspects. Improper operations during relocation may lead to serious consequences such as equipment damage, data loss, and business interruption.

    To ensure the shortest downtime and zero impact and interruption to user services, bilibili collaborated with H3C to carefully plan the migration and relocation process. This partnership leveraged H3C's extensive experience in data center migrations and its strong team support. The project team devised a comprehensive emergency plan that addressed various scenarios, including unexpected changes in business operations, adjustments to data center policies, network outages under special circumstances, alterations in entry and exit procedures, and emergency repairs for transportation equipment in the data center. Each potential risk and issue was met with a clear and detailed emergency strategy, ensuring the smooth execution of the relocation while safeguarding data security and maintaining business continuity.

    During the 18-month multi-batch rolling migration, the project team effectively addressed various challenges, including complex scenarios, lengthy cycles, multiple coordinating parties, and difficult execution. The team executed the process step by step on-site.

    For instance, one batch of relocations involved over 1,700 devices, which required completing the entire process—from equipment removal to business re-launch—within one week. Team members dedicated themselves to the tasks at hand, which included shutting down businesses, backing up data, dismantling servers and switches, as well as transporting, installing, and shelving the equipment.

    All processes were carried out in an orderly manner, allowing the team to complete all relocation tasks on time. They achieved an impressive failure rate of less than 0.1% while ensuring a smooth business restart.

    With the national "dual carbon" strategic goal in mind, the new data center at the bilibili is focused on green energy conservation. The project incorporates the principles of a low-carbon economy along with energy conservation and emission reduction into its design and construction. Through careful layout planning, the use of advanced energy-saving equipment, and efficient operation and maintenance management, the overall Power Usage Effectiveness (PUE) of the computer room has been reduced from 1.5 to below 1.25. This improvement significantly lowers energy consumption and carbon emissions while enhancing the service level agreement (SLA) of the computer room.

    Additionally, the new data center employs state-of-the-art network equipment, which greatly improves network transmission efficiency and response times. The optimization of network topology and security measures has also considerably decreased the risk of network failures and downtime.

    It is noteworthy that H3C seized the opportunity to assist Bilibili in conducting a comprehensive management overhaul of the servers. This included replacing faulty hardware in batches, updating problematic firmware versions, standardizing host BMC/BIOS configurations, and aligning kernel versions and system environments. These steps were taken to ensure consistency across the system, simplify operation and maintenance management, and ultimately improve the operating efficiency and stability of the new computer room.

    You may also like

    PoE Technology: Powering Smarter Networks

    2025-03-20
    In today's fast-paced digital world, the demand for efficient, safe, and flexible power solutions for networked devices is higher than ever. Enter Power over Ethernet (PoE) technology, a game-changer that's revolutionizing how we power and connect our devices. From IP phones and wireless access points to security cameras and gateways, PoE products are making waves across various industries. But what exactly sets them apart? Let's dive into the key advantages that make PoE products a standout choice.

    What Are the Differences Between Ordinary Switches and Industrial Switches?

    2025-03-20
    As a key device for connecting network nodes, switches are widely used in various scenarios. However, different usage scenarios call for different types of switches. Thus, industrial switches, which are specifically designed for particular environments, have emerged in the market. These switches are distinct from ordinary ones in terms of environmental adaptability, communication protocol support, network management functions, and data transmission reliability.

    800G Silicon Photonic Switches: Revolutionizing AIGC with Unprecedented Speed and Efficiency

    2025-03-20
    In the ChatGPT boom, major global companies are actively embracing AIGC and releasing their own AI large-model products and applications. Public data shows that from GPT-1 to GPT-3, the number of model parameters has increased from 110 million to 175 billion. In order to prevent communication factors from becoming a shortcoming that restricts supercomputing, higher-speed network bandwidth and high-speed optical module transmission are required. Therefore, the ultra-high computing power support under large models may make 800G and 1.6T high-speed bandwidth the main demand for large-scale training in the future.

    H3C Tests 400G Interconnection Capabilities on Real Devices

    2025-03-20
    The H3C 400G data center switch and its 400G DSP & LPO optical module recently participated in an interoperability live demonstration. The H3C S9827-128DH high-density 400G intelligent computing switch will undergo joint testing with various modules from numerous optical module manufacturers. The on-site testing will focus on analyzing optical eye diagrams and bit error rates to verify the capability of H3C switches in utilizing DSP and LPO technologies. This initiative aims to enhance high-speed optical interconnection in the intelligent computing era and to explore new possibilities for the future.
    • Product Support Services
    • Technical Service Solutions
    All Services
    新华三官网