1. Symptoms
Over the weekend, a well-known systems integrator reported a problem at a new bank building they were working on. During the network acceptance test, they couldn’t achieve the required 40% margin specified in the contract. After multiple checks, the cause remained unknown. The entire system used Category 5e cabling, and all other issues had been resolved. Only the server acceptance test remained, and the report indicated that all results were unsatisfactory. The final deadline for the project’s acceptance is next Wednesday, and failure to resolve the issue by Tuesday would affect practical usage for the end-users and tarnish the integrator’s reputation.
The systems integrator is responsible for overall system integration, while a reputable specialist cabling company is responsible for the cabling system, which has undergone Category 5e certification tests. The network acceptance test involves testing the connectivity and channel capacity of key devices, including servers. The contract requires that servers allocate a 40% available margin. The test involves adding 60% background traffic to servers and then testing connectivity speed. For optimal results, a Ping test should show less than 2ms throughout the network, and downloading a 20MB file should take less than 10 seconds. During the actual test, the Ping test value was 5ms, and the download speed with 60% background traffic was 80 seconds. This gave the subjective impression of slow server access, with an unknown cause. When reducing background traffic to 15%, the test results met the required parameters. The network asked for our help to identify the issue.
2. Diagnostic Process
Slow server channel test speed can have various causes, including network configuration errors, mismatched network card driver versions, poor or conflicting network card protocol binding, incorrect or faulty network equipment settings (e.g., gateways, bridges, switches, routers), excessive secondary garbage, interference signals entering the system, incorrect system platform settings, poor application system program design optimization, platform and terminal device mismatch, and mismatched server and network protocols. We need to determine the specific cause of the problem. Generally, fault localization can begin with connectivity and protocol compatibility, as it is relatively simple and quick.
According to the engineering team, the platform has been installed three times, and network settings and network card driver adjustments have been made multiple times. Given that the network Ping test passes, the issue is likely due to poor server-to-network protocol compatibility. We connected a network tester to the network and repeated the aforementioned test. It verified the previous test data as largely accurate. The problem is that almost all servers experienced similar issues, so we had to look for common parameters. First, we disconnected the server from the network, randomly selected four out of 14 servers, and conducted an “expert-level” test on the NIC interfaces using the network tester. All results were normal. Then, we observed the network’s operational parameters and protocols, and everything appeared to be normal. This indicated that the network and server network settings, protocol settings, physical operational parameters, protocol compatibility, etc., were essentially qualified. However, network performance problems often only become apparent when there is a high load (1%) on the network, and many of these problems manifest under high traffic conditions. Therefore, we conducted the following test: we selected any one server link, simulated traffic to the server closest to the switch port, monitored channel data with a network fault locator or network cable tester. When simulating a link traffic load of 3%, the collision rate for the selected link exceeded the 5% health threshold. At 40% traffic, the collision rate reached 98%, and at 60% traffic, the collision rate was 99.8%. Obviously, there was a significant problem with the network’s link performance, which had little to do with network equipment settings.
We inquired with the engineers, who insisted that the cabling system had undergone rigorous Category 5e testing, and the cabling company had confidently assured them that there would be no problems with the links. Reviewing the cabling system’s certification test report, all BasicLink Category 5e cable tests passed. The servers were installed and debugged by a distributor designated by the server supplier, who claimed to have installed hundreds of servers and never encountered a similar issue.
Everyone seemed to have valid arguments, but it was clear that there was an issue with the links. Therefore, we decided to retest the links on-site. Testing the randomly sampled links revealed that they were all unsatisfactory, and the cable tester indicated “wiring error.” Additionally, when we initiated the HDTDX analysis feature on the cable tester, it pinpointed the poor crosstalk within the 2-3 meters of the far end of the entire link. To determine liability, we retested the BasicLink tests for horizontal cabling, and they all passed. This indicated that the cabling company’s construction parameters were indeed qualified, and the issue was most likely with the server installation service provider. We tried changing the server link jumpers, and the fault immediately disappeared. Subsequently, we replaced all server jumpers and re-verified the network; all parameters passed.
3. Diagnostic Recommendations
The server installation service provider erroneously used Category 6 cables to create Category 5e jumpers, enabling servers that shouldn’t have had network access to connect, albeit with significantly high collision rates. In general, jumpers made with Category 6 cables will perform better than those made with Category 5e cables. Therefore, it is recommended that the user keep the Category 6 cable-made Category 5e link jumpers but correct the wiring sequence.
4. Afterword
The systems integrator called back to inform us that they had ultimately not kept the original server jumpers but had replaced them all with qualified Category 5e jumpers. They decided to do so because they didn’t have their cable testers, which made them passive in this situation. Today, they specifically equipped themselves with a complete set of cabling system certification testing tools and network acceptance testing tools, hoping to be free from worries and better perform network performance testing and maintenance, facilitating network acceptance tests.