Disclaimer: Matrox Electronic Systems Ltd. reserves the right to make changes in
these specifications at any time and without notice. The information provided by this
document is believed to be accurate and reliable at the time of its publication. However, no responsibility is assumed by Matrox Electronic Systems Ltd. for its use; nor
for any infringements of patents or other rights of third parties resulting from its use.
No license is granted under any patents or patent rights of Matrox Electronic Systems
Ltd. by the ownership of this document.
The Matrox NS-FNIC/4 is a multiple port Network Interface Card (NIC). The name
stands for Network Server Fast NIC with four ports. It is designed to support 10BaseT
and 100BaseTx technology in both half and full duplex modes.
The Matrox NS-FNIC/4 allows four subnets to be connected to the same server while
using less server resources (PCI slot usage, power consumption, etc.) than four 1 port
NICs and giving the same, if not better, performance. Since the Matrox NS-FNIC/4
uses only one PCI slot per four subnets, you can add more subnets to the same server
by adding more Matrox NS-FNIC/4 cards. The Matrox NS-FNIC/4 can also be used
as part of a software router.
This paper will describe the performance numbers obtained for the Matrox
NS-FNIC/4 under RedHat Linux 6.2® with the Linux Kernel version 2.2.16.
The benchmarking utilities that were used to obtain the performance numbers for this
paper are:
•NetPerf v2.1pl3 – available at http://www.netperf.org
.
NetPerf is a benchmark utility that can be used to measure the performance of
many different types of networking equipment. It provides tests for both
unidirectional throughput and end-to-end latency. Te s t #1 used NetPerf to obtain
a throughput and a server CPU utilization measurement for each port, while the
other ports remained idle.
TTCP times the transmission and reception of data between two systems using the
UDP or TCP protocols. It differs from common “blast” tests, which tend to
measure the remote client as much as the network performance and which usually
do not allow measurements at the remote end of a UDP transmission. The TTCP
utility only measured throughput. These tests were done using different packets
and file sizes, whenever file transfers were involved.
SmartBits is a hardware device that is the industry standard for network
performance analysis for 10/100 Ethernet and TCP/IP communications.
•Top - Top is a plain text process manager that displays relevant information on all
processes that are active on the machine. The information that can be obtained
includes per-process CPU utilization and total user and system CPU utilization.
Since this information was constantly changing, only an approximation, from
measured values, could be given as to its overall performance. Running the utility
did not add errors to our values because its CPU utilization was also reported. For
all the tests, “Top” consumed 1.9% on average.
•Packet Sniffer - A Packet Sniffer is a network monitoring tool that captures data
packets and decodes them using built-in knowledge of common protocols. Sniffers
are used to debug and monitor networking problems.
TTTT
1:
1: S
AAAABBBBLLLLEEEE
Server
ServerClient 1
ServerServer
1 Matrox
NS-FNIC/4
700MHz
Pentium III
AGP Matrox G200
128MB of RAM64MB of RAM64MB of RAM128MB of RAM
20GB Hard Disk drive 1GB Hard Disk drive1GB Hard Disk drive 20GB Hard Disk drive
The first test used the NetPerf benchmarking utility to obtain the throughput as well as
the server CPU utilization. A packet size of 8 kilobytes (8192 bytes) was used
throughout the test. The test bench was set up as shown in Figure 1, below.
Figure 1:
Figure 1: T
Figure 1:Figure 1:
Linux PC with a 10/
100 Ethernet Card
Linux PC with a 10/
100 Ethernet Card
HROUGHPUT AND SERVER
Subnet 1
CPU U
TILIZATION
Linux PC with a
Matrox NS-FNIC/4
Subnet 2Subnet 3
Linux PC with a 10/
100 Ethernet Card
- B
ENCH CONFIGURATION
Subnet 4
Linux PC with a 10/
100 Ethernet Card
For this test, the clients were all 166MHz Pentiums. The results for throughput and
server CPU utilization are summarized in Ta ble 2. The throughput is in megabits per
second (1 million bits per second) and the CPU utilization is in percentage.
The second test used the same test bench as Tes t # 1 (see Figure 1). The purpose of
this test was to verify the throughput measurements of Test #1 with a second
independent application. The only limitation of this test is that no CPU utilization
measurements are provided. TTCP gives a throughput measurement by sending 16
megabytes (16,777,216 bytes) in packets of 8 kilobytes (8192 bytes) to the server. It
then divides the total transfer size by the time it took and converts this result to Mbits/s.
The throughput obtained, summarized in Tab le 3, is similar to that obtained from
Test #1. The major difference between Test #1 and Tes t # 2 occurs in the method in
which the table results were computed. The results from Test #1 are averages
computed by the utility itself after automatically performing the test many times.
TTCP, however, only performs one data transfer and then displays the computed
result. In order to get a reliable value, ten such tests were performed on each port and
arithmetic mean
the
of those ten results was computed. Tab le 3 displays only the
average results that were computed.
AAAABBBBLLLLEEEE
TTTT
10Mbits/s
10Mbits/s
10Mbits/s 10Mbits/s
Half duplex
Half duplex
Half duplexHalf duplex
10Mbits/s
10Mbits/s
10Mbits/s 10Mbits/s
Full duplex
Full duplex
Full duplexFull duplex
100Mbits/s
100Mbits/s
100Mbits/s 100Mbits/s
Half duplex
Half duplex
Half duplexHalf duplex
100Mbits/s
100Mbits/s
100Mbits/s 100Mbits/s
Full duplex
Full duplex
Full duplexFull duplex
VALIDATION
3:
3:
3: 3:
HROUGHPUT
OF T
Port 1
Port 1
Port 1 Port 1
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
6.496.516.546.53
8.948.948.948.94
62.3461.5662.2962.05
89.2489.1489.2688.51
- TTCP B
Port 2
Port 2
Port 2 Port 2
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
ENCHMARK RESULTS
Port 3
Port 3
Port 3Port 3
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
At 100Mbits/s full duplex, the results were identical at 89Mbits/s in both Tes t #1 (as
shown in Table 2 ) and Test # 2 (as shown in Table 3, above).
At 100Mbits/s half duplex, the results were similar in both Test #1 (as shown in
Tab le 2 at 66Mbits/s) and Te s t #2 (as shown in Ta ble 3 , above at 62Mbits/s).
At the end of Tes t # 2 , it is clear that the four ports behave in a similar fashion and give
similar throughput.
The third test also used the TTCP benchmarking utility but this time, it was used to
perform different total transfer sizes with different packet sizes. The total transfer sizes
were in fact files that were to be transferred by the software. The file sizes were 1MB,
5MB and 10MB. The packet sizes were 1KB (1024 bytes), 4KB (4096 bytes) and 8KB
(8192 bytes). The test bench was set up as shown in Figure 2, below.
Figure 2:
Figure 2: T
Figure 2:Figure 2:
Linux PC with a 10/
100 Ethernet Card
Linux PC with a 10/
100 Ethernet Card
HROUGHPUT BY TRANSFER AND PACKET SIZE
Linux PC with a
Matrox NS-FNIC/4
Subnet 1
Subnet 2Subnet 3
Linux PC with a 10/
100 Ethernet Card
- B
TCCP
Application
ENCH CONFIGURATION
Subnet 4
Linux PC with a 10/
100 Ethernet Card
Tab le 4 only shows the results for a packet size of 8 KB and a file size of 10MB for
each port
10Mbits/s
10Mbits/s
10Mbits/s 10Mbits/s
Half duplex
Half duplex
Half duplexHalf duplex
10Mbits/s
10Mbits/s
10Mbits/s 10Mbits/s
Full duplex
Full duplex
Full duplexFull duplex
100Mbits/s
100Mbits/s
100Mbits/s 100Mbits/s
Half duplex
Half duplex
Half duplexHalf duplex
100Mbits/s
100Mbits/s
100Mbits/s 100Mbits/s
Full duplex
Full duplex
Full duplexFull duplex
AAAABBBBLLLLEEEE
TTTT
THROUGHPUT
4:
4:
4: 4:
TTCP R
Port 1
Port 1
Port 1 Port 1
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
RANSFER AND PACKET SIZE
BY T
ESULTS
LIEN T
(C
Port 2
Port 2
Port 2 Port 2
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
3)
Port 3
Port 3
Port 3 Port 3
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
6.526.556.406.67
8.948.948.948.94
84.5484.5384.5184.51
86.0686.0586.0586.04
Port 4
Port 4
Port 4 Port 4
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
In 100Mbits/s Half and Full duplex the resulting numbers were a full 20 points lower
across all four ports when using Client 1 and Client 2. To verify that this was not a
limitation of the Matrox NS-FNIC/4, this test used a different client configuration (see
Client 3 in “Benchmark Utilities Description” on page 2). The higher performing
client provided results that more closely resembled those in Test #1 and Test #2,
therefore the Matrox NS-FNIC/4 was not the cause of this performance drop.
The fourth test was similar to Tes t #3 in that it used the same file and packet sizes.
The difference was that the receiving part of the TTCP utility was not run on the
server but on a client. In other words, IP forwarding was added to this test, as shown in
Figure 3, below.
Figure 3:
Figure 3: A
Figure 3:Figure 3:
Client 1Client 2
12
DDING
IP F
ORWARDING
TCCP
Application
- B
ENCH CONFIGURATION
Client 3Client 4
34
IP Forwarding
NS-FNIC/4
Ports in the Server
This test verified whether the Matrox NS-FNIC/4 gives similar throughput when used
in a router configuration. The reported throughput was similar to those reported by
the first three tests and is shown in Tabl e 5 . This table only summarizes the results for
a packet size of 8KB and a file size of 10MB. Ta ble 5 displays the average of the results
obtained in routing packets from one port to the other three. For example, the result
displayed for port one is the average of the measured throughput on each of the three
other ports.
DDING
IP F
ORWARDING
Port 2
Port 2
Port 2Port 2
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
- TTCP R
Port 3
Port 3
Port 3Port 3
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
OUTING
Port 4
Port 4
Port 4Port 4
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
10Mbits/s
10Mbits/s
10Mbits/s10Mbits/s
Half duplex
Half duplex
Half duplexHalf duplex
10Mbits/s
10Mbits/s
10Mbits/s 10Mbits/s
Full duplex
Full duplex
Full duplexFull duplex
100Mbits/s
100Mbits/s
100Mbits/s100Mbits/s
Half duplex
Half duplex
Half duplexHalf duplex
100Mbits/s
100Mbits/s
100Mbits/s100Mbits/s
Full duplex
AAAABBBBLLLLEEEE
TTTT
5:
5: A
5: 5:
Port 1
Port 1
Port 1Port 1
(Mbits/s)
(Mbits/s)
(Mbits/s)(Mbits/s)
6.326.326.286.28
8.948.948.948.94
61.1461.1361.3262.00
61.17 61.5463.4764.59
The results in Table 5 are similar to those of Table 4 , showing that the Matrox
NS-FNIC/4 gives similar performance results irrespective of the routing
configuration. However, as in Tab le 4 , the results at 100Mbits/s full duplex seem low.
After further investigation, the client’s CPU and hard disk drive speed proved to be
causing the reduction. Routing, performed by software running on the Host CPU
rather than dedicated hardware, also combined to slow the system down.
The fifth test used the Netcom Systems Smartbits 1000® and a network ‘sniffer’ to
generate a routing efficiency rating. The routing efficiency refers to the percentage of
packets that are correctly echoed back by the Matrox NS-FNIC/4. The test setup is
shown in Figure 4. The Smartbits generates packets which must be sent back to the
Smartbits by the Matrox NS-FNIC/4 in the server. The sniffer is used to ensure that
the statistics are valid and packets have not been corrupted in transit.
Figure 4:
Figure 4: S
Figure 4:Figure 4:
MARTBITS
1000 T
EST
- B
ENCH SETUP
2
Sniffer
SmartBits
1
2
1
Managed
Switch
1
Managed
Switch
2
NS-FNIC/4
Ports in the Server
Note: Figure 4 shows the connection for two ports. Ports 3 and 4 are connected
in the same fashion as port 1.
The test used a packet size of 64 bytes and the results are presented in Table 6 . The
Load column represents the percentage of the maximum throughput achievable by the
network. A load of 50%, for example, represents a theoretical bandwidth of about
50Mbits/s using 100Mbits/s modes. The results displayed in Table 6 are the efficiency
percentage computed as explained above. For the results on four ports, two rates were
computed and then the
results displayed for 4 ports are averages.
The 100Mbits/s results indicate that the system is not able to maintain wire-speed on
all 4 ports with small-size (64 byte) packets. This is not surprising as the small-size
packets place a heavy demand on the driver and the applications that reside above it.
However, two issues must be kept in mind. First, this limitation is not a card-limitation,
but rather a limitation of CPU power in the system. Secondly, small size packets are not
the typical case on a network. Generally, average packet size will be much closer to the
ethernet maximum size of 1500 bytes. Another test was performed with the same
setup and a packet size of 1000 bytes. The results shown below, indicating a 100%
efficiency, confirm that there are no dropped packets.
The sixth, seventh and eighth tests were meant to measure the combined throughput
of the four ports. The aggregate throughput is the sum of all individual throughputs
from these tests. Top was used to provide the resulting CPU usage for these test.
Test #6 used the same test bench setup as the original NetPerf test (see Figure 1). It
was exactly as Test # 1 , except that all the clients were active at the same time. Test # 6
was only performed in the two 100Mbits/s modes (half and full duplex). The
individual throughputs (see Ta bl e 8 ) are not very different from the results of the
previous tests.
The CPU utilization for this test can be described as follows:
•At 100Mbits/s full duplex, the individual CPU utilization by client ranges from
15% to 40% with an average of about 24.5%. This average is consistent with the
original test, which resulted in 24% CPU utilization. As for the overall CPU
utilization, it ranges from 60% to 100% with an average of 98%.
•At 100Mbits/s half duplex, the individual CPU utilization ranges from 5 to 20%
with an average of 11.25%. The overall value ranges from 20% to 55% with an
average of 45%.
Test #7 was executed to obtain the combined throughput. It was similar to Te st #6
but switches were used to put two clients on each subnet. The final setup for this test
was as shown in Figure 3.
Figure 5:
Figure 5: C
Figure 5:Figure 5:
Linux PC with a
10/100 E thernet
card
Linux PC with a
10/100 E thernet
card
Linux PC with a
10/100 E thernet
card
Linux PC with a
10/100 E thernet
card
OMBINED THROUGHPUT TESTING
Switch
Linux PC with a
Matrox NS-FNIC/4
- C
OMBINED THROUGHPUT SETUP
Linux PC with a
10/100 E thernet
card
Switch
Linux PC with a
10/100 E thernet
card
Linux PC with a
10/100 E thernet
card
SwitchSwitch
Linux PC with a
10/100 E thernet
card
The four available 166MHz Pentiums with Linux installed were used with four
additional 100MHz Pentiums to provide the needed eight clients. In order not to
corrupt the data obtained, there was one 166MHz Pentium and one 100MHz Pentium
on each subnet. The expected individual throughput was lower than with only one
client per subnet and the overall throughput was expected to be about the same. The
expectations were confirmed by the results shown in Table 9 .
Tab le 9 shows that the individual throughputs achieved are about half those measured
when only one client was present on each server port, but the combined throughput is
about the same as with only one client. At half duplex, it is 30Mbits/s higher.
•At 100Mbits/s full duplex, individual client values ranged from 8% to 17%
averaging to 12.25%. Overall, it ranged from 64 % to 100% averaging at 98%.
There were twice as many clients as in the previous test. The same individual
average per port was obtained as in Test #7. The overall CPU Utilization per client
was halved.
•At 100Mbits/s half duplex, the CPU usage ranged from 6% to 20% averaging to
11%. Overall, it ranged from 48% to 95% averaging at 88%. In the previous case,
the opposite occurred. In Tes t # 6 , the average was 45%. Since there are now two
clients per port, both the bandwidth available and CPU Usage per client is now
halved.
Test #8 used two Matrox NS-FNIC/4 cards in the server. One client was connected
to each available port. The final setup is shown on Figure 6. As for Tes t # 7 , there
were two 100Mhz Pentiums and two 166Mhz Pentiums on each Matrox NS-FNIC/4.
Figure 6:
Linux PC with a
10/100 Ethernet
card
Linux PC with a
10/100 Ethernet
card
Linux PC with a
10/100 Ethernet
card
Linux PC with a
10/100 Ethernet
card
Figure 6: T
Figure 6:Figure 6:
subnet 1
subnet
2
subnet
3
subnet 4
ESTING TWO CARDS
card 1
Linux PC with 2
Matrox NS-FNIC/4
- B
card 2
ENCH SETUP
subnet 1
subnet
2
subnet
3
subnet 4
Linux PC with a
10/100 Ethernet
card
Linux PC with a
10/100 Ethernet
card
Linux PC with a
10/100 Ethernet
card
Linux PC with a
10/100 Ethernet
card
Since having eight ports active simultaneously is very demanding on the server, the
expected results of this test were a little lower than the two previous tests. The
combined throughputs were expected to be near the ones obtained for those same
tests. The expected results were met, as shown in Ta ble 1 0.
The CPU utilization for this test can be described as follows:
•At 100Mbits/s full duplex, each port ranged from 9% to 15% averaging at 12.38%
and overall, it ranged from 72% to 100% averaging at 99%.
•At 100Mbits/s half duplex, the CPU usage for each client ranged from 8% to 15%
averaging at 12.38% and overall, it ranged from 64% to 100 % averaging at 99%.
The combined throughput result shows that even with eight ports active at the same
time, the Matrox NS-FNIC/4 is able to deliver around 300Mbits/s shared almost
equally between its ports.
In conclusion, the Matrox NS-FNIC/4 is able to deliver at least 88Mbits/s per port at
100Mbits/s full duplex, which means a combined throughput of at least 300Mbits/s.
It can also deliver 60Mbits/s per port at 100Mbits/s half duplex, which means a
combined throughput of at least 240Mbits/s.
It attains similar performance even when used as part of a routing configuration in a
fast machine, giving the Matrox NS-FNIC/4 added value.
In congested networks, the Matrox NS-FNIC/4 in the server splits the network into
four smaller subnets without sacrificing bandwidth. Refer to the Matrox NS-FNIC/4
Case Study #1, available on the Matrox Networks web site, for more information.