I don’t have an answer to the following TCP performance difference between two machines, but thought I’d post about what I’ve seen as I think it’s interesting, and maybe someone else has ideas. I’m running netio (http://freshmeat.net/projects/netio/) on one machine (opensolaris) and contacting two different Linux machines (both on 2.6.18-128.el5 ), machine A and machine B. Machine A has a network throughput of 10MB/sec with netio and machine B 100MB/sec with netio. Netio is set to send 32K chunks:
Linux machine: netio -s -b 32k -t -p 1234 Opensolaris: netio -b 32k -t -p 1234 linuxmachine
On the opensolaris machine I used dtrace to trace the connections. From dtrace, all the interactions TCP settings look the same – same windows sizes on the receive and send, same ssthresh, same congestion window sizes, but the slow machine is sending an ACK for every 2 or 3 receives whereas the fast machine is sending an ACK every 12 receives. All three machines are on the same switch. Here is the Dtrace output: Fast Machine:
delta send recd
(us) bytes bytes swnd snd_ws rwnd rcv_ws cwnd ssthresh
122 1448 \ 195200 7 131768 2 128872 1073725440
37 1448 \ 195200 7 131768 2 128872 1073725440
20 1448 \ 195200 7 131768 2 128872 1073725440
18 1448 \ 195200 7 131768 2 128872 1073725440
18 1448 \ 195200 7 131768 2 128872 1073725440
18 1448 \ 195200 7 131768 2 128872 1073725440
18 1448 \ 195200 7 131768 2 128872 1073725440
19 1448 \ 195200 7 131768 2 128872 1073725440
18 1448 \ 195200 7 131768 2 128872 1073725440
18 1448 \ 195200 7 131768 2 128872 1073725440
57 1448 \ 195200 7 131768 2 128872 1073725440
171 1448 \ 195200 7 131768 2 128872 1073725440
29 912 \ 195200 7 131768 2 128872 1073725440
30 / 0 195200 7 131768 2 128872 1073725440
slow machine:
delta send recd
(us) bytes bytes swnd snd_ws rwnd rcv_ws cwnd ssthresh
161 / 0 195200 7 131768 2 127424 1073725440
52 1448 \ 195200 7 131768 2 128872 1073725440
33 1448 \ 195200 7 131768 2 128872 1073725440
11 1448 \ 195200 7 131768 2 128872 1073725440
143 / 0 195200 7 131768 2 128872 1073725440
46 1448 \ 195200 7 131768 2 130320 1073725440
31 1448 \ 195200 7 131768 2 130320 1073725440
11 1448 \ 195200 7 131768 2 130320 1073725440
157 / 0 195200 7 131768 2 130320 1073725440
46 1448 \ 195200 7 131768 2 131768 1073725440
18 1448 \ 195200 7 131768 2 131768 1073725440
Dtrace code
dtrace: 130717 drops on CPU 0
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option defaultargs
inline int TICKS=$1;
inline string ADDR=$$2;
dtrace:::BEGIN
{
TIMER = ( TICKS != NULL ) ? TICKS : 1 ;
ticks = TIMER;
TITLE = 10;
title = 0;
walltime=timestamp;
printf("starting up ...\n");
}
tcp:::send
/ ( args[2]->ip_daddr == ADDR || ADDR == NULL ) /
{
nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
delta= timestamp-walltime;
walltime=timestamp;
printf("%6d %8d \ %8s %8d %8d %8d %8d %8d %12d %12d %12d %8d %8d %d \n",
delta/1000,
args[2]->ip_plength - args[4]->tcp_offset,
"",
args[3]->tcps_swnd,
args[3]->tcps_snd_ws,
args[3]->tcps_rwnd,
args[3]->tcps_rcv_ws,
args[3]->tcps_cwnd,
args[3]->tcps_cwnd_ssthresh,
args[3]->tcps_sack_fack,
args[3]->tcps_sack_snxt,
args[3]->tcps_rto,
args[3]->tcps_mss,
args[3]->tcps_retransmit
);
flag=0;
title--;
}
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL ) && nfs[args[1]->cs_cid] /
{
delta=timestamp-walltime;
walltime=timestamp;
printf("%6d %8s / %8d %8d %8d %8d %8d %8d %12d %12d %12d %8d %8d %d \n",
delta/1000,
"",
args[2]->ip_plength - args[4]->tcp_offset,
args[3]->tcps_swnd,
args[3]->tcps_snd_ws,
args[3]->tcps_rwnd,
args[3]->tcps_rcv_ws,
args[3]->tcps_cwnd,
args[3]->tcps_cwnd_ssthresh,
args[3]->tcps_sack_fack,
args[3]->tcps_sack_snxt,
args[3]->tcps_rto,
args[3]->tcps_mss,
args[3]->tcps_retransmit
);
flag=0;
title--;
}
Followup , since I did the above, I have since changed the dtrace code to include the number of unacknowledged bytes and it turns out the slow code does run up it’s unacknowleged bytes until it hits the congestion window, where as the fast machine never hits it’s congestion window:
unack unack delta bytes bytes send receive cong ssthresh
bytes byte us sent recieved window window window
sent recieved
139760 0 31 1448 \ 195200 131768 144800 1073725440
139760 0 33 1448 \ 195200 131768 144800 1073725440
144104 0 29 1448 \ 195200 131768 146248 1073725440
145552 0 31 / 0 195200 131768 144800 1073725440
145552 0 41 1448 \ 195200 131768 147696 1073725440
147000 0 30 / 0 195200 131768 144800 1073725440
147000 0 22 1448 \ 195200 131768 76744 72400
147000 0 28 / 0 195200 131768 76744 72400
147000 0 18 1448 \ 195200 131768 76744 72400
147000 0 26 / 0 195200 131768 76744 72400
147000 0 17 1448 \ 195200 131768 76744 72400
147000 0 27 / 0 195200 131768 76744 72400
147000 0 18 1448 \ 195200 131768 76744 72400
147000 0 56 / 0 195200 131768 76744 72400
147000 0 22 1448 \ 195200 131768 76744 72400
dtrace code:
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option defaultargs
inline int TICKS=$1;
inline string ADDR=$$2;
tcp:::send, tcp:::receive
/ ( args[2]->ip_daddr == ADDR || ADDR == NULL ) /
{
nfs[args[1]->cs_cid]=1; /* this is an NFS thread */
delta= timestamp-walltime;
walltime=timestamp;
printf("%6d %6d %6d %8d \ %8s %8d %8d %8d %8d %8d %12d %12d %12d %8d %8d %d \n",
args[3]->tcps_snxt - args[3]->tcps_suna ,
args[3]->tcps_rnxt - args[3]->tcps_rack,
delta/1000,
args[2]->ip_plength - args[4]->tcp_offset,
"",
args[3]->tcps_swnd,
args[3]->tcps_snd_ws,
args[3]->tcps_rwnd,
args[3]->tcps_rcv_ws,
args[3]->tcps_cwnd,
args[3]->tcps_cwnd_ssthresh,
args[3]->tcps_sack_fack,
args[3]->tcps_sack_snxt,
args[3]->tcps_rto,
args[3]->tcps_mss,
args[3]->tcps_retransmit
);
}
tcp:::receive
/ ( args[2]->ip_saddr == ADDR || ADDR == NULL ) && nfs[args[1]->cs_cid] /
{
delta=timestamp-walltime;
walltime=timestamp;
printf("%6d %6d %6d %8s / %-8d %8d %8d %8d %8d %8d %12d %12d %12d %8d %8d %d \n",
args[3]->tcps_snxt - args[3]->tcps_suna ,
args[3]->tcps_rnxt - args[3]->tcps_rack,
delta/1000,
"",
args[2]->ip_plength - args[4]->tcp_offset,
args[3]->tcps_swnd,
args[3]->tcps_snd_ws,
args[3]->tcps_rwnd,
args[3]->tcps_rcv_ws,
args[3]->tcps_cwnd,
args[3]->tcps_cwnd_ssthresh,
args[3]->tcps_sack_fack,
args[3]->tcps_sack_snxt,
args[3]->tcps_rto,
args[3]->tcps_mss,
args[3]->tcps_retransmit
);
}
So the fact that it looked like the slow machine was acknowledging every second or third send was due to the fact that the receiver was already behind on acknowledging previous packets.
Now the question remains is why did the receiver get so far behind on one machine and not the other.
I’ve check the rmem values on both machines and set the to the same:
net.core.rmem_default=4194304
net.core.rmem_max=4194304
To be continued
תגובות