Newsgroups: comp.dcom.lans.ethernet
From: mart@csri.toronto.edu (Mart Molle)
Subject: 15% collisions on Ethernet OK? (was: Switching vs. Routing)
Message-ID: <1993Apr30.173455.12194@jarvis.csri.toronto.edu>
References: <1rnrlu$nd0@access.digex.net>
Date: 30 Apr 93 21:34:55 GMT
Lines: 202

rl@access.digex.net (Rich) writes:

>On 04-27-19, Vjs@rhyolite.wpd.sgi.com had this to say to All:

>V>In article <1993Apr26.122114.28216@cyantic.com>, mark@cyantic.com (Mark
>V>> ...
>V>> The only visibile problem on the network is a high level of collision
>V>> the accounting system, sometimes reaching 15% during operations peaks
>V>
>V>hmmph.  classic "gotta have the latest technology" disease.  
>V>15% collisions is not a "high level" by any stretch, unless it means
>V>15% of the packets suffer 16 consecutive collisions and are dropped, so
>V>that the per transmission attempt collision rate close to 100%.
>V>(Yes, an ethernet is very sick at much less than 100% collisions, but
>V>if I did the arithmetic right, 15% "excessive" collsions can't happen
>V>much below 100%.)

>Normally you and I are in sync, Vernon, but I have to question on this
>one (BTW folks, this is my new address, previously you knew me as
>rich@grebyn.com, and yes, I work for Synoptics communications and the
>following is not official advice from myself as a Synoptics employee and
>is not to be misconstrued as Synoptics policy or recommendation, etc etc
>etc).

>Consider the following graph:

>                  Where you would want to be, hopefully
>                 \/
>Efficiency
>----------
>    |              XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>    |           XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>    |         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>    |        XXOOOOOOXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>    |      XOOO......OOOOXXXXXXXXXXXXXXXXXXXXXXXX
>    |    XOOO..............OOOXXXXXXXXXXXXXXXXXXX
>    |   OO......................XXXXXXXXXXXXXXXXX
>    |  O...........................XXXXXXXXXXXXXX
>    |........................................XXXX
>    --------------------------------------------------
>                Utilization ( % bandwidth )


>Where X=Collisions, O=Packets, .=Actual data 
>(sorry about the lousy scaling)

Interesting ASCII graphics.  But impossible to interpret without
knowing the scales.  Normally these sorts of curves plot "G"
("offered load", in transmission attempts per unit time; note that
this measure may count individual packets more than once if they
suffer collisions) on the horizontal axis and "S" ("throughput",
in successful packet transmissions per unit time; this *never* counts
an individual packet more than once).  If the time unit is taken to
be the transmission time for an average packet (rather than a second,
millisecond, etc), normalized throughput is a number between 0 and 1
and is sometimes called efficiency.  Is that what you mean here????

>Of course, this shows the stochastic nature of ethernet which I
>know you are already familiar with. The only reason I draw it is
>we appear to disagree on what is an acceptable delta between the
>number of packets transmitted and the percentage of those packets
>that are collisions. In the above graph, this would be the delta
>between the top of level of the "O" and the top of the "X" curve.

This is an understatement!

>Obviously, and as shown in the graph, collisions are an expected
>result of traffic and therefore are no cause for immediate alarm.

Obviously!

>The point we should be interested in is the marked top of the curve,
>where the highest efficiency for the least cost in collisions is
>obtained. The popular figure for that point is around 2-3% (!).

There is something terribly wrong with what you are saying.  Consider
the following:

1) Assuming I understand what your axes mean, you are saying that the
correct operating point for the system (i.e., the rate at which your
hosts should attempt to schedule packet transmissions) is where the
throughput curve (i.e., number of *successful* packet transmissions
completed per unit time) is maximized.  Am I right?

Well, in that case, your system would behave like a saturated queue,
where the waiting times blow up to unacceptable levels. (Remember, we're
talking about the *average* arrival rate of new packets being set
equal to the *average* departure of successful packets...)  If you
don't want to get swamped with queueing delays, you need to make the
arrival rate *smaller* than the departure rate.

2) Again, since the vertical axis of your graph is labelled "efficiency",
I understand your comment to mean that at your suggested operating point,
2-3% OF THE CHANNEL TIME is being spent on collisions (as opposed to idle
periods or successful packet transmissions).  This is NOT the same as
saying that only 2-3% of the packet transmission attempts result in a
collision.

Remember that each collision event takes up a small amount of time
(on the order of a few hundred bit times) whereas a successful packet
transmission can occupy the channel for much longer (on the order of
thousands of bit times, for a reasonable segment size).  For example,
consider a scenario where every packet is 10,000 bits long and every
collision event occupies the channel for 1,000 bit times (pretty much
the worst case, since you know that everyone will have detected the
collision and aborted their transmissions in half that time....).  If
the collision rate is 50% (i.e., half the time you succeed and half
the time you suffer a collision when attempting to send a packet), the
proportion of time occupied by collisions in the worst case would only be:

	(50% * 1,000 bits) / (50% * 1,000 bit times + 50% * 10,000 bit times)

which is 1/11 or about 9%.

>That is, 2-3% of your total packets should be collisions, and you
>are then at the top of that curve. My own unscientific surveys of 
>multi-protocol networks suggests that this is higher - 5%, maybe
>more depending on protocol, but I've never considered it to be 15%
>In the above graph, 15% would still be a useable network but well
>on the way to the right of the optimal point. Once you pass that 
>point, you are headed along the exponential curve of additional
>collisions.

This isn't reasonable.  Right now when I typed "netstat -i" on my
Sun workstation, I got:

 Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts  Oerrs Collis Queue 
 le0   1500 csri2-ether   genie.csri     1658181 0    608422  4    52114  0     
 lo0   1536 loopback      localhost      115487  0    115487  0    0      0     

Modulo various comments about the inaccuracies of the collision counter
(since it may only report the number of packets that suffered one or more
collisions, not the total number of collisions...), it can be seen that
roughly 8.5% of the time my workstation tried to send a packet over our
ether, a collision occurred.  This is not a busy network, with average
daytime utilization under 5% and peak utilizations over 1 minute intervals
under 20-30%.

>I'm basing my much lower percentage on the "LAN and Enterprise
>Network Performance Queuing Model", SST Associates, and "Performance
>of a Stochastically Optimized CSMA Network" from Mitre (also
>summarized fairly well in Nemzow's "The Ethernet Management Guide").
>I'm aware that these are biased towards TCP/IP, but the original
>poster is naming that as one of his primary protocols.

I've never heard of any of these guys.  Show me where I can see the
measurements and/or analysis to back up these numbers.  (Remember,
the world is full of snake oil salesmen who quote the throughput formula
for unslotted Aloha, or non-persistent CSMA *without collision detection*
and say it is Ethernet...)

>In any case, do you have a more recent study that you are
>obtaining the 15% figure from for multi-protocol networks? My own
>empirical evidence suggests that 15% as acceptable would be a rare
>case.

>V>15% collisions is an almost idle network.

I agree with Vernon here.  15% collisions *is* an almost idle network,
especially if it has a bunch of machines with fast Ethernet interfaces
on it (like SGIs) where one host sending, say, the fragments of an NFS
page as multiple packets goes around colliding with everybody else who
was deferring to him...

>If 15% of the number of packets sent are collisions, it's almost
>certainly *not* an idle network, no matter if you agree with the 
>curve above or not, wouldn't you say?

Not at all.  You really have to think more about what happens on a
shared contention medium like Ethernet.  If the network is mostly
busy, then almost every time *I* decide to generate a new packet,
I'll find the network busy, and have to defer my transmission until
end-of-carrier.  Now if we are talking about a "real" network with
a mixture of long (data) and short (acknowledgement) packets, it
is very likely are that I will find the network busy with a long
packet than a short one.  (Trust me, or go look up anybody's measurement
study showing the histograms of % packets versus % bytes sent in packets
of various lengths -- and remember that if you decide to xmit at a random
time, you are equally likely to do so when a random byte is byte is
being sent, rather than a random packet.)  Because of the short packets
in the traffic mix, there is a good chance that more than one station
will decide to transmit a packet during the transmission time for a
long.  QED, you suffer a collision.

In other words, the only way *not* to have many collisions is for most
of the packets to arrive to the system to find the channel mostly idle,
since it is the deferred packets (common when the network is busy) that
suffer most of the collisions.

>                                      People may differ on what they
>think the optimal "break point" on the curve is, but everybody seems
>to agree that the slope starts out very small and grows slowly until
>it reaches a break point.

Sorry, it's not that simple.  If it were, analysts like me would have
more respect on the net...  :-(

Mart L. Molle
Computer Systems Research Institute
University of Toronto
Toronto, Canada M5S 1A4
(416)978-4928