School Work

Packet Classification Using Extended TCAMs

Description
Appears in Proeedings of ICNP, 2003 Paket Classifiation Using Extended TCAMs Ed Spitznagel, David Taylor, Jonathan Turner Applied Researh Laboratory, Washington University, St. Louis, MO
Categories
Published
of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
Appears in Proeedings of ICNP, 2003 Paket Classifiation Using Extended TCAMs Ed Spitznagel, David Taylor, Jonathan Turner Applied Researh Laboratory, Washington University, St. Louis, MO Abstrat CAMs are the most popular pratial method for implementing paket lassifiation in high performane routers. Their prinipal drawbaks are high power onsumption and ineffiient representation of filters with port ranges. A reent paper [] showed how partitioned TCAMs an be used to implement IP route lookup with dramatially lower power onsumption. We extend the ideas in [] to address the more hallenging problem of general paket lassifiation. We desribe two extensions to the standard TCAM arhiteture. The first organizes the TCAM as a two level hierarhy in whih an index blok is used to enable/disable the querying of the main storage bloks. The seond inorporates iruits for range omparisons diretly within the TCAM memory array. Extended TCAMs an deliver high performane (00 million lookups per seond) for large filter sets (00,000 filters), while reduing power onsumption by a fator of ten and improving spae effiieny by a fator of three.. Introdution Paket lassifiation is a key tehnology for modern high performane routers. Pakets reeived at a router input are lassified to determine both the output port the paket should be sent to and to determine what, if any, speial handling it should reeive. Paket lassifiation an be used to provide expedited forwarding of ertain types of pakets, to enfore seurity restritions or to trigger traffi monitoring. The growing omplexity of the Internet is reating new appliations for paket lassifiation, plaing additional demands on the paket lassifiation subsystem of routers and other paket handling devies. In the general paket lassifiation problem, pakets are lassified aording to a set of paket filters, whih define patterns that are mathed against inoming pakets. Typially, paket filters speify possible values of the soure and destination address fields of the IP header, the protool field (often inluding flags) and the soure and destination port numbers (for TCP and UDP). The address fields are often This work supported by the National Siene Foundation, ANI and the Defense Advaned Researh Projets Ageny, Contrat N speified as address prefixes, although arbitrary bit masks of the address fields are ommonly allowed in paket filters and this feature is used in real filter sets, although relatively infrequently. Filters typially speify a range of port numbers for mathing pakets. Protools an be either speified exatly or as a wildard. Some systems allow protool values to be speified by bit masks as well, although it s not lear how useful that feature is. A small example of a filter set appears in Figure. Here, the address fields are shown as four bits, rather than 32 to simplify the example. A dash in an address field indiates a bit position where the mask bit is zero. A dash for an entire entry indiates a wildard whih is mathed by any paket. When a paket is reeived, the filter set is onsulted to find the first mathing filter in the set. The paket is then proessed aording to the speified ation. So for example, a paket with a soure address of 00, a destination address of 00, a protool field speifying TCP, a soure port of 4 and a destination port of 6 would be forwarded to output port 5, sine it mathes the seond filter in the set, but not the first. On the other hand, a paket with a soure address of and a destination address of 000 would be dropped (regardless of other fields), sine the first mathing filter is number 6. Historially, the appliations of general paket lassifiation have been limited to relatively low performane systems with relatively small numbers of paket filters. This has made it possible to math every inoming paket against the ordered list of filters and stop when the first mathing soure dest soure dest address address protool port port ation TCP fwd TCP fwd UDP fwd ICMP - - fwd UDP 4 5 fwd drop fwd drop Figure. Example paket filter set filter in the list is enountered. This method does not sale effetively to high performane systems that must proess tens of millions of pakets per seond and that may have muh larger filter sets. While urrently, most general paket filter sets are fairly small (most have a few hundred entries and very few exeed a few thousand), the size is expeted to grow substantially in the future. The paket lassifiation problem has been studied extensively in reent years. One early effort [3] proposed a grid-of-tries to look up 2D filters defined on soure and destination address. While very effetive for 2D filters, it ould not be applied diretly to larger numbers of dimensions. The tuple-spae searh tehnique [4] is another general approah, in whih filters are separated into lasses based on the number of bits speified in eah dimension. This allows a given lass to be probed quikly using hashing, but sine many lasses may have to be probed for a given paket, it does not yield very high performane. The Reursive Flow Classifiation (RFC) algorithm [GU99a] has reeived muh attention in reent years. RFC trades-off memory spae against time to ahieve faster lookups. While this is a legitimate hoie, the spae effiieny of RFC an be surprisingly poor. It an use more than a kilobyte per filter, roughly 50 times the memory needed to represent the filter. Another reent algorithm, Hiuts [GU99b] is similarly profligate in its use of memory. The Extended Grid of Tries [2] and Hyperuts [7] algorithms are the first algorithms for the general problem that show some promise of ahieving high performane, without requiring exessive amounts of memory. Perhaps the most popular method for paket lassifiation problem in pratie, is to use Ternary Content Addressable Memory (TCAM). TCAMs stores data patterns in the form of (value, bit mask) pairs. A query word an be simultaneously ompared against all the stored patterns. A query word q is said to math a stored pattern (v,m) if q & m = v & m, where the ampersand denotes the bit-wise logial and operation. One bit of TCAM storage an be implemented using 6 transistors [MO94] ompared to 6 transistors for a word of SRAM. This 2.7x penalty, makes TCAMs less attrative than SRAM-based algorithms that use the same number of bits. However, as disussed earlier, high performane algorithms using SRAM typially use very large amounts of memory per stored filter, whih offsets the ost advantage of SRAM. TCAMs suffer from two other shortomings, in addition to their relatively high ost per bit. First, TCAMs require large amounts of power, more than 00 times the power of a similar amount of SRAM. They an aount for a major part of the power onsumption of a router line ard. A reent paper [] showed how partitioned TCAMs ould signifiantly redue TCAM power onsumption in IP route lookup. While this is of some interest, the availability of effiient SRAM-based route lookup algorithms [3, 2, 5, 8] limits its impat. In this paper, we explore how similar ideas an be applied to the more diffiult problem of general paket lassifiation. Another signifiant shortoming of TCAMs is filter set filters with ranges TCAM entries % % % % % Figure 2. Effet of ranges on TCAM effiieny their inability to effiiently handle filters ontaining port number ranges. Suh filters must be handled using multiple TCAM entries, and in the worst-ase, it may take hundreds of TCAM entries to represent a single filter. We propose an extension to TCAMs that enables them to handle port ranges diretly and argue that the added implementation ost of this extension is amply ompensated by the improved handling of port ranges. Setion II desribes the extensions to TCAMs that are needed to enable high performane and ost-effetive solutions to the general paket lassifiation problem. Setion III desribes a general algorithm for organizing a filter set in an extended TCAM to enable fast lookup. Setion IV briefly presents the method we use to generate large paket filter sets, whih reflet the harateristis of the muh smaller filter sets that are typially available for use by researhers. In Setion V we present results evaluating the performane of our algorithm, under a wide range of onditions. Conluding remarks are provided in Setion VI. 2. Extended TCAMs storage effiieny Ternary CAMs are perhaps the most popular implementation method for paket lassifiation in high performane routers. TCAMs are beoming available in onfigurations with up to 8 Mbits, roughly half the size of the largest SRAMs. An 8 Mbit TCAM offers enough storage for up to 28K IPv4 filters, whih is large enough to meet most near term needs for general paket lassifiation solutions. As disussed above TCAMs have two major drawbaks. First, they onsume a large amount of power, and seond, they are ineffiient when applied to filters with port number ranges. Some TCAMs have a feature that allows a query to be applied to a subset of the TCAM entries, instead of the entire set. Referene [] has shown how suh partitioned TCAMs an provide a lower power solution to the problem of IP lookup. We extend the partitioned TCAM onept and show that if we organize the set of filters in this extended TCAM appropriately, we an perform a lookup for a single paket, using a limited number of the TCAM bloks, rather than the entire TCAM, reduing the power onsumption by more than an order-of-magnitude. To make this strategy effetive, the TCAM must have a fairly large number of independent storage bloks. For example, we might organize a 28K filter TCAM into 52 bloks of 256 filters eah. A lookup algorithm that limited its searh to no more than say ten bloks would use just a few perent of the power that - 2 - q i q i q i+ hi i+ q i hi i q i hi i g i+ g i g i..... l i+ l i l i hi i hi i g i+ g i lo i+ g i = g i+ + l i+ q i lo i lo i lo i l i = l i+ + g i+ q i hi i Figure 3. Iterative struture of range hek iruit would be required if every lookup were applied to the entire TCAM. Our modifiation to the TCAM arhiteture adds a speial storage blok alled an index to an ordinary partitioned TCAM. Eah word in the index is assoiated with one of the main storage bloks. Coneptually, when a lookup is performed on the modified TCAM, the index is onsulted first, and then for eah word in the index that mathes the query word, a lookup is performed on the orresponding storage blok. The lookups in the storage bloks are done in parallel and the address of the first mathing entry in eah blok is returned. To resolve mathes aross multiple bloks, we assoiate a priority with eah filter, whih orresponds to its position in the original ordered list. The priority field of the first mathing filter in a blok is returned along with the ation field of the mathing filter. The priorities are ompared to determine whih of the mathing filters has the highest priority. The ation field of this filter is returned as the result of the lookup. The modified TCAM an be pipelined to maintain the same operating frequeny as a onventional TCAM. The index lookup is done on the first lok tik, followed by the lookup in the storage blok on a seond lok tik, followed by the priority resolution on a third lok tik. An extended TCAM with a lok rate of 00 MHz an perform 00 million lookups per seond. As mentioned above, a TCAM lookup is performed by omparing a query word against a set (value, mask) pairs. A word q mathes a stored pair (v,m) if q & m = v & m. The (value, mask) mathing paradigm works well for mathing IP addresses, but is not well-suited to mathing port number ranges. The usual way to handle a port number range in a filter, is to replae eah filter with several filters, eah overing a portion of the desired port range. This requires splitting the range into smaller ranges that an be expressed as (value, mask) pairs. For example, the range 2-0 an be partitioned into the set of patterns 00-, 0--, 00- and 00, where the dashes denote bit positions where the mask is zero. In general, any sub-range of a k bit field an be partitioned into 2(k ) suh patterns. Sine port numbers are 6 bits eah, this means that a range in either the soure or destination port number field an require as many as 30 distint TCAM entries. The problem beomes muh worse if ranges l i+ lo i lo i Figure Range hek sub-iruit are present in both the soure and destination port number fields. In this ase, we need a filter for all ombinations of the sub-ranges for the two fields. This means that a single paket filter may require 900 TCAM entries. In pratie, things aren t nearly this bad, but they are still bad enough. Filter sets often use the port range ,535. This an be split into just six filters, but a filter ontaining this range in both the soure and destination port number fields still needs 36 TCAM entries. If even 0% of the filters in a large filter set ontained suh port number ranges, the average number of TCAM entries per filter would be 4.5, greatly inreasing the effetive ost of a TCAM-based solution. (We note that referene [5] desribes a more effiient way to represent a set of ranges, but this method annot be applied to mathing multi-dimensional filters in TCAMs.) To better understand the magnitude of this issue, we studied five real-world filter sets and determined the minimum number of TCAM entries required to represent the set, assuming that port ranges were deomposed in the most effiient way. The results appear in Figure 2, whih shows the number of TCAM entries required to represent eah of the filter sets and the resulting storage effiieny. The storage effiieny ranges from as little as 6% to 53%, with an average of 34%, tripling the effetive ost of TCAM-based solutions. One way to handle port ranges better is to extend the TCAM funtionality to diretly inorporate port range omparisons in the devie. Suh a TCAM would store a pair of 6 bit values (lo,hi) for eah port number field and inlude iruitry to ompare a query word q against the stored values. Figure 3 shows the iterative struture of the required range hek iruit. The iruit onsists of a separate stage for eah bit and the omparison proeeds from the most signifiant bits to the least signifiant bits. The inter-stage l i - 3 - index blok 0-5, 0xxx 0-6, xxx 7-5, xxx 0-5, xxxx -3, 00x 2-3, 00xx -4, 0x 2-3, 0xxx 0-5, 0-2, xx 7-7, 0x 3-4, xx -5, x 9-0, xxx 0-4, 00 Figure 5. Searh in an extended TCAM signal g i is high whenever the value of the high order bits of q (down to bit i) are numerially larger than the high order bits of hi. Similarly, the inter-stage signal l i is high whenever the value of the high order bits of q are smaller than the high order bits of lo. The query value is in range if both g 0 and l 0 are low. Figure 4 shows a iruit implementing the required logi for eah stage, along with the storage elements for lo and hi. A standard CMOS implementation of this iruit uses 32 transistors, twie as many as the standard TCAM storage ell. However, the impat of this on the ost of the entire TCAM is muh smaller. The two port number fields represent 22% of a 44 bit TCAM word, suitable for IPv4 (allowing 6 bits for the protool field inluding flags, plus 32 bits for an ation field and priority). Doubling the transistor ount for this portion of the TCAM inreases the total number of transistors per word by 22%. While this is a nontrivial inrease, it is a far smaller prie to pay than the prie implied by the ineffiient representation of port ranges in standard TCAMs. In any appliation where port number ranges are present in more than a few perent of the filters, the added ost is easily justified. We use the term extended TCAM to refer to a TCAM with both of the modifiations desribed. 3. Classifiying Pakets with Extended TCAMs To use extended TCAMs for paket lassifiation, we need to partition the filter set into storage bloks and then assoiate eah storage blok with an appropriate index filter. The extended TCAM searh will first identify all mathing index filters, and then query the storage bloks assoiated with those mathing index filters. This proess is illustrated in Figure 5, whih shows a set of two dimensional filters on four bit fields (one defined using ranges, one defined using bit-masks) and the organization of those filters into TCAM bloks with an index blok to the left (the figure does not show the priority and ation fields). To perform a lookup on a paket with field values (2,0), we first hek the index blok and disover that the seond and fourth index filters math the paket. Searhing the seond blok, we find the mathing filter (-2, xx) and searhing the fourth blok, we find the mathing filter (0-4, 00). In this example, the TCAM bloks are large enough for just four filters eah, but a realisti implementation of the method would use TCAMs with bloks apable of storing hundreds of filters. Also note that the index blok need not have the same number of entries as the storage bloks. The key to making the searh power-effiient is to organize the filters so that only a few TCAM bloks must be searhed in order to find the desired mathing filter for a given paket. We define the problem of organizing the filters preisely below, but first we introdue the following definition. Definition. Let f and f 2 be filters defined on the same multidimensional spae. We say that f overs f 2 if the region of the spae that is defined by f ompletely ontains the region defined by f 2. Similarly, we say a set of filters F overs a filter f if the region defined by the union of the filters in F ompletely ontains the region defined by f. Filter Grouping Problem. Given a set F of filters and integers k, m and r, find a set S of at most m filters and a bipartite graph G = (V,E ) with V=F S and E F S, that satisfy the following onditions. for every f in F, the neighbors (in G) of F over f, for every s in S, the degree (in G) of s is at most k, no point in the multi-dimensional spae on whih the filters are defined is overed by more than r members of S. S defines the set of index filters. The graph speifies the assignment of original filters to index filters and their assoiated storage bloks. The degree of a vertex for an index filter is equal to the number of filters in the storage blok assoiated with that index filter. The bound k, on the degree, limits the number of filters per blok; the bound m, on the size of S, limits the number of index filters and hene the number of TCAM bloks needed to hold the index; and the bound r, on the number of index filters overing any point in the spae, limits the number of TCAM storage bloks that must searhed (in addition to the index). The problem an be onverted into an optimization problem, by minimizing any one of the three parameters, while leaving the other two as bounds. We use a heuristi filter grouping algorithm to organize the filters. The algorithm proeeds in a series of phases. Eah phase reursively divides the multi-dimensional spae into ever smaller regions, so eah phase produes a separate - 4 - partition of the spae. During eah step in a phase, a region in the spae is seleted and divided into two parts with approximately the same number of filters. The algorithm returns a set S of index filters and a subset of the original filter set for eah of the index filters. In all but the last phase, eah sub-region reated in that phase is assoiated with a set of filters that are ontained entirely within the sub-region. These filters are assigned to this enlosing sub-region and are then ignored in later phases. The last phase also partitions the spae, but some of the filters that remain at this stage, may not fall entirely within any of the sub-regions. Suh filters are assigned to all the sub-regions reated in the last phase whih they interset, meaning that there will be multiple TCAM bloks ontaining opies of these filters. A basi operation of the algorithm is to ut a region r of the multidimensional spae into two sub-regions r and r 2 along one of the multiple dimensions. We represent eah region by an index filter. To ut a filter along a bit-mask dimensi
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks