Merge branch 'oz-trie-table'

This commit is contained in:
Ondrej Zajicek (work) 2022-02-06 23:32:15 +01:00
commit 53a2540687
23 changed files with 2986 additions and 399 deletions

View file

@ -252,16 +252,9 @@ The global best route selection algorithm is (roughly) as follows:
</itemize>
<p><label id="dsc-table-sorted">Usually, a routing table just chooses a selected
route from a list of entries for one network. But if the <cf/sorted/ option is
activated, these lists of entries are kept completely sorted (according to
preference or some protocol-dependent metric). This is needed for some features
of some protocols (e.g. <cf/secondary/ option of BGP protocol, which allows to
accept not just a selected route, but the first route (in the sorted list) that
is accepted by filters), but it is incompatible with some other features (e.g.
<cf/deterministic med/ option of BGP protocol, which activates a way of choosing
selected route that cannot be described using comparison and ordering). Minor
advantage is that routes are shown sorted in <cf/show route/, minor disadvantage
is that it is slightly more computationally expensive.
route from a list of entries for one network. Optionally, these lists of entries
are kept completely sorted (according to preference or some protocol-dependent
metric). See <ref id="rtable-sorted" name="sorted"> table option for details.
<sect>Routes and network types
<label id="routes">
@ -628,18 +621,73 @@ include "tablename.conf";;
<cf/protocol/ times, and the <cf/iso long ms/ format for <cf/base/ and
<cf/log/ times.
<tag><label id="opt-table"><m/nettype/ table <m/name/ [sorted]</tag>
Create a new routing table. The default routing tables <cf/master4/ and
<cf/master6/ are created implicitly, other routing tables have to be
added by this command. Option <cf/sorted/ can be used to enable sorting
of routes, see <ref id="dsc-table-sorted" name="sorted table">
description for details.
<tag><label id="opt-table"><m/nettype/ table <m/name/ [ { <m/option/; [<m/.../] } ]</tag>
Define a new routing table. The default routing tables <cf/master4/ and
<cf/master6/ are defined implicitly, other routing tables have to be
defined by this option. See the <ref id="rtable-opts"
name="routing table configuration section"> for routing table options.
<tag><label id="opt-eval">eval <m/expr/</tag>
Evaluates given filter expression. It is used by the developers for testing of filters.
</descrip>
<sect>Routing table options
<label id="rtable-opts">
<p>Most routing tables do not need any options and are defined without an option
block, but there are still some options to tweak routing table behavior. Note
that implicit tables (<cf/master4/ and <cf/master6/) can be redefined in order
to set options.
<descrip>
<tag><label id="rtable-sorted">sorted <m/switch/</tag>
Usually, a routing table just chooses the selected (best) route from a
list of routes for each network, while keeping remaining routes unsorted.
If enabled, these lists of routes are kept completely sorted (according
to preference or some protocol-dependent metric).
This is needed for some protocol features (e.g. <cf/secondary/ option of
BGP protocol, which allows to accept not just a selected route, but the
first route (in the sorted list) that is accepted by filters), but it is
incompatible with some other features (e.g. <cf/deterministic med/
option of BGP protocol, which activates a way of choosing selected route
that cannot be described using comparison and ordering). Minor advantage
is that routes are shown sorted in <cf/show route/, minor disadvantage
is that it is slightly more computationally expensive. Default: off.
<tag><label id="rtable-trie">trie <m/switch/</tag>
BIRD routing tables are implemented with hash tables, which is efficient
for exact-match lookups, but inconvenient for longest-match lookups or
interval lookups (finding superprefix or subprefixes). This option
activates additional trie structure that is used to accelerate these
lookups, while using the hash table for exact-match lookups.
This has advantage for <ref id="rpki" name="RPKI"> (on ROA tables),
for <ref id="bgp-gateway" name="recursive next-hops"> (on IGP tables),
and is required for <ref id="bgp-validate" name="flowspec validation">
(on base IP tables). Another advantage is that interval results (like
from <cf/show route in .../ command) are lexicographically sorted. The
disadvantage is that trie-enabled routing tables require more memory,
which may be an issue especially in multi-table setups. Default: off.
<tag><label id="rtable-min-settle-time">min settle time <m/time/</tag>
Specify a minimum value of the settle time. When a ROA table changes,
automatic <ref id="proto-rpki-reload" name="RPKI reload"> may be
triggered, after a short settle time. Minimum settle time is a delay
from the last ROA table change to wait for more updates. Default: 1 s.
<tag><label id="rtable-max-settle-time">max settle time <m/time/</tag>
Specify a maximum value of the settle time. When a ROA table changes,
automatic <ref id="proto-rpki-reload" name="RPKI reload"> may be
triggered, after a short settle time. Maximum settle time is an upper
limit to the settle time from the initial ROA table change even if
there are consecutive updates gradually renewing the settle time.
Default: 20 s.
</descrip>
<sect>Protocol options
<label id="protocol-opts">
@ -2290,6 +2338,7 @@ avoid routing loops.
<item> <rfc id="8092"> - BGP Large Communities Attribute
<item> <rfc id="8203"> - BGP Administrative Shutdown Communication
<item> <rfc id="8212"> - Default EBGP Route Propagation Behavior without Policies
<item> <rfc id="9117"> - Revised Validation Procedure for BGP Flow Specifications
</itemize>
<sect1>Route selection rules
@ -2674,7 +2723,7 @@ using the following configuration parameters:
<tag><label id="bgp-error-wait-time">error wait time <m/number/,<m/number/</tag>
Minimum and maximum delay in seconds between a protocol failure (either
local or reported by the peer) and automatic restart. Doesn't apply
local or reported by the peer) and automatic restart. Doesn not apply
when <cf/disable after error/ is configured. If consecutive errors
happen, the delay is increased exponentially until it reaches the
maximum. Default: 60, 300.
@ -2852,6 +2901,31 @@ be used in explicit configuration.
explicitly (to conserve memory). This option requires that the connected
routing table is <ref id="dsc-table-sorted" name="sorted">. Default: off.
<tag><label id="bgp-validate">validate <m/switch/</tag>
Apply flowspec validation procedure as described in <rfc id="8955">
section 6 and <rfc id="9117">. The Validation procedure enforces that
only routers in the forwarding path for a network can originate flowspec
rules for that network. The validation procedure should be used for EBGP
to prevent injection of malicious flowspec rules from outside, but it
should also be used for IBGP to ensure that selected flowspec rules are
consistent with selected IP routes. The validation procedure uses an IP
routing table (<ref id="bgp-base-table" name="base table">, see below)
against which flowspec rules are validated. This option is limited to
flowspec channels. Default: off (for compatibility reasons).
Note that currently the flowspec validation does not work reliably
together with <ref id="bgp-import-table" name="import table"> option
enabled on flowspec channels.
<tag><label id="bgp-base-table">base table <m/name/</tag>
Specifies an IP table used for the flowspec validation procedure. The
table must have enabled <cf/trie/ option, otherwise the validation
procedure would not work. The type of the table must be <cf/ipv4/ for
<cf/flow4/ channels and <cf/ipv6/ for <cf/flow6/ channels. This option
is limited to flowspec channels. Default: the main table of the
<cf/ipv4/ / <cf/ipv6/ channel of the same BGP instance, or the
<cf/master4/ / <cf/master6/ table if there is no such channel.
<tag><label id="bgp-extended-next-hop">extended next hop <m/switch/</tag>
BGP expects that announced next hops have the same address family as
associated network prefixes. This option provides an extension to use

View file

@ -140,18 +140,23 @@ struct f_tree {
void *data;
};
#define TRIE_STEP 4
#define TRIE_STACK_LENGTH 33
struct f_trie_node4
{
ip4_addr addr, mask, accept;
uint plen;
struct f_trie_node4 *c[2];
u16 plen;
u16 local;
struct f_trie_node4 *c[1 << TRIE_STEP];
};
struct f_trie_node6
{
ip6_addr addr, mask, accept;
uint plen;
struct f_trie_node6 *c[2];
u16 plen;
u16 local;
struct f_trie_node6 *c[1 << TRIE_STEP];
};
struct f_trie_node
@ -168,9 +173,20 @@ struct f_trie
u8 zero;
s8 ipv4; /* -1 for undefined / empty */
u16 data_size; /* Additional data for each trie node */
u32 prefix_count; /* Works only for restricted tries (pxlen == l == h) */
struct f_trie_node root; /* Root trie node */
};
struct f_trie_walk_state
{
u8 ipv4;
u8 accept_length; /* Current inter-node prefix position */
u8 start_pos; /* Initial prefix position in stack[0] */
u8 local_pos; /* Current intra-node prefix position */
u8 stack_pos; /* Current node in stack below */
const struct f_trie_node *stack[TRIE_STACK_LENGTH];
};
struct f_tree *f_new_tree(void);
struct f_tree *build_tree(struct f_tree *);
const struct f_tree *find_tree(const struct f_tree *t, const struct f_val *val);
@ -181,9 +197,70 @@ void tree_walk(const struct f_tree *t, void (*hook)(const struct f_tree *, void
struct f_trie *f_new_trie(linpool *lp, uint data_size);
void *trie_add_prefix(struct f_trie *t, const net_addr *n, uint l, uint h);
int trie_match_net(const struct f_trie *t, const net_addr *n);
int trie_match_longest_ip4(const struct f_trie *t, const net_addr_ip4 *net, net_addr_ip4 *dst, ip4_addr *found0);
int trie_match_longest_ip6(const struct f_trie *t, const net_addr_ip6 *net, net_addr_ip6 *dst, ip6_addr *found0);
void trie_walk_init(struct f_trie_walk_state *s, const struct f_trie *t, const net_addr *from);
int trie_walk_next(struct f_trie_walk_state *s, net_addr *net);
int trie_same(const struct f_trie *t1, const struct f_trie *t2);
void trie_format(const struct f_trie *t, buffer *buf);
static inline int
trie_match_next_longest_ip4(net_addr_ip4 *n, ip4_addr *found)
{
while (n->pxlen)
{
n->pxlen--;
ip4_clrbit(&n->prefix, n->pxlen);
if (ip4_getbit(*found, n->pxlen))
return 1;
}
return 0;
}
static inline int
trie_match_next_longest_ip6(net_addr_ip6 *n, ip6_addr *found)
{
while (n->pxlen)
{
n->pxlen--;
ip6_clrbit(&n->prefix, n->pxlen);
if (ip6_getbit(*found, n->pxlen))
return 1;
}
return 0;
}
#define TRIE_WALK_TO_ROOT_IP4(trie, net, dst) ({ \
net_addr_ip4 dst; \
ip4_addr _found; \
for (int _n = trie_match_longest_ip4(trie, net, &dst, &_found); \
_n; \
_n = trie_match_next_longest_ip4(&dst, &_found))
#define TRIE_WALK_TO_ROOT_IP6(trie, net, dst) ({ \
net_addr_ip6 dst; \
ip6_addr _found; \
for (int _n = trie_match_longest_ip6(trie, net, &dst, &_found); \
_n; \
_n = trie_match_next_longest_ip6(&dst, &_found))
#define TRIE_WALK_TO_ROOT_END })
#define TRIE_WALK(trie, net, from) ({ \
net_addr net; \
struct f_trie_walk_state tws_; \
trie_walk_init(&tws_, trie, from); \
while (trie_walk_next(&tws_, &net))
#define TRIE_WALK_END })
#define F_CMP_ERROR 999
const char *f_type_name(enum f_type t);

View file

@ -499,6 +499,33 @@ prefix set pxs;
bt_assert(1.2.0.0/16 ~ [ 1.0.0.0/8{ 15 , 17 } ]);
bt_assert([ 10.0.0.0/8{ 15 , 17 } ] != [ 11.0.0.0/8{ 15 , 17 } ]);
/* Formatting of prefix sets, some cases are a bit strange */
bt_assert(format([ 0.0.0.0/0 ]) = "[0.0.0.0/0]");
bt_assert(format([ 10.10.0.0/32 ]) = "[10.10.0.0/32{0.0.0.1}]");
bt_assert(format([ 10.10.0.0/17 ]) = "[10.10.0.0/17{0.0.128.0}]");
bt_assert(format([ 10.10.0.0/17{17,19} ]) = "[10.10.0.0/17{0.0.224.0}]"); # 224 = 128+64+32
bt_assert(format([ 10.10.128.0/17{18,19} ]) = "[10.10.128.0/18{0.0.96.0}, 10.10.192.0/18{0.0.96.0}]"); # 96 = 64+32
bt_assert(format([ 10.10.64.0/18- ]) = "[0.0.0.0/0, 0.0.0.0/1{128.0.0.0}, 0.0.0.0/2{64.0.0.0}, 0.0.0.0/3{32.0.0.0}, 10.10.0.0/16{255.255.0.0}, 10.10.0.0/17{0.0.128.0}, 10.10.64.0/18{0.0.64.0}]");
bt_assert(format([ 10.10.64.0/18+ ]) = "[10.10.64.0/18{0.0.96.0}, 10.10.64.0/20{0.0.31.255}, 10.10.80.0/20{0.0.31.255}, 10.10.96.0/20{0.0.31.255}, 10.10.112.0/20{0.0.31.255}]");
bt_assert(format([ 10.10.160.0/19 ]) = "[10.10.160.0/19{0.0.32.0}]");
bt_assert(format([ 10.10.160.0/19{19,22} ]) = "[10.10.160.0/19{0.0.32.0}, 10.10.160.0/20{0.0.28.0}, 10.10.176.0/20{0.0.28.0}]"); # 28 = 16+8+4
bt_assert(format([ 10.10.160.0/19+ ]) = "[10.10.160.0/19{0.0.32.0}, 10.10.160.0/20{0.0.31.255}, 10.10.176.0/20{0.0.31.255}]");
bt_assert(format([ ::/0 ]) = "[::/0]");
bt_assert(format([ 11:22:33:44:55:66:77:88/128 ]) = "[11:22:33:44:55:66:77:88/128{::1}]");
bt_assert(format([ 11:22:33:44::/64 ]) = "[11:22:33:44::/64{0:0:0:1::}]");
bt_assert(format([ 11:22:33:44::/64+ ]) = "[11:22:33:44::/64{::1:ffff:ffff:ffff:ffff}]");
bt_assert(format([ 11:22:33:44::/65 ]) = "[11:22:33:44::/65{::8000:0:0:0}]");
bt_assert(format([ 11:22:33:44::/65{65,67} ]) = "[11:22:33:44::/65{::e000:0:0:0}]"); # e = 8+4+2
bt_assert(format([ 11:22:33:44:8000::/65{66,67} ]) = "[11:22:33:44:8000::/66{::6000:0:0:0}, 11:22:33:44:c000::/66{::6000:0:0:0}]"); # 6 = 4+2
bt_assert(format([ 11:22:33:44:4000::/66- ]) = "[::/0, ::/1{8000::}, ::/2{4000::}, ::/3{2000::}, 11:22:33:44::/64{ffff:ffff:ffff:ffff::}, 11:22:33:44::/65{::8000:0:0:0}, 11:22:33:44:4000::/66{::4000:0:0:0}]");
bt_assert(format([ 11:22:33:44:4000::/66+ ]) = "[11:22:33:44:4000::/66{::6000:0:0:0}, 11:22:33:44:4000::/68{::1fff:ffff:ffff:ffff}, 11:22:33:44:5000::/68{::1fff:ffff:ffff:ffff}, 11:22:33:44:6000::/68{::1fff:ffff:ffff:ffff}, 11:22:33:44:7000::/68{::1fff:ffff:ffff:ffff}]");
bt_assert(format([ 11:22:33:44:c000::/67 ]) = "[11:22:33:44:c000::/67{::2000:0:0:0}]");
bt_assert(format([ 11:22:33:44:c000::/67{67,71} ]) = "[11:22:33:44:c000::/67{::2000:0:0:0}, 11:22:33:44:c000::/68{::1e00:0:0:0}, 11:22:33:44:d000::/68{::1e00:0:0:0}]");
bt_assert(format([ 11:22:33:44:c000::/67+ ]) = "[11:22:33:44:c000::/67{::2000:0:0:0}, 11:22:33:44:c000::/68{::1fff:ffff:ffff:ffff}, 11:22:33:44:d000::/68{::1fff:ffff:ffff:ffff}]");
}
bt_test_suite(t_prefix_set, "Testing prefix sets");
@ -578,6 +605,12 @@ prefix set pxs;
bt_assert(2000::/29 !~ pxs);
bt_assert(1100::/10 !~ pxs);
bt_assert(2010::/26 !~ pxs);
pxs = [ 52E0::/13{13,128} ];
bt_assert(52E7:BE81:379B:E6FD:541F:B0D0::/93 ~ pxs);
pxs = [ 41D8:8718::/30{0,30}, 413A:99A8:6C00::/38{38,128} ];
bt_assert(4180::/9 ~ pxs);
}
bt_test_suite(t_prefix6_set, "Testing prefix IPv6 sets");

File diff suppressed because it is too large Load diff

View file

@ -14,9 +14,12 @@
#include "conf/conf.h"
#define TESTS_NUM 10
#define PREFIXES_NUM 10
#define PREFIXES_NUM 32
#define PREFIX_TESTS_NUM 10000
#define PREFIX_BENCH_NUM 100000000
#define TRIE_BUFFER_SIZE 1024
#define TEST_BUFFER_SIZE (1024*1024)
#define BIG_BUFFER_SIZE 10000
/* Wrapping structure for storing f_prefixes structures in list */
@ -31,146 +34,860 @@ xrandom(u32 max)
return (bt_random() % max);
}
static inline uint
get_exp_random(void)
{
uint r, n = 0;
for (r = bt_random(); r & 1; r = r >> 1)
n++;
return n;
}
static int
is_prefix_included(list *prefixes, struct f_prefix *needle)
compare_prefixes(const void *a, const void *b)
{
return net_compare(&((const struct f_prefix *) a)->net,
&((const struct f_prefix *) b)->net);
}
static inline int
matching_ip4_nets(const net_addr_ip4 *a, const net_addr_ip4 *b)
{
ip4_addr cmask = ip4_mkmask(MIN(a->pxlen, b->pxlen));
return ip4_compare(ip4_and(a->prefix, cmask), ip4_and(b->prefix, cmask)) == 0;
}
static inline int
matching_ip6_nets(const net_addr_ip6 *a, const net_addr_ip6 *b)
{
ip6_addr cmask = ip6_mkmask(MIN(a->pxlen, b->pxlen));
return ip6_compare(ip6_and(a->prefix, cmask), ip6_and(b->prefix, cmask)) == 0;
}
static inline int
matching_nets(const net_addr *a, const net_addr *b)
{
if (a->type != b->type)
return 0;
return (a->type == NET_IP4) ?
matching_ip4_nets((const net_addr_ip4 *) a, (const net_addr_ip4 *) b) :
matching_ip6_nets((const net_addr_ip6 *) a, (const net_addr_ip6 *) b);
}
static int
is_prefix_included(list *prefixes, const net_addr *needle)
{
struct f_prefix_node *n;
WALK_LIST(n, *prefixes)
if (matching_nets(&n->prefix.net, needle) &&
(n->prefix.lo <= needle->pxlen) && (needle->pxlen <= n->prefix.hi))
{
ip6_addr cmask = ip6_mkmask(MIN(n->prefix.net.pxlen, needle->net.pxlen));
char buf[64];
bt_format_net(buf, 64, &n->prefix.net);
bt_debug("FOUND %s %d-%d\n", buf, n->prefix.lo, n->prefix.hi);
ip6_addr ip = net6_prefix(&n->prefix.net);
ip6_addr needle_ip = net6_prefix(&needle->net);
if ((ipa_compare(ipa_and(ip, cmask), ipa_and(needle_ip, cmask)) == 0) &&
(n->prefix.lo <= needle->net.pxlen) && (needle->net.pxlen <= n->prefix.hi))
{
bt_debug("FOUND\t" PRIip6 "/%d %d-%d\n", ARGip6(net6_prefix(&n->prefix.net)), n->prefix.net.pxlen, n->prefix.lo, n->prefix.hi);
return 1; /* OK */
}
}
return 0; /* FAIL */
}
static struct f_prefix
get_random_ip6_prefix(void)
static void
get_random_net(net_addr *net, int v6)
{
struct f_prefix p;
u8 pxlen = xrandom(120)+8;
ip6_addr ip6 = ip6_build(bt_random(),bt_random(),bt_random(),bt_random());
net_addr_ip6 net6 = NET_ADDR_IP6(ip6, pxlen);
p.net = *((net_addr*) &net6);
if (bt_random() % 2)
if (!v6)
{
p.lo = 0;
p.hi = p.net.pxlen;
uint pxlen = xrandom(24)+8;
ip4_addr ip4 = ip4_from_u32((u32) bt_random());
net_fill_ip4(net, ip4_and(ip4, ip4_mkmask(pxlen)), pxlen);
}
else
{
p.lo = p.net.pxlen;
p.hi = net_max_prefix_length[p.net.type];
uint pxlen = xrandom(120)+8;
ip6_addr ip6 = ip6_build(bt_random(), bt_random(), bt_random(), bt_random());
net_fill_ip6(net, ip6_and(ip6, ip6_mkmask(pxlen)), pxlen);
}
return p;
}
static void
generate_random_ipv6_prefixes(list *prefixes)
get_random_prefix(struct f_prefix *px, int v6, int tight)
{
int i;
for (i = 0; i < PREFIXES_NUM; i++)
get_random_net(&px->net, v6);
if (tight)
{
struct f_prefix f = get_random_ip6_prefix();
struct f_prefix_node *px = calloc(1, sizeof(struct f_prefix_node));
px->prefix = f;
bt_debug("ADD\t" PRIip6 "/%d %d-%d\n", ARGip6(net6_prefix(&px->prefix.net)), px->prefix.net.pxlen, px->prefix.lo, px->prefix.hi);
add_tail(prefixes, &px->n);
px->lo = px->hi = px->net.pxlen;
}
else if (bt_random() % 2)
{
px->lo = 0;
px->hi = px->net.pxlen;
}
else
{
px->lo = px->net.pxlen;
px->hi = net_max_prefix_length[px->net.type];
}
}
static void
get_random_ip4_subnet(net_addr_ip4 *net, const net_addr_ip4 *src, int pxlen)
{
*net = NET_ADDR_IP4(ip4_and(src->prefix, ip4_mkmask(pxlen)), pxlen);
if (pxlen > src->pxlen)
{
ip4_addr rnd = ip4_from_u32((u32) bt_random());
ip4_addr mask = ip4_xor(ip4_mkmask(src->pxlen), ip4_mkmask(pxlen));
net->prefix = ip4_or(net->prefix, ip4_and(rnd, mask));
}
}
static void
get_random_ip6_subnet(net_addr_ip6 *net, const net_addr_ip6 *src, int pxlen)
{
*net = NET_ADDR_IP6(ip6_and(src->prefix, ip6_mkmask(pxlen)), pxlen);
if (pxlen > src->pxlen)
{
ip6_addr rnd = ip6_build(bt_random(), bt_random(), bt_random(), bt_random());
ip6_addr mask = ip6_xor(ip6_mkmask(src->pxlen), ip6_mkmask(pxlen));
net->prefix = ip6_or(net->prefix, ip6_and(rnd, mask));
}
}
static void
get_random_subnet(net_addr *net, const net_addr *src, int pxlen)
{
if (src->type == NET_IP4)
get_random_ip4_subnet((net_addr_ip4 *) net, (const net_addr_ip4 *) src, pxlen);
else
get_random_ip6_subnet((net_addr_ip6 *) net, (const net_addr_ip6 *) src, pxlen);
}
static void
get_inner_net(net_addr *net, const struct f_prefix *src)
{
int pxlen, step;
if (bt_random() % 2)
{
step = get_exp_random();
step = MIN(step, src->hi - src->lo);
pxlen = (bt_random() % 2) ? (src->lo + step) : (src->hi - step);
}
else
pxlen = src->lo + bt_random() % (src->hi - src->lo + 1);
get_random_subnet(net, &src->net, pxlen);
}
static void
swap_random_bits_ip4(net_addr_ip4 *net, int num)
{
for (int i = 0; i < num; i++)
{
ip4_addr swap = IP4_NONE;
ip4_setbit(&swap, bt_random() % net->pxlen);
net->prefix = ip4_xor(net->prefix, swap);
}
}
static void
swap_random_bits_ip6(net_addr_ip6 *net, int num)
{
for (int i = 0; i < num; i++)
{
ip6_addr swap = IP6_NONE;
ip6_setbit(&swap, bt_random() % net->pxlen);
net->prefix = ip6_xor(net->prefix, swap);
}
}
static void
swap_random_bits(net_addr *net, int num)
{
if (net->type == NET_IP4)
swap_random_bits_ip4((net_addr_ip4 *) net, num);
else
swap_random_bits_ip6((net_addr_ip6 *) net, num);
}
static void
get_outer_net(net_addr *net, const struct f_prefix *src)
{
int pxlen, step;
int inside = 0;
int max = net_max_prefix_length[src->net.type];
if ((src->lo > 0) && (bt_random() % 3))
{
step = 1 + get_exp_random();
step = MIN(step, src->lo);
pxlen = src->lo - step;
}
else if ((src->hi < max) && (bt_random() % 2))
{
step = 1 + get_exp_random();
step = MIN(step, max - src->hi);
pxlen = src->hi + step;
}
else
{
pxlen = src->lo + bt_random() % (src->hi - src->lo + 1);
inside = 1;
}
get_random_subnet(net, &src->net, pxlen);
/* Perhaps swap some bits in prefix */
if ((net->pxlen > 0) && (inside || (bt_random() % 4)))
swap_random_bits(net, 1 + get_exp_random());
}
static list *
make_random_prefix_list(linpool *lp, int num, int v6, int tight)
{
list *prefixes = lp_allocz(lp, sizeof(struct f_prefix_node));
init_list(prefixes);
for (int i = 0; i < num; i++)
{
struct f_prefix_node *px = lp_allocz(lp, sizeof(struct f_prefix_node));
get_random_prefix(&px->prefix, v6, tight);
add_tail(prefixes, &px->n);
char buf[64];
bt_format_net(buf, 64, &px->prefix.net);
bt_debug("ADD %s{%d,%d}\n", buf, px->prefix.lo, px->prefix.hi);
}
return prefixes;
}
static struct f_trie *
make_trie_from_prefix_list(linpool *lp, list *prefixes)
{
struct f_trie *trie = f_new_trie(lp, 0);
struct f_prefix_node *n;
WALK_LIST(n, *prefixes)
trie_add_prefix(trie, &n->prefix.net, n->prefix.lo, n->prefix.hi);
return trie;
}
/*
* Read sequence of prefixes from file handle and return prefix list.
* Each prefix is on one line, sequence terminated by empty line or eof.
* Arg @plus means prefix should include all longer ones.
*/
static list *
read_prefix_list(linpool *lp, FILE *f, int v6, int plus)
{
ASSERT(!v6);
uint a0, a1, a2, a3, pl;
char s[32];
int n;
list *pxlist = lp_allocz(lp, sizeof(struct f_prefix_node));
init_list(pxlist);
errno = 0;
while (fgets(s, 32, f))
{
if (s[0] == '\n')
return pxlist;
n = sscanf(s, "%u.%u.%u.%u/%u", &a0, &a1, &a2, &a3, &pl);
if (n != 5)
bt_abort_msg("Invalid content of trie_data");
struct f_prefix_node *px = lp_allocz(lp, sizeof(struct f_prefix_node));
net_fill_ip4(&px->prefix.net, ip4_build(a0, a1, a2, a3), pl);
px->prefix.lo = pl;
px->prefix.hi = plus ? IP4_MAX_PREFIX_LENGTH : pl;
add_tail(pxlist, &px->n);
char buf[64];
bt_format_net(buf, 64, &px->prefix.net);
bt_debug("ADD %s{%d,%d}\n", buf, px->prefix.lo, px->prefix.hi);
}
bt_syscall(errno, "fgets()");
return EMPTY_LIST(*pxlist) ? NULL : pxlist;
}
/*
* Open file, read multiple sequences of prefixes from it. Fill @data with
* prefix lists and @trie with generated tries. Return number of sequences /
* tries. Use separate linpool @lp0 for prefix lists and @lp1 for tries.
* Arg @plus means prefix should include all longer ones.
*/
static int
read_prefix_file(const char *filename, int plus,
linpool *lp0, linpool *lp1,
list *data[], struct f_trie *trie[])
{
FILE *f = fopen(filename, "r");
bt_syscall(!f, "fopen(%s)", filename);
int n = 0;
list *pxlist;
while (pxlist = read_prefix_list(lp0, f, 0, plus))
{
data[n] = pxlist;
trie[n] = make_trie_from_prefix_list(lp1, pxlist);
bt_debug("NEXT\n");
n++;
}
fclose(f);
bt_debug("DONE reading %d tries\n", n);
return n;
}
/*
* Select random subset of @dn prefixes from prefix list @src of length @sn,
* and store them to buffer @dst (of size @dn). Prefixes may be chosen multiple
* times. Randomize order of prefixes in @dst buffer.
*/
static void
select_random_prefix_subset(list *src[], net_addr dst[], int sn, int dn)
{
int pn = 0;
if (!dn)
return;
/* Compute total prefix number */
for (int i = 0; i < sn; i++)
pn += list_length(src[i]);
/* Change of selecting a prefix */
int rnd = (pn / dn) + 10;
int n = 0;
/* Iterate indefinitely over src array */
for (int i = 0; 1; i++, i = (i < sn) ? i : 0)
{
struct f_prefix_node *px;
WALK_LIST(px, *src[i])
{
if (xrandom(rnd) != 0)
continue;
net_copy(&dst[n], &px->prefix.net);
n++;
/* We have enough */
if (n == dn)
goto done;
}
}
done:
/* Shuffle networks */
for (int i = 0; i < dn; i++)
{
int j = xrandom(dn);
if (i == j)
continue;
net_addr tmp;
net_copy(&tmp, &dst[i]);
net_copy(&dst[i], &dst[j]);
net_copy(&dst[j], &tmp);
}
}
/* Fill @dst buffer with @dn randomly generated /32 prefixes */
static void
make_random_addresses(net_addr dst[], int dn)
{
for (int i = 0; i < dn; i++)
net_fill_ip4(&dst[i], ip4_from_u32((u32) bt_random()), IP4_MAX_PREFIX_LENGTH);
}
static void
test_match_net(list *prefixes, struct f_trie *trie, const net_addr *net)
{
char buf[64];
bt_format_net(buf, 64, net);
bt_debug("TEST %s\n", buf);
int should_be = is_prefix_included(prefixes, net);
int is_there = trie_match_net(trie, net);
bt_assert_msg(should_be == is_there, "Prefix %s %s match", buf,
(should_be ? "should" : "should not"));
}
static int
t_match_net(void)
t_match_random_net(void)
{
bt_bird_init();
bt_config_parse(BT_CONFIG_SIMPLE);
uint round;
for (round = 0; round < TESTS_NUM; round++)
int v6 = 0;
linpool *lp = lp_new_default(&root_pool);
for (int round = 0; round < TESTS_NUM; round++)
{
list prefixes; /* of structs f_extended_prefix */
init_list(&prefixes);
struct f_trie *trie = f_new_trie(config->mem, 0);
list *prefixes = make_random_prefix_list(lp, PREFIXES_NUM, v6, 0);
struct f_trie *trie = make_trie_from_prefix_list(lp, prefixes);
generate_random_ipv6_prefixes(&prefixes);
struct f_prefix_node *n;
WALK_LIST(n, prefixes)
for (int i = 0; i < PREFIX_TESTS_NUM; i++)
{
trie_add_prefix(trie, &n->prefix.net, n->prefix.lo, n->prefix.hi);
net_addr net;
get_random_net(&net, v6);
test_match_net(prefixes, trie, &net);
}
int i;
for (i = 0; i < PREFIX_TESTS_NUM; i++)
{
struct f_prefix f = get_random_ip6_prefix();
bt_debug("TEST\t" PRIip6 "/%d\n", ARGip6(net6_prefix(&f.net)), f.net.pxlen);
int should_be = is_prefix_included(&prefixes, &f);
int is_there = trie_match_net(trie, &f.net);
bt_assert_msg(should_be == is_there, "Prefix " PRIip6 "/%d %s", ARGip6(net6_prefix(&f.net)), f.net.pxlen, (should_be ? "should be found in trie" : "should not be found in trie"));
}
struct f_prefix_node *nxt;
WALK_LIST_DELSAFE(n, nxt, prefixes)
{
free(n);
}
v6 = !v6;
lp_flush(lp);
}
bt_bird_cleanup();
return 1;
}
static int
t_match_inner_net(void)
{
bt_bird_init();
bt_config_parse(BT_CONFIG_SIMPLE);
int v6 = 0;
linpool *lp = lp_new_default(&root_pool);
for (int round = 0; round < TESTS_NUM; round++)
{
list *prefixes = make_random_prefix_list(lp, PREFIXES_NUM, v6, 0);
struct f_trie *trie = make_trie_from_prefix_list(lp, prefixes);
struct f_prefix_node *n = HEAD(*prefixes);
for (int i = 0; i < PREFIX_TESTS_NUM; i++)
{
net_addr net;
get_inner_net(&net, &n->prefix);
test_match_net(prefixes, trie, &net);
n = NODE_VALID(NODE_NEXT(n)) ? NODE_NEXT(n) : HEAD(*prefixes);
}
v6 = !v6;
lp_flush(lp);
}
bt_bird_cleanup();
return 1;
}
static int
t_match_outer_net(void)
{
bt_bird_init();
bt_config_parse(BT_CONFIG_SIMPLE);
int v6 = 0;
linpool *lp = lp_new_default(&root_pool);
for (int round = 0; round < TESTS_NUM; round++)
{
list *prefixes = make_random_prefix_list(lp, PREFIXES_NUM, v6, 0);
struct f_trie *trie = make_trie_from_prefix_list(lp, prefixes);
struct f_prefix_node *n = HEAD(*prefixes);
for (int i = 0; i < PREFIX_TESTS_NUM; i++)
{
net_addr net;
get_outer_net(&net, &n->prefix);
test_match_net(prefixes, trie, &net);
n = NODE_VALID(NODE_NEXT(n)) ? NODE_NEXT(n) : HEAD(*prefixes);
}
v6 = !v6;
lp_flush(lp);
}
v6 = !v6;
bt_bird_cleanup();
return 1;
}
/*
* Read prefixes from @filename, build set of tries, prepare test data and do
* PREFIX_BENCH_NUM trie lookups. With @plus = 0, use random subset of known
* prefixes as test data, with @plus = 1, use randomly generated /32 prefixes
* as test data.
*/
static int
benchmark_trie_dataset(const char *filename, int plus)
{
int n = 0;
linpool *lp0 = lp_new_default(&root_pool);
linpool *lp1 = lp_new_default(&root_pool);
list *data[TRIE_BUFFER_SIZE];
struct f_trie *trie[TRIE_BUFFER_SIZE];
net_addr *nets;
bt_reset_suite_case_timer();
bt_log_suite_case_result(1, "Reading %s", filename, n);
n = read_prefix_file(filename, plus, lp0, lp1, data, trie);
bt_log_suite_case_result(1, "Read prefix data, %d lists, ", n);
size_t trie_size = rmemsize(lp1).effective * 1000 / (1024*1024);
bt_log_suite_case_result(1, "Trie size %u.%03u MB",
(uint) (trie_size / 1000), (uint) (trie_size % 1000));
int t = PREFIX_BENCH_NUM / n;
int tb = MIN(t, TEST_BUFFER_SIZE);
nets = lp_alloc(lp0, tb * sizeof(net_addr));
if (!plus)
select_random_prefix_subset(data, nets, n, tb);
else
make_random_addresses(nets, tb);
bt_log_suite_case_result(1, "Make test data, %d (%d) tests", t, tb);
bt_reset_suite_case_timer();
/*
int match = 0;
for (int i = 0; i < t; i++)
for (int j = 0; j < n; j++)
test_match_net(data[j], trie[j], &nets[i]);
*/
int match = 0;
for (int i = 0; i < t; i++)
for (int j = 0; j < n; j++)
if (trie_match_net(trie[j], &nets[i % TEST_BUFFER_SIZE]))
match++;
bt_log_suite_case_result(1, "Matching done, %d / %d matches", match, t * n);
rfree(lp0);
rfree(lp1);
return 1;
}
static int UNUSED
t_bench_trie_datasets_subset(void)
{
bt_bird_init();
bt_config_parse(BT_CONFIG_SIMPLE);
/* Specific datasets, not included */
benchmark_trie_dataset("trie-data-bgp-1", 0);
benchmark_trie_dataset("trie-data-bgp-10", 0);
benchmark_trie_dataset("trie-data-bgp-100", 0);
benchmark_trie_dataset("trie-data-bgp-1000", 0);
bt_bird_cleanup();
return 1;
}
static int UNUSED
t_bench_trie_datasets_random(void)
{
bt_bird_init();
bt_config_parse(BT_CONFIG_SIMPLE);
/* Specific datasets, not included */
benchmark_trie_dataset("trie-data-bgp-1", 1);
benchmark_trie_dataset("trie-data-bgp-10", 1);
benchmark_trie_dataset("trie-data-bgp-100", 1);
benchmark_trie_dataset("trie-data-bgp-1000", 1);
bt_bird_cleanup();
return 1;
}
static int
t_trie_same(void)
{
bt_bird_init();
bt_config_parse(BT_CONFIG_SIMPLE);
int round;
for (round = 0; round < TESTS_NUM*4; round++)
int v6 = 0;
linpool *lp = lp_new_default(&root_pool);
for (int round = 0; round < TESTS_NUM*4; round++)
{
struct f_trie * trie1 = f_new_trie(config->mem, 0);
struct f_trie * trie2 = f_new_trie(config->mem, 0);
list prefixes; /* a list of f_extended_prefix structures */
init_list(&prefixes);
int i;
for (i = 0; i < 100; i++)
generate_random_ipv6_prefixes(&prefixes);
list *prefixes = make_random_prefix_list(lp, 100 * PREFIXES_NUM, v6, 0);
struct f_trie *trie1 = f_new_trie(lp, 0);
struct f_trie *trie2 = f_new_trie(lp, 0);
struct f_prefix_node *n;
WALK_LIST(n, prefixes)
{
WALK_LIST(n, *prefixes)
trie_add_prefix(trie1, &n->prefix.net, n->prefix.lo, n->prefix.hi);
}
WALK_LIST_BACKWARDS(n, prefixes)
{
WALK_LIST_BACKWARDS(n, *prefixes)
trie_add_prefix(trie2, &n->prefix.net, n->prefix.lo, n->prefix.hi);
}
bt_assert(trie_same(trie1, trie2));
struct f_prefix_node *nxt;
WALK_LIST_DELSAFE(n, nxt, prefixes)
v6 = !v6;
lp_flush(lp);
}
bt_bird_cleanup();
return 1;
}
static inline void
log_networks(const net_addr *a, const net_addr *b)
{
free(n);
if (bt_verbose >= BT_VERBOSE_ABSOLUTELY_ALL)
{
char buf0[64];
char buf1[64];
bt_format_net(buf0, 64, a);
bt_format_net(buf1, 64, b);
bt_debug("Found %s expected %s\n", buf0, buf1);
}
}
static int
t_trie_walk(void)
{
bt_bird_init();
bt_config_parse(BT_CONFIG_SIMPLE);
linpool *lp = lp_new_default(&root_pool);
for (int round = 0; round < TESTS_NUM*8; round++)
{
int level = round / TESTS_NUM;
int v6 = level % 2;
int num = PREFIXES_NUM * (int[]){1, 10, 100, 1000}[level / 2];
int pos = 0, end = 0;
list *prefixes = make_random_prefix_list(lp, num, v6, 1);
struct f_trie *trie = make_trie_from_prefix_list(lp, prefixes);
struct f_prefix *pxset = malloc((num + 1) * sizeof(struct f_prefix));
struct f_prefix_node *n;
WALK_LIST(n, *prefixes)
pxset[pos++] = n->prefix;
memset(&pxset[pos], 0, sizeof (struct f_prefix));
qsort(pxset, num, sizeof(struct f_prefix), compare_prefixes);
/* Full walk */
bt_debug("Full walk (round %d, %d nets)\n", round, num);
pos = 0;
uint pxc = 0;
TRIE_WALK(trie, net, NULL)
{
log_networks(&net, &pxset[pos].net);
bt_assert(net_equal(&net, &pxset[pos].net));
/* Skip possible duplicates */
while (net_equal(&pxset[pos].net, &pxset[pos + 1].net))
pos++;
pos++;
pxc++;
}
TRIE_WALK_END;
bt_assert(pos == num);
bt_assert(pxc == trie->prefix_count);
bt_debug("Full walk done\n");
/* Prepare net for subnet walk - start with random prefix */
pos = bt_random() % num;
end = pos + (int[]){2, 2, 3, 4}[level / 2];
end = MIN(end, num);
struct f_prefix from = pxset[pos];
/* Find a common superprefix to several subsequent prefixes */
for (; pos < end; pos++)
{
if (net_equal(&from.net, &pxset[pos].net))
continue;
int common = !v6 ?
ip4_pxlen(net4_prefix(&from.net), net4_prefix(&pxset[pos].net)) :
ip6_pxlen(net6_prefix(&from.net), net6_prefix(&pxset[pos].net));
from.net.pxlen = MIN(from.net.pxlen, common);
if (!v6)
((net_addr_ip4 *) &from.net)->prefix =
ip4_and(net4_prefix(&from.net), net4_prefix(&pxset[pos].net));
else
((net_addr_ip6 *) &from.net)->prefix =
ip6_and(net6_prefix(&from.net), net6_prefix(&pxset[pos].net));
}
/* Fix irrelevant bits */
if (!v6)
((net_addr_ip4 *) &from.net)->prefix =
ip4_and(net4_prefix(&from.net), ip4_mkmask(net4_pxlen(&from.net)));
else
((net_addr_ip6 *) &from.net)->prefix =
ip6_and(net6_prefix(&from.net), ip6_mkmask(net6_pxlen(&from.net)));
/* Find initial position for final prefix */
for (pos = 0; pos < num; pos++)
if (compare_prefixes(&pxset[pos], &from) >= 0)
break;
int p0 = pos;
char buf0[64];
bt_format_net(buf0, 64, &from.net);
bt_debug("Subnet walk for %s (round %d, %d nets)\n", buf0, round, num);
/* Subnet walk */
TRIE_WALK(trie, net, &from.net)
{
log_networks(&net, &pxset[pos].net);
bt_assert(net_equal(&net, &pxset[pos].net));
bt_assert(net_in_netX(&net, &from.net));
/* Skip possible duplicates */
while (net_equal(&pxset[pos].net, &pxset[pos + 1].net))
pos++;
pos++;
}
TRIE_WALK_END;
bt_assert((pos == num) || !net_in_netX(&pxset[pos].net, &from.net));
bt_debug("Subnet walk done for %s (found %d nets)\n", buf0, pos - p0);
lp_flush(lp);
}
bt_bird_cleanup();
return 1;
}
static int
find_covering_nets(struct f_prefix *prefixes, int num, const net_addr *net, net_addr *found)
{
struct f_prefix key;
net_addr *n = &key.net;
int found_num = 0;
net_copy(n, net);
while (1)
{
struct f_prefix *px =
bsearch(&key, prefixes, num, sizeof(struct f_prefix), compare_prefixes);
if (px)
{
net_copy(&found[found_num], n);
found_num++;
}
if (n->pxlen == 0)
return found_num;
n->pxlen--;
if (n->type == NET_IP4)
ip4_clrbit(&((net_addr_ip4 *) n)->prefix, n->pxlen);
else
ip6_clrbit(&((net_addr_ip6 *) n)->prefix, n->pxlen);
}
}
static int
t_trie_walk_to_root(void)
{
bt_bird_init();
bt_config_parse(BT_CONFIG_SIMPLE);
linpool *lp = lp_new_default(&root_pool);
for (int round = 0; round < TESTS_NUM * 4; round++)
{
int level = round / TESTS_NUM;
int v6 = level % 2;
int num = PREFIXES_NUM * (int[]){32, 512}[level / 2];
int pos = 0;
int st = 0, sn = 0, sm = 0;
list *prefixes = make_random_prefix_list(lp, num, v6, 1);
struct f_trie *trie = make_trie_from_prefix_list(lp, prefixes);
struct f_prefix *pxset = malloc((num + 1) * sizeof(struct f_prefix));
struct f_prefix_node *pxn;
WALK_LIST(pxn, *prefixes)
pxset[pos++] = pxn->prefix;
memset(&pxset[pos], 0, sizeof (struct f_prefix));
qsort(pxset, num, sizeof(struct f_prefix), compare_prefixes);
int i;
for (i = 0; i < (PREFIX_TESTS_NUM / 10); i++)
{
net_addr from;
get_random_net(&from, v6);
net_addr found[129];
int found_num = find_covering_nets(pxset, num, &from, found);
int n = 0;
if (bt_verbose >= BT_VERBOSE_ABSOLUTELY_ALL)
{
char buf[64];
bt_format_net(buf, 64, &from);
bt_debug("Lookup for %s (expect %d)\n", buf, found_num);
}
/* Walk to root, separate for IPv4 and IPv6 */
if (!v6)
{
TRIE_WALK_TO_ROOT_IP4(trie, (net_addr_ip4 *) &from, net)
{
log_networks((net_addr *) &net, &found[n]);
bt_assert((n < found_num) && net_equal((net_addr *) &net, &found[n]));
n++;
}
TRIE_WALK_TO_ROOT_END;
}
else
{
TRIE_WALK_TO_ROOT_IP6(trie, (net_addr_ip6 *) &from, net)
{
log_networks((net_addr *) &net, &found[n]);
bt_assert((n < found_num) && net_equal((net_addr *) &net, &found[n]));
n++;
}
TRIE_WALK_TO_ROOT_END;
}
bt_assert(n == found_num);
/* Stats */
st += n;
sn += !!n;
sm = MAX(sm, n);
}
bt_debug("Success in %d / %d, sum %d, max %d\n", sn, i, st, sm);
lp_flush(lp);
}
bt_bird_cleanup();
return 1;
}
@ -179,8 +896,15 @@ main(int argc, char *argv[])
{
bt_init(argc, argv);
bt_test_suite(t_match_net, "Testing random prefix matching");
bt_test_suite(t_match_random_net, "Testing random prefix matching");
bt_test_suite(t_match_inner_net, "Testing random inner prefix matching");
bt_test_suite(t_match_outer_net, "Testing random outer prefix matching");
bt_test_suite(t_trie_same, "A trie filled forward should be same with a trie filled backward.");
bt_test_suite(t_trie_walk, "Testing TRIE_WALK() on random tries");
bt_test_suite(t_trie_walk_to_root, "Testing TRIE_WALK_TO_ROOT() on random tries");
// bt_test_suite(t_bench_trie_datasets_subset, "Benchmark tries from datasets by random subset of nets");
// bt_test_suite(t_bench_trie_datasets_random, "Benchmark tries from datasets by generated addresses");
return bt_exit_value();
}

View file

@ -32,6 +32,9 @@ struct align_probe { char x; long int y; };
#define MAX(a,b) MAX_(a,b)
#endif
#define ROUND_DOWN_POW2(a,b) ((a) & ~((b)-1))
#define ROUND_UP_POW2(a,b) (((a)+((b)-1)) & ~((b)-1))
#define U64(c) UINT64_C(c)
#define ABS(a) ((a)>=0 ? (a) : -(a))
#define DELTA(a,b) (((a)>=(b))?(a)-(b):(b)-(a))

View file

@ -279,11 +279,35 @@ static inline uint ip6_pxlen(ip6_addr a, ip6_addr b)
return 32 * i + 31 - u32_log2(a.addr[i] ^ b.addr[i]);
}
static inline int ip4_prefix_equal(ip4_addr a, ip4_addr b, uint n)
{
return (_I(a) ^ _I(b)) < ((u64) 1 << (32 - n));
}
static inline int ip6_prefix_equal(ip6_addr a, ip6_addr b, uint n)
{
uint n0 = n / 32;
uint n1 = n % 32;
return
((n0 <= 0) || (_I0(a) == _I0(b))) &&
((n0 <= 1) || (_I1(a) == _I1(b))) &&
((n0 <= 2) || (_I2(a) == _I2(b))) &&
((n0 <= 3) || (_I3(a) == _I3(b))) &&
(!n1 || ((a.addr[n0] ^ b.addr[n0]) < (1u << (32 - n1))));
}
static inline u32 ip4_getbit(ip4_addr a, uint pos)
{ return _I(a) & (0x80000000 >> pos); }
{ return (_I(a) >> (31 - pos)) & 1; }
static inline u32 ip4_getbits(ip4_addr a, uint pos, uint n)
{ return (_I(a) >> ((32 - n) - pos)) & ((1u << n) - 1); }
static inline u32 ip6_getbit(ip6_addr a, uint pos)
{ return a.addr[pos / 32] & (0x80000000 >> (pos % 32)); }
{ return (a.addr[pos / 32] >> (31 - (pos % 32))) & 0x1; }
static inline u32 ip6_getbits(ip6_addr a, uint pos, uint n)
{ return (a.addr[pos / 32] >> ((32 - n) - (pos % 32))) & ((1u << n) - 1); }
static inline u32 ip4_setbit(ip4_addr *a, uint pos)
{ return _I(*a) |= (0x80000000 >> pos); }
@ -297,6 +321,13 @@ static inline u32 ip4_clrbit(ip4_addr *a, uint pos)
static inline u32 ip6_clrbit(ip6_addr *a, uint pos)
{ return a->addr[pos / 32] &= ~(0x80000000 >> (pos % 32)); }
static inline ip4_addr ip4_setbits(ip4_addr a, uint pos, uint val)
{ _I(a) |= val << (31 - pos); return a; }
static inline ip6_addr ip6_setbits(ip6_addr a, uint pos, uint val)
{ a.addr[pos / 32] |= val << (31 - pos % 32); return a; }
static inline ip4_addr ip4_opposite_m1(ip4_addr a)
{ return _MI4(_I(a) ^ 1); }

View file

@ -167,6 +167,70 @@ t_ip6_ntop(void)
return bt_assert_batch(test_vectors, test_ipa_ntop, bt_fmt_ipa, bt_fmt_str);
}
static int
t_ip4_prefix_equal(void)
{
bt_assert( ip4_prefix_equal(ip4_from_u32(0x12345678), ip4_from_u32(0x1234ffff), 16));
bt_assert(!ip4_prefix_equal(ip4_from_u32(0x12345678), ip4_from_u32(0x1234ffff), 17));
bt_assert( ip4_prefix_equal(ip4_from_u32(0x12345678), ip4_from_u32(0x12345000), 21));
bt_assert(!ip4_prefix_equal(ip4_from_u32(0x12345678), ip4_from_u32(0x12345000), 22));
bt_assert( ip4_prefix_equal(ip4_from_u32(0x00000000), ip4_from_u32(0xffffffff), 0));
bt_assert( ip4_prefix_equal(ip4_from_u32(0x12345678), ip4_from_u32(0x12345678), 0));
bt_assert( ip4_prefix_equal(ip4_from_u32(0x12345678), ip4_from_u32(0x12345678), 32));
bt_assert(!ip4_prefix_equal(ip4_from_u32(0x12345678), ip4_from_u32(0x12345679), 32));
bt_assert(!ip4_prefix_equal(ip4_from_u32(0x12345678), ip4_from_u32(0x92345678), 32));
return 1;
}
static int
t_ip6_prefix_equal(void)
{
bt_assert( ip6_prefix_equal(ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
ip6_build(0x20010db8, 0x1234ffff, 0xfefefefe, 0xdcdcdcdc),
48));
bt_assert(!ip6_prefix_equal(ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
ip6_build(0x20010db8, 0x1234ffff, 0xfefefefe, 0xdcdcdcdc),
49));
bt_assert(!ip6_prefix_equal(ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
ip6_build(0x20020db8, 0x12345678, 0xfefefefe, 0xdcdcdcdc),
48));
bt_assert( ip6_prefix_equal(ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
ip6_build(0x20010db8, 0x12345678, 0xfefefefe, 0xdcdcdcdc),
64));
bt_assert(!ip6_prefix_equal(ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
ip6_build(0x20010db8, 0x1234567e, 0xfefefefe, 0xdcdcdcdc),
64));
bt_assert( ip6_prefix_equal(ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20002020),
ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
106));
bt_assert(!ip6_prefix_equal(ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20002020),
ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
107));
bt_assert( ip6_prefix_equal(ip6_build(0xfeef0db8, 0x87654321, 0x10101010, 0x20202020),
ip6_build(0x20010db8, 0x12345678, 0xfefefefe, 0xdcdcdcdc),
0));
bt_assert( ip6_prefix_equal(ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
128));
bt_assert(!ip6_prefix_equal(ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202020),
ip6_build(0x20010db8, 0x12345678, 0x10101010, 0x20202021),
128));
return 1;
}
int
main(int argc, char *argv[])
{
@ -176,6 +240,8 @@ main(int argc, char *argv[])
bt_test_suite(t_ip6_pton, "Converting IPv6 string to ip6_addr struct");
bt_test_suite(t_ip4_ntop, "Converting ip4_addr struct to IPv4 string");
bt_test_suite(t_ip6_ntop, "Converting ip6_addr struct to IPv6 string");
bt_test_suite(t_ip4_prefix_equal, "Testing ip4_prefix_equal()");
bt_test_suite(t_ip6_prefix_equal, "Testing ip6_prefix_equal()");
return bt_exit_value();
}

View file

@ -38,6 +38,7 @@
#define NB_IP (NB_IP4 | NB_IP6)
#define NB_VPN (NB_VPN4 | NB_VPN6)
#define NB_ROA (NB_ROA4 | NB_ROA6)
#define NB_FLOW (NB_FLOW4 | NB_FLOW6)
#define NB_DEST (NB_IP | NB_IP6_SADR | NB_VPN | NB_MPLS)
#define NB_ANY 0xffffffff

View file

@ -17,6 +17,7 @@ CF_HDR
CF_DEFINES
static struct rtable_config *this_table;
static struct proto_config *this_proto;
static struct channel_config *this_channel;
static struct iface_patt *this_ipatt;
@ -117,13 +118,14 @@ CF_KEYWORDS(IPV4, IPV6, VPN4, VPN6, ROA4, ROA6, FLOW4, FLOW6, SADR, MPLS)
CF_KEYWORDS(RECEIVE, LIMIT, ACTION, WARN, BLOCK, RESTART, DISABLE, KEEP, FILTERED, RPKI)
CF_KEYWORDS(PASSWORD, KEY, FROM, PASSIVE, TO, ID, EVENTS, PACKETS, PROTOCOLS, CHANNELS, INTERFACES)
CF_KEYWORDS(ALGORITHM, KEYED, HMAC, MD5, SHA1, SHA256, SHA384, SHA512, BLAKE2S128, BLAKE2S256, BLAKE2B256, BLAKE2B512)
CF_KEYWORDS(PRIMARY, STATS, COUNT, BY, FOR, COMMANDS, PREEXPORT, NOEXPORT, EXPORTED, GENERATE)
CF_KEYWORDS(BGP, PASSWORDS, DESCRIPTION, SORTED)
CF_KEYWORDS(PRIMARY, STATS, COUNT, BY, FOR, IN, COMMANDS, PREEXPORT, NOEXPORT, EXPORTED, GENERATE)
CF_KEYWORDS(BGP, PASSWORDS, DESCRIPTION)
CF_KEYWORDS(RELOAD, IN, OUT, MRTDUMP, MESSAGES, RESTRICT, MEMORY, IGP_METRIC, CLASS, DSCP)
CF_KEYWORDS(TIMEFORMAT, ISO, SHORT, LONG, ROUTE, PROTOCOL, BASE, LOG, S, MS, US)
CF_KEYWORDS(GRACEFUL, RESTART, WAIT, MAX, FLUSH, AS)
CF_KEYWORDS(MIN, IDLE, RX, TX, INTERVAL, MULTIPLIER, PASSIVE)
CF_KEYWORDS(CHECK, LINK)
CF_KEYWORDS(SORTED, TRIE, MIN, MAX, SETTLE, TIME)
/* For r_args_channel */
CF_KEYWORDS(IPV4, IPV4_MC, IPV4_MPLS, IPV6, IPV6_MC, IPV6_MPLS, IPV6_SADR, VPN4, VPN4_MC, VPN4_MPLS, VPN6, VPN6_MC, VPN6_MPLS, ROA4, ROA6, FLOW4, FLOW6, MPLS, PRI, SEC)
@ -141,7 +143,7 @@ CF_ENUM_PX(T_ENUM_AF, AF_, AFI_, IPV4, IPV6)
%type <s> optproto
%type <ra> r_args
%type <sd> sym_args
%type <i> proto_start echo_mask echo_size debug_mask debug_list debug_flag mrtdump_mask mrtdump_list mrtdump_flag export_mode limit_action net_type table_sorted tos password_algorithm
%type <i> proto_start echo_mask echo_size debug_mask debug_list debug_flag mrtdump_mask mrtdump_list mrtdump_flag export_mode limit_action net_type tos password_algorithm
%type <ps> proto_patt proto_patt2
%type <cc> channel_start proto_channel
%type <cl> limit_spec
@ -206,16 +208,37 @@ CF_ENUM(T_ENUM_NETTYPE, NET_, IP4, IP6, VPN4, VPN6, ROA4, ROA6, FLOW4, FLOW6, IP
conf: table ;
table_sorted:
{ $$ = 0; }
| SORTED { $$ = 1; }
table: table_start table_sorted table_opt_list ;
table_start: net_type TABLE symbol {
this_table = rt_new_table($3, $1);
}
;
table: net_type TABLE symbol table_sorted {
struct rtable_config *cf;
cf = rt_new_table($3, $1);
cf->sorted = $4;
table_sorted:
/* empty */
| SORTED { this_table->sorted = 1; }
;
table_opt:
SORTED bool { this_table->sorted = $2; }
| TRIE bool {
if (!net_val_match(this_table->addr_type, NB_IP | NB_VPN | NB_ROA | NB_IP6_SADR))
cf_error("Trie option not supported for %s table", net_label[this_table->addr_type]);
this_table->trie_used = $2;
}
| MIN SETTLE TIME expr_us { this_table->min_settle_time = $4; }
| MAX SETTLE TIME expr_us { this_table->max_settle_time = $4; }
;
table_opts:
/* empty */
| table_opts table_opt ';'
;
table_opt_list:
/* empty */
| '{' table_opts '}'
;
@ -621,12 +644,20 @@ r_args:
$$ = $1;
if ($$->addr) cf_error("Only one prefix expected");
$$->addr = $2;
$$->addr_mode = RSD_ADDR_EQUAL;
}
| r_args FOR r_args_for {
$$ = $1;
if ($$->addr) cf_error("Only one prefix expected");
$$->show_for = 1;
$$->addr = $3;
$$->addr_mode = RSD_ADDR_FOR;
}
| r_args IN net_any {
$$ = $1;
if ($$->addr) cf_error("Only one prefix expected");
if (!net_type_match($3, NB_IP)) cf_error("Only IP networks accepted for 'in' argument");
$$->addr = $3;
$$->addr_mode = RSD_ADDR_IN;
}
| r_args TABLE CF_SYM_KNOWN {
cf_assert_symbol($3, SYM_TABLE);

View file

@ -20,7 +20,10 @@ struct proto;
struct rte_src;
struct symbol;
struct timer;
struct fib;
struct filter;
struct f_trie;
struct f_trie_walk_state;
struct cli;
/*
@ -49,7 +52,7 @@ struct fib_iterator { /* See lib/slists.h for an explanation */
uint hash;
};
typedef void (*fib_init_fn)(void *);
typedef void (*fib_init_fn)(struct fib *, void *);
struct fib {
pool *fib_pool; /* Pool holding all our data */
@ -149,6 +152,7 @@ struct rtable_config {
int gc_min_time; /* Minimum time between two consecutive GC runs */
byte sorted; /* Routes of network are sorted according to rte_better() */
byte internal; /* Internal table of a protocol */
byte trie_used; /* Rtable has attached trie */
btime min_settle_time; /* Minimum settle time for notifications */
btime max_settle_time; /* Maximum settle time for notifications */
};
@ -158,6 +162,7 @@ typedef struct rtable {
node n; /* Node in list of all tables */
pool *rp; /* Resource pool to allocate everything from, including itself */
struct fib fib;
struct f_trie *trie; /* Trie of prefixes defined in fib */
char *name; /* Name of this table */
list channels; /* List of attached channels (struct channel) */
uint addr_type; /* Type of address data stored in table (NET_*) */
@ -180,13 +185,20 @@ typedef struct rtable {
btime gc_time; /* Time of last GC */
int gc_counter; /* Number of operations since last GC */
byte prune_state; /* Table prune state, 1 -> scheduled, 2-> running */
byte prune_trie; /* Prune prefix trie during next table prune */
byte hcu_scheduled; /* Hostcache update is scheduled */
byte nhu_state; /* Next Hop Update state */
struct fib_iterator prune_fit; /* Rtable prune FIB iterator */
struct fib_iterator nhu_fit; /* Next Hop Update FIB iterator */
struct f_trie *trie_new; /* New prefix trie defined during pruning */
struct f_trie *trie_old; /* Old prefix trie waiting to be freed */
u32 trie_lock_count; /* Prefix trie locked by walks */
u32 trie_old_lock_count; /* Old prefix trie locked by walks */
list subscribers; /* Subscribers for notifications */
struct timer *settle_timer; /* Settle time for notifications */
list flowspec_links; /* List of flowspec links, src for NET_IPx and dst for NET_FLOWx */
struct f_trie *flowspec_trie; /* Trie for evaluation of flowspec notifications */
} rtable;
struct rt_subscription {
@ -196,6 +208,13 @@ struct rt_subscription {
void *data;
};
struct rt_flowspec_link {
node n;
rtable *src;
rtable *dst;
u32 uc;
};
#define NHU_CLEAN 0
#define NHU_SCHEDULED 1
#define NHU_RUNNING 2
@ -262,6 +281,7 @@ typedef struct rte {
struct {
u8 suppressed; /* Used for deterministic MED comparison */
s8 stale; /* Route is LLGR_STALE, -1 if unknown */
struct rtable *base_table; /* Base table for Flowspec validation */
} bgp;
#endif
#ifdef CONFIG_BABEL
@ -315,8 +335,12 @@ void rt_preconfig(struct config *);
void rt_commit(struct config *new, struct config *old);
void rt_lock_table(rtable *);
void rt_unlock_table(rtable *);
struct f_trie * rt_lock_trie(rtable *tab);
void rt_unlock_trie(rtable *tab, struct f_trie *trie);
void rt_subscribe(rtable *tab, struct rt_subscription *s);
void rt_unsubscribe(struct rt_subscription *s);
void rt_flowspec_link(rtable *src, rtable *dst);
void rt_flowspec_unlink(rtable *src, rtable *dst);
rtable *rt_setup(pool *, struct rtable_config *);
static inline void rt_shutdown(rtable *r) { rfree(r->rp); }
@ -324,7 +348,8 @@ static inline net *net_find(rtable *tab, const net_addr *addr) { return (net *)
static inline net *net_find_valid(rtable *tab, const net_addr *addr)
{ net *n = net_find(tab, addr); return (n && rte_is_valid(n->routes)) ? n : NULL; }
static inline net *net_get(rtable *tab, const net_addr *addr) { return (net *) fib_get(&tab->fib, addr); }
void *net_route(rtable *tab, const net_addr *n);
net *net_get(rtable *tab, const net_addr *addr);
net *net_route(rtable *tab, const net_addr *n);
int net_roa_check(rtable *tab, const net_addr *n, u32 asn);
rte *rte_find(net *net, struct rte_src *src);
rte *rte_get_temp(struct rta *);
@ -356,6 +381,18 @@ void rt_prune_sync(rtable *t, int all);
int rte_update_out(struct channel *c, const net_addr *n, rte *new, rte *old0, int refeed);
struct rtable_config *rt_new_table(struct symbol *s, uint addr_type);
static inline int rt_is_ip(rtable *tab)
{ return (tab->addr_type == NET_IP4) || (tab->addr_type == NET_IP6); }
static inline int rt_is_vpn(rtable *tab)
{ return (tab->addr_type == NET_VPN4) || (tab->addr_type == NET_VPN6); }
static inline int rt_is_roa(rtable *tab)
{ return (tab->addr_type == NET_ROA4) || (tab->addr_type == NET_ROA6); }
static inline int rt_is_flow(rtable *tab)
{ return (tab->addr_type == NET_FLOW4) || (tab->addr_type == NET_FLOW6); }
/* Default limit for ECMP next hops, defined in sysdep code */
extern const int rt_default_ecmp;
@ -372,6 +409,8 @@ struct rt_show_data {
struct rt_show_data_rtable *tab; /* Iterator over table list */
struct rt_show_data_rtable *last_table; /* Last table in output */
struct fib_iterator fit; /* Iterator over networks in table */
struct f_trie_walk_state *walk_state; /* Iterator over networks in trie */
struct f_trie *walk_lock; /* Locked trie for walking */
int verbose, tables_defined_by;
const struct filter *filter;
struct proto *show_protocol;
@ -379,9 +418,10 @@ struct rt_show_data {
struct channel *export_channel;
struct config *running_on_config;
struct krt_proto *kernel;
int export_mode, primary_only, filtered, stats, show_for;
int export_mode, addr_mode, primary_only, filtered, stats;
int table_open; /* Iteration (fit) is open */
int trie_walk; /* Current table is iterated using trie */
int net_counter, rt_counter, show_counter, table_counter;
int net_counter_last, rt_counter_last, show_counter_last;
};
@ -398,6 +438,11 @@ struct rt_show_data_rtable * rt_show_add_table(struct rt_show_data *d, rtable *t
#define RSD_TDB_SET 0x1 /* internal: show empty tables */
#define RSD_TDB_NMN 0x2 /* internal: need matching net */
/* Value of addr_mode */
#define RSD_ADDR_EQUAL 1 /* Exact query - show route <addr> */
#define RSD_ADDR_FOR 2 /* Longest prefix match - show route for <addr> */
#define RSD_ADDR_IN 3 /* Interval query - show route in <addr> */
/* Value of export_mode in struct rt_show_data */
#define RSEM_NONE 0 /* Export mode not used */
#define RSEM_PREEXPORT 1 /* Routes ready for export, before filtering */
@ -718,6 +763,9 @@ rta_set_recursive_next_hop(rtable *dep, rta *a, rtable *tab, ip_addr gw, ip_addr
static inline void rt_lock_hostentry(struct hostentry *he) { if (he) he->uc++; }
static inline void rt_unlock_hostentry(struct hostentry *he) { if (he) he->uc--; }
int rt_flowspec_check(rtable *tab_ip, rtable *tab_flow, const net_addr *n, rta *a, int interior);
/*
* Default protocol preferences
*/

View file

@ -331,7 +331,7 @@ fib_get(struct fib *f, const net_addr *a)
memset(b, 0, f->node_offset);
if (f->init)
f->init(b);
f->init(f, b);
if (f->entries++ > f->entries_max)
fib_rehash(f, HASH_HI_STEP);

View file

@ -15,6 +15,7 @@
#include "nest/cli.h"
#include "nest/iface.h"
#include "filter/filter.h"
#include "filter/data.h"
#include "sysdep/unix/krt.h"
static void
@ -110,10 +111,9 @@ rt_show_net(struct cli *c, net *n, struct rt_show_data *d)
ASSUME(!d->export_mode || ec);
int first = 1;
int first_show = 1;
int pass = 0;
bsnprintf(ia, sizeof(ia), "%N", n->n.addr);
for (e = n->routes; e; e = e->next)
{
if (rte_is_filtered(e) != d->filtered)
@ -187,10 +187,17 @@ rt_show_net(struct cli *c, net *n, struct rt_show_data *d)
goto skip;
if (d->stats < 2)
{
if (first_show)
net_format(n->n.addr, ia, sizeof(ia));
else
ia[0] = 0;
rt_show_rte(c, ia, e, d, (e->net->routes == ee));
first_show = 0;
}
d->show_counter++;
ia[0] = 0;
skip:
if (e != ee)
@ -212,9 +219,12 @@ rt_show_cleanup(struct cli *c)
struct rt_show_data_rtable *tab;
/* Unlink the iterator */
if (d->table_open)
if (d->table_open && !d->trie_walk)
fit_get(&d->tab->table->fib, &d->fit);
if (d->walk_lock)
rt_unlock_trie(d->tab->table, d->walk_lock);
/* Unlock referenced tables */
WALK_LIST(tab, d->tables)
rt_unlock_table(tab->table);
@ -224,12 +234,13 @@ static void
rt_show_cont(struct cli *c)
{
struct rt_show_data *d = c->rover;
struct rtable *tab = d->tab->table;
#ifdef DEBUGGING
unsigned max = 4;
#else
unsigned max = 64;
#endif
struct fib *fib = &d->tab->table->fib;
struct fib *fib = &tab->fib;
struct fib_iterator *it = &d->fit;
if (d->running_on_config && (d->running_on_config != config))
@ -240,7 +251,22 @@ rt_show_cont(struct cli *c)
if (!d->table_open)
{
FIB_ITERATE_INIT(&d->fit, &d->tab->table->fib);
/* We use either trie-based walk or fib-based walk */
d->trie_walk = tab->trie &&
(d->addr_mode == RSD_ADDR_IN) &&
net_val_match(tab->addr_type, NB_IP);
if (d->trie_walk && !d->walk_state)
d->walk_state = lp_allocz(c->parser_pool, sizeof (struct f_trie_walk_state));
if (d->trie_walk)
{
d->walk_lock = rt_lock_trie(tab);
trie_walk_init(d->walk_state, tab->trie, d->addr);
}
else
FIB_ITERATE_INIT(&d->fit, &tab->fib);
d->table_open = 1;
d->table_counter++;
d->kernel = rt_show_get_kernel(d);
@ -253,16 +279,44 @@ rt_show_cont(struct cli *c)
rt_show_table(c, d);
}
if (d->trie_walk)
{
/* Trie-based walk */
net_addr addr;
while (trie_walk_next(d->walk_state, &addr))
{
net *n = net_find(tab, &addr);
if (!n)
continue;
rt_show_net(c, n, d);
if (!--max)
return;
}
rt_unlock_trie(tab, d->walk_lock);
d->walk_lock = NULL;
}
else
{
/* fib-based walk */
FIB_ITERATE_START(fib, it, net, n)
{
if ((d->addr_mode == RSD_ADDR_IN) && (!net_in_netX(n->n.addr, d->addr)))
goto next;
if (!max--)
{
FIB_ITERATE_PUT(it);
return;
}
rt_show_net(c, n, d);
next:;
}
FIB_ITERATE_END;
}
if (d->stats)
{
@ -271,7 +325,7 @@ rt_show_cont(struct cli *c)
cli_printf(c, -1007, "%d of %d routes for %d networks in table %s",
d->show_counter - d->show_counter_last, d->rt_counter - d->rt_counter_last,
d->net_counter - d->net_counter_last, d->tab->table->name);
d->net_counter - d->net_counter_last, tab->name);
}
d->kernel = NULL;
@ -402,7 +456,7 @@ rt_show(struct rt_show_data *d)
rt_show_prepare_tables(d);
if (!d->addr)
if (!d->addr || (d->addr_mode == RSD_ADDR_IN))
{
WALK_LIST(tab, d->tables)
rt_lock_table(tab->table);
@ -420,7 +474,7 @@ rt_show(struct rt_show_data *d)
d->tab = tab;
d->kernel = rt_show_get_kernel(d);
if (d->show_for)
if (d->addr_mode == RSD_ADDR_FOR)
n = net_route(tab->table, d->addr);
else
n = net_find(tab->table, d->addr);

File diff suppressed because it is too large Load diff

View file

@ -63,7 +63,7 @@ static inline void babel_iface_kick_timer(struct babel_iface *ifa);
*/
static void
babel_init_entry(void *E)
babel_init_entry(struct fib *f UNUSED, void *E)
{
struct babel_entry *e = E;

View file

@ -1683,6 +1683,10 @@ bgp_preexport(struct proto *P, rte **new, struct linpool *pool UNUSED)
if (src == NULL)
return 0;
/* Reject flowspec that failed validation */
if ((e->attrs->dest == RTD_UNREACHABLE) && net_is_flow(e->net->n.addr))
return -1;
/* IBGP route reflection, RFC 4456 */
if (p->is_internal && src->is_internal && (p->local_as == src->local_as))
{

View file

@ -101,6 +101,7 @@
* RFC 8203 - BGP Administrative Shutdown Communication
* RFC 8212 - Default EBGP Route Propagation Behavior without Policies
* RFC 8654 - Extended Message Support for BGP
* RFC 9117 - Revised Validation Procedure for BGP Flow Specifications
* draft-ietf-idr-ext-opt-param-07
* draft-uttaro-idr-bgp-persistence-04
* draft-walton-bgp-hostname-capability-02
@ -1740,6 +1741,9 @@ bgp_channel_init(struct channel *C, struct channel_config *CF)
if (cf->igp_table_ip6)
c->igp_table_ip6 = cf->igp_table_ip6->table;
if (cf->base_table)
c->base_table = cf->base_table->table;
}
static int
@ -1755,6 +1759,12 @@ bgp_channel_start(struct channel *C)
if (c->igp_table_ip6)
rt_lock_table(c->igp_table_ip6);
if (c->base_table)
{
rt_lock_table(c->base_table);
rt_flowspec_link(c->base_table, c->c.table);
}
c->pool = p->p.pool; // XXXX
bgp_init_bucket_table(c);
bgp_init_prefix_table(c);
@ -1839,6 +1849,12 @@ bgp_channel_cleanup(struct channel *C)
if (c->igp_table_ip6)
rt_unlock_table(c->igp_table_ip6);
if (c->base_table)
{
rt_flowspec_unlink(c->base_table, c->c.table);
rt_unlock_table(c->base_table);
}
c->index = 0;
/* Cleanup rest of bgp_channel starting at pool field */
@ -1886,6 +1902,25 @@ bgp_default_igp_table(struct bgp_config *cf, struct bgp_channel_config *cc, u32
cf_error("Undefined IGP table");
}
static struct rtable_config *
bgp_default_base_table(struct bgp_config *cf, struct bgp_channel_config *cc)
{
/* Expected table type */
u32 type = (cc->afi == BGP_AF_FLOW4) ? NET_IP4 : NET_IP6;
/* First, try appropriate IP channel */
u32 afi2 = BGP_AF(BGP_AFI(cc->afi), BGP_SAFI_UNICAST);
struct bgp_channel_config *cc2 = bgp_find_channel_config(cf, afi2);
if (cc2 && (cc2->c.table->addr_type == type))
return cc2->c.table;
/* Last, try default table of given type */
struct rtable_config *tab = cf->c.global->def_tables[type];
if (tab)
return tab;
cf_error("Undefined base table");
}
void
bgp_postconfig(struct proto_config *CF)
@ -2030,6 +2065,14 @@ bgp_postconfig(struct proto_config *CF)
cf_error("Mismatched IGP table type");
}
/* Default value of base table */
if ((BGP_SAFI(cc->afi) == BGP_SAFI_FLOW) && cc->validate && !cc->base_table)
cc->base_table = bgp_default_base_table(cf, cc);
if (cc->base_table && !cc->base_table->trie_used)
cf_error("Flowspec validation requires base table (%s) with trie",
cc->base_table->name);
if (cf->multihop && (cc->gw_mode == GW_DIRECT))
cf_error("Multihop BGP cannot use direct gateway mode");
@ -2098,7 +2141,7 @@ bgp_reconfigure(struct proto *P, struct proto_config *CF)
return same;
}
#define IGP_TABLE(cf, sym) ((cf)->igp_table_##sym ? (cf)->igp_table_##sym ->table : NULL )
#define TABLE(cf, NAME) ((cf)->NAME ? (cf)->NAME->table : NULL )
static int
bgp_channel_reconfigure(struct channel *C, struct channel_config *CC, int *import_changed, int *export_changed)
@ -2109,6 +2152,7 @@ bgp_channel_reconfigure(struct channel *C, struct channel_config *CC, int *impor
struct bgp_channel_config *old = c->cf;
if ((new->secondary != old->secondary) ||
(new->validate != old->validate) ||
(new->gr_able != old->gr_able) ||
(new->llgr_able != old->llgr_able) ||
(new->llgr_time != old->llgr_time) ||
@ -2116,8 +2160,9 @@ bgp_channel_reconfigure(struct channel *C, struct channel_config *CC, int *impor
(new->add_path != old->add_path) ||
(new->import_table != old->import_table) ||
(new->export_table != old->export_table) ||
(IGP_TABLE(new, ip4) != IGP_TABLE(old, ip4)) ||
(IGP_TABLE(new, ip6) != IGP_TABLE(old, ip6)))
(TABLE(new, igp_table_ip4) != TABLE(old, igp_table_ip4)) ||
(TABLE(new, igp_table_ip6) != TABLE(old, igp_table_ip6)) ||
(TABLE(new, base_table) != TABLE(old, base_table)))
return 0;
if (new->mandatory && !old->mandatory && (C->channel_state != CS_UP))
@ -2531,6 +2576,9 @@ bgp_show_proto_info(struct proto *P)
if (c->igp_table_ip6)
cli_msg(-1006, " IGP IPv6 table: %s", c->igp_table_ip6->name);
if (c->base_table)
cli_msg(-1006, " Base table: %s", c->base_table->name);
}
}
}

View file

@ -147,6 +147,7 @@ struct bgp_channel_config {
u8 mandatory; /* Channel is mandatory in capability negotiation */
u8 gw_mode; /* How we compute route gateway from next_hop attr, see GW_* */
u8 secondary; /* Accept also non-best routes (i.e. RA_ACCEPTED) */
u8 validate; /* Validate Flowspec per RFC 8955 (6) */
u8 gr_able; /* Allow full graceful restart for the channel */
u8 llgr_able; /* Allow full long-lived GR for the channel */
uint llgr_time; /* Long-lived graceful restart stale time */
@ -160,6 +161,7 @@ struct bgp_channel_config {
struct rtable_config *igp_table_ip4; /* Table for recursive IPv4 next hop lookups */
struct rtable_config *igp_table_ip6; /* Table for recursive IPv6 next hop lookups */
struct rtable_config *base_table; /* Base table for Flowspec validation */
};
#define BGP_PT_INTERNAL 1
@ -341,6 +343,7 @@ struct bgp_channel {
rtable *igp_table_ip4; /* Table for recursive IPv4 next hop lookups */
rtable *igp_table_ip6; /* Table for recursive IPv6 next hop lookups */
rtable *base_table; /* Base table for Flowspec validation */
/* Rest are zeroed when down */
pool *pool;
@ -451,6 +454,7 @@ struct bgp_parse_state {
jmp_buf err_jmpbuf;
struct hostentry *hostentry;
struct rtable *base_table;
adata *mpls_labels;
/* Cached state for bgp_rte_update() */
@ -517,7 +521,7 @@ struct rte_source *bgp_get_source(struct bgp_proto *p, u32 path_id);
static inline int
rte_resolvable(rte *rt)
{
return rt->attrs->dest == RTD_UNICAST;
return rt->attrs->dest != RTD_UNREACHABLE;
}

View file

@ -31,7 +31,7 @@ CF_KEYWORDS(BGP, LOCAL, NEIGHBOR, AS, HOLD, TIME, CONNECT, RETRY, KEEPALIVE,
STRICT, BIND, CONFEDERATION, MEMBER, MULTICAST, FLOW4, FLOW6, LONG,
LIVED, STALE, IMPORT, IBGP, EBGP, MANDATORY, INTERNAL, EXTERNAL, SETS,
DYNAMIC, RANGE, NAME, DIGITS, BGP_AIGP, AIGP, ORIGINATE, COST, ENFORCE,
FIRST, FREE)
FIRST, FREE, VALIDATE, BASE)
%type <i> bgp_nh
%type <i32> bgp_afi
@ -256,6 +256,11 @@ bgp_channel_item:
| GATEWAY DIRECT { BGP_CC->gw_mode = GW_DIRECT; }
| GATEWAY RECURSIVE { BGP_CC->gw_mode = GW_RECURSIVE; }
| SECONDARY bool { BGP_CC->secondary = $2; }
| VALIDATE bool {
BGP_CC->validate = $2;
if (BGP_SAFI(BGP_CC->afi) != BGP_SAFI_FLOW)
cf_error("Validate option limited to flowspec channels");
}
| GRACEFUL RESTART bool { BGP_CC->gr_able = $3; }
| LONG LIVED GRACEFUL RESTART bool { BGP_CC->llgr_able = $5; }
| LONG LIVED STALE TIME expr { BGP_CC->llgr_time = $5; }
@ -279,6 +284,16 @@ bgp_channel_item:
else
cf_error("Mismatched IGP table type");
}
| BASE TABLE rtable {
if (BGP_SAFI(BGP_CC->afi) != BGP_SAFI_FLOW)
cf_error("Base table option limited to flowspec channels");
if (((BGP_CC->afi == BGP_AF_FLOW4) && ($3->addr_type == NET_IP4)) ||
((BGP_CC->afi == BGP_AF_FLOW6) && ($3->addr_type == NET_IP6)))
BGP_CC->base_table = $3;
else
cf_error("Mismatched base table type");
}
;
bgp_channel_opts:

View file

@ -1018,6 +1018,23 @@ bgp_apply_mpls_labels(struct bgp_parse_state *s, rta *a, u32 *labels, uint lnum)
}
}
static void
bgp_apply_flow_validation(struct bgp_parse_state *s, const net_addr *n, rta *a)
{
struct bgp_channel *c = s->channel;
int valid = rt_flowspec_check(c->base_table, c->c.table, n, a, s->proto->is_interior);
a->dest = valid ? RTD_NONE : RTD_UNREACHABLE;
/* Set rte.bgp.base_table later from this state variable */
s->base_table = c->base_table;
/* Invalidate cached rta if dest changes */
if (s->cached_rta && (s->cached_rta->dest != a->dest))
{
rta_free(s->cached_rta);
s->cached_rta = NULL;
}
}
static int
bgp_match_src(struct bgp_export_state *s, int mode)
@ -1383,6 +1400,7 @@ bgp_rte_update(struct bgp_parse_state *s, const net_addr *n, u32 path_id, rta *a
e->pflags = 0;
e->u.bgp.suppressed = 0;
e->u.bgp.stale = -1;
e->u.bgp.base_table = s->base_table;
rte_update3(&s->channel->c, n, e, s->last_src);
}
@ -1897,6 +1915,10 @@ bgp_decode_nlri_flow4(struct bgp_parse_state *s, byte *pos, uint len, rta *a)
net_fill_flow4(n, px, pxlen, pos, flen);
ADVANCE(pos, len, flen);
/* Apply validation procedure per RFC 8955 (6) */
if (a && s->channel->cf->validate)
bgp_apply_flow_validation(s, n, a);
bgp_rte_update(s, n, path_id, a);
}
}
@ -1985,6 +2007,10 @@ bgp_decode_nlri_flow6(struct bgp_parse_state *s, byte *pos, uint len, rta *a)
net_fill_flow6(n, px, pxlen, pos, flen);
ADVANCE(pos, len, flen);
/* Apply validation procedure per RFC 8955 (6) */
if (a && s->channel->cf->validate)
bgp_apply_flow_validation(s, n, a);
bgp_rte_update(s, n, path_id, a);
}
}
@ -2438,6 +2464,8 @@ bgp_decode_nlri(struct bgp_parse_state *s, u32 afi, byte *nlri, uint len, ea_lis
s->last_id = 0;
s->last_src = s->proto->p.main_source;
s->base_table = NULL;
/*
* IPv4 BGP and MP-BGP may be used together in one update, therefore we do not
* add BA_NEXT_HOP in bgp_decode_attrs(), but we add it here independently for

View file

@ -81,7 +81,10 @@ pipe_rt_notify(struct proto *P, struct channel *src_ch, net *n, rte *new, rte *o
#ifdef CONFIG_BGP
/* Hack to cleanup cached value */
if (e->attrs->src->proto->proto == &proto_bgp)
{
e->u.bgp.stale = -1;
e->u.bgp.base_table = NULL;
}
#endif
src = a->src;

View file

@ -309,6 +309,12 @@ bt_log_suite_case_result(int result, const char *fmt, ...)
}
}
void
bt_reset_suite_case_timer(void)
{
clock_gettime(CLOCK_MONOTONIC, &bt_suite_case_begin);
}
int
bt_test_suite_base(int (*fn)(const void *), const char *id, const void *fn_arg, int forked, int timeout, const char *dsc, ...)
{
@ -501,6 +507,15 @@ bt_fmt_ipa(char *buf, size_t size, const void *data)
bsnprintf(buf, size, "(null)");
}
void
bt_format_net(char *buf, size_t size, const void *data)
{
if (data)
bsnprintf(buf, size, "%N", (const net_addr *) data);
else
bsnprintf(buf, size, "(null)");
}
int
bt_is_char(byte c)
{

View file

@ -32,6 +32,7 @@ extern const char *bt_test_id;
void bt_init(int argc, char *argv[]);
int bt_exit_value(void);
void bt_reset_suite_case_timer(void);
int bt_test_suite_base(int (*test_fn)(const void *), const char *test_id, const void *test_fn_argument, int forked, int timeout, const char *dsc, ...);
static inline u64 bt_random(void)
{ return ((u64) random() & 0xffffffff) | ((u64) random() << 32); }
@ -165,6 +166,8 @@ struct bt_batch {
void bt_fmt_str(char *buf, size_t size, const void *data);
void bt_fmt_unsigned(char *buf, size_t size, const void *data);
void bt_fmt_ipa(char *buf, size_t size, const void *data);
void bt_format_net(char *buf, size_t size, const void *data);
int bt_assert_batch__(struct bt_batch *opts);
int bt_is_char(byte c);