From 3d3be4333fdf6faa080947b331a6a19bce1a4f57 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 1 Sep 2010 00:50:51 +0000 Subject: gro: fix different skb headrooms Packets entering GRO might have different headrooms, even for a given flow (because of implementation details in drivers, like copybreak). We cant force drivers to deliver packets with a fixed headroom. 1) fix skb_segment() skb_segment() makes the false assumption headrooms of fragments are same than the head. When CHECKSUM_PARTIAL is used, this can give csum_start errors, and crash later in skb_copy_and_csum_dev() 2) allocate a minimal skb for head of frag_list skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to allocate a fresh skb. This adds NET_SKB_PAD to a padding already provided by netdevice, depending on various things, like copybreak. Use alloc_skb() to allocate an exact padding, to reduce cache line needs: NET_SKB_PAD + NET_IP_ALIGN bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 Many thanks to Plamen Petrov, testing many debugging patches ! With help of Jarek Poplawski. Reported-by: Plamen Petrov Signed-off-by: Eric Dumazet CC: Jarek Poplawski Signed-off-by: David S. Miller --- net/core/skbuff.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'net/core') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 3a2513f0d0c3..26396ff67cf9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2573,6 +2573,10 @@ struct sk_buff *skb_segment(struct sk_buff *skb, int features) __copy_skb_header(nskb, skb); nskb->mac_len = skb->mac_len; + /* nskb and skb might have different headroom */ + if (nskb->ip_summed == CHECKSUM_PARTIAL) + nskb->csum_start += skb_headroom(nskb) - headroom; + skb_reset_mac_header(nskb); skb_set_network_header(nskb, skb->mac_len); nskb->transport_header = (nskb->network_header + @@ -2702,8 +2706,8 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb) } else if (skb_gro_len(p) != pinfo->gso_size) return -E2BIG; - headroom = skb_headroom(p); - nskb = netdev_alloc_skb(p->dev, headroom + skb_gro_offset(p)); + headroom = NET_SKB_PAD + NET_IP_ALIGN; + nskb = alloc_skb(headroom + skb_gro_offset(p), GFP_ATOMIC); if (unlikely(!nskb)) return -ENOMEM; -- cgit v1.2.3 From 0b5d404e349c0236b11466c0a4785520c0be6982 Mon Sep 17 00:00:00 2001 From: Jarek Poplawski Date: Thu, 2 Sep 2010 13:22:11 -0700 Subject: pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimator This patch fixes a lockdep warning: [ 516.287584] ========================================================= [ 516.288386] [ INFO: possible irq lock inversion dependency detected ] [ 516.288386] 2.6.35b #7 [ 516.288386] --------------------------------------------------------- [ 516.288386] swapper/0 just changed the state of lock: [ 516.288386] (&qdisc_tx_lock){+.-...}, at: [] est_timer+0x62/0x1b4 [ 516.288386] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 516.288386] (est_tree_lock){+.+...} [ 516.288386] [ 516.288386] and interrupts could create inverse lock ordering between them. ... So, est_tree_lock needs BH protection because it's taken by qdisc_tx_lock, which is used both in BH and process contexts. (Full warning with this patch at netdev, 02 Sep 2010.) Fixes commit: ae638c47dc040b8def16d05dc6acdd527628f231 ("pkt_sched: gen_estimator: add a new lock") Signed-off-by: Jarek Poplawski Signed-off-by: David S. Miller --- net/core/gen_estimator.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'net/core') diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c index 9fbe7f7429b0..6743146e4d6b 100644 --- a/net/core/gen_estimator.c +++ b/net/core/gen_estimator.c @@ -232,7 +232,7 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats, est->last_packets = bstats->packets; est->avpps = rate_est->pps<<10; - spin_lock(&est_tree_lock); + spin_lock_bh(&est_tree_lock); if (!elist[idx].timer.function) { INIT_LIST_HEAD(&elist[idx].list); setup_timer(&elist[idx].timer, est_timer, idx); @@ -243,7 +243,7 @@ int gen_new_estimator(struct gnet_stats_basic_packed *bstats, list_add_rcu(&est->list, &elist[idx].list); gen_add_node(est); - spin_unlock(&est_tree_lock); + spin_unlock_bh(&est_tree_lock); return 0; } @@ -270,7 +270,7 @@ void gen_kill_estimator(struct gnet_stats_basic_packed *bstats, { struct gen_estimator *e; - spin_lock(&est_tree_lock); + spin_lock_bh(&est_tree_lock); while ((e = gen_find_node(bstats, rate_est))) { rb_erase(&e->node, &est_root); @@ -281,7 +281,7 @@ void gen_kill_estimator(struct gnet_stats_basic_packed *bstats, list_del_rcu(&e->list); call_rcu(&e->e_rcu, __gen_kill_estimator); } - spin_unlock(&est_tree_lock); + spin_unlock_bh(&est_tree_lock); } EXPORT_SYMBOL(gen_kill_estimator); @@ -320,9 +320,9 @@ bool gen_estimator_active(const struct gnet_stats_basic_packed *bstats, ASSERT_RTNL(); - spin_lock(&est_tree_lock); + spin_lock_bh(&est_tree_lock); res = gen_find_node(bstats, rate_est) != NULL; - spin_unlock(&est_tree_lock); + spin_unlock_bh(&est_tree_lock); return res; } -- cgit v1.2.3 From deabc772f39405054a438d711f408d2d94d26d96 Mon Sep 17 00:00:00 2001 From: Helmut Schaa Date: Fri, 3 Sep 2010 02:39:56 +0000 Subject: net: fix tx queue selection for bridged devices implementing select_queue When a net device is implementing the select_queue callback and is part of a bridge, frames coming from the bridge already have a tx queue associated to the socket (introduced in commit a4ee3ce3293dc931fab19beb472a8bde1295aebe, "net: Use sk_tx_queue_mapping for connected sockets"). The call to sk_tx_queue_get will then return the tx queue used by the bridge instead of calling the select_queue callback. In case of mac80211 this broke QoS which is implemented by using the select_queue callback. Furthermore it introduced problems with rt2x00 because frames with the same TID and RA sometimes appeared on different tx queues which the hw cannot handle correctly. Fix this by always calling select_queue first if it is available and only afterwards use the socket tx queue mapping. Signed-off-by: Helmut Schaa Signed-off-by: David S. Miller --- net/core/dev.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'net/core') diff --git a/net/core/dev.c b/net/core/dev.c index 3721fbb9a83c..b9b22a3c4c8f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2058,16 +2058,16 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev, struct sk_buff *skb) { int queue_index; - struct sock *sk = skb->sk; + const struct net_device_ops *ops = dev->netdev_ops; - queue_index = sk_tx_queue_get(sk); - if (queue_index < 0) { - const struct net_device_ops *ops = dev->netdev_ops; + if (ops->ndo_select_queue) { + queue_index = ops->ndo_select_queue(dev, skb); + queue_index = dev_cap_txqueue(dev, queue_index); + } else { + struct sock *sk = skb->sk; + queue_index = sk_tx_queue_get(sk); + if (queue_index < 0) { - if (ops->ndo_select_queue) { - queue_index = ops->ndo_select_queue(dev, skb); - queue_index = dev_cap_txqueue(dev, queue_index); - } else { queue_index = 0; if (dev->real_num_tx_queues > 1) queue_index = skb_tx_hash(dev, skb); -- cgit v1.2.3 From 64289c8e6851bca0e589e064c9a5c9fbd6ae5dd4 Mon Sep 17 00:00:00 2001 From: Jarek Poplawski Date: Sat, 4 Sep 2010 10:34:29 +0000 Subject: gro: Re-fix different skb headrooms The patch: "gro: fix different skb headrooms" in its part: "2) allocate a minimal skb for head of frag_list" is buggy. The copied skb has p->data set at the ip header at the moment, and skb_gro_offset is the length of ip + tcp headers. So, after the change the length of mac header is skipped. Later skb_set_mac_header() sets it into the NET_SKB_PAD area (if it's long enough) and ip header is misaligned at NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the original skb was wrongly allocated, so let's copy it as it was. bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 fixes commit: 3d3be4333fdf6faa080947b331a6a19bce1a4f57 Reported-by: Plamen Petrov Signed-off-by: Jarek Poplawski CC: Eric Dumazet Acked-by: Eric Dumazet Tested-by: Plamen Petrov Signed-off-by: David S. Miller --- net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'net/core') diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 26396ff67cf9..c83b421341c0 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -2706,7 +2706,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb) } else if (skb_gro_len(p) != pinfo->gso_size) return -E2BIG; - headroom = NET_SKB_PAD + NET_IP_ALIGN; + headroom = skb_headroom(p); nskb = alloc_skb(headroom + skb_gro_offset(p), GFP_ATOMIC); if (unlikely(!nskb)) return -ENOMEM; -- cgit v1.2.3