diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-04-23 16:50:42 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-04-23 16:50:42 -0700 |
| commit | e728258debd553c95d2e70f9cd97c9fde27c7130 (patch) | |
| tree | 18ef97c80f9923717f5cf6bdab44d77607ca0f4b | |
| parent | e8df5a0c0d041588e7f02781822d637d226cdbe8 (diff) | |
| parent | 5e6391da4539c35422c0df1d1d2d9a9bb97cd736 (diff) | |
Merge tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Steady stream of fixes. Last two weeks feel comparable to the two
weeks before the merge window. Lots of AI-aided bug discovery. A newer
big source is Sashiko/Gemini (Roman Gushchin's system), which points
out issues in existing code during patch review (maybe 25% of fixes
here likely originating from Sashiko). Nice thing is these are often
fixed by the respective maintainers, not drive-bys.
Current release - new code bugs:
- kconfig: MDIO_PIC64HPSC should depend on ARCH_MICROCHIP
Previous releases - regressions:
- add async ndo_set_rx_mode and switch drivers which we promised to
be called under the per-netdev mutex to it
- dsa: remove duplicate netdev_lock_ops() for conduit ethtool ops
- hv_sock: report EOF instead of -EIO for FIN
- vsock/virtio: fix MSG_PEEK calculation on bytes to copy
Previous releases - always broken:
- ipv6: fix possible UAF in icmpv6_rcv()
- icmp: validate reply type before using icmp_pointers
- af_unix: drop all SCM attributes for SOCKMAP
- netfilter: fix a number of bugs in the osf (OS fingerprinting)
- eth: intel: fix timestamp interrupt configuration for E825C
Misc:
- bunch of data-race annotations"
* tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (148 commits)
rxrpc: Fix error handling in rxgk_extract_token()
rxrpc: Fix re-decryption of RESPONSE packets
rxrpc: Fix rxrpc_input_call_event() to only unshare DATA packets
rxrpc: Fix missing validation of ticket length in non-XDR key preparsing
rxgk: Fix potential integer overflow in length check
rxrpc: Fix conn-level packet handling to unshare RESPONSE packets
rxrpc: Fix potential UAF after skb_unshare() failure
rxrpc: Fix rxkad crypto unalignment handling
rxrpc: Fix memory leaks in rxkad_verify_response()
net: rds: fix MR cleanup on copy error
m68k: mvme147: Make me the maintainer
net: txgbe: fix firmware version check
selftests/bpf: check epoll readiness during reuseport migration
tcp: call sk_data_ready() after listener migration
vhost_net: fix sleeping with preempt-disabled in vhost_net_busy_poll()
ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim
tipc: fix double-free in tipc_buf_append()
llc: Return -EINPROGRESS from llc_ui_connect()
ipv4: icmp: validate reply type before using icmp_pointers
selftests/net: packetdrill: cover RFC 5961 5.2 challenge ACK on both edges
...
172 files changed, 3534 insertions, 1414 deletions
diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst index 83e28b96884f..93e06e8d51a9 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -289,6 +289,19 @@ ndo_tx_timeout: ndo_set_rx_mode: Synchronization: netif_addr_lock spinlock. Context: BHs disabled + Notes: Deprecated in favor of ndo_set_rx_mode_async which runs + in process context. + +ndo_set_rx_mode_async: + Synchronization: rtnl_lock() semaphore. In addition, netdev instance + lock if the driver implements queue management or shaper API. + Context: process (from a work queue) + Notes: Async version of ndo_set_rx_mode which runs in process + context. Receives snapshots of the unicast and multicast address lists. + +ndo_change_rx_flags: + Synchronization: rtnl_lock() semaphore. In addition, netdev instance + lock if the driver implements queue management or shaper API. ndo_setup_tc: ``TC_SETUP_BLOCK`` and ``TC_SETUP_FT`` are running under NFT locks diff --git a/Documentation/process/maintainer-netdev.rst b/Documentation/process/maintainer-netdev.rst index bda93b459a05..ec7b9aa2877f 100644 --- a/Documentation/process/maintainer-netdev.rst +++ b/Documentation/process/maintainer-netdev.rst @@ -528,7 +528,7 @@ The exact rules a driver must follow to acquire the ``Supported`` status: status will be withdrawn. 5. Test failures due to bugs either in the driver or the test itself, - or lack of support for the feature the test is targgeting are + or lack of support for the feature the test is targeting are *not* a basis for losing the ``Supported`` status. netdev CI will maintain an official page of supported devices, listing their diff --git a/MAINTAINERS b/MAINTAINERS index c01e7ae900e5..53f7dc12ee95 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15337,6 +15337,13 @@ S: Maintained W: http://www.tazenda.demon.co.uk/phil/linux-hp F: arch/m68k/hp300/ +M68K ON MVME147 +M: Daniel Palmer <daniel@thingy.jp> +S: Maintained +F: arch/m68k/mvme147/ +F: drivers/net/ethernet/amd/mvme147.c +F: drivers/scsi/mvme147.* + M88DS3103 MEDIA DRIVER L: linux-media@vger.kernel.org S: Orphan diff --git a/drivers/net/dsa/realtek/rtl8365mb.c b/drivers/net/dsa/realtek/rtl8365mb.c index 31fa94dac627..c35cef01ec26 100644 --- a/drivers/net/dsa/realtek/rtl8365mb.c +++ b/drivers/net/dsa/realtek/rtl8365mb.c @@ -216,7 +216,7 @@ (_extint) == 2 ? RTL8365MB_DIGITAL_INTERFACE_SELECT_REG1 : \ 0x0) #define RTL8365MB_DIGITAL_INTERFACE_SELECT_MODE_MASK(_extint) \ - (0xF << (((_extint) % 2))) + (0xF << (((_extint) % 2) * 4)) #define RTL8365MB_DIGITAL_INTERFACE_SELECT_MODE_OFFSET(_extint) \ (((_extint) % 2) * 4) diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c index d6bdad4baadd..f8a4eb365c3d 100644 --- a/drivers/net/dummy.c +++ b/drivers/net/dummy.c @@ -47,7 +47,9 @@ static int numdummies = 1; /* fake multicast ability */ -static void set_multicast_list(struct net_device *dev) +static void set_multicast_list(struct net_device *dev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { } @@ -87,7 +89,7 @@ static const struct net_device_ops dummy_netdev_ops = { .ndo_init = dummy_dev_init, .ndo_start_xmit = dummy_xmit, .ndo_validate_addr = eth_validate_addr, - .ndo_set_rx_mode = set_multicast_list, + .ndo_set_rx_mode_async = set_multicast_list, .ndo_set_mac_address = eth_mac_addr, .ndo_get_stats64 = dummy_get_stats64, .ndo_change_carrier = dummy_change_carrier, diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c index e1ab15f1ee7d..2bb0a3ff9810 100644 --- a/drivers/net/ethernet/airoha/airoha_eth.c +++ b/drivers/net/ethernet/airoha/airoha_eth.c @@ -745,14 +745,18 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q, dma_addr_t dma_addr; q->buf_size = PAGE_SIZE / 2; - q->ndesc = ndesc; q->qdma = qdma; - q->entry = devm_kzalloc(eth->dev, q->ndesc * sizeof(*q->entry), + q->entry = devm_kzalloc(eth->dev, ndesc * sizeof(*q->entry), GFP_KERNEL); if (!q->entry) return -ENOMEM; + q->desc = dmam_alloc_coherent(eth->dev, ndesc * sizeof(*q->desc), + &dma_addr, GFP_KERNEL); + if (!q->desc) + return -ENOMEM; + q->page_pool = page_pool_create(&pp_params); if (IS_ERR(q->page_pool)) { int err = PTR_ERR(q->page_pool); @@ -761,11 +765,7 @@ static int airoha_qdma_init_rx_queue(struct airoha_queue *q, return err; } - q->desc = dmam_alloc_coherent(eth->dev, q->ndesc * sizeof(*q->desc), - &dma_addr, GFP_KERNEL); - if (!q->desc) - return -ENOMEM; - + q->ndesc = ndesc; netif_napi_add(eth->napi_dev, &q->napi, airoha_qdma_rx_napi_poll); airoha_qdma_wr(qdma, REG_RX_RING_BASE(qid), dma_addr); @@ -843,6 +843,21 @@ static int airoha_qdma_init_rx(struct airoha_qdma *qdma) return 0; } +static void airoha_qdma_wake_netdev_txqs(struct airoha_queue *q) +{ + struct airoha_qdma *qdma = q->qdma; + struct airoha_eth *eth = qdma->eth; + int i; + + for (i = 0; i < ARRAY_SIZE(eth->ports); i++) { + struct airoha_gdm_port *port = eth->ports[i]; + + if (port && port->qdma == qdma) + netif_tx_wake_all_queues(port->dev); + } + q->txq_stopped = false; +} + static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget) { struct airoha_tx_irq_queue *irq_q; @@ -919,12 +934,21 @@ static int airoha_qdma_tx_napi_poll(struct napi_struct *napi, int budget) txq = netdev_get_tx_queue(skb->dev, queue); netdev_tx_completed_queue(txq, 1, skb->len); - if (netif_tx_queue_stopped(txq) && - q->ndesc - q->queued >= q->free_thr) - netif_tx_wake_queue(txq); - dev_kfree_skb_any(skb); } + + if (q->txq_stopped && q->ndesc - q->queued >= q->free_thr) { + /* Since multiple net_device TX queues can share the + * same hw QDMA TX queue, there is no guarantee we have + * inflight packets queued in hw belonging to a + * net_device TX queue stopped in the xmit path. + * In order to avoid any potential net_device TX queue + * stall, we need to wake all the net_device TX queues + * feeding the same hw QDMA TX queue. + */ + airoha_qdma_wake_netdev_txqs(q); + } + unlock: spin_unlock_bh(&q->lock); } @@ -954,27 +978,27 @@ static int airoha_qdma_init_tx_queue(struct airoha_queue *q, dma_addr_t dma_addr; spin_lock_init(&q->lock); - q->ndesc = size; q->qdma = qdma; q->free_thr = 1 + MAX_SKB_FRAGS; INIT_LIST_HEAD(&q->tx_list); - q->entry = devm_kzalloc(eth->dev, q->ndesc * sizeof(*q->entry), + q->entry = devm_kzalloc(eth->dev, size * sizeof(*q->entry), GFP_KERNEL); if (!q->entry) return -ENOMEM; - q->desc = dmam_alloc_coherent(eth->dev, q->ndesc * sizeof(*q->desc), + q->desc = dmam_alloc_coherent(eth->dev, size * sizeof(*q->desc), &dma_addr, GFP_KERNEL); if (!q->desc) return -ENOMEM; - for (i = 0; i < q->ndesc; i++) { + for (i = 0; i < size; i++) { u32 val = FIELD_PREP(QDMA_DESC_DONE_MASK, 1); list_add_tail(&q->entry[i].list, &q->tx_list); WRITE_ONCE(q->desc[i].ctrl, cpu_to_le32(val)); } + q->ndesc = size; /* xmit ring drop default setting */ airoha_qdma_set(qdma, REG_TX_RING_BLOCKING(qid), @@ -996,8 +1020,6 @@ static int airoha_qdma_tx_irq_init(struct airoha_tx_irq_queue *irq_q, struct airoha_eth *eth = qdma->eth; dma_addr_t dma_addr; - netif_napi_add_tx(eth->napi_dev, &irq_q->napi, - airoha_qdma_tx_napi_poll); irq_q->q = dmam_alloc_coherent(eth->dev, size * sizeof(u32), &dma_addr, GFP_KERNEL); if (!irq_q->q) @@ -1007,6 +1029,9 @@ static int airoha_qdma_tx_irq_init(struct airoha_tx_irq_queue *irq_q, irq_q->size = size; irq_q->qdma = qdma; + netif_napi_add_tx(eth->napi_dev, &irq_q->napi, + airoha_qdma_tx_napi_poll); + airoha_qdma_wr(qdma, REG_TX_IRQ_BASE(id), dma_addr); airoha_qdma_rmw(qdma, REG_TX_IRQ_CFG(id), TX_IRQ_DEPTH_MASK, FIELD_PREP(TX_IRQ_DEPTH_MASK, size)); @@ -1039,12 +1064,15 @@ static int airoha_qdma_init_tx(struct airoha_qdma *qdma) static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q) { - struct airoha_eth *eth = q->qdma->eth; - int i; + struct airoha_qdma *qdma = q->qdma; + struct airoha_eth *eth = qdma->eth; + int i, qid = q - &qdma->q_tx[0]; + u16 index = 0; spin_lock_bh(&q->lock); for (i = 0; i < q->ndesc; i++) { struct airoha_queue_entry *e = &q->entry[i]; + struct airoha_qdma_desc *desc = &q->desc[i]; if (!e->dma_addr) continue; @@ -1055,8 +1083,33 @@ static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q) e->dma_addr = 0; e->skb = NULL; list_add_tail(&e->list, &q->tx_list); + + /* Reset DMA descriptor */ + WRITE_ONCE(desc->ctrl, 0); + WRITE_ONCE(desc->addr, 0); + WRITE_ONCE(desc->data, 0); + WRITE_ONCE(desc->msg0, 0); + WRITE_ONCE(desc->msg1, 0); + WRITE_ONCE(desc->msg2, 0); + q->queued--; } + + if (!list_empty(&q->tx_list)) { + struct airoha_queue_entry *e; + + e = list_first_entry(&q->tx_list, struct airoha_queue_entry, + list); + index = e - q->entry; + } + /* Set TX_DMA_IDX to TX_CPU_IDX to notify the hw the QDMA TX ring is + * empty. + */ + airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid), TX_RING_CPU_IDX_MASK, + FIELD_PREP(TX_RING_CPU_IDX_MASK, index)); + airoha_qdma_rmw(qdma, REG_TX_DMA_IDX(qid), TX_RING_DMA_IDX_MASK, + FIELD_PREP(TX_RING_DMA_IDX_MASK, index)); + spin_unlock_bh(&q->lock); } @@ -1398,8 +1451,12 @@ static void airoha_qdma_cleanup(struct airoha_qdma *qdma) } } - for (i = 0; i < ARRAY_SIZE(qdma->q_tx_irq); i++) + for (i = 0; i < ARRAY_SIZE(qdma->q_tx_irq); i++) { + if (!qdma->q_tx_irq[i].size) + continue; + netif_napi_del(&qdma->q_tx_irq[i].napi); + } for (i = 0; i < ARRAY_SIZE(qdma->q_tx); i++) { if (!qdma->q_tx[i].ndesc) @@ -1727,7 +1784,7 @@ static int airoha_set_gdm2_loopback(struct airoha_gdm_port *port) { struct airoha_eth *eth = port->qdma->eth; u32 val, pse_port, chan; - int src_port; + int i, src_port; /* Forward the traffic to the proper GDM port */ pse_port = port->id == AIROHA_GDM3_IDX ? FE_PSE_PORT_GDM3 @@ -1769,6 +1826,9 @@ static int airoha_set_gdm2_loopback(struct airoha_gdm_port *port) SP_CPORT_MASK(val), __field_prep(SP_CPORT_MASK(val), FE_PSE_PORT_CDM2)); + for (i = 0; i < eth->soc->num_ppe; i++) + airoha_ppe_set_cpu_port(port, i, AIROHA_GDM2_IDX); + if (port->id == AIROHA_GDM4_IDX && airoha_is_7581(eth)) { u32 mask = FC_ID_OF_SRC_PORT_MASK(port->nbq); @@ -1807,7 +1867,8 @@ static int airoha_dev_init(struct net_device *dev) } for (i = 0; i < eth->soc->num_ppe; i++) - airoha_ppe_set_cpu_port(port, i); + airoha_ppe_set_cpu_port(port, i, + airoha_get_fe_port(port)); return 0; } @@ -1984,6 +2045,7 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb, if (q->queued + nr_frags >= q->ndesc) { /* not enough space in the queue */ netif_tx_stop_queue(txq); + q->txq_stopped = true; spin_unlock_bh(&q->lock); return NETDEV_TX_BUSY; } @@ -2039,8 +2101,10 @@ static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb, TX_RING_CPU_IDX_MASK, FIELD_PREP(TX_RING_CPU_IDX_MASK, index)); - if (q->ndesc - q->queued < q->free_thr) + if (q->ndesc - q->queued < q->free_thr) { netif_tx_stop_queue(txq); + q->txq_stopped = true; + } spin_unlock_bh(&q->lock); diff --git a/drivers/net/ethernet/airoha/airoha_eth.h b/drivers/net/ethernet/airoha/airoha_eth.h index 95e557638617..e389d2fe3b86 100644 --- a/drivers/net/ethernet/airoha/airoha_eth.h +++ b/drivers/net/ethernet/airoha/airoha_eth.h @@ -193,6 +193,7 @@ struct airoha_queue { int ndesc; int free_thr; int buf_size; + bool txq_stopped; struct napi_struct napi; struct page_pool *page_pool; @@ -653,7 +654,8 @@ int airoha_get_fe_port(struct airoha_gdm_port *port); bool airoha_is_valid_gdm_port(struct airoha_eth *eth, struct airoha_gdm_port *port); -void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id); +void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id, + u8 fport); bool airoha_ppe_is_enabled(struct airoha_eth *eth, int index); void airoha_ppe_check_skb(struct airoha_ppe_dev *dev, struct sk_buff *skb, u16 hash, bool rx_wlan); diff --git a/drivers/net/ethernet/airoha/airoha_ppe.c b/drivers/net/ethernet/airoha/airoha_ppe.c index 03115c1c1063..5c9dff6bccd1 100644 --- a/drivers/net/ethernet/airoha/airoha_ppe.c +++ b/drivers/net/ethernet/airoha/airoha_ppe.c @@ -85,10 +85,9 @@ static u32 airoha_ppe_get_timestamp(struct airoha_ppe *ppe) return FIELD_GET(AIROHA_FOE_IB1_BIND_TIMESTAMP, timestamp); } -void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id) +void airoha_ppe_set_cpu_port(struct airoha_gdm_port *port, u8 ppe_id, u8 fport) { struct airoha_qdma *qdma = port->qdma; - u8 fport = airoha_get_fe_port(port); struct airoha_eth *eth = qdma->eth; u8 qdma_id = qdma - ð->qdma[0]; u32 fe_cpu_port; @@ -182,7 +181,8 @@ static void airoha_ppe_hw_init(struct airoha_ppe *ppe) if (!port) continue; - airoha_ppe_set_cpu_port(port, i); + airoha_ppe_set_cpu_port(port, i, + airoha_get_fe_port(port)); } } } @@ -1356,6 +1356,29 @@ static struct airoha_npu *airoha_ppe_npu_get(struct airoha_eth *eth) return npu; } +static int airoha_ppe_wait_for_npu_init(struct airoha_eth *eth) +{ + int err; + u32 val; + + /* PPE_FLOW_CFG default register value is 0. Since we reset FE + * during the device probe we can just check the configured value + * is not 0 here. + */ + err = read_poll_timeout(airoha_fe_rr, val, val, USEC_PER_MSEC, + 100 * USEC_PER_MSEC, false, eth, + REG_PPE_PPE_FLOW_CFG(0)); + if (err) + return err; + + if (airoha_ppe_is_enabled(eth, 1)) + err = read_poll_timeout(airoha_fe_rr, val, val, USEC_PER_MSEC, + 100 * USEC_PER_MSEC, false, eth, + REG_PPE_PPE_FLOW_CFG(1)); + + return err; +} + static int airoha_ppe_offload_setup(struct airoha_eth *eth) { struct airoha_npu *npu = airoha_ppe_npu_get(eth); @@ -1369,6 +1392,11 @@ static int airoha_ppe_offload_setup(struct airoha_eth *eth) if (err) goto error_npu_put; + /* Wait for NPU PPE configuration to complete */ + err = airoha_ppe_wait_for_npu_init(eth); + if (err) + goto error_npu_put; + ppe_num_stats_entries = airoha_ppe_get_total_num_stats_entries(ppe); if (ppe_num_stats_entries > 0) { err = npu->ops.ppe_init_stats(npu, ppe->foe_stats_dma, diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_core.c b/drivers/net/ethernet/broadcom/bnge/bnge_core.c index 1c14c5fe8d61..68b74eb2c3a2 100644 --- a/drivers/net/ethernet/broadcom/bnge/bnge_core.c +++ b/drivers/net/ethernet/broadcom/bnge/bnge_core.c @@ -74,6 +74,13 @@ static int bnge_func_qcaps(struct bnge_dev *bd) return rc; } + return 0; +} + +static int bnge_func_qrcaps_qcfg(struct bnge_dev *bd) +{ + int rc; + rc = bnge_hwrm_func_resc_qcaps(bd); if (rc) { dev_err(bd->dev, "query resc caps failure rc: %d\n", rc); @@ -133,23 +140,28 @@ static int bnge_fw_register_dev(struct bnge_dev *bd) bnge_hwrm_fw_set_time(bd); - rc = bnge_hwrm_func_drv_rgtr(bd); + /* Get the resources and configuration from firmware */ + rc = bnge_func_qcaps(bd); if (rc) { - dev_err(bd->dev, "Failed to rgtr with firmware rc: %d\n", rc); + dev_err(bd->dev, "Failed querying caps rc: %d\n", rc); return rc; } rc = bnge_alloc_ctx_mem(bd); if (rc) { dev_err(bd->dev, "Failed to allocate ctx mem rc: %d\n", rc); - goto err_func_unrgtr; + goto err_free_ctx_mem; } - /* Get the resources and configuration from firmware */ - rc = bnge_func_qcaps(bd); + rc = bnge_hwrm_func_drv_rgtr(bd); if (rc) { - dev_err(bd->dev, "Failed initial configuration rc: %d\n", rc); - rc = -ENODEV; + dev_err(bd->dev, "Failed to rgtr with firmware rc: %d\n", rc); + goto err_free_ctx_mem; + } + + rc = bnge_func_qrcaps_qcfg(bd); + if (rc) { + dev_err(bd->dev, "Failed querying resources rc: %d\n", rc); goto err_func_unrgtr; } @@ -158,7 +170,9 @@ static int bnge_fw_register_dev(struct bnge_dev *bd) return 0; err_func_unrgtr: - bnge_fw_unregister_dev(bd); + bnge_hwrm_func_drv_unrgtr(bd); +err_free_ctx_mem: + bnge_free_ctx_mem(bd); return rc; } diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c index 94f15e08a88c..b066ee887a09 100644 --- a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c +++ b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c @@ -324,7 +324,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd) u32 l2_qps, qp1_qps, max_qps; u32 ena, entries_sp, entries; u32 srqs, max_srqs, min; - u32 num_mr, num_ah; u32 extra_srqs = 0; u32 extra_qps = 0; u32 fast_qpmd_qps; @@ -390,21 +389,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd) if (!bnge_is_roce_en(bd)) goto skip_rdma; - ctxm = &ctx->ctx_arr[BNGE_CTX_MRAV]; - /* 128K extra is needed to accommodate static AH context - * allocation by f/w. - */ - num_mr = min_t(u32, ctxm->max_entries / 2, 1024 * 256); - num_ah = min_t(u32, num_mr, 1024 * 128); - ctxm->split_entry_cnt = BNGE_CTX_MRAV_AV_SPLIT_ENTRY + 1; - if (!ctxm->mrav_av_entries || ctxm->mrav_av_entries > num_ah) - ctxm->mrav_av_entries = num_ah; - - rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, num_mr + num_ah, 2); - if (rc) - return rc; - ena |= FUNC_BACKING_STORE_CFG_REQ_ENABLES_MRAV; - ctxm = &ctx->ctx_arr[BNGE_CTX_TIM]; rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, l2_qps + qp1_qps + extra_qps, 1); if (rc) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 58cf02bcb98a..8c55874f44ca 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -11132,8 +11132,9 @@ static int bnxt_setup_nitroa0_vnic(struct bnxt *bp) return rc; } -static int bnxt_cfg_rx_mode(struct bnxt *); -static bool bnxt_mc_list_updated(struct bnxt *, u32 *); +static int bnxt_cfg_rx_mode(struct bnxt *, struct netdev_hw_addr_list *, bool); +static bool bnxt_mc_list_updated(struct bnxt *, u32 *, + const struct netdev_hw_addr_list *); static int bnxt_init_chip(struct bnxt *bp, bool irq_re_init) { @@ -11223,11 +11224,11 @@ static int bnxt_init_chip(struct bnxt *bp, bool irq_re_init) } else if (bp->dev->flags & IFF_MULTICAST) { u32 mask = 0; - bnxt_mc_list_updated(bp, &mask); + bnxt_mc_list_updated(bp, &mask, &bp->dev->mc); vnic->rx_mask |= mask; } - rc = bnxt_cfg_rx_mode(bp); + rc = bnxt_cfg_rx_mode(bp, &bp->dev->uc, true); if (rc) goto err_out; @@ -13622,17 +13623,17 @@ void bnxt_get_ring_drv_stats(struct bnxt *bp, bnxt_get_one_ring_drv_stats(bp, stats, &bp->bnapi[i]->cp_ring); } -static bool bnxt_mc_list_updated(struct bnxt *bp, u32 *rx_mask) +static bool bnxt_mc_list_updated(struct bnxt *bp, u32 *rx_mask, + const struct netdev_hw_addr_list *mc) { struct bnxt_vnic_info *vnic = &bp->vnic_info[BNXT_VNIC_DEFAULT]; - struct net_device *dev = bp->dev; struct netdev_hw_addr *ha; u8 *haddr; int mc_count = 0; bool update = false; int off = 0; - netdev_for_each_mc_addr(ha, dev) { + netdev_hw_addr_list_for_each(ha, mc) { if (mc_count >= BNXT_MAX_MC_ADDRS) { *rx_mask |= CFA_L2_SET_RX_MASK_REQ_MASK_ALL_MCAST; vnic->mc_list_count = 0; @@ -13656,17 +13657,17 @@ static bool bnxt_mc_list_updated(struct bnxt *bp, u32 *rx_mask) return update; } -static bool bnxt_uc_list_updated(struct bnxt *bp) +static bool bnxt_uc_list_updated(struct bnxt *bp, + const struct netdev_hw_addr_list *uc) { - struct net_device *dev = bp->dev; struct bnxt_vnic_info *vnic = &bp->vnic_info[BNXT_VNIC_DEFAULT]; struct netdev_hw_addr *ha; int off = 0; - if (netdev_uc_count(dev) != (vnic->uc_filter_count - 1)) + if (netdev_hw_addr_list_count(uc) != (vnic->uc_filter_count - 1)) return true; - netdev_for_each_uc_addr(ha, dev) { + netdev_hw_addr_list_for_each(ha, uc) { if (!ether_addr_equal(ha->addr, vnic->uc_list + off)) return true; @@ -13675,7 +13676,9 @@ static bool bnxt_uc_list_updated(struct bnxt *bp) return false; } -static void bnxt_set_rx_mode(struct net_device *dev) +static void bnxt_set_rx_mode(struct net_device *dev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { struct bnxt *bp = netdev_priv(dev); struct bnxt_vnic_info *vnic; @@ -13696,7 +13699,7 @@ static void bnxt_set_rx_mode(struct net_device *dev) if (dev->flags & IFF_PROMISC) mask |= CFA_L2_SET_RX_MASK_REQ_MASK_PROMISCUOUS; - uc_update = bnxt_uc_list_updated(bp); + uc_update = bnxt_uc_list_updated(bp, uc); if (dev->flags & IFF_BROADCAST) mask |= CFA_L2_SET_RX_MASK_REQ_MASK_BCAST; @@ -13704,27 +13707,23 @@ static void bnxt_set_rx_mode(struct net_device *dev) mask |= CFA_L2_SET_RX_MASK_REQ_MASK_ALL_MCAST; vnic->mc_list_count = 0; } else if (dev->flags & IFF_MULTICAST) { - mc_update = bnxt_mc_list_updated(bp, &mask); + mc_update = bnxt_mc_list_updated(bp, &mask, mc); } if (mask != vnic->rx_mask || uc_update || mc_update) { vnic->rx_mask = mask; - bnxt_queue_sp_work(bp, BNXT_RX_MASK_SP_EVENT); + bnxt_cfg_rx_mode(bp, uc, uc_update); } } -static int bnxt_cfg_rx_mode(struct bnxt *bp) +static int bnxt_cfg_rx_mode(struct bnxt *bp, struct netdev_hw_addr_list *uc, + bool uc_update) { struct net_device *dev = bp->dev; struct bnxt_vnic_info *vnic = &bp->vnic_info[BNXT_VNIC_DEFAULT]; struct netdev_hw_addr *ha; int i, off = 0, rc; - bool uc_update; - - netif_addr_lock_bh(dev); - uc_update = bnxt_uc_list_updated(bp); - netif_addr_unlock_bh(dev); if (!uc_update) goto skip_uc; @@ -13739,10 +13738,10 @@ static int bnxt_cfg_rx_mode(struct bnxt *bp) vnic->uc_filter_count = 1; netif_addr_lock_bh(dev); - if (netdev_uc_count(dev) > (BNXT_MAX_UC_ADDRS - 1)) { + if (netdev_hw_addr_list_count(uc) > (BNXT_MAX_UC_ADDRS - 1)) { vnic->rx_mask |= CFA_L2_SET_RX_MASK_REQ_MASK_PROMISCUOUS; } else { - netdev_for_each_uc_addr(ha, dev) { + netdev_hw_addr_list_for_each(ha, uc) { memcpy(vnic->uc_list + off, ha->addr, ETH_ALEN); off += ETH_ALEN; vnic->uc_filter_count++; @@ -14708,6 +14707,7 @@ static void bnxt_ulp_restart(struct bnxt *bp) static void bnxt_sp_task(struct work_struct *work) { struct bnxt *bp = container_of(work, struct bnxt, sp_task); + struct net_device *dev = bp->dev; set_bit(BNXT_STATE_IN_SP_TASK, &bp->state); smp_mb__after_atomic(); @@ -14721,9 +14721,6 @@ static void bnxt_sp_task(struct work_struct *work) bnxt_reenable_sriov(bp); } - if (test_and_clear_bit(BNXT_RX_MASK_SP_EVENT, &bp->sp_event)) - bnxt_cfg_rx_mode(bp); - if (test_and_clear_bit(BNXT_RX_NTP_FLTR_SP_EVENT, &bp->sp_event)) bnxt_cfg_ntp_filters(bp); if (test_and_clear_bit(BNXT_HWRM_EXEC_FWD_REQ_SP_EVENT, &bp->sp_event)) @@ -14788,6 +14785,13 @@ static void bnxt_sp_task(struct work_struct *work) /* These functions below will clear BNXT_STATE_IN_SP_TASK. They * must be the last functions to be called before exiting. */ + if (test_and_clear_bit(BNXT_RX_MASK_SP_EVENT, &bp->sp_event)) { + bnxt_lock_sp(bp); + if (test_bit(BNXT_STATE_OPEN, &bp->state)) + bnxt_cfg_rx_mode(bp, &dev->uc, true); + bnxt_unlock_sp(bp); + } + if (test_and_clear_bit(BNXT_RESET_TASK_SP_EVENT, &bp->sp_event)) bnxt_reset(bp, false); @@ -15989,7 +15993,7 @@ static const struct net_device_ops bnxt_netdev_ops = { .ndo_start_xmit = bnxt_start_xmit, .ndo_stop = bnxt_close, .ndo_get_stats64 = bnxt_get_stats64, - .ndo_set_rx_mode = bnxt_set_rx_mode, + .ndo_set_rx_mode_async = bnxt_set_rx_mode, .ndo_eth_ioctl = bnxt_ioctl, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address = bnxt_change_mac_addr, diff --git a/drivers/net/ethernet/freescale/enetc/ntmp.c b/drivers/net/ethernet/freescale/enetc/ntmp.c index 703752995e93..c94a928622fd 100644 --- a/drivers/net/ethernet/freescale/enetc/ntmp.c +++ b/drivers/net/ethernet/freescale/enetc/ntmp.c @@ -7,6 +7,7 @@ #include <linux/dma-mapping.h> #include <linux/fsl/netc_global.h> #include <linux/iopoll.h> +#include <linux/vmalloc.h> #include "ntmp_private.h" @@ -42,6 +43,12 @@ int ntmp_init_cbdr(struct netc_cbdr *cbdr, struct device *dev, if (!cbdr->addr_base) return -ENOMEM; + cbdr->swcbd = vcalloc(cbd_num, sizeof(struct netc_swcbd)); + if (!cbdr->swcbd) { + dma_free_coherent(dev, size, cbdr->addr_base, cbdr->dma_base); + return -ENOMEM; + } + cbdr->dma_size = size; cbdr->bd_num = cbd_num; cbdr->regs = *regs; @@ -52,10 +59,10 @@ int ntmp_init_cbdr(struct netc_cbdr *cbdr, struct device *dev, cbdr->addr_base_align = PTR_ALIGN(cbdr->addr_base, NTMP_BASE_ADDR_ALIGN); - spin_lock_init(&cbdr->ring_lock); + mutex_init(&cbdr->ring_lock); cbdr->next_to_use = netc_read(cbdr->regs.pir); - cbdr->next_to_clean = netc_read(cbdr->regs.cir); + cbdr->next_to_clean = netc_read(cbdr->regs.cir) & NETC_CBDRCIR_INDEX; /* Step 1: Configure the base address of the Control BD Ring */ netc_write(cbdr->regs.bar0, lower_32_bits(cbdr->dma_base_align)); @@ -71,10 +78,24 @@ int ntmp_init_cbdr(struct netc_cbdr *cbdr, struct device *dev, } EXPORT_SYMBOL_GPL(ntmp_init_cbdr); +static void ntmp_free_data_mem(struct device *dev, struct netc_swcbd *swcbd) +{ + if (unlikely(!swcbd->buf)) + return; + + dma_free_coherent(dev, swcbd->size + NTMP_DATA_ADDR_ALIGN, + swcbd->buf, swcbd->dma); +} + void ntmp_free_cbdr(struct netc_cbdr *cbdr) { /* Disable the Control BD Ring */ netc_write(cbdr->regs.mr, 0); + + for (int i = 0; i < cbdr->bd_num; i++) + ntmp_free_data_mem(cbdr->dev, &cbdr->swcbd[i]); + + vfree(cbdr->swcbd); dma_free_coherent(cbdr->dev, cbdr->dma_size, cbdr->addr_base, cbdr->dma_base); memset(cbdr, 0, sizeof(*cbdr)); @@ -94,40 +115,59 @@ static union netc_cbd *ntmp_get_cbd(struct netc_cbdr *cbdr, int index) static void ntmp_clean_cbdr(struct netc_cbdr *cbdr) { - union netc_cbd *cbd; - int i; + int i = cbdr->next_to_clean; + + while ((netc_read(cbdr->regs.cir) & NETC_CBDRCIR_INDEX) != i) { + union netc_cbd *cbd = ntmp_get_cbd(cbdr, i); + struct netc_swcbd *swcbd = &cbdr->swcbd[i]; - i = cbdr->next_to_clean; - while (netc_read(cbdr->regs.cir) != i) { - cbd = ntmp_get_cbd(cbdr, i); + ntmp_free_data_mem(cbdr->dev, swcbd); + memset(swcbd, 0, sizeof(*swcbd)); memset(cbd, 0, sizeof(*cbd)); i = (i + 1) % cbdr->bd_num; } + dma_wmb(); cbdr->next_to_clean = i; } -static int netc_xmit_ntmp_cmd(struct ntmp_user *user, union netc_cbd *cbd) +static void ntmp_select_and_lock_cbdr(struct ntmp_user *user, + struct netc_cbdr **cbdr) +{ + /* Currently only ENETC is supported, and it has only one command + * BD ring. + */ + *cbdr = &user->ring[0]; + + mutex_lock(&(*cbdr)->ring_lock); +} + +static void ntmp_unlock_cbdr(struct netc_cbdr *cbdr) +{ + mutex_unlock(&cbdr->ring_lock); +} + +static int netc_xmit_ntmp_cmd(struct netc_cbdr *cbdr, union netc_cbd *cbd, + struct netc_swcbd *swcbd) { union netc_cbd *cur_cbd; - struct netc_cbdr *cbdr; - int i, err; + int i, err, used_bds; u16 status; u32 val; - /* Currently only i.MX95 ENETC is supported, and it only has one - * command BD ring - */ - cbdr = &user->ring[0]; - - spin_lock_bh(&cbdr->ring_lock); - - if (unlikely(!ntmp_get_free_cbd_num(cbdr))) + used_bds = cbdr->bd_num - ntmp_get_free_cbd_num(cbdr); + if (unlikely(used_bds >= NETC_CBDR_CLEAN_WORK)) { ntmp_clean_cbdr(cbdr); + if (unlikely(!ntmp_get_free_cbd_num(cbdr))) { + ntmp_free_data_mem(cbdr->dev, swcbd); + return -EBUSY; + } + } i = cbdr->next_to_use; cur_cbd = ntmp_get_cbd(cbdr, i); *cur_cbd = *cbd; + cbdr->swcbd[i] = *swcbd; dma_wmb(); /* Update producer index of both software and hardware */ @@ -135,11 +175,17 @@ static int netc_xmit_ntmp_cmd(struct ntmp_user *user, union netc_cbd *cbd) cbdr->next_to_use = i; netc_write(cbdr->regs.pir, i); - err = read_poll_timeout_atomic(netc_read, val, val == i, - NETC_CBDR_DELAY_US, NETC_CBDR_TIMEOUT, - true, cbdr->regs.cir); + err = read_poll_timeout(netc_read, val, + (val & NETC_CBDRCIR_INDEX) == i, + NETC_CBDR_DELAY_US, NETC_CBDR_TIMEOUT, + true, cbdr->regs.cir); if (unlikely(err)) - goto cbdr_unlock; + return err; + + if (unlikely(val & NETC_CBDRCIR_SBE)) { + dev_err(cbdr->dev, "Command BD system bus error\n"); + return -EIO; + } dma_rmb(); /* Get the writeback command BD, because the caller may need @@ -150,40 +196,29 @@ static int netc_xmit_ntmp_cmd(struct ntmp_user *user, union netc_cbd *cbd) /* Check the writeback error status */ status = le16_to_cpu(cbd->resp_hdr.error_rr) & NTMP_RESP_ERROR; if (unlikely(status)) { - err = -EIO; - dev_err(user->dev, "Command BD error: 0x%04x\n", status); + dev_err(cbdr->dev, "Command BD error: 0x%04x\n", status); + return -EIO; } - ntmp_clean_cbdr(cbdr); - dma_wmb(); - -cbdr_unlock: - spin_unlock_bh(&cbdr->ring_lock); - - return err; + return 0; } -static int ntmp_alloc_data_mem(struct ntmp_dma_buf *data, void **buf_align) +static int ntmp_alloc_data_mem(struct device *dev, struct netc_swcbd *swcbd, + void **buf_align) { void *buf; - buf = dma_alloc_coherent(data->dev, data->size + NTMP_DATA_ADDR_ALIGN, - &data->dma, GFP_KERNEL); + buf = dma_alloc_coherent(dev, swcbd->size + NTMP_DATA_ADDR_ALIGN, + &swcbd->dma, GFP_KERNEL); if (!buf) return -ENOMEM; - data->buf = buf; + swcbd->buf = buf; *buf_align = PTR_ALIGN(buf, NTMP_DATA_ADDR_ALIGN); return 0; } -static void ntmp_free_data_mem(struct ntmp_dma_buf *data) -{ - dma_free_coherent(data->dev, data->size + NTMP_DATA_ADDR_ALIGN, - data->buf, data->dma); -} - static void ntmp_fill_request_hdr(union netc_cbd *cbd, dma_addr_t dma, int len, int table_id, int cmd, int access_method) @@ -234,37 +269,39 @@ static int ntmp_delete_entry_by_id(struct ntmp_user *user, int tbl_id, u8 tbl_ver, u32 entry_id, u32 req_len, u32 resp_len) { - struct ntmp_dma_buf data = { - .dev = user->dev, + struct netc_swcbd swcbd = { .size = max(req_len, resp_len), }; struct ntmp_req_by_eid *req; + struct netc_cbdr *cbdr; union netc_cbd cbd; int err; - err = ntmp_alloc_data_mem(&data, (void **)&req); + err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req); if (err) return err; ntmp_fill_crd_eid(req, tbl_ver, 0, 0, entry_id); - ntmp_fill_request_hdr(&cbd, data.dma, NTMP_LEN(req_len, resp_len), + ntmp_fill_request_hdr(&cbd, swcbd.dma, NTMP_LEN(req_len, resp_len), tbl_id, NTMP_CMD_DELETE, NTMP_AM_ENTRY_ID); - err = netc_xmit_ntmp_cmd(user, &cbd); + ntmp_select_and_lock_cbdr(user, &cbdr); + err = netc_xmit_ntmp_cmd(cbdr, &cbd, &swcbd); if (err) dev_err(user->dev, "Failed to delete entry 0x%x of %s, err: %pe", entry_id, ntmp_table_name(tbl_id), ERR_PTR(err)); - - ntmp_free_data_mem(&data); + ntmp_unlock_cbdr(cbdr); return err; } -static int ntmp_query_entry_by_id(struct ntmp_user *user, int tbl_id, - u32 len, struct ntmp_req_by_eid *req, - dma_addr_t dma, bool compare_eid) +static int ntmp_query_entry_by_id(struct netc_cbdr *cbdr, int tbl_id, + struct ntmp_req_by_eid *req, + struct netc_swcbd *swcbd, + bool compare_eid) { + u32 len = NTMP_LEN(sizeof(*req), swcbd->size); struct ntmp_cmn_resp_query *resp; int cmd = NTMP_CMD_QUERY; union netc_cbd cbd; @@ -276,10 +313,11 @@ static int ntmp_query_entry_by_id(struct ntmp_user *user, int tbl_id, cmd = NTMP_CMD_QU; /* Request header */ - ntmp_fill_request_hdr(&cbd, dma, len, tbl_id, cmd, NTMP_AM_ENTRY_ID); - err = netc_xmit_ntmp_cmd(user, &cbd); + ntmp_fill_request_hdr(&cbd, swcbd->dma, len, tbl_id, cmd, + NTMP_AM_ENTRY_ID); + err = netc_xmit_ntmp_cmd(cbdr, &cbd, swcbd); if (err) { - dev_err(user->dev, + dev_err(cbdr->dev, "Failed to query entry 0x%x of %s, err: %pe\n", entry_id, ntmp_table_name(tbl_id), ERR_PTR(err)); return err; @@ -293,7 +331,7 @@ static int ntmp_query_entry_by_id(struct ntmp_user *user, int tbl_id, resp = (struct ntmp_cmn_resp_query *)req; if (unlikely(le32_to_cpu(resp->entry_id) != entry_id)) { - dev_err(user->dev, + dev_err(cbdr->dev, "%s: query EID 0x%x doesn't match response EID 0x%x\n", ntmp_table_name(tbl_id), entry_id, le32_to_cpu(resp->entry_id)); return -EIO; @@ -305,15 +343,15 @@ static int ntmp_query_entry_by_id(struct ntmp_user *user, int tbl_id, int ntmp_maft_add_entry(struct ntmp_user *user, u32 entry_id, struct maft_entry_data *maft) { - struct ntmp_dma_buf data = { - .dev = user->dev, + struct netc_swcbd swcbd = { .size = sizeof(struct maft_req_add), }; struct maft_req_add *req; + struct netc_cbdr *cbdr; union netc_cbd cbd; int err; - err = ntmp_alloc_data_mem(&data, (void **)&req); + err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req); if (err) return err; @@ -322,14 +360,15 @@ int ntmp_maft_add_entry(struct ntmp_user *user, u32 entry_id, req->keye = maft->keye; req->cfge = maft->cfge; - ntmp_fill_request_hdr(&cbd, data.dma, NTMP_LEN(data.size, 0), + ntmp_fill_request_hdr(&cbd, swcbd.dma, NTMP_LEN(swcbd.size, 0), NTMP_MAFT_ID, NTMP_CMD_ADD, NTMP_AM_ENTRY_ID); - err = netc_xmit_ntmp_cmd(user, &cbd); + + ntmp_select_and_lock_cbdr(user, &cbdr); + err = netc_xmit_ntmp_cmd(cbdr, &cbd, &swcbd); if (err) dev_err(user->dev, "Failed to add MAFT entry 0x%x, err: %pe\n", entry_id, ERR_PTR(err)); - - ntmp_free_data_mem(&data); + ntmp_unlock_cbdr(cbdr); return err; } @@ -338,31 +377,31 @@ EXPORT_SYMBOL_GPL(ntmp_maft_add_entry); int ntmp_maft_query_entry(struct ntmp_user *user, u32 entry_id, struct maft_entry_data *maft) { - struct ntmp_dma_buf data = { - .dev = user->dev, + struct netc_swcbd swcbd = { .size = sizeof(struct maft_resp_query), }; struct maft_resp_query *resp; struct ntmp_req_by_eid *req; + struct netc_cbdr *cbdr; int err; - err = ntmp_alloc_data_mem(&data, (void **)&req); + err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req); if (err) return err; ntmp_fill_crd_eid(req, user->tbl.maft_ver, 0, 0, entry_id); - err = ntmp_query_entry_by_id(user, NTMP_MAFT_ID, - NTMP_LEN(sizeof(*req), data.size), - req, data.dma, true); + + ntmp_select_and_lock_cbdr(user, &cbdr); + err = ntmp_query_entry_by_id(cbdr, NTMP_MAFT_ID, req, &swcbd, true); if (err) - goto end; + goto unlock_cbdr; resp = (struct maft_resp_query *)req; maft->keye = resp->keye; maft->cfge = resp->cfge; -end: - ntmp_free_data_mem(&data); +unlock_cbdr: + ntmp_unlock_cbdr(cbdr); return err; } @@ -378,8 +417,9 @@ EXPORT_SYMBOL_GPL(ntmp_maft_delete_entry); int ntmp_rsst_update_entry(struct ntmp_user *user, const u32 *table, int count) { - struct ntmp_dma_buf data = {.dev = user->dev}; struct rsst_req_update *req; + struct netc_swcbd swcbd; + struct netc_cbdr *cbdr; union netc_cbd cbd; int err, i; @@ -387,8 +427,8 @@ int ntmp_rsst_update_entry(struct ntmp_user *user, const u32 *table, /* HW only takes in a full 64 entry table */ return -EINVAL; - data.size = struct_size(req, groups, count); - err = ntmp_alloc_data_mem(&data, (void **)&req); + swcbd.size = struct_size(req, groups, count); + err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req); if (err) return err; @@ -398,15 +438,15 @@ int ntmp_rsst_update_entry(struct ntmp_user *user, const u32 *table, for (i = 0; i < count; i++) req->groups[i] = (u8)(table[i]); - ntmp_fill_request_hdr(&cbd, data.dma, NTMP_LEN(data.size, 0), + ntmp_fill_request_hdr(&cbd, swcbd.dma, NTMP_LEN(swcbd.size, 0), NTMP_RSST_ID, NTMP_CMD_UPDATE, NTMP_AM_ENTRY_ID); - err = netc_xmit_ntmp_cmd(user, &cbd); + ntmp_select_and_lock_cbdr(user, &cbdr); + err = netc_xmit_ntmp_cmd(cbdr, &cbd, &swcbd); if (err) dev_err(user->dev, "Failed to update RSST entry, err: %pe\n", ERR_PTR(err)); - - ntmp_free_data_mem(&data); + ntmp_unlock_cbdr(cbdr); return err; } @@ -414,8 +454,9 @@ EXPORT_SYMBOL_GPL(ntmp_rsst_update_entry); int ntmp_rsst_query_entry(struct ntmp_user *user, u32 *table, int count) { - struct ntmp_dma_buf data = {.dev = user->dev}; struct ntmp_req_by_eid *req; + struct netc_swcbd swcbd; + struct netc_cbdr *cbdr; union netc_cbd cbd; int err, i; u8 *group; @@ -424,21 +465,23 @@ int ntmp_rsst_query_entry(struct ntmp_user *user, u32 *table, int count) /* HW only takes in a full 64 entry table */ return -EINVAL; - data.size = NTMP_ENTRY_ID_SIZE + RSST_STSE_DATA_SIZE(count) + - RSST_CFGE_DATA_SIZE(count); - err = ntmp_alloc_data_mem(&data, (void **)&req); + swcbd.size = NTMP_ENTRY_ID_SIZE + RSST_STSE_DATA_SIZE(count) + + RSST_CFGE_DATA_SIZE(count); + err = ntmp_alloc_data_mem(user->dev, &swcbd, (void **)&req); if (err) return err; /* Set the request data buffer */ ntmp_fill_crd_eid(req, user->tbl.rsst_ver, 0, 0, 0); - ntmp_fill_request_hdr(&cbd, data.dma, NTMP_LEN(sizeof(*req), data.size), + ntmp_fill_request_hdr(&cbd, swcbd.dma, NTMP_LEN(sizeof(*req), swcbd.size), NTMP_RSST_ID, NTMP_CMD_QUERY, NTMP_AM_ENTRY_ID); - err = netc_xmit_ntmp_cmd(user, &cbd); + + ntmp_select_and_lock_cbdr(user, &cbdr); + err = netc_xmit_ntmp_cmd(cbdr, &cbd, &swcbd); if (err) { dev_err(user->dev, "Failed to query RSST entry, err: %pe\n", ERR_PTR(err)); - goto end; + goto unlock_cbdr; } group = (u8 *)req; @@ -446,8 +489,8 @@ int ntmp_rsst_query_entry(struct ntmp_user *user, u32 *table, int count) for (i = 0; i < count; i++) table[i] = group[i]; -end: - ntmp_free_data_mem(&data); +unlock_cbdr: + ntmp_unlock_cbdr(cbdr); return err; } diff --git a/drivers/net/ethernet/freescale/enetc/ntmp_private.h b/drivers/net/ethernet/freescale/enetc/ntmp_private.h index 34394e40fddd..f8dff3ba2c28 100644 --- a/drivers/net/ethernet/freescale/enetc/ntmp_private.h +++ b/drivers/net/ethernet/freescale/enetc/ntmp_private.h @@ -12,6 +12,9 @@ #define NTMP_EID_REQ_LEN 8 #define NETC_CBDR_BD_NUM 256 +#define NETC_CBDRCIR_INDEX GENMASK(9, 0) +#define NETC_CBDRCIR_SBE BIT(31) +#define NETC_CBDR_CLEAN_WORK 16 union netc_cbd { struct { @@ -54,13 +57,6 @@ union netc_cbd { } resp_hdr; /* NTMP Response Message Header Format */ }; -struct ntmp_dma_buf { - struct device *dev; - size_t size; - void *buf; - dma_addr_t dma; -}; - struct ntmp_cmn_req_data { __le16 update_act; u8 dbg_opt; diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 9befdacd6730..7ce0cc8ab8f4 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -7706,6 +7706,7 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent) err_register: if (!(adapter->flags & FLAG_HAS_AMT)) e1000e_release_hw_control(adapter); + e1000e_ptp_remove(adapter); err_eeprom: if (hw->phy.ops.check_reset_block && !hw->phy.ops.check_reset_block(hw)) e1000_phy_hw_reset(&adapter->hw); diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 926d001b2150..028bd500603a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -13783,7 +13783,6 @@ static int i40e_config_netdev(struct i40e_vsi *vsi) netdev->neigh_priv_len = sizeof(u32) * 4; netdev->priv_flags |= IFF_UNICAST_FLT; - netdev->priv_flags |= IFF_SUPP_NOFCS; /* Setup netdev TC information */ i40e_vsi_config_netdev_tc(vsi, vsi->tc_config.enabled_tc); diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index dad001abc908..3c1465cf0515 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -1150,14 +1150,18 @@ bool iavf_promiscuous_mode_changed(struct iavf_adapter *adapter) /** * iavf_set_rx_mode - NDO callback to set the netdev filters * @netdev: network interface device structure + * @uc: snapshot of uc address list + * @mc: snapshot of mc address list **/ -static void iavf_set_rx_mode(struct net_device *netdev) +static void iavf_set_rx_mode(struct net_device *netdev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { struct iavf_adapter *adapter = netdev_priv(netdev); spin_lock_bh(&adapter->mac_vlan_list_lock); - __dev_uc_sync(netdev, iavf_addr_sync, iavf_addr_unsync); - __dev_mc_sync(netdev, iavf_addr_sync, iavf_addr_unsync); + __hw_addr_sync_dev(uc, netdev, iavf_addr_sync, iavf_addr_unsync); + __hw_addr_sync_dev(mc, netdev, iavf_addr_sync, iavf_addr_unsync); spin_unlock_bh(&adapter->mac_vlan_list_lock); spin_lock_bh(&adapter->current_netdev_promisc_flags_lock); @@ -1210,7 +1214,9 @@ static void iavf_configure(struct iavf_adapter *adapter) struct net_device *netdev = adapter->netdev; int i; - iavf_set_rx_mode(netdev); + netif_addr_lock_bh(netdev); + iavf_set_rx_mode(netdev, &netdev->uc, &netdev->mc); + netif_addr_unlock_bh(netdev); iavf_configure_tx(adapter); iavf_configure_rx(adapter); @@ -5153,7 +5159,7 @@ static const struct net_device_ops iavf_netdev_ops = { .ndo_open = iavf_open, .ndo_stop = iavf_close, .ndo_start_xmit = iavf_xmit_frame, - .ndo_set_rx_mode = iavf_set_rx_mode, + .ndo_set_rx_mode_async = iavf_set_rx_mode, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address = iavf_set_mac, .ndo_change_mtu = iavf_change_mtu, diff --git a/drivers/net/ethernet/intel/iavf/iavf_type.h b/drivers/net/ethernet/intel/iavf/iavf_type.h index 1d8cf29cb65a..5bb1de1cfd33 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_type.h +++ b/drivers/net/ethernet/intel/iavf/iavf_type.h @@ -277,7 +277,7 @@ struct iavf_rx_desc { /* L2 Tag 2 Presence */ #define IAVF_RXD_LEGACY_L2TAG2P_M BIT(0) /* Stripped S-TAG VLAN from the receive packet */ -#define IAVF_RXD_LEGACY_L2TAG2_M GENMASK_ULL(63, 32) +#define IAVF_RXD_LEGACY_L2TAG2_M GENMASK_ULL(63, 48) /* Stripped S-TAG VLAN from the receive packet */ #define IAVF_RXD_FLEX_L2TAG2_2_M GENMASK_ULL(63, 48) /* The packet is a UDP tunneled packet */ diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index eb3a48330cc1..725b130dd3a2 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -753,7 +753,7 @@ static inline bool ice_is_xdp_ena_vsi(struct ice_vsi *vsi) static inline void ice_set_ring_xdp(struct ice_tx_ring *ring) { - ring->flags |= ICE_TX_FLAGS_RING_XDP; + set_bit(ICE_TX_RING_FLAGS_XDP, ring->flags); } /** @@ -778,7 +778,7 @@ static inline bool ice_is_txtime_ena(const struct ice_tx_ring *ring) */ static inline bool ice_is_txtime_cfg(const struct ice_tx_ring *ring) { - return !!(ring->flags & ICE_TX_FLAGS_TXTIME); + return test_bit(ICE_TX_RING_FLAGS_TXTIME, ring->flags); } /** diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h index 859e9c66f3e7..3cbb1b0582e3 100644 --- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h +++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h @@ -1252,7 +1252,7 @@ struct ice_aqc_get_link_status_data { #define ICE_AQ_LINK_PWR_QSFP_CLASS_3 2 #define ICE_AQ_LINK_PWR_QSFP_CLASS_4 3 __le16 link_speed; -#define ICE_AQ_LINK_SPEED_M 0x7FF +#define ICE_AQ_LINK_SPEED_M GENMASK(11, 0) #define ICE_AQ_LINK_SPEED_10MB BIT(0) #define ICE_AQ_LINK_SPEED_100MB BIT(1) #define ICE_AQ_LINK_SPEED_1000MB BIT(2) diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c index bd77f1c001ee..16aa25535152 100644 --- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c @@ -943,7 +943,7 @@ ice_tx_prepare_vlan_flags_dcb(struct ice_tx_ring *tx_ring, /* if this is not already set it means a VLAN 0 + priority needs * to be offloaded */ - if (tx_ring->flags & ICE_TX_FLAGS_RING_VLAN_L2TAG2) + if (test_bit(ICE_TX_RING_FLAGS_VLAN_L2TAG2, tx_ring->flags)) first->tx_flags |= ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN; else first->tx_flags |= ICE_TX_FLAGS_HW_VLAN; diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c index e6a20af6f63d..f28416a707d7 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c @@ -3290,6 +3290,7 @@ ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring, tx_rings[i].desc = NULL; tx_rings[i].tx_buf = NULL; tx_rings[i].tstamp_ring = NULL; + clear_bit(ICE_TX_RING_FLAGS_TXTIME, tx_rings[i].flags); tx_rings[i].tx_tstamps = &pf->ptp.port.tx; err = ice_setup_tx_ring(&tx_rings[i]); if (err) { diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 689c6025ea82..837b71b7b2b7 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -1412,9 +1412,9 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi) ring->count = vsi->num_tx_desc; ring->txq_teid = ICE_INVAL_TEID; if (dvm_ena) - ring->flags |= ICE_TX_FLAGS_RING_VLAN_L2TAG2; + set_bit(ICE_TX_RING_FLAGS_VLAN_L2TAG2, ring->flags); else - ring->flags |= ICE_TX_FLAGS_RING_VLAN_L2TAG1; + set_bit(ICE_TX_RING_FLAGS_VLAN_L2TAG1, ring->flags); WRITE_ONCE(vsi->tx_rings[i], ring); } diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index f93ee3b62391..5f92377d4dfc 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -1923,82 +1923,6 @@ static void ice_handle_mdd_event(struct ice_pf *pf) } /** - * ice_force_phys_link_state - Force the physical link state - * @vsi: VSI to force the physical link state to up/down - * @link_up: true/false indicates to set the physical link to up/down - * - * Force the physical link state by getting the current PHY capabilities from - * hardware and setting the PHY config based on the determined capabilities. If - * link changes a link event will be triggered because both the Enable Automatic - * Link Update and LESM Enable bits are set when setting the PHY capabilities. - * - * Returns 0 on success, negative on failure - */ -static int ice_force_phys_link_state(struct ice_vsi *vsi, bool link_up) -{ - struct ice_aqc_get_phy_caps_data *pcaps; - struct ice_aqc_set_phy_cfg_data *cfg; - struct ice_port_info *pi; - struct device *dev; - int retcode; - - if (!vsi || !vsi->port_info || !vsi->back) - return -EINVAL; - if (vsi->type != ICE_VSI_PF) - return 0; - - dev = ice_pf_to_dev(vsi->back); - - pi = vsi->port_info; - - pcaps = kzalloc_obj(*pcaps); - if (!pcaps) - return -ENOMEM; - - retcode = ice_aq_get_phy_caps(pi, false, ICE_AQC_REPORT_ACTIVE_CFG, pcaps, - NULL); - if (retcode) { - dev_err(dev, "Failed to get phy capabilities, VSI %d error %d\n", - vsi->vsi_num, retcode); - retcode = -EIO; - goto out; - } - - /* No change in link */ - if (link_up == !!(pcaps->caps & ICE_AQC_PHY_EN_LINK) && - link_up == !!(pi->phy.link_info.link_info & ICE_AQ_LINK_UP)) - goto out; - - /* Use the current user PHY configuration. The current user PHY - * configuration is initialized during probe from PHY capabilities - * software mode, and updated on set PHY configuration. - */ - cfg = kmemdup(&pi->phy.curr_user_phy_cfg, sizeof(*cfg), GFP_KERNEL); - if (!cfg) { - retcode = -ENOMEM; - goto out; - } - - cfg->caps |= ICE_AQ_PHY_ENA_AUTO_LINK_UPDT; - if (link_up) - cfg->caps |= ICE_AQ_PHY_ENA_LINK; - else - cfg->caps &= ~ICE_AQ_PHY_ENA_LINK; - - retcode = ice_aq_set_phy_cfg(&vsi->back->hw, pi, cfg, NULL); - if (retcode) { - dev_err(dev, "Failed to set phy config, VSI %d error %d\n", - vsi->vsi_num, retcode); - retcode = -EIO; - } - - kfree(cfg); -out: - kfree(pcaps); - return retcode; -} - -/** * ice_init_nvm_phy_type - Initialize the NVM PHY type * @pi: port info structure * @@ -2066,7 +1990,7 @@ static void ice_init_link_dflt_override(struct ice_port_info *pi) * first time media is available. The ICE_LINK_DEFAULT_OVERRIDE_PENDING state * is used to indicate that the user PHY cfg default override is initialized * and the PHY has not been configured with the default override settings. The - * state is set here, and cleared in ice_configure_phy the first time the PHY is + * state is set here, and cleared in ice_phy_cfg the first time the PHY is * configured. * * This function should be called only if the FW doesn't support default @@ -2172,14 +2096,18 @@ err_out: } /** - * ice_configure_phy - configure PHY + * ice_phy_cfg - configure PHY * @vsi: VSI of PHY + * @link_en: true/false indicates to set link to enable/disable * * Set the PHY configuration. If the current PHY configuration is the same as - * the curr_user_phy_cfg, then do nothing to avoid link flap. Otherwise - * configure the based get PHY capabilities for topology with media. + * the curr_user_phy_cfg and link_en hasn't changed, then do nothing to avoid + * link flap. Otherwise configure the PHY based get PHY capabilities for + * topology with media and link_en. + * + * Return: 0 on success, negative on failure */ -static int ice_configure_phy(struct ice_vsi *vsi) +static int ice_phy_cfg(struct ice_vsi *vsi, bool link_en) { struct device *dev = ice_pf_to_dev(vsi->back); struct ice_port_info *pi = vsi->port_info; @@ -2199,9 +2127,6 @@ static int ice_configure_phy(struct ice_vsi *vsi) phy->link_info.topo_media_conflict == ICE_AQ_LINK_TOPO_UNSUPP_MEDIA) return -EPERM; - if (test_bit(ICE_FLAG_LINK_DOWN_ON_CLOSE_ENA, pf->flags)) - return ice_force_phys_link_state(vsi, true); - pcaps = kzalloc_obj(*pcaps); if (!pcaps) return -ENOMEM; @@ -2215,10 +2140,8 @@ static int ice_configure_phy(struct ice_vsi *vsi) goto done; } - /* If PHY enable link is configured and configuration has not changed, - * there's nothing to do - */ - if (pcaps->caps & ICE_AQC_PHY_EN_LINK && + /* Configuration has not changed. There's nothing to do. */ + if (link_en == !!(pcaps->caps & ICE_AQC_PHY_EN_LINK) && ice_phy_caps_equals_cfg(pcaps, &phy->curr_user_phy_cfg)) goto done; @@ -2282,8 +2205,12 @@ static int ice_configure_phy(struct ice_vsi *vsi) */ ice_cfg_phy_fc(pi, cfg, phy->curr_user_fc_req); - /* Enable link and link update */ - cfg->caps |= ICE_AQ_PHY_ENA_AUTO_LINK_UPDT | ICE_AQ_PHY_ENA_LINK; + /* Enable/Disable link and link update */ + cfg->caps |= ICE_AQ_PHY_ENA_AUTO_LINK_UPDT; + if (link_en) + cfg->caps |= ICE_AQ_PHY_ENA_LINK; + else + cfg->caps &= ~ICE_AQ_PHY_ENA_LINK; err = ice_aq_set_phy_cfg(&pf->hw, pi, cfg, NULL); if (err) @@ -2336,7 +2263,7 @@ static void ice_check_media_subtask(struct ice_pf *pf) test_bit(ICE_FLAG_LINK_DOWN_ON_CLOSE_ENA, vsi->back->flags)) return; - err = ice_configure_phy(vsi); + err = ice_phy_cfg(vsi, true); if (!err) clear_bit(ICE_FLAG_NO_MEDIA, pf->flags); @@ -4892,9 +4819,15 @@ static int ice_init_link(struct ice_pf *pf) if (!test_bit(ICE_FLAG_LINK_DOWN_ON_CLOSE_ENA, pf->flags)) { struct ice_vsi *vsi = ice_get_main_vsi(pf); + struct ice_link_default_override_tlv *ldo; + bool link_en; + + ldo = &pf->link_dflt_override; + link_en = !(ldo->options & + ICE_LINK_OVERRIDE_AUTO_LINK_DIS); if (vsi) - ice_configure_phy(vsi); + ice_phy_cfg(vsi, link_en); } } else { set_bit(ICE_FLAG_NO_MEDIA, pf->flags); @@ -9707,7 +9640,7 @@ int ice_open_internal(struct net_device *netdev) } } - err = ice_configure_phy(vsi); + err = ice_phy_cfg(vsi, true); if (err) { netdev_err(netdev, "Failed to set physical link up, error %d\n", err); @@ -9748,7 +9681,7 @@ int ice_stop(struct net_device *netdev) } if (test_bit(ICE_FLAG_LINK_DOWN_ON_CLOSE_ENA, vsi->back->flags)) { - int link_err = ice_force_phys_link_state(vsi, false); + int link_err = ice_phy_cfg(vsi, false); if (link_err) { if (link_err == -ENOMEDIUM) diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c index 6cb0cf7a9891..36df742c326c 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp.c @@ -2710,7 +2710,7 @@ static bool ice_any_port_has_timestamps(struct ice_pf *pf) bool ice_ptp_tx_tstamps_pending(struct ice_pf *pf) { struct ice_hw *hw = &pf->hw; - unsigned int i; + int ret; /* Check software indicator */ switch (pf->ptp.tx_interrupt_mode) { @@ -2731,16 +2731,19 @@ bool ice_ptp_tx_tstamps_pending(struct ice_pf *pf) } /* Check hardware indicator */ - for (i = 0; i < ICE_GET_QUAD_NUM(hw->ptp.num_lports); i++) { - u64 tstamp_ready = 0; - int err; - - err = ice_get_phy_tx_tstamp_ready(&pf->hw, i, &tstamp_ready); - if (err || tstamp_ready) - return true; + ret = ice_check_phy_tx_tstamp_ready(hw); + if (ret < 0) { + dev_dbg(ice_pf_to_dev(pf), "Unable to read PHY Tx timestamp ready bitmap, err %d\n", + ret); + /* Stop triggering IRQs if we're unable to read PHY */ + return false; } - return false; + /* ice_check_phy_tx_tstamp_ready() returns 1 if there are timestamps + * available, 0 if there are no waiting timestamps, and a negative + * value if there was an error (which we checked for above). + */ + return ret > 0; } /** @@ -2824,8 +2827,7 @@ static void ice_ptp_maybe_trigger_tx_interrupt(struct ice_pf *pf) { struct device *dev = ice_pf_to_dev(pf); struct ice_hw *hw = &pf->hw; - bool trigger_oicr = false; - unsigned int i; + int ret; if (!pf->ptp.port.tx.has_ready_bitmap) return; @@ -2833,21 +2835,11 @@ static void ice_ptp_maybe_trigger_tx_interrupt(struct ice_pf *pf) if (!ice_pf_src_tmr_owned(pf)) return; - for (i = 0; i < ICE_GET_QUAD_NUM(hw->ptp.num_lports); i++) { - u64 tstamp_ready; - int err; - - err = ice_get_phy_tx_tstamp_ready(&pf->hw, i, &tstamp_ready); - if (!err && tstamp_ready) { - trigger_oicr = true; - break; - } - } - - if (trigger_oicr) { - /* Trigger a software interrupt, to ensure this data - * gets processed. - */ + ret = ice_check_phy_tx_tstamp_ready(hw); + if (ret < 0) { + dev_dbg(dev, "PTP periodic task unable to read PHY timestamp ready bitmap, err %d\n", + ret); + } else if (ret) { dev_dbg(dev, "PTP periodic task detected waiting timestamps. Triggering Tx timestamp interrupt now.\n"); wr32(hw, PFINT_OICR, PFINT_OICR_TSYN_TX_M); diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_consts.h b/drivers/net/ethernet/intel/ice/ice_ptp_consts.h index 19dddd9b53dd..4d298c27bfb2 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_consts.h +++ b/drivers/net/ethernet/intel/ice/ice_ptp_consts.h @@ -78,14 +78,14 @@ struct ice_eth56g_mac_reg_cfg eth56g_mac_cfg[NUM_ICE_ETH56G_LNK_SPD] = { .blktime = 0x666, /* 3.2 */ .tx_offset = { .serdes = 0x234c, /* 17.6484848 */ - .no_fec = 0x8e80, /* 71.25 */ + .no_fec = 0x93d9, /* 73 */ .fc = 0xb4a4, /* 90.32 */ .sfd = 0x4a4, /* 2.32 */ .onestep = 0x4ccd /* 38.4 */ }, .rx_offset = { .serdes = 0xffffeb27, /* -10.42424 */ - .no_fec = 0xffffcccd, /* -25.6 */ + .no_fec = 0xffffc7b6, /* -28 */ .fc = 0xfffc557b, /* -469.26 */ .sfd = 0x4a4, /* 2.32 */ .bs_ds = 0x32 /* 0.0969697 */ @@ -118,17 +118,17 @@ struct ice_eth56g_mac_reg_cfg eth56g_mac_cfg[NUM_ICE_ETH56G_LNK_SPD] = { .mktime = 0x147b, /* 10.24, only if RS-FEC enabled */ .tx_offset = { .serdes = 0xe1e, /* 7.0593939 */ - .no_fec = 0x3857, /* 28.17 */ + .no_fec = 0x4266, /* 33 */ .fc = 0x48c3, /* 36.38 */ - .rs = 0x8100, /* 64.5 */ + .rs = 0x8a00, /* 69 */ .sfd = 0x1dc, /* 0.93 */ .onestep = 0x1eb8 /* 15.36 */ }, .rx_offset = { .serdes = 0xfffff7a9, /* -4.1697 */ - .no_fec = 0xffffe71a, /* -12.45 */ + .no_fec = 0xffffe700, /* -12 */ .fc = 0xfffe894d, /* -187.35 */ - .rs = 0xfffff8cd, /* -3.6 */ + .rs = 0xfffff8cc, /* -3 */ .sfd = 0x1dc, /* 0.93 */ .bs_ds = 0x14 /* 0.0387879, RS-FEC 0 */ } diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c index 61c0a0d93ea8..24fb7a3e14d6 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.c +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.c @@ -378,6 +378,31 @@ static void ice_ptp_cfg_sync_delay(const struct ice_hw *hw, u32 delay) */ /** + * ice_ptp_init_phc_e825c - Perform E825C specific PHC initialization + * @hw: pointer to HW struct + * + * Perform E825C-specific PTP hardware clock initialization steps. + * + * Return: 0 on success, or a negative error value on failure. + */ +static int ice_ptp_init_phc_e825c(struct ice_hw *hw) +{ + int err; + + /* Soft reset all ports, to ensure everything is at a clean state */ + for (int port = 0; port < hw->ptp.num_lports; port++) { + err = ice_ptp_phy_soft_reset_eth56g(hw, port); + if (err) { + ice_debug(hw, ICE_DBG_PTP, "Failed to soft reset port %d, err %d\n", + port, err); + return err; + } + } + + return 0; +} + +/** * ice_ptp_get_dest_dev_e825 - get destination PHY for given port number * @hw: pointer to the HW struct * @port: destination port @@ -1847,6 +1872,8 @@ static int ice_phy_cfg_mac_eth56g(struct ice_hw *hw, u8 port) * @ena: enable or disable interrupt * @threshold: interrupt threshold * + * The threshold cannot be 0 while the interrupt is enabled. + * * Configure TX timestamp interrupt for the specified port * * Return: @@ -1858,19 +1885,45 @@ int ice_phy_cfg_intr_eth56g(struct ice_hw *hw, u8 port, bool ena, u8 threshold) int err; u32 val; + if (ena && !threshold) + return -EINVAL; + err = ice_read_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, &val); if (err) return err; + val &= ~PHY_TS_INT_CONFIG_ENA_M; if (ena) { - val |= PHY_TS_INT_CONFIG_ENA_M; val &= ~PHY_TS_INT_CONFIG_THRESHOLD_M; val |= FIELD_PREP(PHY_TS_INT_CONFIG_THRESHOLD_M, threshold); - } else { - val &= ~PHY_TS_INT_CONFIG_ENA_M; + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, + val); + if (err) { + ice_debug(hw, ICE_DBG_PTP, + "Failed to update 'threshold' PHY_REG_TS_INT_CONFIG port=%u ena=%u threshold=%u\n", + port, !!ena, threshold); + return err; + } + val |= PHY_TS_INT_CONFIG_ENA_M; + } + + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, val); + if (err) { + ice_debug(hw, ICE_DBG_PTP, + "Failed to update 'ena' PHY_REG_TS_INT_CONFIG port=%u ena=%u threshold=%u\n", + port, !!ena, threshold); + return err; } - return ice_write_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, val); + err = ice_read_ptp_reg_eth56g(hw, port, PHY_REG_TS_INT_CONFIG, &val); + if (err) { + ice_debug(hw, ICE_DBG_PTP, + "Failed to read PHY_REG_TS_INT_CONFIG port=%u ena=%u threshold=%u\n", + port, !!ena, threshold); + return err; + } + + return 0; } /** @@ -2116,6 +2169,35 @@ int ice_start_phy_timer_eth56g(struct ice_hw *hw, u8 port) } /** + * ice_check_phy_tx_tstamp_ready_eth56g - Check Tx memory status for all ports + * @hw: pointer to the HW struct + * + * Check the PHY_REG_TX_MEMORY_STATUS for all ports. A set bit indicates + * a waiting timestamp. + * + * Return: 1 if any port has at least one timestamp ready bit set, + * 0 otherwise, and a negative error code if unable to read the bitmap. + */ +static int ice_check_phy_tx_tstamp_ready_eth56g(struct ice_hw *hw) +{ + int port; + + for (port = 0; port < hw->ptp.num_lports; port++) { + u64 tstamp_ready; + int err; + + err = ice_get_phy_tx_tstamp_ready(hw, port, &tstamp_ready); + if (err) + return err; + + if (tstamp_ready) + return 1; + } + + return 0; +} + +/** * ice_ptp_read_tx_hwtstamp_status_eth56g - Get TX timestamp status * @hw: pointer to the HW struct * @ts_status: the timestamp mask pointer @@ -2137,13 +2219,19 @@ int ice_ptp_read_tx_hwtstamp_status_eth56g(struct ice_hw *hw, u32 *ts_status) *ts_status = 0; for (phy = 0; phy < params->num_phys; phy++) { + u8 port; int err; - err = ice_read_phy_eth56g(hw, phy, PHY_PTP_INT_STATUS, &status); + /* ice_read_phy_eth56g expects a port index, so use the first + * port of the PHY + */ + port = phy * hw->ptp.ports_per_phy; + + err = ice_read_phy_eth56g(hw, port, PHY_PTP_INT_STATUS, &status); if (err) return err; - *ts_status |= (status & mask) << (phy * hw->ptp.ports_per_phy); + *ts_status |= (status & mask) << port; } ice_debug(hw, ICE_DBG_PTP, "PHY interrupt err: %x\n", *ts_status); @@ -2152,6 +2240,69 @@ int ice_ptp_read_tx_hwtstamp_status_eth56g(struct ice_hw *hw, u32 *ts_status) } /** + * ice_ptp_phy_soft_reset_eth56g - Perform a PHY soft reset on ETH56G + * @hw: pointer to the HW structure + * @port: PHY port number + * + * Trigger a soft reset of the ETH56G PHY by toggling the soft reset + * bit in the PHY global register. The reset sequence consists of: + * 1. Clearing the soft reset bit + * 2. Asserting the soft reset bit + * 3. Clearing the soft reset bit again + * + * Short delays are inserted between each step to allow the hardware + * to settle. This provides a controlled way to reinitialize the PHY + * without requiring a full device reset. + * + * Return: 0 on success, or a negative error code on failure when + * reading or writing the PHY register. + */ +int ice_ptp_phy_soft_reset_eth56g(struct ice_hw *hw, u8 port) +{ + u32 global_val; + int err; + + err = ice_read_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, &global_val); + if (err) { + ice_debug(hw, ICE_DBG_PTP, "Failed to read PHY_REG_GLOBAL for port %d, err %d\n", + port, err); + return err; + } + + global_val &= ~PHY_REG_GLOBAL_SOFT_RESET_M; + ice_debug(hw, ICE_DBG_PTP, "Clearing soft reset bit for port %d, val: 0x%x\n", + port, global_val); + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, global_val); + if (err) { + ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_REG_GLOBAL for port %d, err %d\n", + port, err); + return err; + } + + usleep_range(5000, 6000); + + global_val |= PHY_REG_GLOBAL_SOFT_RESET_M; + ice_debug(hw, ICE_DBG_PTP, "Set soft reset bit for port %d, val: 0x%x\n", + port, global_val); + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, global_val); + if (err) { + ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_REG_GLOBAL for port %d, err %d\n", + port, err); + return err; + } + usleep_range(5000, 6000); + + global_val &= ~PHY_REG_GLOBAL_SOFT_RESET_M; + ice_debug(hw, ICE_DBG_PTP, "Clear soft reset bit for port %d, val: 0x%x\n", + port, global_val); + err = ice_write_ptp_reg_eth56g(hw, port, PHY_REG_GLOBAL, global_val); + if (err) + ice_debug(hw, ICE_DBG_PTP, "Failed to write PHY_REG_GLOBAL for port %d, err %d\n", + port, err); + return err; +} + +/** * ice_get_phy_tx_tstamp_ready_eth56g - Read the Tx memory status register * @hw: pointer to the HW struct * @port: the PHY port to read from @@ -4203,6 +4354,35 @@ ice_get_phy_tx_tstamp_ready_e82x(struct ice_hw *hw, u8 quad, u64 *tstamp_ready) } /** + * ice_check_phy_tx_tstamp_ready_e82x - Check Tx memory status for all quads + * @hw: pointer to the HW struct + * + * Check the Q_REG_TX_MEMORY_STATUS for all quads. A set bit indicates + * a waiting timestamp. + * + * Return: 1 if any quad has at least one timestamp ready bit set, + * 0 otherwise, and a negative error value if unable to read the bitmap. + */ +static int ice_check_phy_tx_tstamp_ready_e82x(struct ice_hw *hw) +{ + int quad; + + for (quad = 0; quad < ICE_GET_QUAD_NUM(hw->ptp.num_lports); quad++) { + u64 tstamp_ready; + int err; + + err = ice_get_phy_tx_tstamp_ready(hw, quad, &tstamp_ready); + if (err) + return err; + + if (tstamp_ready) + return 1; + } + + return 0; +} + +/** * ice_phy_cfg_intr_e82x - Configure TX timestamp interrupt * @hw: pointer to the HW struct * @quad: the timestamp quad @@ -4755,6 +4935,23 @@ ice_get_phy_tx_tstamp_ready_e810(struct ice_hw *hw, u8 port, u64 *tstamp_ready) return 0; } +/** + * ice_check_phy_tx_tstamp_ready_e810 - Check Tx memory status register + * @hw: pointer to the HW struct + * + * The E810 devices do not have a Tx memory status register. Note this is + * intentionally different behavior from ice_get_phy_tx_tstamp_ready_e810 + * which always says that all bits are ready. This function is called in cases + * where code will trigger interrupts if timestamps are waiting, and should + * not be called for E810 hardware. + * + * Return: 0. + */ +static int ice_check_phy_tx_tstamp_ready_e810(struct ice_hw *hw) +{ + return 0; +} + /* E810 SMA functions * * The following functions operate specifically on E810 hardware and are used @@ -5010,6 +5207,21 @@ static void ice_get_phy_tx_tstamp_ready_e830(const struct ice_hw *hw, u8 port, } /** + * ice_check_phy_tx_tstamp_ready_e830 - Check Tx memory status register + * @hw: pointer to the HW struct + * + * Return: 1 if the device has waiting timestamps, 0 otherwise. + */ +static int ice_check_phy_tx_tstamp_ready_e830(struct ice_hw *hw) +{ + u64 tstamp_ready; + + ice_get_phy_tx_tstamp_ready_e830(hw, 0, &tstamp_ready); + + return !!tstamp_ready; +} + +/** * ice_ptp_init_phy_e830 - initialize PHY parameters * @ptp: pointer to the PTP HW struct */ @@ -5381,8 +5593,8 @@ int ice_ptp_write_incval_locked(struct ice_hw *hw, u64 incval) */ int ice_ptp_adj_clock(struct ice_hw *hw, s32 adj) { + int err = 0; u8 tmr_idx; - int err; tmr_idx = hw->func_caps.ts_func_info.tmr_index_owned; @@ -5399,8 +5611,8 @@ int ice_ptp_adj_clock(struct ice_hw *hw, s32 adj) err = ice_ptp_prep_phy_adj_e810(hw, adj); break; case ICE_MAC_E830: - /* E830 sync PHYs automatically after setting GLTSYN_SHADJ */ - return 0; + /* E830 sync PHYs automatically after setting cmd register */ + break; case ICE_MAC_GENERIC: err = ice_ptp_prep_phy_adj_e82x(hw, adj); break; @@ -5564,7 +5776,7 @@ int ice_ptp_init_phc(struct ice_hw *hw) case ICE_MAC_GENERIC: return ice_ptp_init_phc_e82x(hw); case ICE_MAC_GENERIC_3K_E825: - return 0; + return ice_ptp_init_phc_e825c(hw); default: return -EOPNOTSUPP; } @@ -5602,6 +5814,33 @@ int ice_get_phy_tx_tstamp_ready(struct ice_hw *hw, u8 block, u64 *tstamp_ready) } /** + * ice_check_phy_tx_tstamp_ready - Check PHY Tx timestamp memory status + * @hw: pointer to the HW struct + * + * Check the PHY for Tx timestamp memory status on all ports. If you need to + * see individual timestamp status for each index, use + * ice_get_phy_tx_tstamp_ready() instead. + * + * Return: 1 if any port has timestamps available, 0 if there are no timestamps + * available, and a negative error code on failure. + */ +int ice_check_phy_tx_tstamp_ready(struct ice_hw *hw) +{ + switch (hw->mac_type) { + case ICE_MAC_E810: + return ice_check_phy_tx_tstamp_ready_e810(hw); + case ICE_MAC_E830: + return ice_check_phy_tx_tstamp_ready_e830(hw); + case ICE_MAC_GENERIC: + return ice_check_phy_tx_tstamp_ready_e82x(hw); + case ICE_MAC_GENERIC_3K_E825: + return ice_check_phy_tx_tstamp_ready_eth56g(hw); + default: + return -EOPNOTSUPP; + } +} + +/** * ice_cgu_get_pin_desc_e823 - get pin description array * @hw: pointer to the hw struct * @input: if request is done against input or output pin diff --git a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h index 9bfd3e79c580..1c9e77dbc770 100644 --- a/drivers/net/ethernet/intel/ice/ice_ptp_hw.h +++ b/drivers/net/ethernet/intel/ice/ice_ptp_hw.h @@ -300,6 +300,7 @@ void ice_ptp_reset_ts_memory(struct ice_hw *hw); int ice_ptp_init_phc(struct ice_hw *hw); void ice_ptp_init_hw(struct ice_hw *hw); int ice_get_phy_tx_tstamp_ready(struct ice_hw *hw, u8 block, u64 *tstamp_ready); +int ice_check_phy_tx_tstamp_ready(struct ice_hw *hw); int ice_ptp_one_port_cmd(struct ice_hw *hw, u8 configured_port, enum ice_ptp_tmr_cmd configured_cmd); @@ -374,6 +375,7 @@ int ice_stop_phy_timer_eth56g(struct ice_hw *hw, u8 port, bool soft_reset); int ice_start_phy_timer_eth56g(struct ice_hw *hw, u8 port); int ice_phy_cfg_intr_eth56g(struct ice_hw *hw, u8 port, bool ena, u8 threshold); int ice_phy_cfg_ptp_1step_eth56g(struct ice_hw *hw, u8 port); +int ice_ptp_phy_soft_reset_eth56g(struct ice_hw *hw, u8 port); #define ICE_ETH56G_NOMINAL_INCVAL 0x140000000ULL #define ICE_ETH56G_NOMINAL_PCS_REF_TUS 0x100000000ULL @@ -676,6 +678,9 @@ static inline u64 ice_get_base_incval(struct ice_hw *hw) #define ICE_P0_GNSS_PRSNT_N BIT(4) /* ETH56G PHY register addresses */ +#define PHY_REG_GLOBAL 0x0 +#define PHY_REG_GLOBAL_SOFT_RESET_M BIT(11) + /* Timestamp PHY incval registers */ #define PHY_REG_TIMETUS_L 0x8 #define PHY_REG_TIMETUS_U 0xC diff --git a/drivers/net/ethernet/intel/ice/ice_sf_eth.c b/drivers/net/ethernet/intel/ice/ice_sf_eth.c index 2cf04bc6edce..a730aa368c92 100644 --- a/drivers/net/ethernet/intel/ice/ice_sf_eth.c +++ b/drivers/net/ethernet/intel/ice/ice_sf_eth.c @@ -305,6 +305,8 @@ ice_sf_eth_activate(struct ice_dynamic_port *dyn_port, aux_dev_uninit: auxiliary_device_uninit(&sf_dev->adev); + return err; + sf_dev_free: kfree(sf_dev); xa_erase: diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index a2cd4cf37734..4ca1a0602307 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -190,9 +190,10 @@ void ice_free_tstamp_ring(struct ice_tx_ring *tx_ring) void ice_free_tx_tstamp_ring(struct ice_tx_ring *tx_ring) { ice_free_tstamp_ring(tx_ring); + clear_bit(ICE_TX_RING_FLAGS_TXTIME, tx_ring->flags); + smp_wmb(); /* order flag clear before pointer NULL */ kfree_rcu(tx_ring->tstamp_ring, rcu); - tx_ring->tstamp_ring = NULL; - tx_ring->flags &= ~ICE_TX_FLAGS_TXTIME; + WRITE_ONCE(tx_ring->tstamp_ring, NULL); } /** @@ -405,7 +406,7 @@ static int ice_alloc_tstamp_ring(struct ice_tx_ring *tx_ring) tx_ring->tstamp_ring = tstamp_ring; tstamp_ring->desc = NULL; tstamp_ring->count = ice_calc_ts_ring_count(tx_ring); - tx_ring->flags |= ICE_TX_FLAGS_TXTIME; + set_bit(ICE_TX_RING_FLAGS_TXTIME, tx_ring->flags); return 0; } @@ -1521,13 +1522,20 @@ ice_tx_map(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first, return; if (ice_is_txtime_cfg(tx_ring)) { - struct ice_tstamp_ring *tstamp_ring = tx_ring->tstamp_ring; - u32 tstamp_count = tstamp_ring->count; - u32 j = tstamp_ring->next_to_use; + struct ice_tstamp_ring *tstamp_ring; + u32 tstamp_count, j; struct ice_ts_desc *ts_desc; struct timespec64 ts; u32 tstamp; + smp_rmb(); /* order flag read before pointer read */ + tstamp_ring = READ_ONCE(tx_ring->tstamp_ring); + if (unlikely(!tstamp_ring)) + goto ring_kick; + + tstamp_count = tstamp_ring->count; + j = tstamp_ring->next_to_use; + ts = ktime_to_timespec64(first->skb->tstamp); tstamp = ts.tv_nsec >> ICE_TXTIME_CTX_RESOLUTION_128NS; @@ -1555,6 +1563,7 @@ ice_tx_map(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first, tstamp_ring->next_to_use = j; writel_relaxed(j, tstamp_ring->tail); } else { +ring_kick: writel_relaxed(i, tx_ring->tail); } return; @@ -1814,7 +1823,7 @@ ice_tx_prepare_vlan_flags(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first) */ if (skb_vlan_tag_present(skb)) { first->vid = skb_vlan_tag_get(skb); - if (tx_ring->flags & ICE_TX_FLAGS_RING_VLAN_L2TAG2) + if (test_bit(ICE_TX_RING_FLAGS_VLAN_L2TAG2, tx_ring->flags)) first->tx_flags |= ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN; else first->tx_flags |= ICE_TX_FLAGS_HW_VLAN; @@ -2158,6 +2167,9 @@ ice_xmit_frame_ring(struct sk_buff *skb, struct ice_tx_ring *tx_ring) ice_trace(xmit_frame_ring, tx_ring, skb); + /* record the location of the first descriptor for this packet */ + first = &tx_ring->tx_buf[tx_ring->next_to_use]; + count = ice_xmit_desc_count(skb); if (ice_chk_linearize(skb, count)) { if (__skb_linearize(skb)) @@ -2183,8 +2195,6 @@ ice_xmit_frame_ring(struct sk_buff *skb, struct ice_tx_ring *tx_ring) offload.tx_ring = tx_ring; - /* record the location of the first descriptor for this packet */ - first = &tx_ring->tx_buf[tx_ring->next_to_use]; first->skb = skb; first->type = ICE_TX_BUF_SKB; first->bytecount = max_t(unsigned int, skb->len, ETH_ZLEN); @@ -2249,6 +2259,7 @@ ice_xmit_frame_ring(struct sk_buff *skb, struct ice_tx_ring *tx_ring) out_drop: ice_trace(xmit_frame_ring_drop, tx_ring, skb); dev_kfree_skb_any(skb); + first->type = ICE_TX_BUF_EMPTY; return NETDEV_TX_OK; } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index b6547e1b7c42..5e517f219379 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -212,6 +212,14 @@ enum ice_rx_dtype { ICE_RX_DTYPE_SPLIT_ALWAYS = 2, }; +enum ice_tx_ring_flags { + ICE_TX_RING_FLAGS_XDP, + ICE_TX_RING_FLAGS_VLAN_L2TAG1, + ICE_TX_RING_FLAGS_VLAN_L2TAG2, + ICE_TX_RING_FLAGS_TXTIME, + ICE_TX_RING_FLAGS_NBITS, +}; + struct ice_pkt_ctx { u64 cached_phctime; __be16 vlan_proto; @@ -352,11 +360,7 @@ struct ice_tx_ring { u16 count; /* Number of descriptors */ u16 q_index; /* Queue number of ring */ - u8 flags; -#define ICE_TX_FLAGS_RING_XDP BIT(0) -#define ICE_TX_FLAGS_RING_VLAN_L2TAG1 BIT(1) -#define ICE_TX_FLAGS_RING_VLAN_L2TAG2 BIT(2) -#define ICE_TX_FLAGS_TXTIME BIT(3) + DECLARE_BITMAP(flags, ICE_TX_RING_FLAGS_NBITS); struct xsk_buff_pool *xsk_pool; @@ -398,7 +402,7 @@ static inline bool ice_ring_ch_enabled(struct ice_tx_ring *ring) static inline bool ice_ring_is_xdp(struct ice_tx_ring *ring) { - return !!(ring->flags & ICE_TX_FLAGS_RING_XDP); + return test_bit(ICE_TX_RING_FLAGS_XDP, ring->flags); } enum ice_container_type { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h index c3408b3f7010..091b80a67189 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs.h @@ -201,7 +201,10 @@ int mlx5e_add_vlan_trap(struct mlx5e_flow_steering *fs, int trap_id, int tir_nu void mlx5e_remove_vlan_trap(struct mlx5e_flow_steering *fs); int mlx5e_add_mac_trap(struct mlx5e_flow_steering *fs, int trap_id, int tir_num); void mlx5e_remove_mac_trap(struct mlx5e_flow_steering *fs); -void mlx5e_fs_set_rx_mode_work(struct mlx5e_flow_steering *fs, struct net_device *netdev); +void mlx5e_fs_set_rx_mode_work(struct mlx5e_flow_steering *fs, + struct net_device *netdev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc); int mlx5e_fs_vlan_rx_add_vid(struct mlx5e_flow_steering *fs, struct net_device *netdev, __be16 proto, u16 vid); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index fdfe9d1cfe21..12492c4a5d41 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -609,20 +609,26 @@ static void mlx5e_execute_l2_action(struct mlx5e_flow_steering *fs, } static void mlx5e_sync_netdev_addr(struct mlx5e_flow_steering *fs, - struct net_device *netdev) + struct net_device *netdev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { struct netdev_hw_addr *ha; - netif_addr_lock_bh(netdev); + if (!uc || !mc) { + netif_addr_lock_bh(netdev); + mlx5e_sync_netdev_addr(fs, netdev, &netdev->uc, &netdev->mc); + netif_addr_unlock_bh(netdev); + return; + } mlx5e_add_l2_to_hash(fs->l2.netdev_uc, netdev->dev_addr); - netdev_for_each_uc_addr(ha, netdev) + + netdev_hw_addr_list_for_each(ha, uc) mlx5e_add_l2_to_hash(fs->l2.netdev_uc, ha->addr); - netdev_for_each_mc_addr(ha, netdev) + netdev_hw_addr_list_for_each(ha, mc) mlx5e_add_l2_to_hash(fs->l2.netdev_mc, ha->addr); - - netif_addr_unlock_bh(netdev); } static void mlx5e_fill_addr_array(struct mlx5e_flow_steering *fs, int list_type, @@ -724,7 +730,9 @@ static void mlx5e_apply_netdev_addr(struct mlx5e_flow_steering *fs) } static void mlx5e_handle_netdev_addr(struct mlx5e_flow_steering *fs, - struct net_device *netdev) + struct net_device *netdev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { struct mlx5e_l2_hash_node *hn; struct hlist_node *tmp; @@ -736,7 +744,7 @@ static void mlx5e_handle_netdev_addr(struct mlx5e_flow_steering *fs, hn->action = MLX5E_ACTION_DEL; if (fs->state_destroy) - mlx5e_sync_netdev_addr(fs, netdev); + mlx5e_sync_netdev_addr(fs, netdev, uc, mc); mlx5e_apply_netdev_addr(fs); } @@ -820,13 +828,15 @@ static void mlx5e_destroy_promisc_table(struct mlx5e_flow_steering *fs) } void mlx5e_fs_set_rx_mode_work(struct mlx5e_flow_steering *fs, - struct net_device *netdev) + struct net_device *netdev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { struct mlx5e_priv *priv = netdev_priv(netdev); struct mlx5e_l2_table *ea = &fs->l2; if (mlx5e_is_uplink_rep(priv)) { - mlx5e_handle_netdev_addr(fs, netdev); + mlx5e_handle_netdev_addr(fs, netdev, uc, mc); goto update_vport_context; } @@ -856,7 +866,7 @@ void mlx5e_fs_set_rx_mode_work(struct mlx5e_flow_steering *fs, if (enable_broadcast) mlx5e_add_l2_flow_rule(fs, &ea->broadcast, MLX5E_FULLMATCH); - mlx5e_handle_netdev_addr(fs, netdev); + mlx5e_handle_netdev_addr(fs, netdev, uc, mc); if (disable_broadcast) mlx5e_del_l2_flow_rule(fs, &ea->broadcast); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 6c4eeb88588c..5a46870c4b74 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4145,11 +4145,13 @@ static void mlx5e_nic_set_rx_mode(struct mlx5e_priv *priv) queue_work(priv->wq, &priv->set_rx_mode_work); } -static void mlx5e_set_rx_mode(struct net_device *dev) +static void mlx5e_set_rx_mode(struct net_device *dev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { struct mlx5e_priv *priv = netdev_priv(dev); - mlx5e_nic_set_rx_mode(priv); + mlx5e_fs_set_rx_mode_work(priv->fs, dev, uc, mc); } static int mlx5e_set_mac(struct net_device *netdev, void *addr) @@ -5324,7 +5326,7 @@ const struct net_device_ops mlx5e_netdev_ops = { .ndo_setup_tc = mlx5e_setup_tc, .ndo_select_queue = mlx5e_select_queue, .ndo_get_stats64 = mlx5e_get_stats, - .ndo_set_rx_mode = mlx5e_set_rx_mode, + .ndo_set_rx_mode_async = mlx5e_set_rx_mode, .ndo_set_mac_address = mlx5e_set_mac, .ndo_vlan_rx_add_vid = mlx5e_vlan_rx_add_vid, .ndo_vlan_rx_kill_vid = mlx5e_vlan_rx_kill_vid, @@ -6309,8 +6311,11 @@ void mlx5e_set_rx_mode_work(struct work_struct *work) { struct mlx5e_priv *priv = container_of(work, struct mlx5e_priv, set_rx_mode_work); + struct net_device *dev = priv->netdev; - return mlx5e_fs_set_rx_mode_work(priv->fs, priv->netdev); + netdev_lock_ops(dev); + mlx5e_fs_set_rx_mode_work(priv->fs, dev, NULL, NULL); + netdev_unlock_ops(dev); } /* mlx5e generic netdev management API (move to en_common.c) */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 72d8ae2df33f..74827e8ca125 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1849,7 +1849,7 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx) err = mlx5_notifiers_init(dev); if (err) - goto err_hca_caps; + goto err_notifiers_init; /* The conjunction of sw_vhca_id with sw_owner_id will be a global * unique id per function which uses mlx5_core. @@ -1865,6 +1865,8 @@ int mlx5_mdev_init(struct mlx5_core_dev *dev, int profile_idx) return 0; +err_notifiers_init: + mlx5_hca_caps_free(dev); err_hca_caps: mlx5_adev_cleanup(dev); err_adev_init: diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c index b4b396ca9bce..c406a3b56b37 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.c @@ -183,7 +183,9 @@ static int fbnic_mc_unsync(struct net_device *netdev, const unsigned char *addr) return ret; } -void __fbnic_set_rx_mode(struct fbnic_dev *fbd) +void __fbnic_set_rx_mode(struct fbnic_dev *fbd, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { bool uc_promisc = false, mc_promisc = false; struct net_device *netdev = fbd->netdev; @@ -213,10 +215,10 @@ void __fbnic_set_rx_mode(struct fbnic_dev *fbd) } /* Synchronize unicast and multicast address lists */ - err = __dev_uc_sync(netdev, fbnic_uc_sync, fbnic_uc_unsync); + err = __hw_addr_sync_dev(uc, netdev, fbnic_uc_sync, fbnic_uc_unsync); if (err == -ENOSPC) uc_promisc = true; - err = __dev_mc_sync(netdev, fbnic_mc_sync, fbnic_mc_unsync); + err = __hw_addr_sync_dev(mc, netdev, fbnic_mc_sync, fbnic_mc_unsync); if (err == -ENOSPC) mc_promisc = true; @@ -238,18 +240,21 @@ void __fbnic_set_rx_mode(struct fbnic_dev *fbd) fbnic_write_tce_tcam(fbd); } -static void fbnic_set_rx_mode(struct net_device *netdev) +static void fbnic_set_rx_mode(struct net_device *netdev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { struct fbnic_net *fbn = netdev_priv(netdev); struct fbnic_dev *fbd = fbn->fbd; /* No need to update the hardware if we are not running */ if (netif_running(netdev)) - __fbnic_set_rx_mode(fbd); + __fbnic_set_rx_mode(fbd, uc, mc); } static int fbnic_set_mac(struct net_device *netdev, void *p) { + struct fbnic_net *fbn = netdev_priv(netdev); struct sockaddr *addr = p; if (!is_valid_ether_addr(addr->sa_data)) @@ -257,7 +262,8 @@ static int fbnic_set_mac(struct net_device *netdev, void *p) eth_hw_addr_set(netdev, addr->sa_data); - fbnic_set_rx_mode(netdev); + if (netif_running(netdev)) + __fbnic_set_rx_mode(fbn->fbd, &netdev->uc, &netdev->mc); return 0; } @@ -551,7 +557,7 @@ static const struct net_device_ops fbnic_netdev_ops = { .ndo_features_check = fbnic_features_check, .ndo_set_mac_address = fbnic_set_mac, .ndo_change_mtu = fbnic_change_mtu, - .ndo_set_rx_mode = fbnic_set_rx_mode, + .ndo_set_rx_mode_async = fbnic_set_rx_mode, .ndo_get_stats64 = fbnic_get_stats64, .ndo_bpf = fbnic_bpf, .ndo_hwtstamp_get = fbnic_hwtstamp_get, diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h index 9129a658f8fa..eded20b0e9e4 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h +++ b/drivers/net/ethernet/meta/fbnic/fbnic_netdev.h @@ -97,7 +97,9 @@ void fbnic_time_init(struct fbnic_net *fbn); int fbnic_time_start(struct fbnic_net *fbn); void fbnic_time_stop(struct fbnic_net *fbn); -void __fbnic_set_rx_mode(struct fbnic_dev *fbd); +void __fbnic_set_rx_mode(struct fbnic_dev *fbd, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc); void fbnic_clear_rx_mode(struct fbnic_dev *fbd); void fbnic_phylink_get_pauseparam(struct net_device *netdev, diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c index b7c0b7349d00..7e85b480203c 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_pci.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_pci.c @@ -135,7 +135,7 @@ void fbnic_up(struct fbnic_net *fbn) fbnic_rss_reinit_hw(fbn->fbd, fbn); - __fbnic_set_rx_mode(fbn->fbd); + __fbnic_set_rx_mode(fbn->fbd, &fbn->netdev->uc, &fbn->netdev->mc); /* Enable Tx/Rx processing */ fbnic_napi_enable(fbn); @@ -180,7 +180,7 @@ static int fbnic_fw_config_after_crash(struct fbnic_dev *fbd) } fbnic_rpc_reset_valid_entries(fbd); - __fbnic_set_rx_mode(fbd); + __fbnic_set_rx_mode(fbd, &fbd->netdev->uc, &fbd->netdev->mc); return 0; } diff --git a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c index 42a186db43ea..fe95b6f69646 100644 --- a/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c +++ b/drivers/net/ethernet/meta/fbnic/fbnic_rpc.c @@ -244,7 +244,7 @@ void fbnic_bmc_rpc_check(struct fbnic_dev *fbd) if (fbd->fw_cap.need_bmc_tcam_reinit) { fbnic_bmc_rpc_init(fbd); - __fbnic_set_rx_mode(fbd); + __fbnic_set_rx_mode(fbd, &fbd->netdev->uc, &fbd->netdev->mc); fbd->fw_cap.need_bmc_tcam_reinit = false; } diff --git a/drivers/net/ethernet/micrel/ks8851.h b/drivers/net/ethernet/micrel/ks8851.h index 31f75b4a67fd..b795a3a60571 100644 --- a/drivers/net/ethernet/micrel/ks8851.h +++ b/drivers/net/ethernet/micrel/ks8851.h @@ -408,10 +408,8 @@ struct ks8851_net { struct gpio_desc *gpio; struct mii_bus *mii_bus; - void (*lock)(struct ks8851_net *ks, - unsigned long *flags); - void (*unlock)(struct ks8851_net *ks, - unsigned long *flags); + void (*lock)(struct ks8851_net *ks); + void (*unlock)(struct ks8851_net *ks); unsigned int (*rdreg16)(struct ks8851_net *ks, unsigned int reg); void (*wrreg16)(struct ks8851_net *ks, diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c index 8048770958d6..4afbb40bc0e4 100644 --- a/drivers/net/ethernet/micrel/ks8851_common.c +++ b/drivers/net/ethernet/micrel/ks8851_common.c @@ -28,25 +28,23 @@ /** * ks8851_lock - register access lock * @ks: The chip state - * @flags: Spinlock flags * * Claim chip register access lock */ -static void ks8851_lock(struct ks8851_net *ks, unsigned long *flags) +static void ks8851_lock(struct ks8851_net *ks) { - ks->lock(ks, flags); + ks->lock(ks); } /** * ks8851_unlock - register access unlock * @ks: The chip state - * @flags: Spinlock flags * * Release chip register access lock */ -static void ks8851_unlock(struct ks8851_net *ks, unsigned long *flags) +static void ks8851_unlock(struct ks8851_net *ks) { - ks->unlock(ks, flags); + ks->unlock(ks); } /** @@ -129,11 +127,10 @@ static void ks8851_set_powermode(struct ks8851_net *ks, unsigned pwrmode) static int ks8851_write_mac_addr(struct net_device *dev) { struct ks8851_net *ks = netdev_priv(dev); - unsigned long flags; u16 val; int i; - ks8851_lock(ks, &flags); + ks8851_lock(ks); /* * Wake up chip in case it was powered off when stopped; otherwise, @@ -149,7 +146,7 @@ static int ks8851_write_mac_addr(struct net_device *dev) if (!netif_running(dev)) ks8851_set_powermode(ks, PMECR_PM_SOFTDOWN); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); return 0; } @@ -163,12 +160,11 @@ static int ks8851_write_mac_addr(struct net_device *dev) static void ks8851_read_mac_addr(struct net_device *dev) { struct ks8851_net *ks = netdev_priv(dev); - unsigned long flags; u8 addr[ETH_ALEN]; u16 reg; int i; - ks8851_lock(ks, &flags); + ks8851_lock(ks); for (i = 0; i < ETH_ALEN; i += 2) { reg = ks8851_rdreg16(ks, KS_MAR(i)); @@ -177,7 +173,7 @@ static void ks8851_read_mac_addr(struct net_device *dev) } eth_hw_addr_set(dev, addr); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); } /** @@ -312,11 +308,10 @@ static irqreturn_t ks8851_irq(int irq, void *_ks) { struct ks8851_net *ks = _ks; struct sk_buff_head rxq; - unsigned long flags; unsigned int status; struct sk_buff *skb; - ks8851_lock(ks, &flags); + ks8851_lock(ks); status = ks8851_rdreg16(ks, KS_ISR); ks8851_wrreg16(ks, KS_ISR, status); @@ -373,14 +368,17 @@ static irqreturn_t ks8851_irq(int irq, void *_ks) ks8851_wrreg16(ks, KS_RXCR1, rxc->rxcr1); } - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); if (status & IRQ_LCI) mii_check_link(&ks->mii); - if (status & IRQ_RXI) + if (status & IRQ_RXI) { + local_bh_disable(); while ((skb = __skb_dequeue(&rxq))) netif_rx(skb); + local_bh_enable(); + } return IRQ_HANDLED; } @@ -405,7 +403,6 @@ static void ks8851_flush_tx_work(struct ks8851_net *ks) static int ks8851_net_open(struct net_device *dev) { struct ks8851_net *ks = netdev_priv(dev); - unsigned long flags; int ret; ret = request_threaded_irq(dev->irq, NULL, ks8851_irq, @@ -418,7 +415,7 @@ static int ks8851_net_open(struct net_device *dev) /* lock the card, even if we may not actually be doing anything * else at the moment */ - ks8851_lock(ks, &flags); + ks8851_lock(ks); netif_dbg(ks, ifup, ks->netdev, "opening\n"); @@ -471,7 +468,7 @@ static int ks8851_net_open(struct net_device *dev) netif_dbg(ks, ifup, ks->netdev, "network device up\n"); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); mii_check_link(&ks->mii); return 0; } @@ -487,23 +484,22 @@ static int ks8851_net_open(struct net_device *dev) static int ks8851_net_stop(struct net_device *dev) { struct ks8851_net *ks = netdev_priv(dev); - unsigned long flags; netif_info(ks, ifdown, dev, "shutting down\n"); netif_stop_queue(dev); - ks8851_lock(ks, &flags); + ks8851_lock(ks); /* turn off the IRQs and ack any outstanding */ ks8851_wrreg16(ks, KS_IER, 0x0000); ks8851_wrreg16(ks, KS_ISR, 0xffff); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); /* stop any outstanding work */ ks8851_flush_tx_work(ks); flush_work(&ks->rxctrl_work); - ks8851_lock(ks, &flags); + ks8851_lock(ks); /* shutdown RX process */ ks8851_wrreg16(ks, KS_RXCR1, 0x0000); @@ -512,7 +508,7 @@ static int ks8851_net_stop(struct net_device *dev) /* set powermode to soft power down to save power */ ks8851_set_powermode(ks, PMECR_PM_SOFTDOWN); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); /* ensure any queued tx buffers are dumped */ while (!skb_queue_empty(&ks->txq)) { @@ -566,14 +562,13 @@ static netdev_tx_t ks8851_start_xmit(struct sk_buff *skb, static void ks8851_rxctrl_work(struct work_struct *work) { struct ks8851_net *ks = container_of(work, struct ks8851_net, rxctrl_work); - unsigned long flags; - ks8851_lock(ks, &flags); + ks8851_lock(ks); /* need to shutdown RXQ before modifying filter parameters */ ks8851_wrreg16(ks, KS_RXCR1, 0x00); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); } static void ks8851_set_rx_mode(struct net_device *dev) @@ -780,7 +775,6 @@ static int ks8851_set_eeprom(struct net_device *dev, { struct ks8851_net *ks = netdev_priv(dev); int offset = ee->offset; - unsigned long flags; int len = ee->len; u16 tmp; @@ -794,7 +788,7 @@ static int ks8851_set_eeprom(struct net_device *dev, if (!(ks->rc_ccr & CCR_EEPROM)) return -ENOENT; - ks8851_lock(ks, &flags); + ks8851_lock(ks); ks8851_eeprom_claim(ks); @@ -817,7 +811,7 @@ static int ks8851_set_eeprom(struct net_device *dev, eeprom_93cx6_wren(&ks->eeprom, false); ks8851_eeprom_release(ks); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); return 0; } @@ -827,7 +821,6 @@ static int ks8851_get_eeprom(struct net_device *dev, { struct ks8851_net *ks = netdev_priv(dev); int offset = ee->offset; - unsigned long flags; int len = ee->len; /* must be 2 byte aligned */ @@ -837,7 +830,7 @@ static int ks8851_get_eeprom(struct net_device *dev, if (!(ks->rc_ccr & CCR_EEPROM)) return -ENOENT; - ks8851_lock(ks, &flags); + ks8851_lock(ks); ks8851_eeprom_claim(ks); @@ -845,7 +838,7 @@ static int ks8851_get_eeprom(struct net_device *dev, eeprom_93cx6_multiread(&ks->eeprom, offset/2, (__le16 *)data, len/2); ks8851_eeprom_release(ks); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); return 0; } @@ -904,7 +897,6 @@ static int ks8851_phy_reg(int reg) static int ks8851_phy_read_common(struct net_device *dev, int phy_addr, int reg) { struct ks8851_net *ks = netdev_priv(dev); - unsigned long flags; int result; int ksreg; @@ -912,9 +904,9 @@ static int ks8851_phy_read_common(struct net_device *dev, int phy_addr, int reg) if (ksreg < 0) return ksreg; - ks8851_lock(ks, &flags); + ks8851_lock(ks); result = ks8851_rdreg16(ks, ksreg); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); return result; } @@ -949,14 +941,13 @@ static void ks8851_phy_write(struct net_device *dev, int phy, int reg, int value) { struct ks8851_net *ks = netdev_priv(dev); - unsigned long flags; int ksreg; ksreg = ks8851_phy_reg(reg); if (ksreg >= 0) { - ks8851_lock(ks, &flags); + ks8851_lock(ks); ks8851_wrreg16(ks, ksreg, value); - ks8851_unlock(ks, &flags); + ks8851_unlock(ks); } } diff --git a/drivers/net/ethernet/micrel/ks8851_par.c b/drivers/net/ethernet/micrel/ks8851_par.c index 78695be2570b..9f1c33f6ddec 100644 --- a/drivers/net/ethernet/micrel/ks8851_par.c +++ b/drivers/net/ethernet/micrel/ks8851_par.c @@ -55,29 +55,27 @@ struct ks8851_net_par { /** * ks8851_lock_par - register access lock * @ks: The chip state - * @flags: Spinlock flags * * Claim chip register access lock */ -static void ks8851_lock_par(struct ks8851_net *ks, unsigned long *flags) +static void ks8851_lock_par(struct ks8851_net *ks) { struct ks8851_net_par *ksp = to_ks8851_par(ks); - spin_lock_irqsave(&ksp->lock, *flags); + spin_lock_bh(&ksp->lock); } /** * ks8851_unlock_par - register access unlock * @ks: The chip state - * @flags: Spinlock flags * * Release chip register access lock */ -static void ks8851_unlock_par(struct ks8851_net *ks, unsigned long *flags) +static void ks8851_unlock_par(struct ks8851_net *ks) { struct ks8851_net_par *ksp = to_ks8851_par(ks); - spin_unlock_irqrestore(&ksp->lock, *flags); + spin_unlock_bh(&ksp->lock); } /** @@ -233,7 +231,6 @@ static netdev_tx_t ks8851_start_xmit_par(struct sk_buff *skb, { struct ks8851_net *ks = netdev_priv(dev); netdev_tx_t ret = NETDEV_TX_OK; - unsigned long flags; unsigned int txqcr; u16 txmir; int err; @@ -241,7 +238,7 @@ static netdev_tx_t ks8851_start_xmit_par(struct sk_buff *skb, netif_dbg(ks, tx_queued, ks->netdev, "%s: skb %p, %d@%p\n", __func__, skb, skb->len, skb->data); - ks8851_lock_par(ks, &flags); + ks8851_lock_par(ks); txmir = ks8851_rdreg16_par(ks, KS_TXMIR) & 0x1fff; @@ -262,7 +259,7 @@ static netdev_tx_t ks8851_start_xmit_par(struct sk_buff *skb, ret = NETDEV_TX_BUSY; } - ks8851_unlock_par(ks, &flags); + ks8851_unlock_par(ks); return ret; } diff --git a/drivers/net/ethernet/micrel/ks8851_spi.c b/drivers/net/ethernet/micrel/ks8851_spi.c index a161ae45743a..b9e68520278d 100644 --- a/drivers/net/ethernet/micrel/ks8851_spi.c +++ b/drivers/net/ethernet/micrel/ks8851_spi.c @@ -71,11 +71,10 @@ struct ks8851_net_spi { /** * ks8851_lock_spi - register access lock * @ks: The chip state - * @flags: Spinlock flags * * Claim chip register access lock */ -static void ks8851_lock_spi(struct ks8851_net *ks, unsigned long *flags) +static void ks8851_lock_spi(struct ks8851_net *ks) { struct ks8851_net_spi *kss = to_ks8851_spi(ks); @@ -85,11 +84,10 @@ static void ks8851_lock_spi(struct ks8851_net *ks, unsigned long *flags) /** * ks8851_unlock_spi - register access unlock * @ks: The chip state - * @flags: Spinlock flags * * Release chip register access lock */ -static void ks8851_unlock_spi(struct ks8851_net *ks, unsigned long *flags) +static void ks8851_unlock_spi(struct ks8851_net *ks) { struct ks8851_net_spi *kss = to_ks8851_spi(ks); @@ -309,7 +307,6 @@ static void ks8851_tx_work(struct work_struct *work) struct ks8851_net_spi *kss; unsigned short tx_space; struct ks8851_net *ks; - unsigned long flags; struct sk_buff *txb; bool last; @@ -317,7 +314,7 @@ static void ks8851_tx_work(struct work_struct *work) ks = &kss->ks8851; last = skb_queue_empty(&ks->txq); - ks8851_lock_spi(ks, &flags); + ks8851_lock_spi(ks); while (!last) { txb = skb_dequeue(&ks->txq); @@ -343,7 +340,7 @@ static void ks8851_tx_work(struct work_struct *work) ks->tx_space = tx_space; spin_unlock_bh(&ks->statelock); - ks8851_unlock_spi(ks, &flags); + ks8851_unlock_spi(ks); } /** diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index fbec952aa436..a654b3699c4c 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -3640,8 +3640,12 @@ int mana_probe(struct gdma_dev *gd, bool resuming) ac->gdma_dev = gd; gd->driver_data = ac; + + INIT_WORK(&ac->link_change_work, mana_link_state_handle); } + INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler); + err = mana_create_eq(ac); if (err) { dev_err(dev, "Failed to create EQs: %d\n", err); @@ -3657,8 +3661,6 @@ int mana_probe(struct gdma_dev *gd, bool resuming) if (!resuming) { ac->num_ports = num_ports; - - INIT_WORK(&ac->link_change_work, mana_link_state_handle); } else { if (ac->num_ports != num_ports) { dev_err(dev, "The number of vPorts changed: %d->%d\n", @@ -3687,10 +3689,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming) if (!resuming) { for (i = 0; i < ac->num_ports; i++) { err = mana_probe_port(ac, i, &ac->ports[i]); - /* we log the port for which the probe failed and stop - * probes for subsequent ports. - * Note that we keep running ports, for which the probes - * were successful, unless add_adev fails too + /* Log the port for which the probe failed, stop probing + * subsequent ports, and skip add_adev. + * mana_remove() will clean up already-probed ports. */ if (err) { dev_err(dev, "Probe Failed for port %d\n", i); @@ -3704,10 +3705,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming) enable_work(&apc->queue_reset_work); err = mana_attach(ac->ports[i]); rtnl_unlock(); - /* we log the port for which the attach failed and stop - * attach for subsequent ports - * Note that we keep running ports, for which the attach - * were successful, unless add_adev fails too + /* Log the port for which the attach failed, stop + * attaching subsequent ports, and skip add_adev. + * mana_remove() will clean up already-attached ports. */ if (err) { dev_err(dev, "Attach Failed for port %d\n", i); @@ -3716,9 +3716,9 @@ int mana_probe(struct gdma_dev *gd, bool resuming) } } - err = add_adev(gd, "eth"); + if (!err) + err = add_adev(gd, "eth"); - INIT_DELAYED_WORK(&ac->gf_stats_work, mana_gf_stats_work_handler); schedule_delayed_work(&ac->gf_stats_work, MANA_GF_STATS_PERIOD); out: @@ -3739,11 +3739,16 @@ void mana_remove(struct gdma_dev *gd, bool suspending) struct gdma_context *gc = gd->gdma_context; struct mana_context *ac = gd->driver_data; struct mana_port_context *apc; - struct device *dev = gc->dev; + struct device *dev; struct net_device *ndev; int err; int i; + if (!gc || !ac) + return; + + dev = gc->dev; + disable_work_sync(&ac->link_change_work); cancel_delayed_work_sync(&ac->gf_stats_work); @@ -3756,7 +3761,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending) if (!ndev) { if (i == 0) dev_err(dev, "No net device to remove\n"); - goto out; + break; } apc = netdev_priv(ndev); @@ -3787,7 +3792,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending) } mana_destroy_eq(ac); -out: + if (ac->per_port_queue_reset_wq) { destroy_workqueue(ac->per_port_queue_reset_wq); ac->per_port_queue_reset_wq = NULL; diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c index 79470f198a62..9cf19446657c 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_target.c @@ -435,12 +435,17 @@ static int nfp_encode_basic_qdr(u64 addr, int dest_island, int cpp_tgt, /* Full Island ID and channel bits overlap? */ ret = nfp_decode_basic(addr, &v, cpp_tgt, mode, addr40, isld1, isld0); - if (ret) + if (ret) { + pr_warn("%s: decode dest_island failed: %d\n", __func__, ret); return ret; + } /* The current address won't go where expected? */ - if (dest_island != -1 && dest_island != v) + if (dest_island != -1 && dest_island != v) { + pr_warn("%s: dest_island mismatch: current (%d) != decoded (%d)\n", + __func__, dest_island, v); return -EINVAL; + } /* If dest_island was -1, we don't care where it goes. */ return 0; @@ -493,7 +498,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt, * the address but we can verify if the existing * contents will point to a valid island. */ - return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island, + return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt, mode, addr40, isld1, isld0); iid_lsb = addr40 ? 34 : 26; @@ -504,7 +509,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt, return 0; case 1: if (cpp_tgt == NFP_CPP_TARGET_QDR && !addr40) - return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island, + return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt, mode, addr40, isld1, isld0); idx_lsb = addr40 ? 39 : 31; @@ -530,7 +535,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt, * be set before hand and with them select an island. * So we need to confirm that it's at least plausible. */ - return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island, + return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt, mode, addr40, isld1, isld0); /* Make sure we compare against isldN values @@ -551,7 +556,7 @@ static int nfp_encode_basic(u64 *addr, int dest_island, int cpp_tgt, * iid<1> = addr<30> = channel<0> * channel<1> = addr<31> = Index */ - return nfp_encode_basic_qdr(*addr, cpp_tgt, dest_island, + return nfp_encode_basic_qdr(*addr, dest_island, cpp_tgt, mode, addr40, isld1, isld0); isld[0] &= ~3; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 01a983001ab4..ca68248dbc78 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1410,8 +1410,6 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv) priv->tx_lpi_clk_stop = priv->plat->flags & STMMAC_FLAG_EN_TX_LPI_CLOCKGATING; - config->default_an_inband = priv->plat->default_an_inband; - /* Get the PHY interface modes (at the PHY end of the link) that * are supported by the platform. */ @@ -1419,6 +1417,8 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv) priv->plat->get_interfaces(priv, priv->plat->bsp_priv, config->supported_interfaces); + config->default_an_inband = priv->plat->default_an_inband; + /* Set the platform/firmware specified interface mode if the * supported interfaces have not already been provided using * phy_interface as a last resort. diff --git a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c index ec32a5f422f2..8b7c3753bb6a 100644 --- a/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c +++ b/drivers/net/ethernet/wangxun/txgbe/txgbe_main.c @@ -864,7 +864,8 @@ static int txgbe_probe(struct pci_dev *pdev, "0x%08x", etrack_id); } - if (etrack_id < 0x20010) + if (wx->mac.type == wx_mac_sp && + ((etrack_id & 0xfffff) < 0x20010)) dev_warn(&pdev->dev, "Please upgrade the firmware to 0x20010 or above.\n"); err = txgbe_test_hostif(wx); diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 70b9e58b9b78..5150f2e4f66b 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -2400,6 +2400,7 @@ static int gtp_genl_send_echo_req(struct sk_buff *skb, struct genl_info *info) return -ENODEV; } + local_bh_disable(); udp_tunnel_xmit_skb(rt, sk, skb_to_send, fl4.saddr, fl4.daddr, inet_dscp_to_dsfield(fl4.flowi4_dscp), @@ -2409,6 +2410,7 @@ static int gtp_genl_send_echo_req(struct sk_buff *skb, struct genl_info *info) !net_eq(sock_net(sk), dev_net(gtp->dev)), false, 0); + local_bh_enable(); return 0; } diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 9f90c598649d..c40fa331836b 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1689,6 +1689,7 @@ static size_t macvlan_get_size(const struct net_device *dev) + macvlan_get_size_mac(vlan) /* IFLA_MACVLAN_MACADDR */ + nla_total_size(4) /* IFLA_MACVLAN_BC_QUEUE_LEN */ + nla_total_size(4) /* IFLA_MACVLAN_BC_QUEUE_LEN_USED */ + + nla_total_size(4) /* IFLA_MACVLAN_BC_CUTOFF */ ); } diff --git a/drivers/net/mdio/Kconfig b/drivers/net/mdio/Kconfig index 516b0d05e16e..c71132f33f84 100644 --- a/drivers/net/mdio/Kconfig +++ b/drivers/net/mdio/Kconfig @@ -147,6 +147,7 @@ config MDIO_OCTEON config MDIO_PIC64HPSC tristate "PIC64-HPSC/HX MDIO interface support" + depends on ARCH_MICROCHIP || COMPILE_TEST depends on HAS_IOMEM && OF_MDIO help This driver supports the MDIO interface found on the PIC64-HPSC/HX diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c index 3c9acd6e49e8..205384dab89a 100644 --- a/drivers/net/netconsole.c +++ b/drivers/net/netconsole.c @@ -497,6 +497,8 @@ static void trim_newline(char *s, size_t maxlen) size_t len; len = strnlen(s, maxlen); + if (!len) + return; if (s[len - 1] == '\n') s[len - 1] = '\0'; } diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index e1541ca76715..a05af192caf3 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -185,7 +185,9 @@ out_drop_cnt: return NETDEV_TX_OK; } -static void nsim_set_rx_mode(struct net_device *dev) +static void nsim_set_rx_mode(struct net_device *dev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { } @@ -623,7 +625,7 @@ static const struct net_shaper_ops nsim_shaper_ops = { static const struct net_device_ops nsim_netdev_ops = { .ndo_start_xmit = nsim_start_xmit, - .ndo_set_rx_mode = nsim_set_rx_mode, + .ndo_set_rx_mode_async = nsim_set_rx_mode, .ndo_set_mac_address = eth_mac_addr, .ndo_validate_addr = eth_validate_addr, .ndo_change_mtu = nsim_change_mtu, @@ -648,7 +650,7 @@ static const struct net_device_ops nsim_netdev_ops = { static const struct net_device_ops nsim_vf_netdev_ops = { .ndo_start_xmit = nsim_start_xmit, - .ndo_set_rx_mode = nsim_set_rx_mode, + .ndo_set_rx_mode_async = nsim_set_rx_mode, .ndo_set_mac_address = eth_mac_addr, .ndo_validate_addr = eth_validate_addr, .ndo_change_mtu = nsim_change_mtu, diff --git a/drivers/net/netkit.c b/drivers/net/netkit.c index 7b56a7ad7a49..5e2eecc3165d 100644 --- a/drivers/net/netkit.c +++ b/drivers/net/netkit.c @@ -186,7 +186,9 @@ static int netkit_get_iflink(const struct net_device *dev) return iflink; } -static void netkit_set_multicast(struct net_device *dev) +static void netkit_set_multicast(struct net_device *dev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc) { /* Nothing to do, we receive whatever gets pushed to us! */ } @@ -330,7 +332,7 @@ static const struct net_device_ops netkit_netdev_ops = { .ndo_open = netkit_open, .ndo_stop = netkit_close, .ndo_start_xmit = netkit_xmit, - .ndo_set_rx_mode = netkit_set_multicast, + .ndo_set_rx_mode_async = netkit_set_multicast, .ndo_set_rx_headroom = netkit_set_headroom, .ndo_set_mac_address = netkit_set_macaddr, .ndo_get_iflink = netkit_get_iflink, diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index b0d3bc49c685..57c68efa5ff8 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -2245,7 +2245,7 @@ ppp_do_recv(struct ppp *ppp, struct sk_buff *skb, struct channel *pch) */ static void __ppp_decompress_proto(struct sk_buff *skb) { - if (skb->data[0] & 0x01) + if (ppp_skb_is_compressed_proto(skb)) *(u8 *)skb_push(skb, 1) = 0x00; } diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c index d546a7af0d54..bdd61c504a1c 100644 --- a/drivers/net/ppp/pppoe.c +++ b/drivers/net/ppp/pppoe.c @@ -393,7 +393,7 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev, if (skb_mac_header_len(skb) < ETH_HLEN) goto drop; - if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr))) + if (!pskb_may_pull(skb, PPPOE_SES_HLEN)) goto drop; ph = pppoe_hdr(skb); @@ -403,6 +403,12 @@ static int pppoe_rcv(struct sk_buff *skb, struct net_device *dev, if (skb->len < len) goto drop; + /* skb->data points to the PPP protocol header after skb_pull_rcsum. + * Drop PFC frames. + */ + if (ppp_skb_is_compressed_proto(skb)) + goto drop; + if (pskb_trim_rcsum(skb, len)) goto drop; diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c index f6b94ac7a68a..87aa4f4e9724 100644 --- a/drivers/net/pse-pd/pse_core.c +++ b/drivers/net/pse-pd/pse_core.c @@ -1170,6 +1170,7 @@ struct pse_irq { struct pse_controller_dev *pcdev; struct pse_irq_desc desc; unsigned long *notifs; + unsigned long *notifs_mask; }; /** @@ -1247,7 +1248,6 @@ static int pse_set_config_isr(struct pse_controller_dev *pcdev, int id, static irqreturn_t pse_isr(int irq, void *data) { struct pse_controller_dev *pcdev; - unsigned long notifs_mask = 0; struct pse_irq_desc *desc; struct pse_irq *h = data; int ret, i; @@ -1257,14 +1257,15 @@ static irqreturn_t pse_isr(int irq, void *data) /* Clear notifs mask */ memset(h->notifs, 0, pcdev->nr_lines * sizeof(*h->notifs)); + bitmap_zero(h->notifs_mask, pcdev->nr_lines); mutex_lock(&pcdev->lock); - ret = desc->map_event(irq, pcdev, h->notifs, ¬ifs_mask); - if (ret || !notifs_mask) { + ret = desc->map_event(irq, pcdev, h->notifs, h->notifs_mask); + if (ret || bitmap_empty(h->notifs_mask, pcdev->nr_lines)) { mutex_unlock(&pcdev->lock); return IRQ_NONE; } - for_each_set_bit(i, ¬ifs_mask, pcdev->nr_lines) { + for_each_set_bit(i, h->notifs_mask, pcdev->nr_lines) { unsigned long notifs, rnotifs; struct pse_ntf ntf = {}; @@ -1340,6 +1341,10 @@ int devm_pse_irq_helper(struct pse_controller_dev *pcdev, int irq, if (!h->notifs) return -ENOMEM; + h->notifs_mask = devm_bitmap_zalloc(dev, pcdev->nr_lines, GFP_KERNEL); + if (!h->notifs_mask) + return -ENOMEM; + ret = devm_request_threaded_irq(dev, irq, NULL, pse_isr, IRQF_ONESHOT | irq_flags, irq_name, h); diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c index e3c785da3eef..1a9b27d5e256 100644 --- a/drivers/net/slip/slhc.c +++ b/drivers/net/slip/slhc.c @@ -80,9 +80,9 @@ #include <linux/unaligned.h> static unsigned char *encode(unsigned char *cp, unsigned short n); -static long decode(unsigned char **cpp); +static long decode(unsigned char **cpp, const unsigned char *end); static unsigned char * put16(unsigned char *cp, unsigned short x); -static unsigned short pull16(unsigned char **cpp); +static long pull16(unsigned char **cpp, const unsigned char *end); /* Allocate compression data structure * slots must be in range 0 to 255 (zero meaning no compression) @@ -190,30 +190,34 @@ encode(unsigned char *cp, unsigned short n) return cp; } -/* Pull a 16-bit integer in host order from buffer in network byte order */ -static unsigned short -pull16(unsigned char **cpp) +/* Pull a 16-bit integer in host order from buffer in network byte order. + * Returns -1 if the buffer is exhausted, otherwise the 16-bit value. + */ +static long +pull16(unsigned char **cpp, const unsigned char *end) { - short rval; + long rval; + if (*cpp + 2 > end) + return -1; rval = *(*cpp)++; rval <<= 8; rval |= *(*cpp)++; return rval; } -/* Decode a number */ +/* Decode a number. Returns -1 if the buffer is exhausted. */ static long -decode(unsigned char **cpp) +decode(unsigned char **cpp, const unsigned char *end) { int x; + if (*cpp >= end) + return -1; x = *(*cpp)++; - if(x == 0){ - return pull16(cpp) & 0xffff; /* pull16 returns -1 on error */ - } else { - return x & 0xff; /* -1 if PULLCHAR returned error */ - } + if (x == 0) + return pull16(cpp, end); + return x & 0xff; } /* @@ -499,6 +503,7 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize) struct cstate *cs; int len, hdrlen; unsigned char *cp = icp; + const unsigned char *end = icp + isize; /* We've got a compressed packet; read the change byte */ comp->sls_i_compressed++; @@ -506,6 +511,8 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize) comp->sls_i_error++; return 0; } + if (!comp->rstate) + goto bad; changes = *cp++; if(changes & NEW_C){ /* Make sure the state index is in range, then grab the state. @@ -534,6 +541,8 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize) thp = &cs->cs_tcp; ip = &cs->cs_ip; + if (cp + 2 > end) + goto bad; thp->check = *(__sum16 *)cp; cp += 2; @@ -564,26 +573,26 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize) default: if(changes & NEW_U){ thp->urg = 1; - if((x = decode(&cp)) == -1) { + if((x = decode(&cp, end)) == -1) { goto bad; } thp->urg_ptr = htons(x); } else thp->urg = 0; if(changes & NEW_W){ - if((x = decode(&cp)) == -1) { + if((x = decode(&cp, end)) == -1) { goto bad; } thp->window = htons( ntohs(thp->window) + x); } if(changes & NEW_A){ - if((x = decode(&cp)) == -1) { + if((x = decode(&cp, end)) == -1) { goto bad; } thp->ack_seq = htonl( ntohl(thp->ack_seq) + x); } if(changes & NEW_S){ - if((x = decode(&cp)) == -1) { + if((x = decode(&cp, end)) == -1) { goto bad; } thp->seq = htonl( ntohl(thp->seq) + x); @@ -591,7 +600,7 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize) break; } if(changes & NEW_I){ - if((x = decode(&cp)) == -1) { + if((x = decode(&cp, end)) == -1) { goto bad; } ip->id = htons (ntohs (ip->id) + x); @@ -649,6 +658,10 @@ slhc_remember(struct slcompress *comp, unsigned char *icp, int isize) struct cstate *cs; unsigned int ihl; + if (!comp->rstate) { + comp->sls_i_error++; + return slhc_toss(comp); + } /* The packet is shorter than a legal IP header. * Also make sure isize is positive. */ diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index bfb566fecb92..f4adcfee7a80 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3759,6 +3759,12 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) queue_pairs); return -EINVAL; } + + /* Keep max_tx_vq in sync so that a later RSS command does not + * revert queue_pairs to a stale value. + */ + if (vi->has_rss) + vi->rss_trailer.max_tx_vq = cpu_to_le16(queue_pairs); succ: vi->curr_queue_pairs = queue_pairs; if (dev->flags & IFF_UP) { diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 80965181920c..c6536cad9c4f 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -560,7 +560,7 @@ static void vhost_net_busy_poll(struct vhost_net *net, busyloop_timeout = poll_rx ? rvq->busyloop_timeout: tvq->busyloop_timeout; - preempt_disable(); + migrate_disable(); endtime = busy_clock() + busyloop_timeout; while (vhost_can_busy_poll(endtime)) { @@ -577,7 +577,7 @@ static void vhost_net_busy_poll(struct vhost_net *net, cpu_relax(); } - preempt_enable(); + migrate_enable(); if (poll_rx || sock_has_rx_data(sock)) vhost_net_busy_poll_try_queue(net, vq); diff --git a/include/linux/fsl/ntmp.h b/include/linux/fsl/ntmp.h index 916dc4fe7de3..83a449b4d6ec 100644 --- a/include/linux/fsl/ntmp.h +++ b/include/linux/fsl/ntmp.h @@ -31,6 +31,12 @@ struct netc_tbl_vers { u8 rsst_ver; }; +struct netc_swcbd { + void *buf; + dma_addr_t dma; + size_t size; +}; + struct netc_cbdr { struct device *dev; struct netc_cbdr_regs regs; @@ -44,9 +50,10 @@ struct netc_cbdr { void *addr_base_align; dma_addr_t dma_base; dma_addr_t dma_base_align; + struct netc_swcbd *swcbd; /* Serialize the order of command BD ring */ - spinlock_t ring_lock; + struct mutex ring_lock; }; struct ntmp_user { diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index e6272f9c5e42..20cc16ea4e5a 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -147,11 +147,13 @@ extern __be16 vlan_dev_vlan_proto(const struct net_device *dev); * @priority: skb priority * @vlan_qos: vlan priority: (skb->priority << 13) & 0xE000 * @next: pointer to next struct + * @rcu: used for deferred freeing of mapping nodes */ struct vlan_priority_tci_mapping { u32 priority; u16 vlan_qos; - struct vlan_priority_tci_mapping *next; + struct vlan_priority_tci_mapping __rcu *next; + struct rcu_head rcu; }; struct proc_dir_entry; @@ -177,7 +179,7 @@ struct vlan_dev_priv { unsigned int nr_ingress_mappings; u32 ingress_priority_map[8]; unsigned int nr_egress_mappings; - struct vlan_priority_tci_mapping *egress_priority_map[16]; + struct vlan_priority_tci_mapping __rcu *egress_priority_map[16]; __be16 vlan_proto; u16 vlan_id; @@ -209,19 +211,24 @@ static inline u16 vlan_dev_get_egress_qos_mask(struct net_device *dev, u32 skprio) { struct vlan_priority_tci_mapping *mp; + u16 vlan_qos = 0; - smp_rmb(); /* coupled with smp_wmb() in vlan_dev_set_egress_priority() */ + rcu_read_lock(); - mp = vlan_dev_priv(dev)->egress_priority_map[(skprio & 0xF)]; + mp = rcu_dereference(vlan_dev_priv(dev)->egress_priority_map[skprio & 0xF]); while (mp) { if (mp->priority == skprio) { - return mp->vlan_qos; /* This should already be shifted - * to mask correctly with the - * VLAN's TCI */ + vlan_qos = READ_ONCE(mp->vlan_qos); + break; } - mp = mp->next; + mp = rcu_dereference(mp->next); } - return 0; + rcu_read_unlock(); + + /* This should already be shifted to mask correctly with + * the VLAN's TCI. + */ + return vlan_qos; } extern bool vlan_do_receive(struct sk_buff **skb); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7969fcdd5ac4..97b435da5771 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1119,6 +1119,16 @@ struct netdev_net_notifier { * This function is called device changes address list filtering. * If driver handles unicast address filtering, it should set * IFF_UNICAST_FLT in its priv_flags. + * Cannot sleep, called with netif_addr_lock_bh held. + * Deprecated in favor of ndo_set_rx_mode_async. + * + * void (*ndo_set_rx_mode_async)(struct net_device *dev, + * struct netdev_hw_addr_list *uc, + * struct netdev_hw_addr_list *mc); + * Async version of ndo_set_rx_mode which runs in process context + * with rtnl_lock and netdev_lock_ops(dev) held. The uc/mc parameters + * are snapshots of the address lists - iterate with + * netdev_hw_addr_list_for_each(ha, uc). * * int (*ndo_set_mac_address)(struct net_device *dev, void *addr); * This function is called when the Media Access Control address @@ -1439,6 +1449,10 @@ struct net_device_ops { void (*ndo_change_rx_flags)(struct net_device *dev, int flags); void (*ndo_set_rx_mode)(struct net_device *dev); + void (*ndo_set_rx_mode_async)( + struct net_device *dev, + struct netdev_hw_addr_list *uc, + struct netdev_hw_addr_list *mc); int (*ndo_set_mac_address)(struct net_device *dev, void *addr); int (*ndo_validate_addr)(struct net_device *dev); @@ -1903,6 +1917,9 @@ enum netdev_reg_state { * has been enabled due to the need to listen to * additional unicast addresses in a device that * does not implement ndo_set_rx_mode() + * @rx_mode_node: List entry for rx_mode work processing + * @rx_mode_tracker: Refcount tracker for rx_mode work + * @rx_mode_addr_cache: Recycled snapshot entries for rx_mode work * @uc: unicast mac addresses * @mc: multicast mac addresses * @dev_addrs: list of device hw addresses @@ -2294,6 +2311,9 @@ struct net_device { unsigned int promiscuity; unsigned int allmulti; bool uc_promisc; + struct list_head rx_mode_node; + netdevice_tracker rx_mode_tracker; + struct netdev_hw_addr_list rx_mode_addr_cache; #ifdef CONFIG_LOCKDEP unsigned char nested_level; #endif @@ -5004,6 +5024,14 @@ void __hw_addr_unsync_dev(struct netdev_hw_addr_list *list, int (*unsync)(struct net_device *, const unsigned char *)); void __hw_addr_init(struct netdev_hw_addr_list *list); +void __hw_addr_flush(struct netdev_hw_addr_list *list); +int __hw_addr_list_snapshot(struct netdev_hw_addr_list *snap, + const struct netdev_hw_addr_list *list, + int addr_len, struct netdev_hw_addr_list *cache); +void __hw_addr_list_reconcile(struct netdev_hw_addr_list *real_list, + struct netdev_hw_addr_list *work, + struct netdev_hw_addr_list *ref, int addr_len, + struct netdev_hw_addr_list *cache); /* Functions used for device addresses handling */ void dev_addr_mod(struct net_device *dev, unsigned int offset, diff --git a/include/linux/ppp_defs.h b/include/linux/ppp_defs.h index b7e57fdbd413..b1d1f46d7d3b 100644 --- a/include/linux/ppp_defs.h +++ b/include/linux/ppp_defs.h @@ -8,6 +8,7 @@ #define _PPP_DEFS_H_ #include <linux/crc-ccitt.h> +#include <linux/skbuff.h> #include <uapi/linux/ppp_defs.h> #define PPP_FCS(fcs, c) crc_ccitt_byte(fcs, c) @@ -25,4 +26,19 @@ static inline bool ppp_proto_is_valid(u16 proto) return !!((proto & 0x0101) == 0x0001); } +/** + * ppp_skb_is_compressed_proto - checks if PPP protocol in a skb is compressed + * @skb: skb to check + * + * Check if the PPP protocol field is compressed (the least significant + * bit of the most significant octet is 1). skb->data must point to the PPP + * protocol header. + * + * Return: Whether the PPP protocol field is compressed. + */ +static inline bool ppp_skb_is_compressed_proto(const struct sk_buff *skb) +{ + return unlikely(skb->data[0] & 0x01); +} + #endif /* _PPP_DEFS_H_ */ diff --git a/include/net/mctp.h b/include/net/mctp.h index e1e0a69afdce..d8bf9074110d 100644 --- a/include/net/mctp.h +++ b/include/net/mctp.h @@ -26,6 +26,9 @@ struct mctp_hdr { #define MCTP_VER_MIN 1 #define MCTP_VER_MAX 1 +/* Definitions for ver field */ +#define MCTP_HDR_VER_MASK GENMASK(3, 0) + /* Definitions for flags_seq_tag field */ #define MCTP_HDR_FLAG_SOM BIT(7) #define MCTP_HDR_FLAG_EOM BIT(6) diff --git a/include/net/pie.h b/include/net/pie.h index 01cbc66825a4..1f3db0c35514 100644 --- a/include/net/pie.h +++ b/include/net/pie.h @@ -104,7 +104,7 @@ static inline void pie_vars_init(struct pie_vars *vars) vars->dq_tstamp = DTIME_INVALID; vars->accu_prob = 0; vars->dq_count = DQCOUNT_INVALID; - vars->avg_dq_rate = 0; + WRITE_ONCE(vars->avg_dq_rate, 0); } static inline struct pie_skb_cb *get_pie_cb(const struct sk_buff *skb) diff --git a/include/net/tcp.h b/include/net/tcp.h index dfa52ceefd23..ecbadcb3a744 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1513,7 +1513,7 @@ static inline u32 tcp_snd_cwnd(const struct tcp_sock *tp) static inline void tcp_snd_cwnd_set(struct tcp_sock *tp, u32 val) { WARN_ON_ONCE((int)val <= 0); - tp->snd_cwnd = val; + WRITE_ONCE(tp->snd_cwnd, val); } static inline bool tcp_in_slow_start(const struct tcp_sock *tp) @@ -2208,10 +2208,14 @@ static inline void tcp_chrono_set(struct tcp_sock *tp, const enum tcp_chrono new const u32 now = tcp_jiffies32; enum tcp_chrono old = tp->chrono_type; + /* Following WRITE_ONCE()s pair with READ_ONCE()s in + * tcp_get_info_chrono_stats(). + */ if (old > TCP_CHRONO_UNSPEC) - tp->chrono_stat[old - 1] += now - tp->chrono_start; - tp->chrono_start = now; - tp->chrono_type = new; + WRITE_ONCE(tp->chrono_stat[old - 1], + tp->chrono_stat[old - 1] + now - tp->chrono_start); + WRITE_ONCE(tp->chrono_start, now); + WRITE_ONCE(tp->chrono_type, new); } static inline void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type) diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h index e9a933641636..865d5c5a7718 100644 --- a/include/net/tcp_ecn.h +++ b/include/net/tcp_ecn.h @@ -181,7 +181,7 @@ static inline void tcp_accecn_third_ack(struct sock *sk, tcp_accecn_validate_syn_feedback(sk, ace, sent_ect)) { if ((tcp_accecn_extract_syn_ect(ace) == INET_ECN_CE) && !tp->delivered_ce) - tp->delivered_ce++; + WRITE_ONCE(tp->delivered_ce, 1); } break; } diff --git a/include/trace/events/net.h b/include/trace/events/net.h index fdd9ad474ce3..dbc2c5598e35 100644 --- a/include/trace/events/net.h +++ b/include/trace/events/net.h @@ -10,6 +10,7 @@ #include <linux/if_vlan.h> #include <linux/ip.h> #include <linux/tracepoint.h> +#include <net/busy_poll.h> TRACE_EVENT(net_dev_start_xmit, @@ -208,7 +209,8 @@ DECLARE_EVENT_CLASS(net_dev_rx_verbose_template, TP_fast_assign( __assign_str(name); #ifdef CONFIG_NET_RX_BUSY_POLL - __entry->napi_id = skb->napi_id; + __entry->napi_id = napi_id_valid(skb->napi_id) ? + skb->napi_id : 0; #else __entry->napi_id = 0; #endif diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h index 578b8038b211..573f2df3a2c9 100644 --- a/include/trace/events/rxrpc.h +++ b/include/trace/events/rxrpc.h @@ -37,6 +37,7 @@ EM(rxkad_abort_1_short_encdata, "rxkad1-short-encdata") \ EM(rxkad_abort_1_short_header, "rxkad1-short-hdr") \ EM(rxkad_abort_2_short_check, "rxkad2-short-check") \ + EM(rxkad_abort_2_crypto_unaligned, "rxkad2-crypto-unaligned") \ EM(rxkad_abort_2_short_data, "rxkad2-short-data") \ EM(rxkad_abort_2_short_header, "rxkad2-short-hdr") \ EM(rxkad_abort_2_short_len, "rxkad2-short-len") \ @@ -161,8 +162,6 @@ E_(rxrpc_call_poke_timer_now, "Timer-now") #define rxrpc_skb_traces \ - EM(rxrpc_skb_eaten_by_unshare, "ETN unshare ") \ - EM(rxrpc_skb_eaten_by_unshare_nomem, "ETN unshar-nm") \ EM(rxrpc_skb_get_call_rx, "GET call-rx ") \ EM(rxrpc_skb_get_conn_secured, "GET conn-secd") \ EM(rxrpc_skb_get_conn_work, "GET conn-work") \ @@ -189,6 +188,7 @@ EM(rxrpc_skb_put_purge, "PUT purge ") \ EM(rxrpc_skb_put_purge_oob, "PUT purge-oob") \ EM(rxrpc_skb_put_response, "PUT response ") \ + EM(rxrpc_skb_put_response_copy, "PUT resp-cpy ") \ EM(rxrpc_skb_put_rotate, "PUT rotate ") \ EM(rxrpc_skb_put_unknown, "PUT unknown ") \ EM(rxrpc_skb_see_conn_work, "SEE conn-work") \ @@ -197,6 +197,7 @@ EM(rxrpc_skb_see_recvmsg_oob, "SEE recvm-oob") \ EM(rxrpc_skb_see_reject, "SEE reject ") \ EM(rxrpc_skb_see_rotate, "SEE rotate ") \ + EM(rxrpc_skb_see_unshare_nomem, "SEE unshar-nm") \ E_(rxrpc_skb_see_version, "SEE version ") #define rxrpc_local_traces \ @@ -284,7 +285,6 @@ EM(rxrpc_conn_put_unidle, "PUT unidle ") \ EM(rxrpc_conn_put_work, "PUT work ") \ EM(rxrpc_conn_queue_challenge, "QUE chall ") \ - EM(rxrpc_conn_queue_retry_work, "QUE retry-wk") \ EM(rxrpc_conn_queue_rx_work, "QUE rx-work ") \ EM(rxrpc_conn_see_new_service_conn, "SEE new-svc ") \ EM(rxrpc_conn_see_reap_service, "SEE reap-svc") \ diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index c40f7d5c4fca..7aa3af8b10ea 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -172,39 +172,42 @@ int vlan_dev_set_egress_priority(const struct net_device *dev, u32 skb_prio, u16 vlan_prio) { struct vlan_dev_priv *vlan = vlan_dev_priv(dev); - struct vlan_priority_tci_mapping *mp = NULL; + struct vlan_priority_tci_mapping __rcu **mpp; + struct vlan_priority_tci_mapping *mp; struct vlan_priority_tci_mapping *np; + u32 bucket = skb_prio & 0xF; u32 vlan_qos = (vlan_prio << VLAN_PRIO_SHIFT) & VLAN_PRIO_MASK; /* See if a priority mapping exists.. */ - mp = vlan->egress_priority_map[skb_prio & 0xF]; + mpp = &vlan->egress_priority_map[bucket]; + mp = rtnl_dereference(*mpp); while (mp) { if (mp->priority == skb_prio) { - if (mp->vlan_qos && !vlan_qos) + if (!vlan_qos) { + rcu_assign_pointer(*mpp, rtnl_dereference(mp->next)); vlan->nr_egress_mappings--; - else if (!mp->vlan_qos && vlan_qos) - vlan->nr_egress_mappings++; - mp->vlan_qos = vlan_qos; + kfree_rcu(mp, rcu); + } else { + WRITE_ONCE(mp->vlan_qos, vlan_qos); + } return 0; } - mp = mp->next; + mpp = &mp->next; + mp = rtnl_dereference(*mpp); } /* Create a new mapping then. */ - mp = vlan->egress_priority_map[skb_prio & 0xF]; + if (!vlan_qos) + return 0; + np = kmalloc_obj(struct vlan_priority_tci_mapping); if (!np) return -ENOBUFS; - np->next = mp; np->priority = skb_prio; np->vlan_qos = vlan_qos; - /* Before inserting this element in hash table, make sure all its fields - * are committed to memory. - * coupled with smp_rmb() in vlan_dev_get_egress_qos_mask() - */ - smp_wmb(); - vlan->egress_priority_map[skb_prio & 0xF] = np; + RCU_INIT_POINTER(np->next, rtnl_dereference(vlan->egress_priority_map[bucket])); + rcu_assign_pointer(vlan->egress_priority_map[bucket], np); if (vlan_qos) vlan->nr_egress_mappings++; return 0; @@ -604,11 +607,17 @@ void vlan_dev_free_egress_priority(const struct net_device *dev) int i; for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) { - while ((pm = vlan->egress_priority_map[i]) != NULL) { - vlan->egress_priority_map[i] = pm->next; - kfree(pm); + pm = rtnl_dereference(vlan->egress_priority_map[i]); + RCU_INIT_POINTER(vlan->egress_priority_map[i], NULL); + while (pm) { + struct vlan_priority_tci_mapping *next; + + next = rtnl_dereference(pm->next); + kfree_rcu(pm, rcu); + pm = next; } } + vlan->nr_egress_mappings = 0; } static void vlan_dev_uninit(struct net_device *dev) diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c index a000b1ef0520..368d53ca7d87 100644 --- a/net/8021q/vlan_netlink.c +++ b/net/8021q/vlan_netlink.c @@ -260,13 +260,11 @@ static int vlan_fill_info(struct sk_buff *skb, const struct net_device *dev) goto nla_put_failure; for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) { - for (pm = vlan->egress_priority_map[i]; pm; - pm = pm->next) { - if (!pm->vlan_qos) - continue; - + for (pm = rcu_dereference_rtnl(vlan->egress_priority_map[i]); pm; + pm = rcu_dereference_rtnl(pm->next)) { + u16 vlan_qos = READ_ONCE(pm->vlan_qos); m.from = pm->priority; - m.to = (pm->vlan_qos >> 13) & 0x7; + m.to = (vlan_qos >> 13) & 0x7; if (nla_put(skb, IFLA_VLAN_QOS_MAPPING, sizeof(m), &m)) goto nla_put_failure; diff --git a/net/8021q/vlanproc.c b/net/8021q/vlanproc.c index fa67374bda49..0e424e0895b7 100644 --- a/net/8021q/vlanproc.c +++ b/net/8021q/vlanproc.c @@ -262,15 +262,19 @@ static int vlandev_seq_show(struct seq_file *seq, void *offset) vlan->ingress_priority_map[7]); seq_printf(seq, " EGRESS priority mappings: "); + rcu_read_lock(); for (i = 0; i < 16; i++) { - const struct vlan_priority_tci_mapping *mp - = vlan->egress_priority_map[i]; + const struct vlan_priority_tci_mapping *mp = + rcu_dereference(vlan->egress_priority_map[i]); while (mp) { + u16 vlan_qos = READ_ONCE(mp->vlan_qos); + seq_printf(seq, "%u:%d ", - mp->priority, ((mp->vlan_qos >> 13) & 0x7)); - mp = mp->next; + mp->priority, ((vlan_qos >> 13) & 0x7)); + mp = rcu_dereference(mp->next); } } + rcu_read_unlock(); seq_puts(seq, "\n"); return 0; diff --git a/net/bridge/br_arp_nd_proxy.c b/net/bridge/br_arp_nd_proxy.c index 0c8a06cdd46f..deb1ab1f24b0 100644 --- a/net/bridge/br_arp_nd_proxy.c +++ b/net/bridge/br_arp_nd_proxy.c @@ -201,11 +201,12 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br, f = br_fdb_find_rcu(br, n->ha, vid); if (f) { + const struct net_bridge_port *dst = READ_ONCE(f->dst); bool replied = false; if ((p && (p->flags & BR_PROXYARP)) || - (f->dst && (f->dst->flags & BR_PROXYARP_WIFI)) || - br_is_neigh_suppress_enabled(f->dst, vid)) { + (dst && (dst->flags & BR_PROXYARP_WIFI)) || + br_is_neigh_suppress_enabled(dst, vid)) { if (!vid) br_arp_send(br, p, skb->dev, sip, tip, sha, n->ha, sha, 0, 0); @@ -469,9 +470,10 @@ void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, f = br_fdb_find_rcu(br, n->ha, vid); if (f) { + const struct net_bridge_port *dst = READ_ONCE(f->dst); bool replied = false; - if (br_is_neigh_suppress_enabled(f->dst, vid)) { + if (br_is_neigh_suppress_enabled(dst, vid)) { if (vid != 0) br_nd_send(br, p, skb, n, skb->vlan_proto, diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index e2c17f620f00..6eb3ab69a514 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -236,6 +236,7 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev, const unsigned char *addr, __u16 vid) { + const struct net_bridge_port *dst; struct net_bridge_fdb_entry *f; struct net_device *dev = NULL; struct net_bridge *br; @@ -248,8 +249,11 @@ struct net_device *br_fdb_find_port(const struct net_device *br_dev, br = netdev_priv(br_dev); rcu_read_lock(); f = br_fdb_find_rcu(br, addr, vid); - if (f && f->dst) - dev = f->dst->dev; + if (f) { + dst = READ_ONCE(f->dst); + if (dst) + dev = dst->dev; + } rcu_read_unlock(); return dev; @@ -346,7 +350,7 @@ static void fdb_delete_local(struct net_bridge *br, vg = nbp_vlan_group(op); if (op != p && ether_addr_equal(op->dev->dev_addr, addr) && (!vid || br_vlan_find(vg, vid))) { - f->dst = op; + WRITE_ONCE(f->dst, op); clear_bit(BR_FDB_ADDED_BY_USER, &f->flags); return; } @@ -357,7 +361,7 @@ static void fdb_delete_local(struct net_bridge *br, /* Maybe bridge device has same hw addr? */ if (p && ether_addr_equal(br->dev->dev_addr, addr) && (!vid || (v && br_vlan_should_use(v)))) { - f->dst = NULL; + WRITE_ONCE(f->dst, NULL); clear_bit(BR_FDB_ADDED_BY_USER, &f->flags); return; } @@ -928,6 +932,7 @@ int br_fdb_test_addr(struct net_device *dev, unsigned char *addr) int br_fdb_fillbuf(struct net_bridge *br, void *buf, unsigned long maxnum, unsigned long skip) { + const struct net_bridge_port *dst; struct net_bridge_fdb_entry *f; struct __fdb_entry *fe = buf; unsigned long delta; @@ -944,7 +949,8 @@ int br_fdb_fillbuf(struct net_bridge *br, void *buf, continue; /* ignore pseudo entry for local MAC address */ - if (!f->dst) + dst = READ_ONCE(f->dst); + if (!dst) continue; if (skip) { @@ -956,8 +962,8 @@ int br_fdb_fillbuf(struct net_bridge *br, void *buf, memcpy(fe->mac_addr, f->key.addr.addr, ETH_ALEN); /* due to ABI compat need to split into hi/lo */ - fe->port_no = f->dst->port_no; - fe->port_hi = f->dst->port_no >> 8; + fe->port_no = dst->port_no; + fe->port_hi = dst->port_no >> 8; fe->is_local = test_bit(BR_FDB_LOCAL, &f->flags); if (!test_bit(BR_FDB_STATIC, &f->flags)) { @@ -1083,9 +1089,11 @@ int br_fdb_dump(struct sk_buff *skb, rcu_read_lock(); hlist_for_each_entry_rcu(f, &br->fdb_list, fdb_node) { + const struct net_bridge_port *dst = READ_ONCE(f->dst); + if (*idx < ctx->fdb_idx) goto skip; - if (filter_dev && (!f->dst || f->dst->dev != filter_dev)) { + if (filter_dev && (!dst || dst->dev != filter_dev)) { if (filter_dev != dev) goto skip; /* !f->dst is a special case for bridge @@ -1093,10 +1101,10 @@ int br_fdb_dump(struct sk_buff *skb, * Therefore need a little more filtering * we only want to dump the !f->dst case */ - if (f->dst) + if (dst) goto skip; } - if (!filter_dev && f->dst) + if (!filter_dev && dst) goto skip; err = fdb_fill_info(skb, br, f, diff --git a/net/core/dev.c b/net/core/dev.c index e59f6025067c..d426c1beeb76 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9593,14 +9593,14 @@ static void dev_change_rx_flags(struct net_device *dev, int flags) ops->ndo_change_rx_flags(dev, flags); } -static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify) +int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify) { unsigned int old_flags = dev->flags; unsigned int promiscuity, flags; kuid_t uid; kgid_t gid; - ASSERT_RTNL(); + netdev_ops_assert_locked(dev); promiscuity = dev->promiscuity + inc; if (promiscuity == 0) { @@ -9636,16 +9636,8 @@ static int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify) dev_change_rx_flags(dev, IFF_PROMISC); } - if (notify) { - /* The ops lock is only required to ensure consistent locking - * for `NETDEV_CHANGE` notifiers. This function is sometimes - * called without the lock, even for devices that are ops - * locked, such as in `dev_uc_sync_multiple` when using - * bonding or teaming. - */ - netdev_ops_assert_locked(dev); + if (notify) __dev_notify_flags(dev, old_flags, IFF_PROMISC, 0, NULL); - } return 0; } @@ -9667,7 +9659,7 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify) unsigned int old_flags = dev->flags, old_gflags = dev->gflags; unsigned int allmulti, flags; - ASSERT_RTNL(); + netdev_ops_assert_locked(dev); allmulti = dev->allmulti + inc; if (allmulti == 0) { @@ -9697,46 +9689,6 @@ int netif_set_allmulti(struct net_device *dev, int inc, bool notify) return 0; } -/* - * Upload unicast and multicast address lists to device and - * configure RX filtering. When the device doesn't support unicast - * filtering it is put in promiscuous mode while unicast addresses - * are present. - */ -void __dev_set_rx_mode(struct net_device *dev) -{ - const struct net_device_ops *ops = dev->netdev_ops; - - /* dev_open will call this function so the list will stay sane. */ - if (!(dev->flags&IFF_UP)) - return; - - if (!netif_device_present(dev)) - return; - - if (!(dev->priv_flags & IFF_UNICAST_FLT)) { - /* Unicast addresses changes may only happen under the rtnl, - * therefore calling __dev_set_promiscuity here is safe. - */ - if (!netdev_uc_empty(dev) && !dev->uc_promisc) { - __dev_set_promiscuity(dev, 1, false); - dev->uc_promisc = true; - } else if (netdev_uc_empty(dev) && dev->uc_promisc) { - __dev_set_promiscuity(dev, -1, false); - dev->uc_promisc = false; - } - } - - if (ops->ndo_set_rx_mode) - ops->ndo_set_rx_mode(dev); -} - -void dev_set_rx_mode(struct net_device *dev) -{ - netif_addr_lock_bh(dev); - __dev_set_rx_mode(dev); - netif_addr_unlock_bh(dev); -} /** * netif_get_flags() - get flags reported to userspace @@ -9775,7 +9727,7 @@ int __dev_change_flags(struct net_device *dev, unsigned int flags, unsigned int old_flags = dev->flags; int ret; - ASSERT_RTNL(); + netdev_ops_assert_locked(dev); /* * Set the flags on our device. @@ -11408,6 +11360,11 @@ int register_netdevice(struct net_device *dev) goto err_uninit; } + if (netdev_need_ops_lock(dev) && + dev->netdev_ops->ndo_set_rx_mode && + !dev->netdev_ops->ndo_set_rx_mode_async) + netdev_WARN(dev, "ops-locked drivers should use ndo_set_rx_mode_async\n"); + ret = netdev_do_alloc_pcpu_stats(dev); if (ret) goto err_uninit; @@ -12127,6 +12084,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, #endif mutex_init(&dev->lock); + INIT_LIST_HEAD(&dev->rx_mode_node); + __hw_addr_init(&dev->rx_mode_addr_cache); dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; setup(dev); @@ -12231,6 +12190,8 @@ void free_netdev(struct net_device *dev) kfree(rcu_dereference_protected(dev->ingress_queue, 1)); + __hw_addr_flush(&dev->rx_mode_addr_cache); + /* Flush device addresses */ dev_addr_flush(dev); diff --git a/net/core/dev.h b/net/core/dev.h index 628bdaebf0ca..0cf24b8f5008 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -78,6 +78,7 @@ void linkwatch_run_queue(void); void dev_addr_flush(struct net_device *dev); int dev_addr_init(struct net_device *dev); void dev_addr_check(struct net_device *dev); +void __hw_addr_flush(struct netdev_hw_addr_list *list); #if IS_ENABLED(CONFIG_NET_SHAPER) void net_shaper_flush_netdev(struct net_device *dev); @@ -164,6 +165,9 @@ int netif_change_carrier(struct net_device *dev, bool new_carrier); int dev_change_carrier(struct net_device *dev, bool new_carrier); void __dev_set_rx_mode(struct net_device *dev); +int __dev_set_promiscuity(struct net_device *dev, int inc, bool notify); +bool netif_rx_mode_clean(struct net_device *dev); +void netif_rx_mode_sync(struct net_device *dev); void __dev_notify_flags(struct net_device *dev, unsigned int old_flags, unsigned int gchanges, u32 portid, diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c index 76c91f224886..d73fcb0c6785 100644 --- a/net/core/dev_addr_lists.c +++ b/net/core/dev_addr_lists.c @@ -11,9 +11,18 @@ #include <linux/rtnetlink.h> #include <linux/export.h> #include <linux/list.h> +#include <linux/spinlock.h> +#include <linux/workqueue.h> +#include <kunit/visibility.h> #include "dev.h" +static void netdev_rx_mode_work(struct work_struct *work); + +static LIST_HEAD(rx_mode_list); +static DEFINE_SPINLOCK(rx_mode_lock); +static DECLARE_WORK(rx_mode_work, netdev_rx_mode_work); + /* * General list handling functions */ @@ -481,7 +490,7 @@ void __hw_addr_unsync_dev(struct netdev_hw_addr_list *list, } EXPORT_SYMBOL(__hw_addr_unsync_dev); -static void __hw_addr_flush(struct netdev_hw_addr_list *list) +void __hw_addr_flush(struct netdev_hw_addr_list *list) { struct netdev_hw_addr *ha, *tmp; @@ -492,6 +501,7 @@ static void __hw_addr_flush(struct netdev_hw_addr_list *list) } list->count = 0; } +EXPORT_SYMBOL_IF_KUNIT(__hw_addr_flush); void __hw_addr_init(struct netdev_hw_addr_list *list) { @@ -501,6 +511,133 @@ void __hw_addr_init(struct netdev_hw_addr_list *list) } EXPORT_SYMBOL(__hw_addr_init); +static void __hw_addr_splice(struct netdev_hw_addr_list *dst, + struct netdev_hw_addr_list *src) +{ + src->tree = RB_ROOT; + list_splice_init(&src->list, &dst->list); + dst->count += src->count; + src->count = 0; +} + +/** + * __hw_addr_list_snapshot - create a snapshot copy of an address list + * @snap: destination snapshot list (needs to be __hw_addr_init-initialized) + * @list: source address list to snapshot + * @addr_len: length of addresses + * @cache: entry cache to reuse entries from; falls back to GFP_ATOMIC + * + * Creates a copy of @list reusing entries from @cache when available. + * Must be called under a spinlock. + * + * Return: 0 on success, -errno on failure. + */ +int __hw_addr_list_snapshot(struct netdev_hw_addr_list *snap, + const struct netdev_hw_addr_list *list, + int addr_len, struct netdev_hw_addr_list *cache) +{ + struct netdev_hw_addr *ha, *entry; + + list_for_each_entry(ha, &list->list, list) { + if (cache->count) { + entry = list_first_entry(&cache->list, + struct netdev_hw_addr, list); + list_del(&entry->list); + cache->count--; + memcpy(entry->addr, ha->addr, addr_len); + entry->type = ha->type; + entry->global_use = false; + entry->synced = 0; + } else { + entry = __hw_addr_create(ha->addr, addr_len, ha->type, + false, false); + if (!entry) { + __hw_addr_flush(snap); + return -ENOMEM; + } + } + entry->sync_cnt = ha->sync_cnt; + entry->refcount = ha->refcount; + + list_add_tail(&entry->list, &snap->list); + __hw_addr_insert(snap, entry, addr_len); + snap->count++; + } + + return 0; +} +EXPORT_SYMBOL_IF_KUNIT(__hw_addr_list_snapshot); + +/** + * __hw_addr_list_reconcile - sync snapshot changes back and free snapshots + * @real_list: the real address list to update + * @work: the working snapshot (modified by driver via __hw_addr_sync_dev) + * @ref: the reference snapshot (untouched copy of original state) + * @addr_len: length of addresses + * @cache: entry cache to return snapshot entries to for reuse + * + * Walks the reference snapshot and compares each entry against the work + * snapshot to compute sync_cnt deltas. Applies those deltas to @real_list. + * Returns snapshot entries to @cache for reuse; frees both snapshots. + * Caller must hold netif_addr_lock_bh. + */ +void __hw_addr_list_reconcile(struct netdev_hw_addr_list *real_list, + struct netdev_hw_addr_list *work, + struct netdev_hw_addr_list *ref, int addr_len, + struct netdev_hw_addr_list *cache) +{ + struct netdev_hw_addr *ref_ha, *tmp, *work_ha, *real_ha; + int delta; + + list_for_each_entry_safe(ref_ha, tmp, &ref->list, list) { + work_ha = __hw_addr_lookup(work, ref_ha->addr, addr_len, + ref_ha->type); + if (work_ha) + delta = work_ha->sync_cnt - ref_ha->sync_cnt; + else + delta = -1; + + if (delta == 0) + continue; + + real_ha = __hw_addr_lookup(real_list, ref_ha->addr, addr_len, + ref_ha->type); + if (!real_ha) { + /* The real entry was concurrently removed. If the + * driver synced this addr to hardware (delta > 0), + * re-insert it as a stale entry so the next work + * run unsyncs it from hardware. + */ + if (delta > 0) { + rb_erase(&ref_ha->node, &ref->tree); + list_del(&ref_ha->list); + ref->count--; + ref_ha->sync_cnt = delta; + ref_ha->refcount = delta; + list_add_tail_rcu(&ref_ha->list, + &real_list->list); + __hw_addr_insert(real_list, ref_ha, + addr_len); + real_list->count++; + } + continue; + } + + real_ha->sync_cnt += delta; + real_ha->refcount += delta; + if (!real_ha->refcount) { + rb_erase(&real_ha->node, &real_list->tree); + list_del_rcu(&real_ha->list); + kfree_rcu(real_ha, rcu_head); + real_list->count--; + } + } + + __hw_addr_splice(cache, work); + __hw_addr_splice(cache, ref); +} +EXPORT_SYMBOL_IF_KUNIT(__hw_addr_list_reconcile); + /* * Device addresses handling functions */ @@ -1049,3 +1186,249 @@ void dev_mc_init(struct net_device *dev) __hw_addr_init(&dev->mc); } EXPORT_SYMBOL(dev_mc_init); + +static int netif_addr_lists_snapshot(struct net_device *dev, + struct netdev_hw_addr_list *uc_snap, + struct netdev_hw_addr_list *mc_snap, + struct netdev_hw_addr_list *uc_ref, + struct netdev_hw_addr_list *mc_ref) +{ + int err; + + err = __hw_addr_list_snapshot(uc_snap, &dev->uc, dev->addr_len, + &dev->rx_mode_addr_cache); + if (!err) + err = __hw_addr_list_snapshot(uc_ref, &dev->uc, dev->addr_len, + &dev->rx_mode_addr_cache); + if (!err) + err = __hw_addr_list_snapshot(mc_snap, &dev->mc, + dev->addr_len, + &dev->rx_mode_addr_cache); + if (!err) + err = __hw_addr_list_snapshot(mc_ref, &dev->mc, dev->addr_len, + &dev->rx_mode_addr_cache); + + if (err) { + __hw_addr_flush(uc_snap); + __hw_addr_flush(uc_ref); + __hw_addr_flush(mc_snap); + } + + return err; +} + +static void netif_addr_lists_reconcile(struct net_device *dev, + struct netdev_hw_addr_list *uc_snap, + struct netdev_hw_addr_list *mc_snap, + struct netdev_hw_addr_list *uc_ref, + struct netdev_hw_addr_list *mc_ref) +{ + __hw_addr_list_reconcile(&dev->uc, uc_snap, uc_ref, dev->addr_len, + &dev->rx_mode_addr_cache); + __hw_addr_list_reconcile(&dev->mc, mc_snap, mc_ref, dev->addr_len, + &dev->rx_mode_addr_cache); +} + +/** + * netif_uc_promisc_update() - evaluate whether uc_promisc should be toggled. + * @dev: device + * + * Must be called under netif_addr_lock_bh. + * Return: +1 to enter promisc, -1 to leave, 0 for no change. + */ +static int netif_uc_promisc_update(struct net_device *dev) +{ + if (dev->priv_flags & IFF_UNICAST_FLT) + return 0; + + if (!netdev_uc_empty(dev) && !dev->uc_promisc) { + dev->uc_promisc = true; + return 1; + } + if (netdev_uc_empty(dev) && dev->uc_promisc) { + dev->uc_promisc = false; + return -1; + } + return 0; +} + +static void netif_rx_mode_run(struct net_device *dev) +{ + struct netdev_hw_addr_list uc_snap, mc_snap, uc_ref, mc_ref; + const struct net_device_ops *ops = dev->netdev_ops; + int promisc_inc; + int err; + + might_sleep(); + netdev_ops_assert_locked(dev); + + __hw_addr_init(&uc_snap); + __hw_addr_init(&mc_snap); + __hw_addr_init(&uc_ref); + __hw_addr_init(&mc_ref); + + if (!(dev->flags & IFF_UP) || !netif_device_present(dev)) + return; + + if (ops->ndo_set_rx_mode_async) { + netif_addr_lock_bh(dev); + err = netif_addr_lists_snapshot(dev, &uc_snap, &mc_snap, + &uc_ref, &mc_ref); + if (err) { + netdev_WARN(dev, "failed to sync uc/mc addresses\n"); + netif_addr_unlock_bh(dev); + return; + } + + promisc_inc = netif_uc_promisc_update(dev); + netif_addr_unlock_bh(dev); + } else { + netif_addr_lock_bh(dev); + promisc_inc = netif_uc_promisc_update(dev); + netif_addr_unlock_bh(dev); + } + + if (promisc_inc) + __dev_set_promiscuity(dev, promisc_inc, false); + + if (ops->ndo_set_rx_mode_async) { + ops->ndo_set_rx_mode_async(dev, &uc_snap, &mc_snap); + + netif_addr_lock_bh(dev); + netif_addr_lists_reconcile(dev, &uc_snap, &mc_snap, + &uc_ref, &mc_ref); + netif_addr_unlock_bh(dev); + } else if (ops->ndo_set_rx_mode) { + netif_addr_lock_bh(dev); + ops->ndo_set_rx_mode(dev); + netif_addr_unlock_bh(dev); + } +} + +static void netdev_rx_mode_work(struct work_struct *work) +{ + struct net_device *dev; + + rtnl_lock(); + + while (true) { + spin_lock_bh(&rx_mode_lock); + if (list_empty(&rx_mode_list)) { + spin_unlock_bh(&rx_mode_lock); + break; + } + dev = list_first_entry(&rx_mode_list, struct net_device, + rx_mode_node); + list_del_init(&dev->rx_mode_node); + /* We must free netdev tracker under + * the spinlock protection. + */ + netdev_tracker_free(dev, &dev->rx_mode_tracker); + spin_unlock_bh(&rx_mode_lock); + + netdev_lock_ops(dev); + netif_rx_mode_run(dev); + netdev_unlock_ops(dev); + /* Use __dev_put() because netdev_tracker_free() was already + * called above. Must be after netdev_unlock_ops() to prevent + * netdev_run_todo() from freeing the device while still in use. + */ + __dev_put(dev); + } + + rtnl_unlock(); +} + +static void netif_rx_mode_queue(struct net_device *dev) +{ + spin_lock_bh(&rx_mode_lock); + if (list_empty(&dev->rx_mode_node)) { + list_add_tail(&dev->rx_mode_node, &rx_mode_list); + netdev_hold(dev, &dev->rx_mode_tracker, GFP_ATOMIC); + } + spin_unlock_bh(&rx_mode_lock); + schedule_work(&rx_mode_work); +} + +/** + * __dev_set_rx_mode() - upload unicast and multicast address lists to device + * and configure RX filtering. + * @dev: device + * + * When the device doesn't support unicast filtering it is put in promiscuous + * mode while unicast addresses are present. + */ +void __dev_set_rx_mode(struct net_device *dev) +{ + const struct net_device_ops *ops = dev->netdev_ops; + int promisc_inc; + + /* dev_open will call this function so the list will stay sane. */ + if (!(dev->flags & IFF_UP)) + return; + + if (!netif_device_present(dev)) + return; + + if (ops->ndo_set_rx_mode_async || ops->ndo_change_rx_flags || + netdev_need_ops_lock(dev)) { + netif_rx_mode_queue(dev); + return; + } + + /* Legacy path for non-ops-locked HW devices. */ + + promisc_inc = netif_uc_promisc_update(dev); + if (promisc_inc) + __dev_set_promiscuity(dev, promisc_inc, false); + + if (ops->ndo_set_rx_mode) + ops->ndo_set_rx_mode(dev); +} + +void dev_set_rx_mode(struct net_device *dev) +{ + netif_addr_lock_bh(dev); + __dev_set_rx_mode(dev); + netif_addr_unlock_bh(dev); +} + +bool netif_rx_mode_clean(struct net_device *dev) +{ + bool clean = false; + + spin_lock_bh(&rx_mode_lock); + if (!list_empty(&dev->rx_mode_node)) { + list_del_init(&dev->rx_mode_node); + clean = true; + /* We must release netdev tracker under + * the spinlock protection. + */ + netdev_tracker_free(dev, &dev->rx_mode_tracker); + } + spin_unlock_bh(&rx_mode_lock); + + return clean; +} + +/** + * netif_rx_mode_sync() - sync rx mode inline + * @dev: network device + * + * Drivers implementing ndo_set_rx_mode_async() have their rx mode callback + * executed from a workqueue. This allows the callback to sleep, but means + * the hardware update is deferred and may not be visible to userspace + * by the time the initiating syscall returns. netif_rx_mode_sync() steals + * workqueue update and executes it inline. This preserves the atomicity of + * operations to the userspace. + */ +void netif_rx_mode_sync(struct net_device *dev) +{ + if (netif_rx_mode_clean(dev)) { + netif_rx_mode_run(dev); + /* Use __dev_put() because netdev_tracker_free() was already + * called inside netif_rx_mode_clean(). + */ + __dev_put(dev); + } +} diff --git a/net/core/dev_addr_lists_test.c b/net/core/dev_addr_lists_test.c index 8e1dba825e94..260e71a2399f 100644 --- a/net/core/dev_addr_lists_test.c +++ b/net/core/dev_addr_lists_test.c @@ -2,22 +2,31 @@ #include <kunit/test.h> #include <linux/etherdevice.h> +#include <linux/math64.h> #include <linux/netdevice.h> #include <linux/rtnetlink.h> static const struct net_device_ops dummy_netdev_ops = { }; +#define ADDR_A 1 +#define ADDR_B 2 +#define ADDR_C 3 + struct dev_addr_test_priv { u32 addr_seen; + u32 addr_synced; + u32 addr_unsynced; }; static int dev_addr_test_sync(struct net_device *netdev, const unsigned char *a) { struct dev_addr_test_priv *datp = netdev_priv(netdev); - if (a[0] < 31 && !memchr_inv(a, a[0], ETH_ALEN)) + if (a[0] < 31 && !memchr_inv(a, a[0], ETH_ALEN)) { datp->addr_seen |= 1 << a[0]; + datp->addr_synced |= 1 << a[0]; + } return 0; } @@ -26,11 +35,22 @@ static int dev_addr_test_unsync(struct net_device *netdev, { struct dev_addr_test_priv *datp = netdev_priv(netdev); - if (a[0] < 31 && !memchr_inv(a, a[0], ETH_ALEN)) + if (a[0] < 31 && !memchr_inv(a, a[0], ETH_ALEN)) { datp->addr_seen &= ~(1 << a[0]); + datp->addr_unsynced |= 1 << a[0]; + } return 0; } +static void dev_addr_test_reset(struct net_device *netdev) +{ + struct dev_addr_test_priv *datp = netdev_priv(netdev); + + datp->addr_seen = 0; + datp->addr_synced = 0; + datp->addr_unsynced = 0; +} + static int dev_addr_test_init(struct kunit *test) { struct dev_addr_test_priv *datp; @@ -225,6 +245,363 @@ static void dev_addr_test_add_excl(struct kunit *test) rtnl_unlock(); } +/* Snapshot test: basic sync with no concurrent modifications. + * Add one address, snapshot, driver syncs it, reconcile propagates + * sync_cnt delta back to real list. + */ +static void dev_addr_test_snapshot_sync(struct kunit *test) +{ + struct netdev_hw_addr_list snap, ref, cache; + struct net_device *netdev = test->priv; + struct dev_addr_test_priv *datp; + struct netdev_hw_addr *ha; + u8 addr[ETH_ALEN]; + + datp = netdev_priv(netdev); + + rtnl_lock(); + + memset(addr, ADDR_A, sizeof(addr)); + KUNIT_EXPECT_EQ(test, 0, dev_uc_add(netdev, addr)); + + /* Snapshot: ADDR_A has sync_cnt=0, refcount=1 (new) */ + netif_addr_lock_bh(netdev); + __hw_addr_init(&snap); + __hw_addr_init(&ref); + __hw_addr_init(&cache); + KUNIT_EXPECT_EQ(test, 0, + __hw_addr_list_snapshot(&snap, &netdev->uc, ETH_ALEN, + &cache)); + KUNIT_EXPECT_EQ(test, 0, + __hw_addr_list_snapshot(&ref, &netdev->uc, ETH_ALEN, + &cache)); + netif_addr_unlock_bh(netdev); + + /* Driver syncs ADDR_A to hardware */ + dev_addr_test_reset(netdev); + __hw_addr_sync_dev(&snap, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + KUNIT_EXPECT_EQ(test, 1 << ADDR_A, datp->addr_synced); + KUNIT_EXPECT_EQ(test, 0, datp->addr_unsynced); + + /* Reconcile: delta=+1 applied to real entry */ + netif_addr_lock_bh(netdev); + __hw_addr_list_reconcile(&netdev->uc, &snap, &ref, ETH_ALEN, + &cache); + netif_addr_unlock_bh(netdev); + + /* Real entry should now reflect the sync: sync_cnt=1, refcount=2 */ + KUNIT_EXPECT_EQ(test, 1, netdev->uc.count); + ha = list_first_entry(&netdev->uc.list, struct netdev_hw_addr, list); + KUNIT_EXPECT_MEMEQ(test, ha->addr, addr, ETH_ALEN); + KUNIT_EXPECT_EQ(test, 1, ha->sync_cnt); + KUNIT_EXPECT_EQ(test, 2, ha->refcount); + + /* Second work run: already synced, nothing to do */ + dev_addr_test_reset(netdev); + __hw_addr_sync_dev(&netdev->uc, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + KUNIT_EXPECT_EQ(test, 0, datp->addr_synced); + KUNIT_EXPECT_EQ(test, 0, datp->addr_unsynced); + KUNIT_EXPECT_EQ(test, 1, netdev->uc.count); + + __hw_addr_flush(&cache); + rtnl_unlock(); +} + +/* Snapshot test: ADDR_A synced to hardware, then concurrently removed + * from the real list before reconcile runs. Reconcile re-inserts ADDR_A as + * a stale entry so the next work run unsyncs it from hardware. + */ +static void dev_addr_test_snapshot_remove_during_sync(struct kunit *test) +{ + struct netdev_hw_addr_list snap, ref, cache; + struct net_device *netdev = test->priv; + struct dev_addr_test_priv *datp; + struct netdev_hw_addr *ha; + u8 addr[ETH_ALEN]; + + datp = netdev_priv(netdev); + + rtnl_lock(); + + memset(addr, ADDR_A, sizeof(addr)); + KUNIT_EXPECT_EQ(test, 0, dev_uc_add(netdev, addr)); + + /* Snapshot: ADDR_A is new (sync_cnt=0, refcount=1) */ + netif_addr_lock_bh(netdev); + __hw_addr_init(&snap); + __hw_addr_init(&ref); + __hw_addr_init(&cache); + KUNIT_EXPECT_EQ(test, 0, + __hw_addr_list_snapshot(&snap, &netdev->uc, ETH_ALEN, + &cache)); + KUNIT_EXPECT_EQ(test, 0, + __hw_addr_list_snapshot(&ref, &netdev->uc, ETH_ALEN, + &cache)); + netif_addr_unlock_bh(netdev); + + /* Driver syncs ADDR_A to hardware */ + dev_addr_test_reset(netdev); + __hw_addr_sync_dev(&snap, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + KUNIT_EXPECT_EQ(test, 1 << ADDR_A, datp->addr_synced); + KUNIT_EXPECT_EQ(test, 0, datp->addr_unsynced); + + /* Concurrent removal: user deletes ADDR_A while driver was working */ + memset(addr, ADDR_A, sizeof(addr)); + KUNIT_EXPECT_EQ(test, 0, dev_uc_del(netdev, addr)); + KUNIT_EXPECT_EQ(test, 0, netdev->uc.count); + + /* Reconcile: ADDR_A gone from real list but driver synced it, + * so it gets re-inserted as stale (sync_cnt=1, refcount=1). + */ + netif_addr_lock_bh(netdev); + __hw_addr_list_reconcile(&netdev->uc, &snap, &ref, ETH_ALEN, + &cache); + netif_addr_unlock_bh(netdev); + + KUNIT_EXPECT_EQ(test, 1, netdev->uc.count); + ha = list_first_entry(&netdev->uc.list, struct netdev_hw_addr, list); + KUNIT_EXPECT_MEMEQ(test, ha->addr, addr, ETH_ALEN); + KUNIT_EXPECT_EQ(test, 1, ha->sync_cnt); + KUNIT_EXPECT_EQ(test, 1, ha->refcount); + + /* Second work run: stale entry gets unsynced from HW and removed */ + dev_addr_test_reset(netdev); + __hw_addr_sync_dev(&netdev->uc, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + KUNIT_EXPECT_EQ(test, 0, datp->addr_synced); + KUNIT_EXPECT_EQ(test, 1 << ADDR_A, datp->addr_unsynced); + KUNIT_EXPECT_EQ(test, 0, netdev->uc.count); + + __hw_addr_flush(&cache); + rtnl_unlock(); +} + +/* Snapshot test: ADDR_A was stale (unsynced from hardware by driver), + * but concurrently re-added by the user. The re-add bumps refcount of + * the existing stale entry. Reconcile applies delta=-1, leaving ADDR_A + * as a fresh entry (sync_cnt=0, refcount=1) for the next work run. + */ +static void dev_addr_test_snapshot_readd_during_unsync(struct kunit *test) +{ + struct netdev_hw_addr_list snap, ref, cache; + struct net_device *netdev = test->priv; + struct dev_addr_test_priv *datp; + struct netdev_hw_addr *ha; + u8 addr[ETH_ALEN]; + + datp = netdev_priv(netdev); + + rtnl_lock(); + + memset(addr, ADDR_A, sizeof(addr)); + KUNIT_EXPECT_EQ(test, 0, dev_uc_add(netdev, addr)); + + /* Sync ADDR_A to hardware: sync_cnt=1, refcount=2 */ + dev_addr_test_reset(netdev); + __hw_addr_sync_dev(&netdev->uc, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + KUNIT_EXPECT_EQ(test, 1 << ADDR_A, datp->addr_synced); + KUNIT_EXPECT_EQ(test, 0, datp->addr_unsynced); + + /* User removes ADDR_A: refcount=1, sync_cnt=1 -> stale */ + KUNIT_EXPECT_EQ(test, 0, dev_uc_del(netdev, addr)); + + /* Snapshot: ADDR_A is stale (sync_cnt=1, refcount=1) */ + netif_addr_lock_bh(netdev); + __hw_addr_init(&snap); + __hw_addr_init(&ref); + __hw_addr_init(&cache); + KUNIT_EXPECT_EQ(test, 0, + __hw_addr_list_snapshot(&snap, &netdev->uc, ETH_ALEN, + &cache)); + KUNIT_EXPECT_EQ(test, 0, + __hw_addr_list_snapshot(&ref, &netdev->uc, ETH_ALEN, + &cache)); + netif_addr_unlock_bh(netdev); + + /* Driver unsyncs stale ADDR_A from hardware */ + dev_addr_test_reset(netdev); + __hw_addr_sync_dev(&snap, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + KUNIT_EXPECT_EQ(test, 0, datp->addr_synced); + KUNIT_EXPECT_EQ(test, 1 << ADDR_A, datp->addr_unsynced); + + /* Concurrent: user re-adds ADDR_A. dev_uc_add finds the existing + * stale entry and bumps refcount from 1 -> 2. sync_cnt stays 1. + */ + KUNIT_EXPECT_EQ(test, 0, dev_uc_add(netdev, addr)); + KUNIT_EXPECT_EQ(test, 1, netdev->uc.count); + + /* Reconcile: ref sync_cnt=1 matches real sync_cnt=1, delta=-1 + * applied. Result: sync_cnt=0, refcount=1 (fresh). + */ + netif_addr_lock_bh(netdev); + __hw_addr_list_reconcile(&netdev->uc, &snap, &ref, ETH_ALEN, + &cache); + netif_addr_unlock_bh(netdev); + + /* Entry survives as fresh: needs re-sync to HW */ + KUNIT_EXPECT_EQ(test, 1, netdev->uc.count); + ha = list_first_entry(&netdev->uc.list, struct netdev_hw_addr, list); + KUNIT_EXPECT_MEMEQ(test, ha->addr, addr, ETH_ALEN); + KUNIT_EXPECT_EQ(test, 0, ha->sync_cnt); + KUNIT_EXPECT_EQ(test, 1, ha->refcount); + + /* Second work run: fresh entry gets synced to HW */ + dev_addr_test_reset(netdev); + __hw_addr_sync_dev(&netdev->uc, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + KUNIT_EXPECT_EQ(test, 1 << ADDR_A, datp->addr_synced); + KUNIT_EXPECT_EQ(test, 0, datp->addr_unsynced); + + __hw_addr_flush(&cache); + rtnl_unlock(); +} + +/* Snapshot test: ADDR_A is new (synced by driver), and independent ADDR_B + * is concurrently removed from the real list. A's sync delta propagates + * normally; B's absence doesn't interfere. + */ +static void dev_addr_test_snapshot_add_and_remove(struct kunit *test) +{ + struct netdev_hw_addr_list snap, ref, cache; + struct net_device *netdev = test->priv; + struct dev_addr_test_priv *datp; + struct netdev_hw_addr *ha; + u8 addr[ETH_ALEN]; + + datp = netdev_priv(netdev); + + rtnl_lock(); + + /* Add ADDR_A and ADDR_B (will be synced then removed) */ + memset(addr, ADDR_A, sizeof(addr)); + KUNIT_EXPECT_EQ(test, 0, dev_uc_add(netdev, addr)); + memset(addr, ADDR_B, sizeof(addr)); + KUNIT_EXPECT_EQ(test, 0, dev_uc_add(netdev, addr)); + + /* Sync both to hardware: sync_cnt=1, refcount=2 */ + __hw_addr_sync_dev(&netdev->uc, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + + /* Add ADDR_C (new, will be synced by snapshot) */ + memset(addr, ADDR_C, sizeof(addr)); + KUNIT_EXPECT_EQ(test, 0, dev_uc_add(netdev, addr)); + + /* Snapshot: A,B synced (sync_cnt=1,refcount=2); C new (0,1) */ + netif_addr_lock_bh(netdev); + __hw_addr_init(&snap); + __hw_addr_init(&ref); + __hw_addr_init(&cache); + KUNIT_EXPECT_EQ(test, 0, + __hw_addr_list_snapshot(&snap, &netdev->uc, ETH_ALEN, + &cache)); + KUNIT_EXPECT_EQ(test, 0, + __hw_addr_list_snapshot(&ref, &netdev->uc, ETH_ALEN, + &cache)); + netif_addr_unlock_bh(netdev); + + /* Driver syncs snapshot: ADDR_C is new -> synced; A,B already synced */ + dev_addr_test_reset(netdev); + __hw_addr_sync_dev(&snap, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + KUNIT_EXPECT_EQ(test, 1 << ADDR_C, datp->addr_synced); + KUNIT_EXPECT_EQ(test, 0, datp->addr_unsynced); + + /* Concurrent: user removes addr B while driver was working */ + memset(addr, ADDR_B, sizeof(addr)); + KUNIT_EXPECT_EQ(test, 0, dev_uc_del(netdev, addr)); + + /* Reconcile: ADDR_C's delta=+1 applied to real list. + * ADDR_B's delta=0 (unchanged in snapshot), + * so nothing to apply to ADDR_B. + */ + netif_addr_lock_bh(netdev); + __hw_addr_list_reconcile(&netdev->uc, &snap, &ref, ETH_ALEN, + &cache); + netif_addr_unlock_bh(netdev); + + /* ADDR_A: unchanged (sync_cnt=1, refcount=2) + * ADDR_B: refcount went from 2->1 via dev_uc_del (still present, stale) + * ADDR_C: sync propagated (sync_cnt=1, refcount=2) + */ + KUNIT_EXPECT_EQ(test, 3, netdev->uc.count); + netdev_hw_addr_list_for_each(ha, &netdev->uc) { + u8 id = ha->addr[0]; + + if (!memchr_inv(ha->addr, id, ETH_ALEN)) { + if (id == ADDR_A) { + KUNIT_EXPECT_EQ(test, 1, ha->sync_cnt); + KUNIT_EXPECT_EQ(test, 2, ha->refcount); + } else if (id == ADDR_B) { + /* B: still present but now stale */ + KUNIT_EXPECT_EQ(test, 1, ha->sync_cnt); + KUNIT_EXPECT_EQ(test, 1, ha->refcount); + } else if (id == ADDR_C) { + KUNIT_EXPECT_EQ(test, 1, ha->sync_cnt); + KUNIT_EXPECT_EQ(test, 2, ha->refcount); + } + } + } + + /* Second work run: ADDR_B is stale, gets unsynced and removed */ + dev_addr_test_reset(netdev); + __hw_addr_sync_dev(&netdev->uc, netdev, dev_addr_test_sync, + dev_addr_test_unsync); + KUNIT_EXPECT_EQ(test, 0, datp->addr_synced); + KUNIT_EXPECT_EQ(test, 1 << ADDR_B, datp->addr_unsynced); + KUNIT_EXPECT_EQ(test, 2, netdev->uc.count); + + __hw_addr_flush(&cache); + rtnl_unlock(); +} + +static void dev_addr_test_snapshot_benchmark(struct kunit *test) +{ + struct net_device *netdev = test->priv; + struct netdev_hw_addr_list snap, cache; + u8 addr[ETH_ALEN]; + s64 duration = 0; + ktime_t start; + int i, iter; + + rtnl_lock(); + + for (i = 0; i < 1024; i++) { + memset(addr, 0, sizeof(addr)); + addr[0] = (i >> 8) & 0xff; + addr[1] = i & 0xff; + KUNIT_EXPECT_EQ(test, 0, dev_uc_add(netdev, addr)); + } + + __hw_addr_init(&cache); + + for (iter = 0; iter < 1000; iter++) { + netif_addr_lock_bh(netdev); + __hw_addr_init(&snap); + + start = ktime_get(); + KUNIT_EXPECT_EQ(test, 0, + __hw_addr_list_snapshot(&snap, &netdev->uc, + ETH_ALEN, &cache)); + duration += ktime_to_ns(ktime_sub(ktime_get(), start)); + + netif_addr_unlock_bh(netdev); + __hw_addr_flush(&snap); + } + + __hw_addr_flush(&cache); + + kunit_info(test, + "1024 addrs x 1000 snapshots: %lld ns total, %lld ns/iter", + duration, div_s64(duration, 1000)); + + rtnl_unlock(); +} + static struct kunit_case dev_addr_test_cases[] = { KUNIT_CASE(dev_addr_test_basic), KUNIT_CASE(dev_addr_test_sync_one), @@ -232,6 +609,11 @@ static struct kunit_case dev_addr_test_cases[] = { KUNIT_CASE(dev_addr_test_del_main), KUNIT_CASE(dev_addr_test_add_set), KUNIT_CASE(dev_addr_test_add_excl), + KUNIT_CASE(dev_addr_test_snapshot_sync), + KUNIT_CASE(dev_addr_test_snapshot_remove_during_sync), + KUNIT_CASE(dev_addr_test_snapshot_readd_during_unsync), + KUNIT_CASE(dev_addr_test_snapshot_add_and_remove), + KUNIT_CASE_SLOW(dev_addr_test_snapshot_benchmark), {} }; @@ -243,5 +625,6 @@ static struct kunit_suite dev_addr_test_suite = { }; kunit_test_suite(dev_addr_test_suite); +MODULE_IMPORT_NS("EXPORTED_FOR_KUNIT_TESTING"); MODULE_DESCRIPTION("KUnit tests for struct netdev_hw_addr_list"); MODULE_LICENSE("GPL"); diff --git a/net/core/dev_api.c b/net/core/dev_api.c index f28852078aa6..437947dd08ed 100644 --- a/net/core/dev_api.c +++ b/net/core/dev_api.c @@ -66,6 +66,7 @@ int dev_change_flags(struct net_device *dev, unsigned int flags, netdev_lock_ops(dev); ret = netif_change_flags(dev, flags, extack); + netif_rx_mode_sync(dev); netdev_unlock_ops(dev); return ret; @@ -285,6 +286,7 @@ int dev_set_promiscuity(struct net_device *dev, int inc) netdev_lock_ops(dev); ret = netif_set_promiscuity(dev, inc); + netif_rx_mode_sync(dev); netdev_unlock_ops(dev); return ret; @@ -311,6 +313,7 @@ int dev_set_allmulti(struct net_device *dev, int inc) netdev_lock_ops(dev); ret = netif_set_allmulti(dev, inc, true); + netif_rx_mode_sync(dev); netdev_unlock_ops(dev); return ret; diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index 7a8966544c9d..f3979b276090 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -586,24 +586,26 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, void __user *data, return err; case SIOCADDMULTI: - if (!ops->ndo_set_rx_mode || + if ((!ops->ndo_set_rx_mode && !ops->ndo_set_rx_mode_async) || ifr->ifr_hwaddr.sa_family != AF_UNSPEC) return -EINVAL; if (!netif_device_present(dev)) return -ENODEV; netdev_lock_ops(dev); err = dev_mc_add_global(dev, ifr->ifr_hwaddr.sa_data); + netif_rx_mode_sync(dev); netdev_unlock_ops(dev); return err; case SIOCDELMULTI: - if (!ops->ndo_set_rx_mode || + if ((!ops->ndo_set_rx_mode && !ops->ndo_set_rx_mode_async) || ifr->ifr_hwaddr.sa_family != AF_UNSPEC) return -EINVAL; if (!netif_device_present(dev)) return -ENODEV; netdev_lock_ops(dev); err = dev_mc_del_global(dev, ifr->ifr_hwaddr.sa_data); + netif_rx_mode_sync(dev); netdev_unlock_ops(dev); return err; diff --git a/net/core/filter.c b/net/core/filter.c index 5fa9189eb772..80a3b702a2d4 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5396,7 +5396,7 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk, int optname, if (val <= 0) return -EINVAL; tp->snd_cwnd_clamp = val; - tp->snd_ssthresh = val; + WRITE_ONCE(tp->snd_ssthresh, val); break; case TCP_BPF_DELACK_MAX: timeout = usecs_to_jiffies(val); diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 1b61bb25ba0e..2a98f5fa74eb 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -1374,16 +1374,13 @@ proto_again: break; } - /* least significant bit of the most significant octet - * indicates if protocol field was compressed + /* PFC (compressed 1-byte protocol) frames are not processed. + * A compressed protocol field has the least significant bit of + * the most significant octet set, which will fail the following + * ppp_proto_is_valid(), returning FLOW_DISSECT_RET_OUT_BAD. */ ppp_proto = ntohs(hdr->proto); - if (ppp_proto & 0x0100) { - ppp_proto = ppp_proto >> 8; - nhoff += PPPOE_SES_HLEN - 1; - } else { - nhoff += PPPOE_SES_HLEN; - } + nhoff += PPPOE_SES_HLEN; if (ppp_proto == PPP_IP) { proto = htons(ETH_P_IP); diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 69daba3ddaf0..b613bb6e07df 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3431,6 +3431,7 @@ errout: dev->name); } + netif_rx_mode_sync(dev); netdev_unlock_ops(dev); return err; diff --git a/net/dsa/conduit.c b/net/dsa/conduit.c index a1b044467bd6..8398d72d7e4d 100644 --- a/net/dsa/conduit.c +++ b/net/dsa/conduit.c @@ -27,9 +27,7 @@ static int dsa_conduit_get_regs_len(struct net_device *dev) int len; if (ops && ops->get_regs_len) { - netdev_lock_ops(dev); len = ops->get_regs_len(dev); - netdev_unlock_ops(dev); if (len < 0) return len; ret += len; @@ -60,15 +58,11 @@ static void dsa_conduit_get_regs(struct net_device *dev, int len; if (ops && ops->get_regs_len && ops->get_regs) { - netdev_lock_ops(dev); len = ops->get_regs_len(dev); - if (len < 0) { - netdev_unlock_ops(dev); + if (len < 0) return; - } regs->len = len; ops->get_regs(dev, regs, data); - netdev_unlock_ops(dev); data += regs->len; } @@ -115,10 +109,8 @@ static void dsa_conduit_get_ethtool_stats(struct net_device *dev, int count, mcount = 0; if (ops && ops->get_sset_count && ops->get_ethtool_stats) { - netdev_lock_ops(dev); mcount = ops->get_sset_count(dev, ETH_SS_STATS); ops->get_ethtool_stats(dev, stats, data); - netdev_unlock_ops(dev); } list_for_each_entry(dp, &dst->ports, list) { @@ -149,10 +141,8 @@ static void dsa_conduit_get_ethtool_phy_stats(struct net_device *dev, if (count >= 0) phy_ethtool_get_stats(dev->phydev, stats, data); } else if (ops && ops->get_sset_count && ops->get_ethtool_phy_stats) { - netdev_lock_ops(dev); count = ops->get_sset_count(dev, ETH_SS_PHY_STATS); ops->get_ethtool_phy_stats(dev, stats, data); - netdev_unlock_ops(dev); } if (count < 0) @@ -176,13 +166,11 @@ static int dsa_conduit_get_sset_count(struct net_device *dev, int sset) struct dsa_switch_tree *dst = cpu_dp->dst; int count = 0; - netdev_lock_ops(dev); if (sset == ETH_SS_PHY_STATS && dev->phydev && (!ops || !ops->get_ethtool_phy_stats)) count = phy_ethtool_get_sset_count(dev->phydev); else if (ops && ops->get_sset_count) count = ops->get_sset_count(dev, sset); - netdev_unlock_ops(dev); if (count < 0) count = 0; @@ -239,7 +227,6 @@ static void dsa_conduit_get_strings(struct net_device *dev, u32 stringset, struct dsa_switch_tree *dst = cpu_dp->dst; int count, mcount = 0; - netdev_lock_ops(dev); if (stringset == ETH_SS_PHY_STATS && dev->phydev && !ops->get_ethtool_phy_stats) { mcount = phy_ethtool_get_sset_count(dev->phydev); @@ -253,7 +240,6 @@ static void dsa_conduit_get_strings(struct net_device *dev, u32 stringset, mcount = 0; ops->get_strings(dev, stringset, data); } - netdev_unlock_ops(dev); list_for_each_entry(dp, &dst->ports, list) { if (!dsa_port_is_dsa(dp) && !dsa_port_is_cpu(dp)) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 2f4fac22d1ab..7eeff658b467 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -64,6 +64,7 @@ #include <linux/jiffies.h> #include <linux/kernel.h> #include <linux/fcntl.h> +#include <linux/nospec.h> #include <linux/socket.h> #include <linux/in.h> #include <linux/inet.h> @@ -371,7 +372,9 @@ static int icmp_glue_bits(void *from, char *to, int offset, int len, int odd, to, len); skb->csum = csum_block_add(skb->csum, csum, odd); - if (icmp_pointers[icmp_param->data.icmph.type].error) + if (icmp_param->data.icmph.type <= NR_ICMP_TYPES && + icmp_pointers[array_index_nospec(icmp_param->data.icmph.type, + NR_ICMP_TYPES + 1)].error) nf_ct_attach(skb, icmp_param->skb); return 0; } diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 4ac3ae1bc1af..928654c34156 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1479,16 +1479,19 @@ void inet_csk_listen_stop(struct sock *sk) if (nreq) { refcount_set(&nreq->rsk_refcnt, 1); + rcu_read_lock(); if (inet_csk_reqsk_queue_add(nsk, nreq, child)) { __NET_INC_STATS(sock_net(nsk), LINUX_MIB_TCPMIGRATEREQSUCCESS); reqsk_migrate_reset(req); + READ_ONCE(nsk->sk_data_ready)(nsk); } else { __NET_INC_STATS(sock_net(nsk), LINUX_MIB_TCPMIGRATEREQFAILURE); reqsk_migrate_reset(nreq); __reqsk_free(nreq); } + rcu_read_unlock(); /* inet_csk_reqsk_queue_add() has already * called inet_child_forget() on failure case. diff --git a/net/ipv4/netfilter/iptable_nat.c b/net/ipv4/netfilter/iptable_nat.c index a5db7c67d61b..625a1ca13b1b 100644 --- a/net/ipv4/netfilter/iptable_nat.c +++ b/net/ipv4/netfilter/iptable_nat.c @@ -79,7 +79,7 @@ static int ipt_nat_register_lookups(struct net *net) while (i) nf_nat_ipv4_unregister_fn(net, &ops[--i]); - kfree(ops); + kfree_rcu(ops, rcu); return ret; } } @@ -100,7 +100,7 @@ static void ipt_nat_unregister_lookups(struct net *net) for (i = 0; i < ARRAY_SIZE(nf_nat_ipv4_ops); i++) nf_nat_ipv4_unregister_fn(net, &ops[i]); - kfree(ops); + kfree_rcu(ops, rcu); } static int iptable_nat_table_init(struct net *net) diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c index 904a060a7330..f92fcc39fc4c 100644 --- a/net/ipv4/nexthop.c +++ b/net/ipv4/nexthop.c @@ -2469,10 +2469,10 @@ static int replace_nexthop_single(struct net *net, struct nexthop *old, goto err_notify; } - /* When replacing an IPv4 nexthop with an IPv6 nexthop, potentially + /* When replacing a nexthop with one of a different family, potentially * update IPv4 indication in all the groups using the nexthop. */ - if (oldi->family == AF_INET && newi->family == AF_INET6) { + if (oldi->family != newi->family) { list_for_each_entry(nhge, &old->grp_list, nh_list) { struct nexthop *nhp = nhge->nh_parent; struct nh_group *nhg; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 2014a6408e93..432fa28e47d4 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3424,7 +3424,7 @@ int tcp_disconnect(struct sock *sk, int flags) icsk->icsk_rto = TCP_TIMEOUT_INIT; WRITE_ONCE(icsk->icsk_rto_min, TCP_RTO_MIN); WRITE_ONCE(icsk->icsk_delack_max, TCP_DELACK_MAX); - tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; + WRITE_ONCE(tp->snd_ssthresh, TCP_INFINITE_SSTHRESH); tcp_snd_cwnd_set(tp, TCP_INIT_CWND); tp->snd_cwnd_cnt = 0; tp->is_cwnd_limited = 0; @@ -3622,7 +3622,8 @@ static void tcp_enable_tx_delay(struct sock *sk, int val) if (delta && sk->sk_state == TCP_ESTABLISHED) { s64 srtt = (s64)tp->srtt_us + delta; - tp->srtt_us = clamp_t(s64, srtt, 1, ~0U); + WRITE_ONCE(tp->srtt_us, + clamp_t(s64, srtt, 1, ~0U)); /* Note: does not deal with non zero icsk_backoff */ tcp_set_rto(sk); @@ -4190,12 +4191,18 @@ static void tcp_get_info_chrono_stats(const struct tcp_sock *tp, struct tcp_info *info) { u64 stats[__TCP_CHRONO_MAX], total = 0; - enum tcp_chrono i; + enum tcp_chrono i, cur; + /* Following READ_ONCE()s pair with WRITE_ONCE()s in tcp_chrono_set(). + * This is because socket lock might not be owned by us at this point. + * This is best effort, tcp_get_timestamping_opt_stats() can + * see wrong values. A real fix would be too costly for TCP fast path. + */ + cur = READ_ONCE(tp->chrono_type); for (i = TCP_CHRONO_BUSY; i < __TCP_CHRONO_MAX; ++i) { - stats[i] = tp->chrono_stat[i - 1]; - if (i == tp->chrono_type) - stats[i] += tcp_jiffies32 - tp->chrono_start; + stats[i] = READ_ONCE(tp->chrono_stat[i - 1]); + if (i == cur) + stats[i] += tcp_jiffies32 - READ_ONCE(tp->chrono_start); stats[i] *= USEC_PER_SEC / HZ; total += stats[i]; } @@ -4427,9 +4434,9 @@ struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk, nla_put_u64_64bit(stats, TCP_NLA_SNDBUF_LIMITED, info.tcpi_sndbuf_limited, TCP_NLA_PAD); nla_put_u64_64bit(stats, TCP_NLA_DATA_SEGS_OUT, - tp->data_segs_out, TCP_NLA_PAD); + READ_ONCE(tp->data_segs_out), TCP_NLA_PAD); nla_put_u64_64bit(stats, TCP_NLA_TOTAL_RETRANS, - tp->total_retrans, TCP_NLA_PAD); + READ_ONCE(tp->total_retrans), TCP_NLA_PAD); rate = READ_ONCE(sk->sk_pacing_rate); rate64 = (rate != ~0UL) ? rate : ~0ULL; @@ -4438,37 +4445,42 @@ struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk, rate64 = tcp_compute_delivery_rate(tp); nla_put_u64_64bit(stats, TCP_NLA_DELIVERY_RATE, rate64, TCP_NLA_PAD); - nla_put_u32(stats, TCP_NLA_SND_CWND, tcp_snd_cwnd(tp)); - nla_put_u32(stats, TCP_NLA_REORDERING, tp->reordering); - nla_put_u32(stats, TCP_NLA_MIN_RTT, tcp_min_rtt(tp)); + nla_put_u32(stats, TCP_NLA_SND_CWND, READ_ONCE(tp->snd_cwnd)); + nla_put_u32(stats, TCP_NLA_REORDERING, READ_ONCE(tp->reordering)); + nla_put_u32(stats, TCP_NLA_MIN_RTT, data_race(tcp_min_rtt(tp))); nla_put_u8(stats, TCP_NLA_RECUR_RETRANS, READ_ONCE(inet_csk(sk)->icsk_retransmits)); - nla_put_u8(stats, TCP_NLA_DELIVERY_RATE_APP_LMT, !!tp->rate_app_limited); - nla_put_u32(stats, TCP_NLA_SND_SSTHRESH, tp->snd_ssthresh); - nla_put_u32(stats, TCP_NLA_DELIVERED, tp->delivered); - nla_put_u32(stats, TCP_NLA_DELIVERED_CE, tp->delivered_ce); - - nla_put_u32(stats, TCP_NLA_SNDQ_SIZE, tp->write_seq - tp->snd_una); + nla_put_u8(stats, TCP_NLA_DELIVERY_RATE_APP_LMT, data_race(!!tp->rate_app_limited)); + nla_put_u32(stats, TCP_NLA_SND_SSTHRESH, READ_ONCE(tp->snd_ssthresh)); + nla_put_u32(stats, TCP_NLA_DELIVERED, READ_ONCE(tp->delivered)); + nla_put_u32(stats, TCP_NLA_DELIVERED_CE, READ_ONCE(tp->delivered_ce)); + + nla_put_u32(stats, TCP_NLA_SNDQ_SIZE, + max_t(int, 0, + READ_ONCE(tp->write_seq) - READ_ONCE(tp->snd_una))); nla_put_u8(stats, TCP_NLA_CA_STATE, inet_csk(sk)->icsk_ca_state); - nla_put_u64_64bit(stats, TCP_NLA_BYTES_SENT, tp->bytes_sent, - TCP_NLA_PAD); - nla_put_u64_64bit(stats, TCP_NLA_BYTES_RETRANS, tp->bytes_retrans, + nla_put_u64_64bit(stats, TCP_NLA_BYTES_SENT, READ_ONCE(tp->bytes_sent), TCP_NLA_PAD); - nla_put_u32(stats, TCP_NLA_DSACK_DUPS, tp->dsack_dups); - nla_put_u32(stats, TCP_NLA_REORD_SEEN, tp->reord_seen); - nla_put_u32(stats, TCP_NLA_SRTT, tp->srtt_us >> 3); - nla_put_u16(stats, TCP_NLA_TIMEOUT_REHASH, tp->timeout_rehash); + nla_put_u64_64bit(stats, TCP_NLA_BYTES_RETRANS, + READ_ONCE(tp->bytes_retrans), TCP_NLA_PAD); + nla_put_u32(stats, TCP_NLA_DSACK_DUPS, READ_ONCE(tp->dsack_dups)); + nla_put_u32(stats, TCP_NLA_REORD_SEEN, READ_ONCE(tp->reord_seen)); + nla_put_u32(stats, TCP_NLA_SRTT, READ_ONCE(tp->srtt_us) >> 3); + nla_put_u16(stats, TCP_NLA_TIMEOUT_REHASH, + READ_ONCE(tp->timeout_rehash)); nla_put_u32(stats, TCP_NLA_BYTES_NOTSENT, - max_t(int, 0, tp->write_seq - tp->snd_nxt)); + max_t(int, 0, + READ_ONCE(tp->write_seq) - READ_ONCE(tp->snd_nxt))); nla_put_u64_64bit(stats, TCP_NLA_EDT, orig_skb->skb_mstamp_ns, TCP_NLA_PAD); if (ack_skb) nla_put_u8(stats, TCP_NLA_TTL, tcp_skb_ttl_or_hop_limit(ack_skb)); - nla_put_u32(stats, TCP_NLA_REHASH, tp->plb_rehash + tp->timeout_rehash); + nla_put_u32(stats, TCP_NLA_REHASH, + READ_ONCE(tp->plb_rehash) + READ_ONCE(tp->timeout_rehash)); return stats; } diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 1ddc20a399b0..aec7805b1d37 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -897,8 +897,8 @@ static void bbr_check_drain(struct sock *sk, const struct rate_sample *rs) if (bbr->mode == BBR_STARTUP && bbr_full_bw_reached(sk)) { bbr->mode = BBR_DRAIN; /* drain queue we created */ - tcp_sk(sk)->snd_ssthresh = - bbr_inflight(sk, bbr_max_bw(sk), BBR_UNIT); + WRITE_ONCE(tcp_sk(sk)->snd_ssthresh, + bbr_inflight(sk, bbr_max_bw(sk), BBR_UNIT)); } /* fall through to check if in-flight is already small: */ if (bbr->mode == BBR_DRAIN && bbr_packets_in_net_at_edt(sk, tcp_packets_in_flight(tcp_sk(sk))) <= @@ -1043,7 +1043,7 @@ __bpf_kfunc static void bbr_init(struct sock *sk) struct bbr *bbr = inet_csk_ca(sk); bbr->prior_cwnd = 0; - tp->snd_ssthresh = TCP_INFINITE_SSTHRESH; + WRITE_ONCE(tp->snd_ssthresh, TCP_INFINITE_SSTHRESH); bbr->rtt_cnt = 0; bbr->next_rtt_delivered = tp->delivered; bbr->prev_ca_state = TCP_CA_Open; diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c index 58358bf92e1b..65444ff14241 100644 --- a/net/ipv4/tcp_bic.c +++ b/net/ipv4/tcp_bic.c @@ -74,7 +74,7 @@ static void bictcp_init(struct sock *sk) bictcp_reset(ca); if (initial_ssthresh) - tcp_sk(sk)->snd_ssthresh = initial_ssthresh; + WRITE_ONCE(tcp_sk(sk)->snd_ssthresh, initial_ssthresh); } /* diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c index ceabfd690a29..0812c390aee5 100644 --- a/net/ipv4/tcp_cdg.c +++ b/net/ipv4/tcp_cdg.c @@ -162,7 +162,7 @@ static void tcp_cdg_hystart_update(struct sock *sk) NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPHYSTARTTRAINCWND, tcp_snd_cwnd(tp)); - tp->snd_ssthresh = tcp_snd_cwnd(tp); + WRITE_ONCE(tp->snd_ssthresh, tcp_snd_cwnd(tp)); return; } } @@ -181,7 +181,7 @@ static void tcp_cdg_hystart_update(struct sock *sk) NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPHYSTARTDELAYCWND, tcp_snd_cwnd(tp)); - tp->snd_ssthresh = tcp_snd_cwnd(tp); + WRITE_ONCE(tp->snd_ssthresh, tcp_snd_cwnd(tp)); } } } diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index ab78b5ae8d0e..119bf8cbb007 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -136,7 +136,7 @@ __bpf_kfunc static void cubictcp_init(struct sock *sk) bictcp_hystart_reset(sk); if (!hystart && initial_ssthresh) - tcp_sk(sk)->snd_ssthresh = initial_ssthresh; + WRITE_ONCE(tcp_sk(sk)->snd_ssthresh, initial_ssthresh); } __bpf_kfunc static void cubictcp_cwnd_event_tx_start(struct sock *sk) @@ -420,7 +420,7 @@ static void hystart_update(struct sock *sk, u32 delay) NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPHYSTARTTRAINCWND, tcp_snd_cwnd(tp)); - tp->snd_ssthresh = tcp_snd_cwnd(tp); + WRITE_ONCE(tp->snd_ssthresh, tcp_snd_cwnd(tp)); } } } @@ -440,7 +440,7 @@ static void hystart_update(struct sock *sk, u32 delay) NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPHYSTARTDELAYCWND, tcp_snd_cwnd(tp)); - tp->snd_ssthresh = tcp_snd_cwnd(tp); + WRITE_ONCE(tp->snd_ssthresh, tcp_snd_cwnd(tp)); } } } diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index 96c99999e09d..274e628e7cf8 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -177,7 +177,7 @@ static void dctcp_react_to_loss(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); ca->loss_cwnd = tcp_snd_cwnd(tp); - tp->snd_ssthresh = max(tcp_snd_cwnd(tp) >> 1U, 2U); + WRITE_ONCE(tp->snd_ssthresh, max(tcp_snd_cwnd(tp) >> 1U, 2U)); } __bpf_kfunc static void dctcp_state(struct sock *sk, u8 new_state) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 021f745747c5..d5c9e65d9760 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -476,14 +476,14 @@ static bool tcp_accecn_process_option(struct tcp_sock *tp, static void tcp_count_delivered_ce(struct tcp_sock *tp, u32 ecn_count) { - tp->delivered_ce += ecn_count; + WRITE_ONCE(tp->delivered_ce, tp->delivered_ce + ecn_count); } /* Updates the delivered and delivered_ce counts */ static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered, bool ece_ack) { - tp->delivered += delivered; + WRITE_ONCE(tp->delivered, tp->delivered + delivered); if (tcp_ecn_mode_rfc3168(tp) && ece_ack) tcp_count_delivered_ce(tp, delivered); } @@ -1132,7 +1132,7 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us) tcp_bpf_rtt(sk, mrtt_us, srtt); } - tp->srtt_us = max(1U, srtt); + WRITE_ONCE(tp->srtt_us, max(1U, srtt)); } void tcp_update_pacing_rate(struct sock *sk) @@ -1246,7 +1246,7 @@ static u32 tcp_dsack_seen(struct tcp_sock *tp, u32 start_seq, else if (tp->tlp_high_seq && tp->tlp_high_seq == end_seq) state->flag |= FLAG_DSACK_TLP; - tp->dsack_dups += dup_segs; + WRITE_ONCE(tp->dsack_dups, tp->dsack_dups + dup_segs); /* Skip the DSACK if dup segs weren't retransmitted by sender */ if (tp->dsack_dups > tp->total_retrans) return 0; @@ -1293,12 +1293,13 @@ static void tcp_check_sack_reordering(struct sock *sk, const u32 low_seq, tp->sacked_out, tp->undo_marker ? tp->undo_retrans : 0); #endif - tp->reordering = min_t(u32, (metric + mss - 1) / mss, - READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_max_reordering)); + WRITE_ONCE(tp->reordering, + min_t(u32, (metric + mss - 1) / mss, + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_max_reordering))); } /* This exciting event is worth to be remembered. 8) */ - tp->reord_seen++; + WRITE_ONCE(tp->reord_seen, tp->reord_seen + 1); NET_INC_STATS(sock_net(sk), ts ? LINUX_MIB_TCPTSREORDER : LINUX_MIB_TCPSACKREORDER); } @@ -2439,9 +2440,10 @@ static void tcp_check_reno_reordering(struct sock *sk, const int addend) if (!tcp_limit_reno_sacked(tp)) return; - tp->reordering = min_t(u32, tp->packets_out + addend, - READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_max_reordering)); - tp->reord_seen++; + WRITE_ONCE(tp->reordering, + min_t(u32, tp->packets_out + addend, + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_max_reordering))); + WRITE_ONCE(tp->reord_seen, tp->reord_seen + 1); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRENOREORDER); } @@ -2565,7 +2567,7 @@ void tcp_enter_loss(struct sock *sk) (icsk->icsk_ca_state == TCP_CA_Loss && !icsk->icsk_retransmits)) { tp->prior_ssthresh = tcp_current_ssthresh(sk); tp->prior_cwnd = tcp_snd_cwnd(tp); - tp->snd_ssthresh = icsk->icsk_ca_ops->ssthresh(sk); + WRITE_ONCE(tp->snd_ssthresh, icsk->icsk_ca_ops->ssthresh(sk)); tcp_ca_event(sk, CA_EVENT_LOSS); tcp_init_undo(tp); } @@ -2579,8 +2581,8 @@ void tcp_enter_loss(struct sock *sk) reordering = READ_ONCE(net->ipv4.sysctl_tcp_reordering); if (icsk->icsk_ca_state <= TCP_CA_Disorder && tp->sacked_out >= reordering) - tp->reordering = min_t(unsigned int, tp->reordering, - reordering); + WRITE_ONCE(tp->reordering, + min_t(unsigned int, tp->reordering, reordering)); tcp_set_ca_state(sk, TCP_CA_Loss); tp->high_seq = tp->snd_nxt; @@ -2858,7 +2860,7 @@ static void tcp_undo_cwnd_reduction(struct sock *sk, bool unmark_loss) tcp_snd_cwnd_set(tp, icsk->icsk_ca_ops->undo_cwnd(sk)); if (tp->prior_ssthresh > tp->snd_ssthresh) { - tp->snd_ssthresh = tp->prior_ssthresh; + WRITE_ONCE(tp->snd_ssthresh, tp->prior_ssthresh); tcp_ecn_withdraw_cwr(tp); } } @@ -2976,7 +2978,7 @@ static void tcp_init_cwnd_reduction(struct sock *sk) tp->prior_cwnd = tcp_snd_cwnd(tp); tp->prr_delivered = 0; tp->prr_out = 0; - tp->snd_ssthresh = inet_csk(sk)->icsk_ca_ops->ssthresh(sk); + WRITE_ONCE(tp->snd_ssthresh, inet_csk(sk)->icsk_ca_ops->ssthresh(sk)); tcp_ecn_queue_cwr(tp); } @@ -3118,7 +3120,7 @@ static void tcp_non_congestion_loss_retransmit(struct sock *sk) if (icsk->icsk_ca_state != TCP_CA_Loss) { tp->high_seq = tp->snd_nxt; - tp->snd_ssthresh = tcp_current_ssthresh(sk); + WRITE_ONCE(tp->snd_ssthresh, tcp_current_ssthresh(sk)); tp->prior_ssthresh = 0; tp->undo_marker = 0; tcp_set_ca_state(sk, TCP_CA_Loss); @@ -3910,7 +3912,7 @@ static void tcp_snd_una_update(struct tcp_sock *tp, u32 ack) sock_owned_by_me((struct sock *)tp); tp->bytes_acked += delta; tcp_snd_sne_update(tp, ack); - tp->snd_una = ack; + WRITE_ONCE(tp->snd_una, ack); } static void tcp_rcv_sne_update(struct tcp_sock *tp, u32 seq) @@ -4284,11 +4286,15 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) goto old_ack; } - /* If the ack includes data we haven't sent yet, discard - * this segment (RFC793 Section 3.9). + /* If the ack includes data we haven't sent yet, drop the + * segment. RFC 793 Section 3.9 and RFC 5961 Section 5.2 + * require us to send an ACK back in that case. */ - if (after(ack, tp->snd_nxt)) + if (after(ack, tp->snd_nxt)) { + if (!(flag & FLAG_NO_CHALLENGE_ACK)) + tcp_send_challenge_ack(sk, false); return -SKB_DROP_REASON_TCP_ACK_UNSENT_DATA; + } if (after(ack, prior_snd_una)) { flag |= FLAG_SND_UNA_ADVANCED; @@ -6777,7 +6783,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack, NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPFASTOPENACTIVE); /* SYN-data is counted as two separate packets in tcp_ack() */ if (tp->delivered > 1) - --tp->delivered; + WRITE_ONCE(tp->delivered, tp->delivered - 1); } tcp_fastopen_add_skb(sk, synack); @@ -7210,7 +7216,7 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) SKB_DR_SET(reason, NOT_SPECIFIED); switch (sk->sk_state) { case TCP_SYN_RECV: - tp->delivered++; /* SYN-ACK delivery isn't tracked in tcp_ack */ + WRITE_ONCE(tp->delivered, tp->delivered + 1); /* SYN-ACK delivery isn't tracked in tcp_ack */ if (!tp->srtt_us) tcp_synack_rtt_meas(sk, req); @@ -7238,7 +7244,7 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) if (sk->sk_socket) sk_wake_async(sk, SOCK_WAKE_IO, POLL_OUT); - tp->snd_una = TCP_SKB_CB(skb)->ack_seq; + WRITE_ONCE(tp->snd_una, TCP_SKB_CB(skb)->ack_seq); tp->snd_wnd = ntohs(th->window) << tp->rx_opt.snd_wscale; tcp_init_wl(tp, TCP_SKB_CB(skb)->seq); diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 06b1d5d3b6df..dc0c081fc1f3 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -490,13 +490,13 @@ void tcp_init_metrics(struct sock *sk) val = READ_ONCE(net->ipv4.sysctl_tcp_no_ssthresh_metrics_save) ? 0 : tcp_metric_get(tm, TCP_METRIC_SSTHRESH); if (val) { - tp->snd_ssthresh = val; + WRITE_ONCE(tp->snd_ssthresh, val); if (tp->snd_ssthresh > tp->snd_cwnd_clamp) - tp->snd_ssthresh = tp->snd_cwnd_clamp; + WRITE_ONCE(tp->snd_ssthresh, tp->snd_cwnd_clamp); } val = tcp_metric_get(tm, TCP_METRIC_REORDERING); if (val && tp->reordering != val) - tp->reordering = val; + WRITE_ONCE(tp->reordering, val); crtt = tcp_metric_get(tm, TCP_METRIC_RTT); rcu_read_unlock(); diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c index a60662f4bdf9..f345897a68df 100644 --- a/net/ipv4/tcp_nv.c +++ b/net/ipv4/tcp_nv.c @@ -396,8 +396,8 @@ static void tcpnv_acked(struct sock *sk, const struct ack_sample *sample) /* We have enough data to determine we are congested */ ca->nv_allow_cwnd_growth = 0; - tp->snd_ssthresh = - (nv_ssthresh_factor * max_win) >> 3; + WRITE_ONCE(tp->snd_ssthresh, + (nv_ssthresh_factor * max_win) >> 3); if (tcp_snd_cwnd(tp) - max_win > 2) { /* gap > 2, we do exponential cwnd decrease */ int dec; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 8e99687526a6..f9d8755705f7 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -171,7 +171,7 @@ void tcp_cwnd_restart(struct sock *sk, s32 delta) tcp_ca_event(sk, CA_EVENT_CWND_RESTART); - tp->snd_ssthresh = tcp_current_ssthresh(sk); + WRITE_ONCE(tp->snd_ssthresh, tcp_current_ssthresh(sk)); restart_cwnd = min(restart_cwnd, cwnd); while ((delta -= inet_csk(sk)->icsk_rto) > 0 && cwnd > restart_cwnd) @@ -1688,8 +1688,10 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, if (skb->len != tcp_header_size) { tcp_event_data_sent(tp, sk); - tp->data_segs_out += tcp_skb_pcount(skb); - tp->bytes_sent += skb->len - tcp_header_size; + WRITE_ONCE(tp->data_segs_out, + tp->data_segs_out + tcp_skb_pcount(skb)); + WRITE_ONCE(tp->bytes_sent, + tp->bytes_sent + skb->len - tcp_header_size); } if (after(tcb->end_seq, tp->snd_nxt) || tcb->seq == tcb->end_seq) @@ -2142,7 +2144,7 @@ static void tcp_cwnd_application_limited(struct sock *sk) u32 init_win = tcp_init_cwnd(tp, __sk_dst_get(sk)); u32 win_used = max(tp->snd_cwnd_used, init_win); if (win_used < tcp_snd_cwnd(tp)) { - tp->snd_ssthresh = tcp_current_ssthresh(sk); + WRITE_ONCE(tp->snd_ssthresh, tcp_current_ssthresh(sk)); tcp_snd_cwnd_set(tp, (tcp_snd_cwnd(tp) + win_used) >> 1); } tp->snd_cwnd_used = 0; @@ -3642,8 +3644,8 @@ start: TCP_ADD_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS, segs); if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN) __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNRETRANS); - tp->total_retrans += segs; - tp->bytes_retrans += skb->len; + WRITE_ONCE(tp->total_retrans, tp->total_retrans + segs); + WRITE_ONCE(tp->bytes_retrans, tp->bytes_retrans + skb->len); /* make sure skb->data is aligned on arches that require it * and check if ack-trimming & collapsing extended the headroom @@ -4152,7 +4154,7 @@ static void tcp_connect_init(struct sock *sk) tp->snd_wnd = 0; tcp_init_wl(tp, 0); tcp_write_queue_purge(sk); - tp->snd_una = tp->write_seq; + WRITE_ONCE(tp->snd_una, tp->write_seq); tp->snd_sml = tp->write_seq; tp->snd_up = tp->write_seq; WRITE_ONCE(tp->snd_nxt, tp->write_seq); @@ -4646,7 +4648,8 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req) * However in this case, we are dealing with a passive fastopen * socket thus we can change total_retrans value. */ - tcp_sk_rw(sk)->total_retrans++; + WRITE_ONCE(tcp_sk_rw(sk)->total_retrans, + tcp_sk_rw(sk)->total_retrans + 1); } trace_tcp_retransmit_synack(sk, req); WRITE_ONCE(req->num_retrans, req->num_retrans + 1); diff --git a/net/ipv4/tcp_plb.c b/net/ipv4/tcp_plb.c index 68ccdb9a5412..c11a0cd3f8fe 100644 --- a/net/ipv4/tcp_plb.c +++ b/net/ipv4/tcp_plb.c @@ -80,7 +80,7 @@ void tcp_plb_check_rehash(struct sock *sk, struct tcp_plb_state *plb) sk_rethink_txhash(sk); plb->consec_cong_rounds = 0; - tcp_sk(sk)->plb_rehash++; + WRITE_ONCE(tcp_sk(sk)->plb_rehash, tcp_sk(sk)->plb_rehash + 1); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPPLBREHASH); } EXPORT_SYMBOL_GPL(tcp_plb_check_rehash); diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index ea99988795e7..8d791a954cd6 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -297,7 +297,7 @@ static int tcp_write_timeout(struct sock *sk) } if (sk_rethink_txhash(sk)) { - tp->timeout_rehash++; + WRITE_ONCE(tp->timeout_rehash, tp->timeout_rehash + 1); __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPTIMEOUTREHASH); } diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c index 950a66966059..574453af6bc0 100644 --- a/net/ipv4/tcp_vegas.c +++ b/net/ipv4/tcp_vegas.c @@ -245,7 +245,8 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked) */ tcp_snd_cwnd_set(tp, min(tcp_snd_cwnd(tp), (u32)target_cwnd + 1)); - tp->snd_ssthresh = tcp_vegas_ssthresh(tp); + WRITE_ONCE(tp->snd_ssthresh, + tcp_vegas_ssthresh(tp)); } else if (tcp_in_slow_start(tp)) { /* Slow start. */ @@ -261,8 +262,8 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked) * we slow down. */ tcp_snd_cwnd_set(tp, tcp_snd_cwnd(tp) - 1); - tp->snd_ssthresh - = tcp_vegas_ssthresh(tp); + WRITE_ONCE(tp->snd_ssthresh, + tcp_vegas_ssthresh(tp)); } else if (diff < alpha) { /* We don't have enough extra packets * in the network, so speed up. @@ -280,7 +281,7 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked) else if (tcp_snd_cwnd(tp) > tp->snd_cwnd_clamp) tcp_snd_cwnd_set(tp, tp->snd_cwnd_clamp); - tp->snd_ssthresh = tcp_current_ssthresh(sk); + WRITE_ONCE(tp->snd_ssthresh, tcp_current_ssthresh(sk)); } /* Wipe the slate clean for the next RTT. */ diff --git a/net/ipv4/tcp_westwood.c b/net/ipv4/tcp_westwood.c index c6e97141eef2..b5a42adfd6ca 100644 --- a/net/ipv4/tcp_westwood.c +++ b/net/ipv4/tcp_westwood.c @@ -244,11 +244,11 @@ static void tcp_westwood_event(struct sock *sk, enum tcp_ca_event event) switch (event) { case CA_EVENT_COMPLETE_CWR: - tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk); + WRITE_ONCE(tp->snd_ssthresh, tcp_westwood_bw_rttmin(sk)); tcp_snd_cwnd_set(tp, tp->snd_ssthresh); break; case CA_EVENT_LOSS: - tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk); + WRITE_ONCE(tp->snd_ssthresh, tcp_westwood_bw_rttmin(sk)); /* Update RTT_min when next ack arrives */ w->reset_rtt_min = 1; break; diff --git a/net/ipv4/tcp_yeah.c b/net/ipv4/tcp_yeah.c index b22b3dccd05e..9e581154f18f 100644 --- a/net/ipv4/tcp_yeah.c +++ b/net/ipv4/tcp_yeah.c @@ -147,7 +147,8 @@ do_vegas: tcp_snd_cwnd_set(tp, max(tcp_snd_cwnd(tp), yeah->reno_count)); - tp->snd_ssthresh = tcp_snd_cwnd(tp); + WRITE_ONCE(tp->snd_ssthresh, + tcp_snd_cwnd(tp)); } if (yeah->reno_count <= 2) diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 799d9e9ac45d..efb23807a026 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -1104,7 +1104,6 @@ static int icmpv6_rcv(struct sk_buff *skb) struct net *net = dev_net_rcu(skb->dev); struct net_device *dev = icmp6_dev(skb); struct inet6_dev *idev = __in6_dev_get(dev); - const struct in6_addr *saddr, *daddr; struct icmp6hdr *hdr; u8 type; @@ -1135,12 +1134,10 @@ static int icmpv6_rcv(struct sk_buff *skb) __ICMP6_INC_STATS(dev_net_rcu(dev), idev, ICMP6_MIB_INMSGS); - saddr = &ipv6_hdr(skb)->saddr; - daddr = &ipv6_hdr(skb)->daddr; - if (skb_checksum_validate(skb, IPPROTO_ICMPV6, ip6_compute_pseudo)) { net_dbg_ratelimited("ICMPv6 checksum failed [%pI6c > %pI6c]\n", - saddr, daddr); + &ipv6_hdr(skb)->saddr, + &ipv6_hdr(skb)->daddr); goto csum_error; } @@ -1220,7 +1217,8 @@ static int icmpv6_rcv(struct sk_buff *skb) break; net_dbg_ratelimited("icmpv6: msg of unknown type [%pI6c > %pI6c]\n", - saddr, daddr); + &ipv6_hdr(skb)->saddr, + &ipv6_hdr(skb)->daddr); /* * error of unknown type. diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 46bc06506470..c468c83af0f2 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -62,6 +62,8 @@ MODULE_LICENSE("GPL"); MODULE_ALIAS_RTNL_LINK("ip6tnl"); MODULE_ALIAS_NETDEV("ip6tnl0"); +#define IP6_TUNNEL_MAX_DEST_TLVS 8 + #define IP6_TUNNEL_HASH_SIZE_SHIFT 5 #define IP6_TUNNEL_HASH_SIZE (1 << IP6_TUNNEL_HASH_SIZE_SHIFT) @@ -425,11 +427,15 @@ __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw) break; } if (nexthdr == NEXTHDR_DEST) { + int tlv_cnt = 0; u16 i = 2; while (1) { struct ipv6_tlv_tnl_enc_lim *tel; + if (unlikely(tlv_cnt++ >= IP6_TUNNEL_MAX_DEST_TLVS)) + break; + /* No more room for encapsulation limit */ if (i + sizeof(*tel) > optlen) break; diff --git a/net/ipv6/netfilter/ip6table_nat.c b/net/ipv6/netfilter/ip6table_nat.c index e119d4f090cc..5be723232df8 100644 --- a/net/ipv6/netfilter/ip6table_nat.c +++ b/net/ipv6/netfilter/ip6table_nat.c @@ -81,7 +81,7 @@ static int ip6t_nat_register_lookups(struct net *net) while (i) nf_nat_ipv6_unregister_fn(net, &ops[--i]); - kfree(ops); + kfree_rcu(ops, rcu); return ret; } } @@ -102,7 +102,7 @@ static void ip6t_nat_unregister_lookups(struct net *net) for (i = 0; i < ARRAY_SIZE(nf_nat_ipv6_ops); i++) nf_nat_ipv6_unregister_fn(net, &ops[i]); - kfree(ops); + kfree_rcu(ops, rcu); } static int ip6table_nat_table_init(struct net *net) diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c index 97b50d9b1365..9b64343ebad6 100644 --- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -746,7 +746,8 @@ static int seg6_build_state(struct net *net, struct nlattr *nla, newts->type = LWTUNNEL_ENCAP_SEG6; newts->flags |= LWTUNNEL_STATE_INPUT_REDIRECT; - if (tuninfo->mode != SEG6_IPTUN_MODE_L2ENCAP) + if (tuninfo->mode != SEG6_IPTUN_MODE_L2ENCAP && + tuninfo->mode != SEG6_IPTUN_MODE_L2ENCAP_RED) newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT; newts->headroom = seg6_lwt_headroom(tuninfo); diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c index 59d593bb5d18..1b210db3119e 100644 --- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -520,8 +520,10 @@ static int llc_ui_connect(struct socket *sock, struct sockaddr_unsized *uaddr, if (sk->sk_state == TCP_SYN_SENT) { const long timeo = sock_sndtimeo(sk, flags & O_NONBLOCK); - if (!timeo || !llc_ui_wait_for_conn(sk, timeo)) + if (!timeo || !llc_ui_wait_for_conn(sk, timeo)) { + rc = -EINPROGRESS; goto out; + } rc = sock_intr_errno(timeo); if (signal_pending(current)) diff --git a/net/mctp/route.c b/net/mctp/route.c index 26fb8c6bbad2..1f3dccbb7aed 100644 --- a/net/mctp/route.c +++ b/net/mctp/route.c @@ -441,6 +441,7 @@ static int mctp_dst_input(struct mctp_dst *dst, struct sk_buff *skb) unsigned long f; u8 tag, flags; int rc; + u8 ver; msk = NULL; rc = -EINVAL; @@ -467,7 +468,8 @@ static int mctp_dst_input(struct mctp_dst *dst, struct sk_buff *skb) netid = mctp_cb(skb)->net; skb_pull(skb, sizeof(struct mctp_hdr)); - if (mh->ver != 1) + ver = mh->ver & MCTP_HDR_VER_MASK; + if (ver < MCTP_VER_MIN || ver > MCTP_VER_MAX) goto out; flags = mh->flags_seq_tag & (MCTP_HDR_FLAG_SOM | MCTP_HDR_FLAG_EOM); @@ -1317,6 +1319,7 @@ static int mctp_pkttype_receive(struct sk_buff *skb, struct net_device *dev, struct mctp_dst dst; struct mctp_hdr *mh; int rc; + u8 ver; rcu_read_lock(); mdev = __mctp_dev_get(dev); @@ -1334,7 +1337,8 @@ static int mctp_pkttype_receive(struct sk_buff *skb, struct net_device *dev, /* We have enough for a header; decode and route */ mh = mctp_hdr(skb); - if (mh->ver < MCTP_VER_MIN || mh->ver > MCTP_VER_MAX) + ver = mh->ver & MCTP_HDR_VER_MASK; + if (ver < MCTP_VER_MIN || ver > MCTP_VER_MAX) goto err_drop; /* source must be valid unicast or null; drop reserved ranges and diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index fbffd3a43fe8..718e910ff23f 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3594,7 +3594,6 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk, * uses the correct data */ mptcp_copy_inaddrs(nsk, ssk); - __mptcp_propagate_sndbuf(nsk, ssk); mptcp_rcv_space_init(msk, ssk); msk->rcvq_space.time = mptcp_stamp(); @@ -4252,6 +4251,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, mptcp_graft_subflows(newsk); mptcp_rps_record_subflows(msk); + __mptcp_propagate_sndbuf(newsk, mptcp_subflow_tcp_sock(subflow)); /* Do late cleanup for the first subflow as necessary. Also * deal with bad peers not doing a complete shutdown. diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index 0fb5162992e5..ce542ed4b013 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -102,6 +102,18 @@ __ip_vs_dst_check(struct ip_vs_dest *dest) return dest_dst; } +/* Based on ip_exceeds_mtu(). */ +static bool ip_vs_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu) +{ + if (skb->len <= mtu) + return false; + + if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu)) + return false; + + return true; +} + static inline bool __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu) { @@ -111,10 +123,9 @@ __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu) */ if (IP6CB(skb)->frag_max_size > mtu) return true; /* largest fragment violate MTU */ - } - else if (skb->len > mtu && !skb_is_gso(skb)) { + } else if (ip_vs_exceeds_mtu(skb, mtu)) return true; /* Packet size violate MTU size */ - } + return false; } @@ -232,7 +243,7 @@ static inline bool ensure_mtu_is_adequate(struct netns_ipvs *ipvs, int skb_af, return true; if (unlikely(ip_hdr(skb)->frag_off & htons(IP_DF) && - skb->len > mtu && !skb_is_gso(skb) && + ip_vs_exceeds_mtu(skb, mtu) && !ip_vs_iph_icmp(ipvsh))) { icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); diff --git a/net/netfilter/nf_nat_amanda.c b/net/netfilter/nf_nat_amanda.c index 98deef6cde69..8f1054920a85 100644 --- a/net/netfilter/nf_nat_amanda.c +++ b/net/netfilter/nf_nat_amanda.c @@ -50,7 +50,7 @@ static unsigned int help(struct sk_buff *skb, return NF_DROP; } - sprintf(buffer, "%u", port); + snprintf(buffer, sizeof(buffer), "%u", port); if (!nf_nat_mangle_udp_packet(skb, exp->master, ctinfo, protoff, matchoff, matchlen, buffer, strlen(buffer))) { diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index 83b2b5e9759a..74ec224ce0d6 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -1222,9 +1222,11 @@ int nf_nat_register_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops, ret = nf_register_net_hooks(net, nat_ops, ops_count); if (ret < 0) { mutex_unlock(&nf_nat_proto_mutex); - for (i = 0; i < ops_count; i++) - kfree(nat_ops[i].priv); - kfree(nat_ops); + for (i = 0; i < ops_count; i++) { + priv = nat_ops[i].priv; + kfree_rcu(priv, rcu_head); + } + kfree_rcu(nat_ops, rcu); return ret; } @@ -1288,7 +1290,7 @@ void nf_nat_unregister_fn(struct net *net, u8 pf, const struct nf_hook_ops *ops, } nat_proto_net->nat_hook_ops = NULL; - kfree(nat_ops); + kfree_rcu(nat_ops, rcu); } unlock: mutex_unlock(&nf_nat_proto_mutex); diff --git a/net/netfilter/nf_nat_sip.c b/net/netfilter/nf_nat_sip.c index cf4aeb299bde..c845b6d1a2bd 100644 --- a/net/netfilter/nf_nat_sip.c +++ b/net/netfilter/nf_nat_sip.c @@ -68,25 +68,27 @@ static unsigned int mangle_packet(struct sk_buff *skb, unsigned int protoff, } static int sip_sprintf_addr(const struct nf_conn *ct, char *buffer, + size_t size, const union nf_inet_addr *addr, bool delim) { if (nf_ct_l3num(ct) == NFPROTO_IPV4) - return sprintf(buffer, "%pI4", &addr->ip); + return scnprintf(buffer, size, "%pI4", &addr->ip); else { if (delim) - return sprintf(buffer, "[%pI6c]", &addr->ip6); + return scnprintf(buffer, size, "[%pI6c]", &addr->ip6); else - return sprintf(buffer, "%pI6c", &addr->ip6); + return scnprintf(buffer, size, "%pI6c", &addr->ip6); } } static int sip_sprintf_addr_port(const struct nf_conn *ct, char *buffer, + size_t size, const union nf_inet_addr *addr, u16 port) { if (nf_ct_l3num(ct) == NFPROTO_IPV4) - return sprintf(buffer, "%pI4:%u", &addr->ip, port); + return scnprintf(buffer, size, "%pI4:%u", &addr->ip, port); else - return sprintf(buffer, "[%pI6c]:%u", &addr->ip6, port); + return scnprintf(buffer, size, "[%pI6c]:%u", &addr->ip6, port); } static int map_addr(struct sk_buff *skb, unsigned int protoff, @@ -119,7 +121,7 @@ static int map_addr(struct sk_buff *skb, unsigned int protoff, if (nf_inet_addr_cmp(&newaddr, addr) && newport == port) return 1; - buflen = sip_sprintf_addr_port(ct, buffer, &newaddr, ntohs(newport)); + buflen = sip_sprintf_addr_port(ct, buffer, sizeof(buffer), &newaddr, ntohs(newport)); return mangle_packet(skb, protoff, dataoff, dptr, datalen, matchoff, matchlen, buffer, buflen); } @@ -212,7 +214,7 @@ static unsigned int nf_nat_sip(struct sk_buff *skb, unsigned int protoff, &addr, true) > 0 && nf_inet_addr_cmp(&addr, &ct->tuplehash[dir].tuple.src.u3) && !nf_inet_addr_cmp(&addr, &ct->tuplehash[!dir].tuple.dst.u3)) { - buflen = sip_sprintf_addr(ct, buffer, + buflen = sip_sprintf_addr(ct, buffer, sizeof(buffer), &ct->tuplehash[!dir].tuple.dst.u3, true); if (!mangle_packet(skb, protoff, dataoff, dptr, datalen, @@ -229,7 +231,7 @@ static unsigned int nf_nat_sip(struct sk_buff *skb, unsigned int protoff, &addr, false) > 0 && nf_inet_addr_cmp(&addr, &ct->tuplehash[dir].tuple.dst.u3) && !nf_inet_addr_cmp(&addr, &ct->tuplehash[!dir].tuple.src.u3)) { - buflen = sip_sprintf_addr(ct, buffer, + buflen = sip_sprintf_addr(ct, buffer, sizeof(buffer), &ct->tuplehash[!dir].tuple.src.u3, false); if (!mangle_packet(skb, protoff, dataoff, dptr, datalen, @@ -247,7 +249,7 @@ static unsigned int nf_nat_sip(struct sk_buff *skb, unsigned int protoff, htons(n) == ct->tuplehash[dir].tuple.dst.u.udp.port && htons(n) != ct->tuplehash[!dir].tuple.src.u.udp.port) { __be16 p = ct->tuplehash[!dir].tuple.src.u.udp.port; - buflen = sprintf(buffer, "%u", ntohs(p)); + buflen = scnprintf(buffer, sizeof(buffer), "%u", ntohs(p)); if (!mangle_packet(skb, protoff, dataoff, dptr, datalen, poff, plen, buffer, buflen)) { nf_ct_helper_log(skb, ct, "cannot mangle rport"); @@ -418,7 +420,8 @@ static unsigned int nf_nat_sip_expect(struct sk_buff *skb, unsigned int protoff, if (!nf_inet_addr_cmp(&exp->tuple.dst.u3, &exp->saved_addr) || exp->tuple.dst.u.udp.port != exp->saved_proto.udp.port) { - buflen = sip_sprintf_addr_port(ct, buffer, &newaddr, port); + buflen = sip_sprintf_addr_port(ct, buffer, sizeof(buffer), + &newaddr, port); if (!mangle_packet(skb, protoff, dataoff, dptr, datalen, matchoff, matchlen, buffer, buflen)) { nf_ct_helper_log(skb, ct, "cannot mangle packet"); @@ -438,8 +441,8 @@ static int mangle_content_len(struct sk_buff *skb, unsigned int protoff, { enum ip_conntrack_info ctinfo; struct nf_conn *ct = nf_ct_get(skb, &ctinfo); + char buffer[sizeof("4294967295")]; unsigned int matchoff, matchlen; - char buffer[sizeof("65536")]; int buflen, c_len; /* Get actual SDP length */ @@ -454,7 +457,7 @@ static int mangle_content_len(struct sk_buff *skb, unsigned int protoff, &matchoff, &matchlen) <= 0) return 0; - buflen = sprintf(buffer, "%u", c_len); + buflen = scnprintf(buffer, sizeof(buffer), "%u", c_len); return mangle_packet(skb, protoff, dataoff, dptr, datalen, matchoff, matchlen, buffer, buflen); } @@ -491,7 +494,7 @@ static unsigned int nf_nat_sdp_addr(struct sk_buff *skb, unsigned int protoff, char buffer[INET6_ADDRSTRLEN]; unsigned int buflen; - buflen = sip_sprintf_addr(ct, buffer, addr, false); + buflen = sip_sprintf_addr(ct, buffer, sizeof(buffer), addr, false); if (mangle_sdp_packet(skb, protoff, dataoff, dptr, datalen, sdpoff, type, term, buffer, buflen)) return 0; @@ -509,7 +512,7 @@ static unsigned int nf_nat_sdp_port(struct sk_buff *skb, unsigned int protoff, char buffer[sizeof("nnnnn")]; unsigned int buflen; - buflen = sprintf(buffer, "%u", port); + buflen = scnprintf(buffer, sizeof(buffer), "%u", port); if (!mangle_packet(skb, protoff, dataoff, dptr, datalen, matchoff, matchlen, buffer, buflen)) return 0; @@ -529,7 +532,7 @@ static unsigned int nf_nat_sdp_session(struct sk_buff *skb, unsigned int protoff unsigned int buflen; /* Mangle session description owner and contact addresses */ - buflen = sip_sprintf_addr(ct, buffer, addr, false); + buflen = sip_sprintf_addr(ct, buffer, sizeof(buffer), addr, false); if (mangle_sdp_packet(skb, protoff, dataoff, dptr, datalen, sdpoff, SDP_HDR_OWNER, SDP_HDR_MEDIA, buffer, buflen)) return 0; diff --git a/net/netfilter/nfnetlink_osf.c b/net/netfilter/nfnetlink_osf.c index d64ce21c7b55..acb753ec5697 100644 --- a/net/netfilter/nfnetlink_osf.c +++ b/net/netfilter/nfnetlink_osf.c @@ -31,26 +31,18 @@ EXPORT_SYMBOL_GPL(nf_osf_fingers); static inline int nf_osf_ttl(const struct sk_buff *skb, int ttl_check, unsigned char f_ttl) { - struct in_device *in_dev = __in_dev_get_rcu(skb->dev); const struct iphdr *ip = ip_hdr(skb); - const struct in_ifaddr *ifa; - int ret = 0; - if (ttl_check == NF_OSF_TTL_TRUE) + switch (ttl_check) { + case NF_OSF_TTL_TRUE: return ip->ttl == f_ttl; - if (ttl_check == NF_OSF_TTL_NOCHECK) - return 1; - else if (ip->ttl <= f_ttl) + break; + case NF_OSF_TTL_NOCHECK: return 1; - - in_dev_for_each_ifa_rcu(ifa, in_dev) { - if (inet_ifa_match(ip->saddr, ifa)) { - ret = (ip->ttl == f_ttl); - break; - } + case NF_OSF_TTL_LESS: + default: + return ip->ttl <= f_ttl; } - - return ret; } struct nf_osf_hdr_ctx { @@ -64,9 +56,9 @@ struct nf_osf_hdr_ctx { static bool nf_osf_match_one(const struct sk_buff *skb, const struct nf_osf_user_finger *f, int ttl_check, - struct nf_osf_hdr_ctx *ctx) + const struct nf_osf_hdr_ctx *ctx) { - const __u8 *optpinit = ctx->optp; + const __u8 *optp = ctx->optp; unsigned int check_WSS = 0; int fmatch = FMATCH_WRONG; int foptsize, optnum; @@ -95,17 +87,17 @@ static bool nf_osf_match_one(const struct sk_buff *skb, check_WSS = f->wss.wc; for (optnum = 0; optnum < f->opt_num; ++optnum) { - if (f->opt[optnum].kind == *ctx->optp) { + if (f->opt[optnum].kind == *optp) { __u32 len = f->opt[optnum].length; - const __u8 *optend = ctx->optp + len; + const __u8 *optend = optp + len; fmatch = FMATCH_OK; - switch (*ctx->optp) { + switch (*optp) { case OSFOPT_MSS: - mss = ctx->optp[3]; + mss = optp[3]; mss <<= 8; - mss |= ctx->optp[2]; + mss |= optp[2]; mss = ntohs((__force __be16)mss); break; @@ -113,7 +105,7 @@ static bool nf_osf_match_one(const struct sk_buff *skb, break; } - ctx->optp = optend; + optp = optend; } else fmatch = FMATCH_OPT_WRONG; @@ -156,9 +148,6 @@ static bool nf_osf_match_one(const struct sk_buff *skb, } } - if (fmatch != FMATCH_OK) - ctx->optp = optpinit; - return fmatch == FMATCH_OK; } @@ -320,6 +309,10 @@ static int nfnl_osf_add_callback(struct sk_buff *skb, if (f->opt_num > ARRAY_SIZE(f->opt)) return -EINVAL; + if (f->wss.wc >= OSF_WSS_MAX || + (f->wss.wc == OSF_WSS_MODULO && f->wss.val == 0)) + return -EINVAL; + for (i = 0; i < f->opt_num; i++) { if (!f->opt[i].length || f->opt[i].length > MAX_IPOPTLEN) return -EINVAL; diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c index 18003433476c..c02d5cb52143 100644 --- a/net/netfilter/nft_osf.c +++ b/net/netfilter/nft_osf.c @@ -28,6 +28,11 @@ static void nft_osf_eval(const struct nft_expr *expr, struct nft_regs *regs, struct nf_osf_data data; struct tcphdr _tcph; + if (nft_pf(pkt) != NFPROTO_IPV4) { + regs->verdict.code = NFT_BREAK; + return; + } + if (pkt->tprot != IPPROTO_TCP) { regs->verdict.code = NFT_BREAK; return; @@ -114,7 +119,6 @@ static int nft_osf_validate(const struct nft_ctx *ctx, switch (ctx->family) { case NFPROTO_IPV4: - case NFPROTO_IPV6: case NFPROTO_INET: hooks = (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_PRE_ROUTING) | diff --git a/net/netfilter/xt_mac.c b/net/netfilter/xt_mac.c index 4798cd2ca26e..7fc5156825e4 100644 --- a/net/netfilter/xt_mac.c +++ b/net/netfilter/xt_mac.c @@ -36,25 +36,37 @@ static bool mac_mt(const struct sk_buff *skb, struct xt_action_param *par) return ret; } -static struct xt_match mac_mt_reg __read_mostly = { - .name = "mac", - .revision = 0, - .family = NFPROTO_UNSPEC, - .match = mac_mt, - .matchsize = sizeof(struct xt_mac_info), - .hooks = (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_IN) | - (1 << NF_INET_FORWARD), - .me = THIS_MODULE, +static struct xt_match mac_mt_reg[] __read_mostly = { + { + .name = "mac", + .family = NFPROTO_IPV4, + .match = mac_mt, + .matchsize = sizeof(struct xt_mac_info), + .hooks = (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_FORWARD), + .me = THIS_MODULE, + }, + { + .name = "mac", + .family = NFPROTO_IPV6, + .match = mac_mt, + .matchsize = sizeof(struct xt_mac_info), + .hooks = (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_FORWARD), + .me = THIS_MODULE, + }, }; static int __init mac_mt_init(void) { - return xt_register_match(&mac_mt_reg); + return xt_register_matches(mac_mt_reg, ARRAY_SIZE(mac_mt_reg)); } static void __exit mac_mt_exit(void) { - xt_unregister_match(&mac_mt_reg); + xt_unregister_matches(mac_mt_reg, ARRAY_SIZE(mac_mt_reg)); } module_init(mac_mt_init); diff --git a/net/netfilter/xt_owner.c b/net/netfilter/xt_owner.c index 5bfb4843df66..8f2e57b2a586 100644 --- a/net/netfilter/xt_owner.c +++ b/net/netfilter/xt_owner.c @@ -127,26 +127,39 @@ owner_mt(const struct sk_buff *skb, struct xt_action_param *par) return true; } -static struct xt_match owner_mt_reg __read_mostly = { - .name = "owner", - .revision = 1, - .family = NFPROTO_UNSPEC, - .checkentry = owner_check, - .match = owner_mt, - .matchsize = sizeof(struct xt_owner_match_info), - .hooks = (1 << NF_INET_LOCAL_OUT) | - (1 << NF_INET_POST_ROUTING), - .me = THIS_MODULE, +static struct xt_match owner_mt_reg[] __read_mostly = { + { + .name = "owner", + .revision = 1, + .family = NFPROTO_IPV4, + .checkentry = owner_check, + .match = owner_mt, + .matchsize = sizeof(struct xt_owner_match_info), + .hooks = (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING), + .me = THIS_MODULE, + }, + { + .name = "owner", + .revision = 1, + .family = NFPROTO_IPV6, + .checkentry = owner_check, + .match = owner_mt, + .matchsize = sizeof(struct xt_owner_match_info), + .hooks = (1 << NF_INET_LOCAL_OUT) | + (1 << NF_INET_POST_ROUTING), + .me = THIS_MODULE, + } }; static int __init owner_mt_init(void) { - return xt_register_match(&owner_mt_reg); + return xt_register_matches(owner_mt_reg, ARRAY_SIZE(owner_mt_reg)); } static void __exit owner_mt_exit(void) { - xt_unregister_match(&owner_mt_reg); + xt_unregister_matches(owner_mt_reg, ARRAY_SIZE(owner_mt_reg)); } module_init(owner_mt_init); diff --git a/net/netfilter/xt_physdev.c b/net/netfilter/xt_physdev.c index 53997771013f..d2b0b52434fa 100644 --- a/net/netfilter/xt_physdev.c +++ b/net/netfilter/xt_physdev.c @@ -137,24 +137,33 @@ static int physdev_mt_check(const struct xt_mtchk_param *par) return 0; } -static struct xt_match physdev_mt_reg __read_mostly = { - .name = "physdev", - .revision = 0, - .family = NFPROTO_UNSPEC, - .checkentry = physdev_mt_check, - .match = physdev_mt, - .matchsize = sizeof(struct xt_physdev_info), - .me = THIS_MODULE, +static struct xt_match physdev_mt_reg[] __read_mostly = { + { + .name = "physdev", + .family = NFPROTO_IPV4, + .checkentry = physdev_mt_check, + .match = physdev_mt, + .matchsize = sizeof(struct xt_physdev_info), + .me = THIS_MODULE, + }, + { + .name = "physdev", + .family = NFPROTO_IPV6, + .checkentry = physdev_mt_check, + .match = physdev_mt, + .matchsize = sizeof(struct xt_physdev_info), + .me = THIS_MODULE, + }, }; static int __init physdev_mt_init(void) { - return xt_register_match(&physdev_mt_reg); + return xt_register_matches(physdev_mt_reg, ARRAY_SIZE(physdev_mt_reg)); } static void __exit physdev_mt_exit(void) { - xt_unregister_match(&physdev_mt_reg); + xt_unregister_matches(physdev_mt_reg, ARRAY_SIZE(physdev_mt_reg)); } module_init(physdev_mt_init); diff --git a/net/netfilter/xt_realm.c b/net/netfilter/xt_realm.c index 6df485f4403d..61b2f1e58d15 100644 --- a/net/netfilter/xt_realm.c +++ b/net/netfilter/xt_realm.c @@ -33,7 +33,7 @@ static struct xt_match realm_mt_reg __read_mostly = { .matchsize = sizeof(struct xt_realm_info), .hooks = (1 << NF_INET_POST_ROUTING) | (1 << NF_INET_FORWARD) | (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_LOCAL_IN), - .family = NFPROTO_UNSPEC, + .family = NFPROTO_IPV4, .me = THIS_MODULE }; diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index e209099218b4..bbbde50fc649 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -2184,9 +2184,40 @@ error: return err; } +static size_t ovs_vport_cmd_msg_size(void) +{ + size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header)); + + msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_PORT_NO */ + msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_TYPE */ + msgsize += nla_total_size(IFNAMSIZ); /* OVS_VPORT_ATTR_NAME */ + msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_IFINDEX */ + msgsize += nla_total_size(sizeof(s32)); /* OVS_VPORT_ATTR_NETNSID */ + + /* OVS_VPORT_ATTR_STATS */ + msgsize += nla_total_size_64bit(sizeof(struct ovs_vport_stats)); + + /* OVS_VPORT_ATTR_UPCALL_STATS(OVS_VPORT_UPCALL_ATTR_SUCCESS + + * OVS_VPORT_UPCALL_ATTR_FAIL) + */ + msgsize += nla_total_size(nla_total_size_64bit(sizeof(u64)) + + nla_total_size_64bit(sizeof(u64))); + + /* OVS_VPORT_ATTR_UPCALL_PID */ + msgsize += nla_total_size(nr_cpu_ids * sizeof(u32)); + + /* OVS_VPORT_ATTR_OPTIONS(OVS_TUNNEL_ATTR_DST_PORT + + * OVS_TUNNEL_ATTR_EXTENSION(OVS_VXLAN_EXT_GBP)) + */ + msgsize += nla_total_size(nla_total_size(sizeof(u16)) + + nla_total_size(nla_total_size(0))); + + return msgsize; +} + static struct sk_buff *ovs_vport_cmd_alloc_info(void) { - return nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + return genlmsg_new(ovs_vport_cmd_msg_size(), GFP_KERNEL); } /* Called with ovs_mutex, only via ovs_dp_notify_wq(). */ @@ -2196,7 +2227,7 @@ struct sk_buff *ovs_vport_cmd_build_info(struct vport *vport, struct net *net, struct sk_buff *skb; int retval; - skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); + skb = ovs_vport_cmd_alloc_info(); if (!skb) return ERR_PTR(-ENOMEM); diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c index 23f629e94a36..56b2e2d1a749 100644 --- a/net/openvswitch/vport.c +++ b/net/openvswitch/vport.c @@ -406,6 +406,9 @@ int ovs_vport_set_upcall_portids(struct vport *vport, const struct nlattr *ids) if (!nla_len(ids) || nla_len(ids) % sizeof(u32)) return -EINVAL; + if (nla_len(ids) / sizeof(u32) > nr_cpu_ids) + return -EINVAL; + old = ovsl_dereference(vport->upcall_portids); vport_portids = kmalloc(sizeof(*vport_portids) + nla_len(ids), diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 4b043241fd56..8e6f3a734ba0 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2718,7 +2718,8 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) { struct sk_buff *skb = NULL; struct net_device *dev; - struct virtio_net_hdr *vnet_hdr = NULL; + struct virtio_net_hdr vnet_hdr; + bool has_vnet_hdr = false; struct sockcm_cookie sockc; __be16 proto; int err, reserve = 0; @@ -2819,16 +2820,20 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) hlen = LL_RESERVED_SPACE(dev); tlen = dev->needed_tailroom; if (vnet_hdr_sz) { - vnet_hdr = data; data += vnet_hdr_sz; tp_len -= vnet_hdr_sz; - if (tp_len < 0 || - __packet_snd_vnet_parse(vnet_hdr, tp_len)) { + if (tp_len < 0) { + tp_len = -EINVAL; + goto tpacket_error; + } + memcpy(&vnet_hdr, data - vnet_hdr_sz, sizeof(vnet_hdr)); + if (__packet_snd_vnet_parse(&vnet_hdr, tp_len)) { tp_len = -EINVAL; goto tpacket_error; } copylen = __virtio16_to_cpu(vio_le(), - vnet_hdr->hdr_len); + vnet_hdr.hdr_len); + has_vnet_hdr = true; } copylen = max_t(int, copylen, dev->hard_header_len); skb = sock_alloc_send_skb(&po->sk, @@ -2865,12 +2870,12 @@ tpacket_error: } } - if (vnet_hdr_sz) { - if (virtio_net_hdr_to_skb(skb, vnet_hdr, vio_le())) { + if (has_vnet_hdr) { + if (virtio_net_hdr_to_skb(skb, &vnet_hdr, vio_le())) { tp_len = -EINVAL; goto tpacket_error; } - virtio_net_hdr_set_proto(skb, vnet_hdr); + virtio_net_hdr_set_proto(skb, &vnet_hdr); } skb->destructor = tpacket_destruct_skb; diff --git a/net/rds/connection.c b/net/rds/connection.c index 412441aaa298..c10b7ed06c49 100644 --- a/net/rds/connection.c +++ b/net/rds/connection.c @@ -701,6 +701,13 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len, i++, head++) { hlist_for_each_entry_rcu(conn, head, c_hash_node) { + /* Zero the per-item buffer before handing it to the + * visitor so any field the visitor does not write - + * including implicit alignment padding - cannot leak + * stack contents to user space via rds_info_copy(). + */ + memset(buffer, 0, item_len); + /* XXX no c_lock usage.. */ if (!visitor(conn, buffer)) continue; @@ -750,6 +757,13 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len, */ cp = conn->c_path; + /* Zero the per-item buffer for the same reason as + * rds_for_each_conn_info(): any byte the visitor + * does not write (including alignment padding) must + * not leak stack contents via rds_info_copy(). + */ + memset(buffer, 0, item_len); + /* XXX no cp_lock usage.. */ if (!visitor(cp, buffer)) continue; diff --git a/net/rds/rdma.c b/net/rds/rdma.c index aa6465dc742c..61fb6e45281b 100644 --- a/net/rds/rdma.c +++ b/net/rds/rdma.c @@ -326,10 +326,6 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args, if (args->cookie_addr && put_user(cookie, (u64 __user *)(unsigned long)args->cookie_addr)) { - if (!need_odp) { - unpin_user_pages(pages, nr_pages); - kfree(sg); - } ret = -EFAULT; goto out; } diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index 96ecb83c9071..27c2aa2dd023 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -1486,7 +1486,6 @@ int rxrpc_server_keyring(struct rxrpc_sock *, sockptr_t, int); void rxrpc_kernel_data_consumed(struct rxrpc_call *, struct sk_buff *); void rxrpc_new_skb(struct sk_buff *, enum rxrpc_skb_trace); void rxrpc_see_skb(struct sk_buff *, enum rxrpc_skb_trace); -void rxrpc_eaten_skb(struct sk_buff *, enum rxrpc_skb_trace); void rxrpc_get_skb(struct sk_buff *, enum rxrpc_skb_trace); void rxrpc_free_skb(struct sk_buff *, enum rxrpc_skb_trace); void rxrpc_purge_queue(struct sk_buff_head *); diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index fec59d9338b9..fdd683261226 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -332,7 +332,25 @@ bool rxrpc_input_call_event(struct rxrpc_call *call) saw_ack |= sp->hdr.type == RXRPC_PACKET_TYPE_ACK; - rxrpc_input_call_packet(call, skb); + if (sp->hdr.type == RXRPC_PACKET_TYPE_DATA && + sp->hdr.securityIndex != 0 && + skb_cloned(skb)) { + /* Unshare the packet so that it can be + * modified by in-place decryption. + */ + struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC); + + if (nskb) { + rxrpc_new_skb(nskb, rxrpc_skb_new_unshared); + rxrpc_input_call_packet(call, nskb); + rxrpc_free_skb(nskb, rxrpc_skb_put_call_rx); + } else { + /* OOM - Drop the packet. */ + rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem); + } + } else { + rxrpc_input_call_packet(call, skb); + } rxrpc_free_skb(skb, rxrpc_skb_put_call_rx); did_receive = true; } diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c index 9a41ec708aeb..a2130d25aaa9 100644 --- a/net/rxrpc/conn_event.c +++ b/net/rxrpc/conn_event.c @@ -240,6 +240,33 @@ static void rxrpc_call_is_secure(struct rxrpc_call *call) rxrpc_notify_socket(call); } +static int rxrpc_verify_response(struct rxrpc_connection *conn, + struct sk_buff *skb) +{ + int ret; + + if (skb_cloned(skb)) { + /* Copy the packet if shared so that we can do in-place + * decryption. + */ + struct sk_buff *nskb = skb_copy(skb, GFP_NOFS); + + if (nskb) { + rxrpc_new_skb(nskb, rxrpc_skb_new_unshared); + ret = conn->security->verify_response(conn, nskb); + rxrpc_free_skb(nskb, rxrpc_skb_put_response_copy); + } else { + /* OOM - Drop the packet. */ + rxrpc_see_skb(skb, rxrpc_skb_see_unshare_nomem); + ret = -ENOMEM; + } + } else { + ret = conn->security->verify_response(conn, skb); + } + + return ret; +} + /* * connection-level Rx packet processor */ @@ -270,7 +297,7 @@ static int rxrpc_process_event(struct rxrpc_connection *conn, } spin_unlock_irq(&conn->state_lock); - ret = conn->security->verify_response(conn, skb); + ret = rxrpc_verify_response(conn, skb); if (ret < 0) return ret; @@ -362,7 +389,6 @@ again: static void rxrpc_do_process_connection(struct rxrpc_connection *conn) { struct sk_buff *skb; - int ret; if (test_and_clear_bit(RXRPC_CONN_EV_CHALLENGE, &conn->events)) rxrpc_secure_connection(conn); @@ -371,17 +397,8 @@ static void rxrpc_do_process_connection(struct rxrpc_connection *conn) * connection that each one has when we've finished with it */ while ((skb = skb_dequeue(&conn->rx_queue))) { rxrpc_see_skb(skb, rxrpc_skb_see_conn_work); - ret = rxrpc_process_event(conn, skb); - switch (ret) { - case -ENOMEM: - case -EAGAIN: - skb_queue_head(&conn->rx_queue, skb); - rxrpc_queue_conn(conn, rxrpc_conn_queue_retry_work); - break; - default: - rxrpc_free_skb(skb, rxrpc_skb_put_conn_work); - break; - } + rxrpc_process_event(conn, skb); + rxrpc_free_skb(skb, rxrpc_skb_put_conn_work); } } diff --git a/net/rxrpc/io_thread.c b/net/rxrpc/io_thread.c index 697956931925..dc5184a2fa9d 100644 --- a/net/rxrpc/io_thread.c +++ b/net/rxrpc/io_thread.c @@ -192,13 +192,12 @@ static bool rxrpc_extract_abort(struct sk_buff *skb) /* * Process packets received on the local endpoint */ -static bool rxrpc_input_packet(struct rxrpc_local *local, struct sk_buff **_skb) +static bool rxrpc_input_packet(struct rxrpc_local *local, struct sk_buff *skb) { struct rxrpc_connection *conn; struct sockaddr_rxrpc peer_srx; struct rxrpc_skb_priv *sp; struct rxrpc_peer *peer = NULL; - struct sk_buff *skb = *_skb; bool ret = false; skb_pull(skb, sizeof(struct udphdr)); @@ -244,25 +243,6 @@ static bool rxrpc_input_packet(struct rxrpc_local *local, struct sk_buff **_skb) return rxrpc_bad_message(skb, rxrpc_badmsg_zero_call); if (sp->hdr.seq == 0) return rxrpc_bad_message(skb, rxrpc_badmsg_zero_seq); - - /* Unshare the packet so that it can be modified for in-place - * decryption. - */ - if (sp->hdr.securityIndex != 0) { - skb = skb_unshare(skb, GFP_ATOMIC); - if (!skb) { - rxrpc_eaten_skb(*_skb, rxrpc_skb_eaten_by_unshare_nomem); - *_skb = NULL; - return just_discard; - } - - if (skb != *_skb) { - rxrpc_eaten_skb(*_skb, rxrpc_skb_eaten_by_unshare); - *_skb = skb; - rxrpc_new_skb(skb, rxrpc_skb_new_unshared); - sp = rxrpc_skb(skb); - } - } break; case RXRPC_PACKET_TYPE_CHALLENGE: @@ -494,7 +474,7 @@ int rxrpc_io_thread(void *data) switch (skb->mark) { case RXRPC_SKB_MARK_PACKET: skb->priority = 0; - if (!rxrpc_input_packet(local, &skb)) + if (!rxrpc_input_packet(local, skb)) rxrpc_reject_packet(local, skb); trace_rxrpc_rx_done(skb->mark, skb->priority); rxrpc_free_skb(skb, rxrpc_skb_put_input); diff --git a/net/rxrpc/key.c b/net/rxrpc/key.c index 6301d79ee35a..3ec3d89fdf14 100644 --- a/net/rxrpc/key.c +++ b/net/rxrpc/key.c @@ -502,6 +502,10 @@ static int rxrpc_preparse(struct key_preparsed_payload *prep) if (v1->security_index != RXRPC_SECURITY_RXKAD) goto error; + ret = -EKEYREJECTED; + if (v1->ticket_length > AFSTOKEN_RK_TIX_MAX) + goto error; + plen = sizeof(*token->kad) + v1->ticket_length; prep->quotalen += plen + sizeof(*token); diff --git a/net/rxrpc/rxgk_app.c b/net/rxrpc/rxgk_app.c index 30275cb5ba3e..0ef2a29eb695 100644 --- a/net/rxrpc/rxgk_app.c +++ b/net/rxrpc/rxgk_app.c @@ -214,7 +214,7 @@ int rxgk_extract_token(struct rxrpc_connection *conn, struct sk_buff *skb, ticket_len = ntohl(container.token_len); ticket_offset = token_offset + sizeof(container); - if (xdr_round_up(ticket_len) > token_len - sizeof(container)) + if (ticket_len > xdr_round_down(token_len - sizeof(container))) goto short_packet; _debug("KVNO %u", kvno); @@ -245,6 +245,7 @@ int rxgk_extract_token(struct rxrpc_connection *conn, struct sk_buff *skb, if (ret != -ENOMEM) return rxrpc_abort_conn(conn, skb, ec, ret, rxgk_abort_resp_tok_dec); + return ret; } ret = conn->security->default_decode_ticket(conn, skb, ticket_offset, diff --git a/net/rxrpc/rxgk_common.h b/net/rxrpc/rxgk_common.h index 80164d89e19c..1e257d7ab8ec 100644 --- a/net/rxrpc/rxgk_common.h +++ b/net/rxrpc/rxgk_common.h @@ -34,6 +34,7 @@ struct rxgk_context { }; #define xdr_round_up(x) (round_up((x), sizeof(__be32))) +#define xdr_round_down(x) (round_down((x), sizeof(__be32))) #define xdr_object_len(x) (4 + xdr_round_up(x)) /* diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c index eb7f2769d2b1..cba7935977f0 100644 --- a/net/rxrpc/rxkad.c +++ b/net/rxrpc/rxkad.c @@ -510,6 +510,9 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb, return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON, rxkad_abort_2_short_header); + /* Don't let the crypto algo see a misaligned length. */ + sp->len = round_down(sp->len, 8); + /* Decrypt the skbuff in-place. TODO: We really want to decrypt * directly into the target buffer. */ @@ -543,8 +546,10 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb, if (sg != _sg) kfree(sg); if (ret < 0) { - WARN_ON_ONCE(ret != -ENOMEM); - return ret; + if (ret == -ENOMEM) + return ret; + return rxrpc_abort_eproto(call, skb, RXKADSEALEDINCON, + rxkad_abort_2_crypto_unaligned); } /* Extract the decrypted packet length */ @@ -1136,7 +1141,7 @@ static int rxkad_verify_response(struct rxrpc_connection *conn, struct rxrpc_crypt session_key; struct key *server_key; time64_t expiry; - void *ticket; + void *ticket = NULL; u32 version, kvno, ticket_len, level; __be32 csum; int ret, i; @@ -1162,13 +1167,13 @@ static int rxkad_verify_response(struct rxrpc_connection *conn, ret = -ENOMEM; response = kzalloc_obj(struct rxkad_response, GFP_NOFS); if (!response) - goto temporary_error; + goto error; if (skb_copy_bits(skb, sizeof(struct rxrpc_wire_header), response, sizeof(*response)) < 0) { - rxrpc_abort_conn(conn, skb, RXKADPACKETSHORT, -EPROTO, - rxkad_abort_resp_short); - goto protocol_error; + ret = rxrpc_abort_conn(conn, skb, RXKADPACKETSHORT, -EPROTO, + rxkad_abort_resp_short); + goto error; } version = ntohl(response->version); @@ -1178,62 +1183,62 @@ static int rxkad_verify_response(struct rxrpc_connection *conn, trace_rxrpc_rx_response(conn, sp->hdr.serial, version, kvno, ticket_len); if (version != RXKAD_VERSION) { - rxrpc_abort_conn(conn, skb, RXKADINCONSISTENCY, -EPROTO, - rxkad_abort_resp_version); - goto protocol_error; + ret = rxrpc_abort_conn(conn, skb, RXKADINCONSISTENCY, -EPROTO, + rxkad_abort_resp_version); + goto error; } if (ticket_len < 4 || ticket_len > MAXKRB5TICKETLEN) { - rxrpc_abort_conn(conn, skb, RXKADTICKETLEN, -EPROTO, - rxkad_abort_resp_tkt_len); - goto protocol_error; + ret = rxrpc_abort_conn(conn, skb, RXKADTICKETLEN, -EPROTO, + rxkad_abort_resp_tkt_len); + goto error; } if (kvno >= RXKAD_TKT_TYPE_KERBEROS_V5) { - rxrpc_abort_conn(conn, skb, RXKADUNKNOWNKEY, -EPROTO, - rxkad_abort_resp_unknown_tkt); - goto protocol_error; + ret = rxrpc_abort_conn(conn, skb, RXKADUNKNOWNKEY, -EPROTO, + rxkad_abort_resp_unknown_tkt); + goto error; } /* extract the kerberos ticket and decrypt and decode it */ ret = -ENOMEM; ticket = kmalloc(ticket_len, GFP_NOFS); if (!ticket) - goto temporary_error_free_resp; + goto error; if (skb_copy_bits(skb, sizeof(struct rxrpc_wire_header) + sizeof(*response), ticket, ticket_len) < 0) { - rxrpc_abort_conn(conn, skb, RXKADPACKETSHORT, -EPROTO, - rxkad_abort_resp_short_tkt); - goto protocol_error; + ret = rxrpc_abort_conn(conn, skb, RXKADPACKETSHORT, -EPROTO, + rxkad_abort_resp_short_tkt); + goto error; } ret = rxkad_decrypt_ticket(conn, server_key, skb, ticket, ticket_len, &session_key, &expiry); if (ret < 0) - goto temporary_error_free_ticket; + goto error; /* use the session key from inside the ticket to decrypt the * response */ ret = rxkad_decrypt_response(conn, response, &session_key); if (ret < 0) - goto temporary_error_free_ticket; + goto error; if (ntohl(response->encrypted.epoch) != conn->proto.epoch || ntohl(response->encrypted.cid) != conn->proto.cid || ntohl(response->encrypted.securityIndex) != conn->security_ix) { - rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, - rxkad_abort_resp_bad_param); - goto protocol_error_free; + ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, + rxkad_abort_resp_bad_param); + goto error; } csum = response->encrypted.checksum; response->encrypted.checksum = 0; rxkad_calc_response_checksum(response); if (response->encrypted.checksum != csum) { - rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, - rxkad_abort_resp_bad_checksum); - goto protocol_error_free; + ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, + rxkad_abort_resp_bad_checksum); + goto error; } for (i = 0; i < RXRPC_MAXCALLS; i++) { @@ -1241,38 +1246,38 @@ static int rxkad_verify_response(struct rxrpc_connection *conn, u32 counter = READ_ONCE(conn->channels[i].call_counter); if (call_id > INT_MAX) { - rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, - rxkad_abort_resp_bad_callid); - goto protocol_error_free; + ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, + rxkad_abort_resp_bad_callid); + goto error; } if (call_id < counter) { - rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, - rxkad_abort_resp_call_ctr); - goto protocol_error_free; + ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, + rxkad_abort_resp_call_ctr); + goto error; } if (call_id > counter) { if (conn->channels[i].call) { - rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, + ret = rxrpc_abort_conn(conn, skb, RXKADSEALEDINCON, -EPROTO, rxkad_abort_resp_call_state); - goto protocol_error_free; + goto error; } conn->channels[i].call_counter = call_id; } } if (ntohl(response->encrypted.inc_nonce) != conn->rxkad.nonce + 1) { - rxrpc_abort_conn(conn, skb, RXKADOUTOFSEQUENCE, -EPROTO, - rxkad_abort_resp_ooseq); - goto protocol_error_free; + ret = rxrpc_abort_conn(conn, skb, RXKADOUTOFSEQUENCE, -EPROTO, + rxkad_abort_resp_ooseq); + goto error; } level = ntohl(response->encrypted.level); if (level > RXRPC_SECURITY_ENCRYPT) { - rxrpc_abort_conn(conn, skb, RXKADLEVELFAIL, -EPROTO, - rxkad_abort_resp_level); - goto protocol_error_free; + ret = rxrpc_abort_conn(conn, skb, RXKADLEVELFAIL, -EPROTO, + rxkad_abort_resp_level); + goto error; } conn->security_level = level; @@ -1280,31 +1285,12 @@ static int rxkad_verify_response(struct rxrpc_connection *conn, * this the connection security can be handled in exactly the same way * as for a client connection */ ret = rxrpc_get_server_data_key(conn, &session_key, expiry, kvno); - if (ret < 0) - goto temporary_error_free_ticket; - - kfree(ticket); - kfree(response); - _leave(" = 0"); - return 0; - -protocol_error_free: - kfree(ticket); -protocol_error: - kfree(response); - key_put(server_key); - return -EPROTO; -temporary_error_free_ticket: +error: kfree(ticket); -temporary_error_free_resp: kfree(response); -temporary_error: - /* Ignore the response packet if we got a temporary error such as - * ENOMEM. We just want to send the challenge again. Note that we - * also come out this way if the ticket decryption fails. - */ key_put(server_key); + _leave(" = %d", ret); return ret; } diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c index 3bcd6ee80396..e2169d1a14b5 100644 --- a/net/rxrpc/skbuff.c +++ b/net/rxrpc/skbuff.c @@ -47,15 +47,6 @@ void rxrpc_get_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) } /* - * Note the dropping of a ref on a socket buffer by the core. - */ -void rxrpc_eaten_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) -{ - int n = atomic_inc_return(&rxrpc_n_rx_skbs); - trace_rxrpc_skb(skb, 0, n, why); -} - -/* * Note the destruction of a socket buffer. */ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace why) diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 05e0b14b5773..2c5a7a321a94 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -354,7 +354,7 @@ static int tcf_blockcast_redir(struct sk_buff *skb, struct tcf_mirred *m, goto assign_prev; tcf_mirred_to_dev(skb, m, dev_prev, - dev_is_mac_header_xmit(dev), + dev_is_mac_header_xmit(dev_prev), mirred_eaction, retval); assign_prev: dev_prev = dev; diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c index ffea9fbd522d..02e1fa4577ae 100644 --- a/net/sched/sch_cake.c +++ b/net/sched/sch_cake.c @@ -619,7 +619,7 @@ static bool cake_update_flowkeys(struct flow_keys *keys, } port = rev ? tuple.src.u.all : tuple.dst.u.all; if (port != keys->ports.dst) { - port = keys->ports.dst; + keys->ports.dst = port; upd = true; } } diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c index fe6f5e889625..241e6a46bd00 100644 --- a/net/sched/sch_dualpi2.c +++ b/net/sched/sch_dualpi2.c @@ -868,11 +868,35 @@ static int dualpi2_change(struct Qdisc *sch, struct nlattr *opt, old_backlog = sch->qstats.backlog; while (qdisc_qlen(sch) > sch->limit || q->memory_used > q->memory_limit) { - struct sk_buff *skb = qdisc_dequeue_internal(sch, true); + struct sk_buff *skb = NULL; - q->memory_used -= skb->truesize; - qdisc_qstats_backlog_dec(sch, skb); - rtnl_qdisc_drop(skb, sch); + if (qdisc_qlen(sch) > qdisc_qlen(q->l_queue)) { + skb = qdisc_dequeue_internal(sch, true); + if (unlikely(!skb)) { + WARN_ON_ONCE(1); + break; + } + q->memory_used -= skb->truesize; + rtnl_qdisc_drop(skb, sch); + } else if (qdisc_qlen(q->l_queue)) { + skb = qdisc_dequeue_internal(q->l_queue, true); + if (unlikely(!skb)) { + WARN_ON_ONCE(1); + break; + } + /* L-queue packets are counted in both sch and + * l_queue on enqueue; qdisc_dequeue_internal() + * handled l_queue, so we further account for sch. + */ + --sch->q.qlen; + qdisc_qstats_backlog_dec(sch, skb); + q->memory_used -= skb->truesize; + rtnl_qdisc_drop(skb, q->l_queue); + qdisc_qstats_drop(sch); + } else { + WARN_ON_ONCE(1); + break; + } } qdisc_tree_reduce_backlog(sch, old_qlen - qdisc_qlen(sch), old_backlog - sch->qstats.backlog); diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 2a3d758f67ab..0664b2f2d6f2 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -585,6 +585,8 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d) }; struct list_head *pos; + sch_tree_lock(sch); + st.qdisc_stats.maxpacket = q->cstats.maxpacket; st.qdisc_stats.drop_overlimit = q->drop_overlimit; st.qdisc_stats.ecn_mark = q->cstats.ecn_mark; @@ -593,7 +595,6 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d) st.qdisc_stats.memory_usage = q->memory_usage; st.qdisc_stats.drop_overmemory = q->drop_overmemory; - sch_tree_lock(sch); list_for_each(pos, &q->new_flows) st.qdisc_stats.new_flows_len++; diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c index 95e5d9bfd9c8..96021f52d835 100644 --- a/net/sched/sch_hhf.c +++ b/net/sched/sch_hhf.c @@ -198,7 +198,8 @@ static struct hh_flow_state *seek_list(const u32 hash, return NULL; list_del(&flow->flowchain); kfree(flow); - q->hh_flows_current_cnt--; + WRITE_ONCE(q->hh_flows_current_cnt, + q->hh_flows_current_cnt - 1); } else if (flow->hash_id == hash) { return flow; } @@ -226,7 +227,7 @@ static struct hh_flow_state *alloc_new_hh(struct list_head *head, } if (q->hh_flows_current_cnt >= q->hh_flows_limit) { - q->hh_flows_overlimit++; + WRITE_ONCE(q->hh_flows_overlimit, q->hh_flows_overlimit + 1); return NULL; } /* Create new entry. */ @@ -234,7 +235,7 @@ static struct hh_flow_state *alloc_new_hh(struct list_head *head, if (!flow) return NULL; - q->hh_flows_current_cnt++; + WRITE_ONCE(q->hh_flows_current_cnt, q->hh_flows_current_cnt + 1); INIT_LIST_HEAD(&flow->flowchain); list_add_tail(&flow->flowchain, head); @@ -309,7 +310,7 @@ static enum wdrr_bucket_idx hhf_classify(struct sk_buff *skb, struct Qdisc *sch) return WDRR_BUCKET_FOR_NON_HH; flow->hash_id = hash; flow->hit_timestamp = now; - q->hh_flows_total_cnt++; + WRITE_ONCE(q->hh_flows_total_cnt, q->hh_flows_total_cnt + 1); /* By returning without updating counters in q->hhf_arrays, * we implicitly implement "shielding" (see Optimization O1). @@ -403,7 +404,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch, return NET_XMIT_SUCCESS; prev_backlog = sch->qstats.backlog; - q->drop_overlimit++; + WRITE_ONCE(q->drop_overlimit, q->drop_overlimit + 1); /* Return Congestion Notification only if we dropped a packet from this * bucket. */ @@ -686,10 +687,10 @@ static int hhf_dump_stats(struct Qdisc *sch, struct gnet_dump *d) { struct hhf_sched_data *q = qdisc_priv(sch); struct tc_hhf_xstats st = { - .drop_overlimit = q->drop_overlimit, - .hh_overlimit = q->hh_flows_overlimit, - .hh_tot_count = q->hh_flows_total_cnt, - .hh_cur_count = q->hh_flows_current_cnt, + .drop_overlimit = READ_ONCE(q->drop_overlimit), + .hh_overlimit = READ_ONCE(q->hh_flows_overlimit), + .hh_tot_count = READ_ONCE(q->hh_flows_total_cnt), + .hh_cur_count = READ_ONCE(q->hh_flows_current_cnt), }; return gnet_stats_copy_app(d, &st, sizeof(st)); diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c index 16f3f629cb8e..fb53fbf0e328 100644 --- a/net/sched/sch_pie.c +++ b/net/sched/sch_pie.c @@ -90,7 +90,7 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, bool enqueue = false; if (unlikely(qdisc_qlen(sch) >= sch->limit)) { - q->stats.overlimit++; + WRITE_ONCE(q->stats.overlimit, q->stats.overlimit + 1); goto out; } @@ -104,7 +104,7 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, /* If packet is ecn capable, mark it if drop probability * is lower than 10%, else drop it. */ - q->stats.ecn_mark++; + WRITE_ONCE(q->stats.ecn_mark, q->stats.ecn_mark + 1); enqueue = true; } @@ -114,15 +114,15 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (!q->params.dq_rate_estimator) pie_set_enqueue_time(skb); - q->stats.packets_in++; + WRITE_ONCE(q->stats.packets_in, q->stats.packets_in + 1); if (qdisc_qlen(sch) > q->stats.maxq) - q->stats.maxq = qdisc_qlen(sch); + WRITE_ONCE(q->stats.maxq, qdisc_qlen(sch)); return qdisc_enqueue_tail(skb, sch); } out: - q->stats.dropped++; + WRITE_ONCE(q->stats.dropped, q->stats.dropped + 1); q->vars.accu_prob = 0; return qdisc_drop_reason(skb, sch, to_free, reason); } @@ -267,11 +267,11 @@ void pie_process_dequeue(struct sk_buff *skb, struct pie_params *params, count = count / dtime; if (vars->avg_dq_rate == 0) - vars->avg_dq_rate = count; + WRITE_ONCE(vars->avg_dq_rate, count); else - vars->avg_dq_rate = + WRITE_ONCE(vars->avg_dq_rate, (vars->avg_dq_rate - - (vars->avg_dq_rate >> 3)) + (count >> 3); + (vars->avg_dq_rate >> 3)) + (count >> 3)); /* If the queue has receded below the threshold, we hold * on to the last drain rate calculated, else we reset @@ -381,7 +381,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars, if (delta > 0) { /* prevent overflow */ if (vars->prob < oldprob) { - vars->prob = MAX_PROB; + WRITE_ONCE(vars->prob, MAX_PROB); /* Prevent normalization error. If probability is at * maximum value already, we normalize it here, and * skip the check to do a non-linear drop in the next @@ -392,7 +392,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars, } else { /* prevent underflow */ if (vars->prob > oldprob) - vars->prob = 0; + WRITE_ONCE(vars->prob, 0); } /* Non-linear drop in probability: Reduce drop probability quickly if @@ -403,7 +403,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars, /* Reduce drop probability to 98.4% */ vars->prob -= vars->prob / 64; - vars->qdelay = qdelay; + WRITE_ONCE(vars->qdelay, qdelay); vars->backlog_old = backlog; /* We restart the measurement cycle if the following conditions are met @@ -502,21 +502,21 @@ static int pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d) struct pie_sched_data *q = qdisc_priv(sch); struct tc_pie_xstats st = { .prob = q->vars.prob << BITS_PER_BYTE, - .delay = ((u32)PSCHED_TICKS2NS(q->vars.qdelay)) / + .delay = ((u32)PSCHED_TICKS2NS(READ_ONCE(q->vars.qdelay))) / NSEC_PER_USEC, - .packets_in = q->stats.packets_in, - .overlimit = q->stats.overlimit, - .maxq = q->stats.maxq, - .dropped = q->stats.dropped, - .ecn_mark = q->stats.ecn_mark, + .packets_in = READ_ONCE(q->stats.packets_in), + .overlimit = READ_ONCE(q->stats.overlimit), + .maxq = READ_ONCE(q->stats.maxq), + .dropped = READ_ONCE(q->stats.dropped), + .ecn_mark = READ_ONCE(q->stats.ecn_mark), }; /* avg_dq_rate is only valid if dq_rate_estimator is enabled */ st.dq_rate_estimating = q->params.dq_rate_estimator; /* unscale and return dq_rate in bytes per sec */ - if (q->params.dq_rate_estimator) - st.avg_dq_rate = q->vars.avg_dq_rate * + if (st.dq_rate_estimating) + st.avg_dq_rate = READ_ONCE(q->vars.avg_dq_rate) * (PSCHED_TICKS_PER_SEC) >> PIE_SCALE; return gnet_stats_copy_app(d, &st, sizeof(st)); diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c index c8d3d09f15e3..432b8a3000a5 100644 --- a/net/sched/sch_red.c +++ b/net/sched/sch_red.c @@ -90,17 +90,20 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch, case RED_PROB_MARK: qdisc_qstats_overlimit(sch); if (!red_use_ecn(q)) { - q->stats.prob_drop++; + WRITE_ONCE(q->stats.prob_drop, + q->stats.prob_drop + 1); goto congestion_drop; } if (INET_ECN_set_ce(skb)) { - q->stats.prob_mark++; + WRITE_ONCE(q->stats.prob_mark, + q->stats.prob_mark + 1); skb = tcf_qevent_handle(&q->qe_mark, sch, skb, to_free, &ret); if (!skb) return NET_XMIT_CN | ret; } else if (!red_use_nodrop(q)) { - q->stats.prob_drop++; + WRITE_ONCE(q->stats.prob_drop, + q->stats.prob_drop + 1); goto congestion_drop; } @@ -111,17 +114,20 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch, reason = QDISC_DROP_OVERLIMIT; qdisc_qstats_overlimit(sch); if (red_use_harddrop(q) || !red_use_ecn(q)) { - q->stats.forced_drop++; + WRITE_ONCE(q->stats.forced_drop, + q->stats.forced_drop + 1); goto congestion_drop; } if (INET_ECN_set_ce(skb)) { - q->stats.forced_mark++; + WRITE_ONCE(q->stats.forced_mark, + q->stats.forced_mark + 1); skb = tcf_qevent_handle(&q->qe_mark, sch, skb, to_free, &ret); if (!skb) return NET_XMIT_CN | ret; } else if (!red_use_nodrop(q)) { - q->stats.forced_drop++; + WRITE_ONCE(q->stats.forced_drop, + q->stats.forced_drop + 1); goto congestion_drop; } @@ -135,7 +141,8 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch, sch->qstats.backlog += len; sch->q.qlen++; } else if (net_xmit_drop_count(ret)) { - q->stats.pdrop++; + WRITE_ONCE(q->stats.pdrop, + q->stats.pdrop + 1); qdisc_qstats_drop(sch); } return ret; @@ -463,9 +470,13 @@ static int red_dump_stats(struct Qdisc *sch, struct gnet_dump *d) dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED, &hw_stats_request); } - st.early = q->stats.prob_drop + q->stats.forced_drop; - st.pdrop = q->stats.pdrop; - st.marked = q->stats.prob_mark + q->stats.forced_mark; + st.early = READ_ONCE(q->stats.prob_drop) + + READ_ONCE(q->stats.forced_drop); + + st.pdrop = READ_ONCE(q->stats.pdrop); + + st.marked = READ_ONCE(q->stats.prob_mark) + + READ_ONCE(q->stats.forced_mark); return gnet_stats_copy_app(d, &st, sizeof(st)); } diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c index 013738662128..bd5ef561030f 100644 --- a/net/sched/sch_sfb.c +++ b/net/sched/sch_sfb.c @@ -130,7 +130,7 @@ static void increment_one_qlen(u32 sfbhash, u32 slot, struct sfb_sched_data *q) sfbhash >>= SFB_BUCKET_SHIFT; if (b[hash].qlen < 0xFFFF) - b[hash].qlen++; + WRITE_ONCE(b[hash].qlen, b[hash].qlen + 1); b += SFB_NUMBUCKETS; /* next level */ } } @@ -159,7 +159,7 @@ static void decrement_one_qlen(u32 sfbhash, u32 slot, sfbhash >>= SFB_BUCKET_SHIFT; if (b[hash].qlen > 0) - b[hash].qlen--; + WRITE_ONCE(b[hash].qlen, b[hash].qlen - 1); b += SFB_NUMBUCKETS; /* next level */ } } @@ -179,12 +179,12 @@ static void decrement_qlen(const struct sk_buff *skb, struct sfb_sched_data *q) static void decrement_prob(struct sfb_bucket *b, struct sfb_sched_data *q) { - b->p_mark = prob_minus(b->p_mark, q->decrement); + WRITE_ONCE(b->p_mark, prob_minus(b->p_mark, q->decrement)); } static void increment_prob(struct sfb_bucket *b, struct sfb_sched_data *q) { - b->p_mark = prob_plus(b->p_mark, q->increment); + WRITE_ONCE(b->p_mark, prob_plus(b->p_mark, q->increment)); } static void sfb_zero_all_buckets(struct sfb_sched_data *q) @@ -202,11 +202,14 @@ static u32 sfb_compute_qlen(u32 *prob_r, u32 *avgpm_r, const struct sfb_sched_da const struct sfb_bucket *b = &q->bins[q->slot].bins[0][0]; for (i = 0; i < SFB_LEVELS * SFB_NUMBUCKETS; i++) { - if (qlen < b->qlen) - qlen = b->qlen; - totalpm += b->p_mark; - if (prob < b->p_mark) - prob = b->p_mark; + u32 b_qlen = READ_ONCE(b->qlen); + u32 b_mark = READ_ONCE(b->p_mark); + + if (qlen < b_qlen) + qlen = b_qlen; + totalpm += b_mark; + if (prob < b_mark) + prob = b_mark; b++; } *prob_r = prob; @@ -295,7 +298,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (unlikely(sch->q.qlen >= q->limit)) { qdisc_qstats_overlimit(sch); - q->stats.queuedrop++; + WRITE_ONCE(q->stats.queuedrop, + q->stats.queuedrop + 1); goto drop; } @@ -348,7 +352,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (unlikely(minqlen >= q->max)) { qdisc_qstats_overlimit(sch); - q->stats.bucketdrop++; + WRITE_ONCE(q->stats.bucketdrop, + q->stats.bucketdrop + 1); goto drop; } @@ -374,7 +379,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch, } if (sfb_rate_limit(skb, q)) { qdisc_qstats_overlimit(sch); - q->stats.penaltydrop++; + WRITE_ONCE(q->stats.penaltydrop, + q->stats.penaltydrop + 1); goto drop; } goto enqueue; @@ -390,14 +396,17 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch, * In either case, we want to start dropping packets. */ if (r < (p_min - SFB_MAX_PROB / 2) * 2) { - q->stats.earlydrop++; + WRITE_ONCE(q->stats.earlydrop, + q->stats.earlydrop + 1); goto drop; } } if (INET_ECN_set_ce(skb)) { - q->stats.marked++; + WRITE_ONCE(q->stats.marked, + q->stats.marked + 1); } else { - q->stats.earlydrop++; + WRITE_ONCE(q->stats.earlydrop, + q->stats.earlydrop + 1); goto drop; } } @@ -410,7 +419,8 @@ enqueue: sch->q.qlen++; increment_qlen(&cb, q); } else if (net_xmit_drop_count(ret)) { - q->stats.childdrop++; + WRITE_ONCE(q->stats.childdrop, + q->stats.childdrop + 1); qdisc_qstats_drop(sch); } return ret; @@ -599,12 +609,12 @@ static int sfb_dump_stats(struct Qdisc *sch, struct gnet_dump *d) { struct sfb_sched_data *q = qdisc_priv(sch); struct tc_sfb_xstats st = { - .earlydrop = q->stats.earlydrop, - .penaltydrop = q->stats.penaltydrop, - .bucketdrop = q->stats.bucketdrop, - .queuedrop = q->stats.queuedrop, - .childdrop = q->stats.childdrop, - .marked = q->stats.marked, + .earlydrop = READ_ONCE(q->stats.earlydrop), + .penaltydrop = READ_ONCE(q->stats.penaltydrop), + .bucketdrop = READ_ONCE(q->stats.bucketdrop), + .queuedrop = READ_ONCE(q->stats.queuedrop), + .childdrop = READ_ONCE(q->stats.childdrop), + .marked = READ_ONCE(q->stats.marked), }; st.maxqlen = sfb_compute_qlen(&st.maxprob, &st.avgprob, q); diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 8e3752811950..a47a09d76400 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -972,11 +972,12 @@ static enum hrtimer_restart advance_sched(struct hrtimer *timer) } if (should_change_schedules(admin, oper, end_time)) { - /* Set things so the next time this runs, the new - * schedule runs. - */ - end_time = sched_base_time(admin); switch_schedules(q, &admin, &oper); + /* After changing schedules, the next entry is the first one + * in the new schedule, with a pre-calculated end_time. + */ + next = list_first_entry(&oper->entries, struct sched_entry, list); + end_time = next->end_time; } next->end_time = end_time; diff --git a/net/sctp/socket.c b/net/sctp/socket.c index d2665bbd41a2..58d0d9747f0b 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4855,8 +4855,9 @@ static struct sock *sctp_clone_sock(struct sock *sk, if (!newsk) return ERR_PTR(err); - /* sk_clone() sets refcnt to 2 */ + /* sk_clone() sets refcnt to 2 and increments sockets_allocated */ sock_put(newsk); + sk_sockets_allocated_dec(newsk); newinet = inet_sk(newsk); newsp = sctp_sk(newsk); @@ -7033,7 +7034,7 @@ static int sctp_getsockopt_peer_auth_chunks(struct sock *sk, int len, /* See if the user provided enough room for all the data */ num_chunks = ntohs(ch->param_hdr.length) - sizeof(struct sctp_paramhdr); - if (len < num_chunks) + if (len < sizeof(struct sctp_authchunks) + num_chunks) return -EINVAL; if (copy_to_user(to, ch->chunks, num_chunks)) diff --git a/net/smc/smc_clc.c b/net/smc/smc_clc.c index c38fc7bf0a7e..014d527d5462 100644 --- a/net/smc/smc_clc.c +++ b/net/smc/smc_clc.c @@ -788,8 +788,8 @@ int smc_clc_wait_msg(struct smc_sock *smc, void *buf, int buflen, dclc = (struct smc_clc_msg_decline *)clcm; reason_code = SMC_CLC_DECL_PEERDECL; smc->peer_diagnosis = ntohl(dclc->peer_diagnosis); - if (((struct smc_clc_msg_decline *)buf)->hdr.typev2 & - SMC_FIRST_CONTACT_MASK) { + if ((dclc->hdr.typev2 & SMC_FIRST_CONTACT_MASK) && + smc->conn.lgr) { smc->conn.lgr->sync_err = 1; smc_lgr_terminate_sched(smc->conn.lgr); } diff --git a/net/tipc/msg.c b/net/tipc/msg.c index 76284fc538eb..b0bba0feef56 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -177,8 +177,20 @@ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf) if (fragid == LAST_FRAGMENT) { TIPC_SKB_CB(head)->validated = 0; - if (unlikely(!tipc_msg_validate(&head))) + + /* If the reassembled skb has been freed in + * tipc_msg_validate() because of an invalid truesize, + * then head will point to a newly allocated reassembled + * skb, while *headbuf points to freed reassembled skb. + * In such cases, correct *headbuf for freeing the newly + * allocated reassembled skb later. + */ + if (unlikely(!tipc_msg_validate(&head))) { + if (head != *headbuf) + *headbuf = head; goto err; + } + *buf = head; TIPC_SKB_CB(head)->tail = NULL; *headbuf = NULL; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index f668ff107722..e2d787ca3e74 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1968,16 +1968,19 @@ static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) static void unix_destruct_scm(struct sk_buff *skb) { - struct scm_cookie scm; + struct scm_cookie scm = {}; + + swap(scm.pid, UNIXCB(skb).pid); - memset(&scm, 0, sizeof(scm)); - scm.pid = UNIXCB(skb).pid; if (UNIXCB(skb).fp) unix_detach_fds(&scm, skb); - /* Alas, it calls VFS */ - /* So fscking what? fput() had been SMP-safe since the last Summer */ scm_destroy(&scm); +} + +static void unix_wfree(struct sk_buff *skb) +{ + unix_destruct_scm(skb); sock_wfree(skb); } @@ -1993,7 +1996,7 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen if (scm->fp && send_fds) err = unix_attach_fds(scm, skb); - skb->destructor = unix_destruct_scm; + skb->destructor = unix_wfree; return err; } @@ -2070,6 +2073,13 @@ static void scm_stat_del(struct sock *sk, struct sk_buff *skb) } } +static void unix_orphan_scm(struct sock *sk, struct sk_buff *skb) +{ + scm_stat_del(sk, skb); + unix_destruct_scm(skb); + skb->destructor = sock_wfree; +} + /* * Send AF_UNIX data. */ @@ -2683,10 +2693,16 @@ static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor) int err; mutex_lock(&u->iolock); + skb = skb_recv_datagram(sk, MSG_DONTWAIT, &err); - mutex_unlock(&u->iolock); - if (!skb) + if (!skb) { + mutex_unlock(&u->iolock); return err; + } + + unix_orphan_scm(sk, skb); + + mutex_unlock(&u->iolock); return recv_actor(sk, skb); } @@ -2886,6 +2902,9 @@ static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor) #endif spin_unlock(&queue->lock); + + unix_orphan_scm(sk, skb); + mutex_unlock(&u->iolock); return recv_actor(sk, skb); diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 2b7c0b5896ed..f862988c1e86 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -694,7 +694,6 @@ out: static s64 hvs_stream_has_data(struct vsock_sock *vsk) { struct hvsock *hvs = vsk->trans; - bool need_refill; s64 ret; if (hvs->recv_data_len > 0) @@ -702,9 +701,31 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk) switch (hvs_channel_readable_payload(hvs->chan)) { case 1: - need_refill = !hvs->recv_desc; - if (!need_refill) - return -EIO; + if (hvs->recv_desc) { + /* Here hvs->recv_data_len is 0, so hvs->recv_desc must + * be NULL unless it points to the 0-byte-payload FIN + * packet or a malformed/short packet: see + * hvs_update_recv_data(). + * + * If hvs->recv_desc points to the FIN packet, here all + * the payload has been dequeued and the peer_shutdown + * flag is set, but hvs_channel_readable_payload() still + * returns 1, because the VMBus ringbuffer's read_index + * is not updated for the FIN packet: + * hvs_stream_dequeue() -> hv_pkt_iter_next() updates + * the cached priv_read_index but has no opportunity to + * update the read_index in hv_pkt_iter_close() as + * hvs_stream_has_data() returns 0 for the FIN packet, + * so it won't get dequeued. + * + * In case hvs->recv_desc points to a malformed/short + * packet, return -EIO. + */ + if (!(vsk->peer_shutdown & SEND_SHUTDOWN)) + return -EIO; + + return 0; + } hvs->recv_desc = hv_pkt_iter_first(hvs->chan); if (!hvs->recv_desc) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index a152a9e208d0..416d533f493d 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -73,6 +73,7 @@ static bool virtio_transport_can_zcopy(const struct virtio_transport *t_ops, static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk, struct sk_buff *skb, struct msghdr *msg, + size_t pkt_len, bool zerocopy) { struct ubuf_info *uarg; @@ -81,12 +82,10 @@ static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk, uarg = msg->msg_ubuf; net_zcopy_get(uarg); } else { - struct iov_iter *iter = &msg->msg_iter; struct ubuf_info_msgzc *uarg_zc; uarg = msg_zerocopy_realloc(sk_vsock(vsk), - iter->count, - NULL, false); + pkt_len, NULL, false); if (!uarg) return -1; @@ -398,11 +397,17 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, * each iteration. If this is last skb for this buffer * and MSG_ZEROCOPY mode is in use - we must allocate * completion for the current syscall. + * + * Pass pkt_len because msg iter is already consumed + * by virtio_transport_fill_skb(), so iter->count + * can not be used for RLIMIT_MEMLOCK pinned-pages + * accounting done by msg_zerocopy_realloc(). */ if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY && skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) { if (virtio_transport_init_zcopy_skb(vsk, skb, info->msg, + pkt_len, can_zcopy)) { kfree_skb(skb); ret = -ENOMEM; @@ -545,9 +550,8 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk, skb_queue_walk(&vvs->rx_queue, skb) { size_t bytes; - bytes = len - total; - if (bytes > skb->len) - bytes = skb->len; + bytes = min_t(size_t, len - total, + skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset); spin_unlock_bh(&vvs->rx_lock); @@ -1558,8 +1562,6 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, return -ENOMEM; } - sk_acceptq_added(sk); - lock_sock_nested(child, SINGLE_DEPTH_NESTING); child->sk_state = TCP_ESTABLISHED; @@ -1581,6 +1583,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, return ret; } + sk_acceptq_added(sk); if (virtio_transport_space_update(child, skb)) child->sk_write_space(child); diff --git a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c index 653b0a20fab9..c62907732c19 100644 --- a/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c +++ b/tools/testing/selftests/bpf/prog_tests/migrate_reuseport.c @@ -7,24 +7,29 @@ * 3. call listen() for 1 server socket. (migration target) * 4. update a map to migrate all child sockets * to the last server socket (migrate_map[cookie] = 4) - * 5. call shutdown() for first 4 server sockets + * 5. for TCP_ESTABLISHED and TCP_SYN_RECV cases, verify via epoll + * that the last server socket is not ready before migration. + * 6. call shutdown() for first 4 server sockets * and migrate the requests in the accept queue * to the last server socket. - * 6. call listen() for the second server socket. - * 7. call shutdown() for the last server + * 7. for TCP_ESTABLISHED and TCP_SYN_RECV cases, verify via epoll + * that the last server socket is ready after migration. + * 8. call listen() for the second server socket. + * 9. call shutdown() for the last server * and migrate the requests in the accept queue * to the second server socket. - * 8. call listen() for the last server. - * 9. call shutdown() for the second server + * 10. call listen() for the last server. + * 11. call shutdown() for the second server * and migrate the requests in the accept queue * to the last server socket. - * 10. call accept() for the last server socket. + * 12. call accept() for the last server socket. * * Author: Kuniyuki Iwashima <kuniyu@amazon.co.jp> */ #include <bpf/bpf.h> #include <bpf/libbpf.h> +#include <sys/epoll.h> #include "test_progs.h" #include "test_migrate_reuseport.skel.h" @@ -350,21 +355,51 @@ static int update_maps(struct migrate_reuseport_test_case *test_case, static int migrate_dance(struct migrate_reuseport_test_case *test_case) { + struct epoll_event ev = { + .events = EPOLLIN, + }; + int epoll = -1, nfds; int i, err; + if (test_case->state != BPF_TCP_NEW_SYN_RECV) { + epoll = epoll_create1(0); + if (!ASSERT_NEQ(epoll, -1, "epoll_create1")) + return -1; + + ev.data.fd = test_case->servers[MIGRATED_TO]; + if (!ASSERT_OK(epoll_ctl(epoll, EPOLL_CTL_ADD, + test_case->servers[MIGRATED_TO], &ev), + "epoll_ctl")) + goto close_epoll; + + nfds = epoll_wait(epoll, &ev, 1, 0); + if (!ASSERT_EQ(nfds, 0, "epoll_wait 1")) + goto close_epoll; + } + /* Migrate TCP_ESTABLISHED and TCP_SYN_RECV requests * to the last listener based on eBPF. */ for (i = 0; i < MIGRATED_TO; i++) { err = shutdown(test_case->servers[i], SHUT_RDWR); if (!ASSERT_OK(err, "shutdown")) - return -1; + goto close_epoll; } /* No dance for TCP_NEW_SYN_RECV to migrate based on eBPF */ if (test_case->state == BPF_TCP_NEW_SYN_RECV) return 0; + nfds = epoll_wait(epoll, &ev, 1, 0); + if (!ASSERT_EQ(nfds, 1, "epoll_wait 2")) { +close_epoll: + if (epoll >= 0) + close(epoll); + return -1; + } + + close(epoll); + /* Note that we use the second listener instead of the * first one here. * diff --git a/tools/testing/selftests/drivers/net/bonding/lag_lib.sh b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh index bf9bcd1b5ec0..f2e43b6c4c81 100644 --- a/tools/testing/selftests/drivers/net/bonding/lag_lib.sh +++ b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh @@ -23,20 +23,9 @@ test_LAG_cleanup() ip link set dev dummy2 master "$name" elif [ "$driver" = "team" ]; then name="team0" - teamd -d -c ' - { - "device": "'"$name"'", - "runner": { - "name": "'"$mode"'" - }, - "ports": { - "dummy1": - {}, - "dummy2": - {} - } - } - ' + ip link add "$name" type team + ip link set dev dummy1 master "$name" + ip link set dev dummy2 master "$name" ip link set dev "$name" up else check_err 1 diff --git a/tools/testing/selftests/drivers/net/team/dev_addr_lists.sh b/tools/testing/selftests/drivers/net/team/dev_addr_lists.sh index b1ec7755b783..26469f3be022 100755 --- a/tools/testing/selftests/drivers/net/team/dev_addr_lists.sh +++ b/tools/testing/selftests/drivers/net/team/dev_addr_lists.sh @@ -42,8 +42,6 @@ team_cleanup() } -require_command teamd - trap cleanup EXIT tests_run diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 2a390cae41bf..94d722770420 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -101,6 +101,7 @@ CONFIG_NET_SCH_HTB=m CONFIG_NET_SCH_INGRESS=m CONFIG_NET_SCH_NETEM=y CONFIG_NET_SCH_PRIO=m +CONFIG_NET_TEAM=y CONFIG_NET_VRF=y CONFIG_NF_CONNTRACK=m CONFIG_NF_CONNTRACK_OVS=y diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh index 6eb7f95e70e1..ac868a731694 100755 --- a/tools/testing/selftests/net/fib_nexthops.sh +++ b/tools/testing/selftests/net/fib_nexthops.sh @@ -1209,6 +1209,28 @@ ipv6_fcnal_runtime() run_cmd "$IP ro replace 2001:db8:101::1/128 nhid 124" log_test $? 0 "IPv6 route using a group after replacing v4 gateways" + # Replacing an IPv6 nexthop with an IPv4 nexthop should update has_v4 + # for all groups using it, preventing IPv6 routes from referencing the + # group after the replace. + run_cmd "$IP nexthop add id 89 via 2001:db8:91::2 dev veth1" + run_cmd "$IP nexthop add id 125 group 89" + run_cmd "$IP nexthop replace id 89 via 172.16.1.1 dev veth1" + run_cmd "$IP ro replace 2001:db8:101::1/128 nhid 125" + log_test $? 2 "IPv6 route can not use group after v6 nexthop replaced by v4" + + # Same scenario but with a blackhole nexthop: the group has no IPv6 + # routes yet when the replace happens, so fib6_check_nh_list returns + # early without checking. has_v4 must still be updated to block + # subsequent IPv6 route additions. + run_cmd "$IP nexthop flush >/dev/null 2>&1" + run_cmd "$IP -6 nexthop add id 90 blackhole" + run_cmd "$IP nexthop add id 125 group 90" + run_cmd "$IP nexthop replace id 90 blackhole" + run_cmd "$IP -6 ro add 2001:db8:101::1/128 nhid 125" + log_test $? 2 "IPv6 route reject v6 blackhole replaced by v4 blackhole" + run_cmd "ip netns exec $me ping -6 2001:db8:101::1 -c1 -w$PING_TIMEOUT" + log_test $? 2 "Ping unreachable after rejected route" + $IP nexthop flush >/dev/null 2>&1 # diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh index d847ff1737c3..27cbda68144e 100755 --- a/tools/testing/selftests/net/mptcp/diag.sh +++ b/tools/testing/selftests/net/mptcp/diag.sh @@ -322,6 +322,33 @@ wait_connected() done } +chk_sndbuf() +{ + local server_sndbuf client_sndbuf msg + local port=${1} + + msg="....chk sndbuf server/client" + server_sndbuf=$(ss -N "${ns}" -inmHM "sport" "${port}" | \ + sed -n 's/.*tb\([0-9]\+\).*/\1/p') + client_sndbuf=$(ss -N "${ns}" -inmHM "dport" "${port}" | \ + sed -n 's/.*tb\([0-9]\+\).*/\1/p') + + mptcp_lib_print_title "${msg}" + if [ -z "${server_sndbuf}" ] || [ -z "${client_sndbuf}" ]; then + mptcp_lib_pr_fail "sndbuf S=${server_sndbuf} C=${client_sndbuf}" + mptcp_lib_result_fail "${msg}" + ret=${KSFT_FAIL} + elif [ "${server_sndbuf}" != "${client_sndbuf}" ]; then + mptcp_lib_pr_fail "sndbuf S=${server_sndbuf} != C=${client_sndbuf}" + mptcp_lib_result_fail "${msg}" + ret=${KSFT_FAIL} + else + mptcp_lib_pr_ok + mptcp_lib_result_pass "${msg}" + fi +} + + trap cleanup EXIT mptcp_lib_ns_init ns @@ -341,6 +368,7 @@ echo "b" | \ 127.0.0.1 >/dev/null & wait_connected $ns 10000 chk_msk_nr 2 "after MPC handshake" +chk_sndbuf 10000 chk_last_time_info 10000 chk_msk_remote_key_nr 2 "....chk remote_key" chk_msk_fallback_nr 0 "....chk no fallback" diff --git a/tools/testing/selftests/net/ovpn/common.sh b/tools/testing/selftests/net/ovpn/common.sh index 4c08f756e63a..2d844eb3aa6e 100644 --- a/tools/testing/selftests/net/ovpn/common.sh +++ b/tools/testing/selftests/net/ovpn/common.sh @@ -4,62 +4,181 @@ # # Author: Antonio Quartulli <antonio@openvpn.net> -UDP_PEERS_FILE=${UDP_PEERS_FILE:-udp_peers.txt} -TCP_PEERS_FILE=${TCP_PEERS_FILE:-tcp_peers.txt} -OVPN_CLI=${OVPN_CLI:-./ovpn-cli} -YNL_CLI=${YNL_CLI:-../../../../net/ynl/pyynl/cli.py} -ALG=${ALG:-aes} -PROTO=${PROTO:-UDP} -FLOAT=${FLOAT:-0} -SYMMETRIC_ID=${SYMMETRIC_ID:-0} - -export ID_OFFSET=$(( 9 * (SYMMETRIC_ID == 0) )) - -JQ_FILTER='map(select(.msg.peer | has("remote-ipv6") | not)) | +OVPN_COMMON_DIR=$(dirname "$(readlink -f "${BASH_SOURCE[0]}")") +source "$OVPN_COMMON_DIR/../../kselftest/ktap_helpers.sh" + +OVPN_UDP_PEERS_FILE=${OVPN_UDP_PEERS_FILE:-udp_peers.txt} +OVPN_TCP_PEERS_FILE=${OVPN_TCP_PEERS_FILE:-tcp_peers.txt} +OVPN_CLI=${OVPN_CLI:-${OVPN_COMMON_DIR}/ovpn-cli} +OVPN_YNL=${OVPN_YNL:-${OVPN_COMMON_DIR}/../../../../net/ynl/pyynl/cli.py} +OVPN_ALG=${OVPN_ALG:-aes} +OVPN_PROTO=${OVPN_PROTO:-UDP} +OVPN_FLOAT=${OVPN_FLOAT:-0} +OVPN_SYMMETRIC_ID=${OVPN_SYMMETRIC_ID:-0} +OVPN_VERBOSE=${OVPN_VERBOSE:-0} + +export OVPN_ID_OFFSET=$(( 9 * (OVPN_SYMMETRIC_ID == 0) )) + +OVPN_JQ_FILTER='map(if type == "array" then .[] else . end) | + map(select(.msg.peer | has("remote-ipv6") | not)) | map(del(.msg.ifindex)) | sort_by(.msg.peer.id)[]' -LAN_IP="11.11.11.11" +OVPN_LAN_IP="11.11.11.11" + +declare -A OVPN_TMP_JSONS=() +declare -A OVPN_LISTENER_PIDS=() +OVPN_CURRENT_STAGE="" + +ovpn_is_verbose() { + [[ "${OVPN_VERBOSE}" == "1" ]] +} + +ovpn_log() { + ovpn_is_verbose || return 0 + printf '%s\n' "$*" +} + +ovpn_print_cmd_output() { + local output_file="$1" + local line + + [[ -s "${output_file}" ]] || return 0 + + while IFS= read -r line; do + ovpn_log "${line}" + done < "${output_file}" +} + +ovpn_cmd_run() { + local mode="$1" + local label="$2" + local output_file + local rc + local ret=0 + + shift 2 + + output_file=$(mktemp) + if "$@" >"${output_file}" 2>&1; then + rc=0 + else + rc=$? + fi + + case "${mode}" in + ok) + if [[ "${rc}" -ne 0 ]]; then + cat "${output_file}" + printf '%s\n' \ + "${label}: command failed with rc=${rc}: $*" + ret="${rc}" + fi + ;; + mayfail) + ;; + fail) + [[ "${rc}" -eq 0 ]] && ret=1 + ;; + esac + + if ovpn_is_verbose && [[ "${rc}" -eq 0 || "${mode}" != "ok" ]]; then + ovpn_print_cmd_output "${output_file}" + fi + + rm -f "${output_file}" + return "${ret}" +} + +ovpn_cmd_ok() { + ovpn_cmd_run ok "$@" +} + +ovpn_cmd_mayfail() { + ovpn_cmd_run mayfail "$@" +} + +ovpn_cmd_fail() { + ovpn_cmd_run fail "$@" +} -declare -A tmp_jsons=() -declare -A listener_pids=() +ovpn_run_bg() { + local pid_var="$1" -create_ns() { - ip netns add peer${1} + shift + if ovpn_is_verbose; then + "$@" & + else + "$@" >/dev/null 2>&1 & + fi + + printf -v "${pid_var}" '%s' "$!" +} + +ovpn_run_stage() { + local label="$1" + + shift + OVPN_CURRENT_STAGE="${label}" + "$@" + OVPN_CURRENT_STAGE="" + ktap_test_pass "${label}" } -setup_ns() { +ovpn_stage_err() { + # ERR trap is global under set -eE: only report failures that happen + # while ovpn_run_stage() is actively executing a stage body. + if [[ -n "${OVPN_CURRENT_STAGE}" ]]; then + ktap_test_fail "${OVPN_CURRENT_STAGE}" + OVPN_CURRENT_STAGE="" + fi +} + +ovpn_create_ns() { + ip netns add "ovpn_peer${1}" +} + +ovpn_setup_ns() { + local peer="ovpn_peer${1}" + local server_ns="ovpn_peer0" + local peer_ns MODE="P2P" if [ ${1} -eq 0 ]; then MODE="MP" - for p in $(seq 1 ${NUM_PEERS}); do - ip link add veth${p} netns peer0 type veth peer name veth${p} netns peer${p} + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + peer_ns="ovpn_peer${p}" + ip link add veth${p} netns "${server_ns}" type veth \ + peer name veth${p} netns "${peer_ns}" - ip -n peer0 addr add 10.10.${p}.1/24 dev veth${p} - ip -n peer0 addr add fd00:0:0:${p}::1/64 dev veth${p} - ip -n peer0 link set veth${p} up + ip -n "${server_ns}" addr add 10.10.${p}.1/24 dev \ + veth${p} + ip -n "${server_ns}" addr add fd00:0:0:${p}::1/64 dev \ + veth${p} + ip -n "${server_ns}" link set veth${p} up - ip -n peer${p} addr add 10.10.${p}.2/24 dev veth${p} - ip -n peer${p} addr add fd00:0:0:${p}::2/64 dev veth${p} - ip -n peer${p} link set veth${p} up + ip -n "${peer_ns}" addr add 10.10.${p}.2/24 dev veth${p} + ip -n "${peer_ns}" addr add fd00:0:0:${p}::2/64 dev \ + veth${p} + ip -n "${peer_ns}" link set veth${p} up done fi - ip netns exec peer${1} ${OVPN_CLI} new_iface tun${1} $MODE - ip -n peer${1} addr add ${2} dev tun${1} + ip netns exec "${peer}" ${OVPN_CLI} new_iface tun${1} $MODE + ip -n "${peer}" addr add ${2} dev tun${1} # add a secondary IP to peer 1, to test a LAN behind a client - if [ ${1} -eq 1 -a -n "${LAN_IP}" ]; then - ip -n peer${1} addr add ${LAN_IP} dev tun${1} - ip -n peer0 route add ${LAN_IP} via $(echo ${2} |sed -e s'!/.*!!') dev tun0 + if [ ${1} -eq 1 -a -n "${OVPN_LAN_IP}" ]; then + ip -n "${peer}" addr add ${OVPN_LAN_IP} dev tun${1} + ip -n "${server_ns}" route add ${OVPN_LAN_IP} via \ + $(echo ${2} |sed -e s'!/.*!!') dev tun0 fi if [ -n "${3}" ]; then - ip -n peer${1} link set mtu ${3} dev tun${1} + ip -n "${peer}" link set mtu ${3} dev tun${1} fi - ip -n peer${1} link set tun${1} up + ip -n "${peer}" link set tun${1} up } -build_capture_filter() { +ovpn_build_capture_filter() { # match the first four bytes of the openvpn data payload - if [ "${PROTO}" == "UDP" ]; then + if [ "${OVPN_PROTO}" == "UDP" ]; then # For UDP, libpcap transport indexing only works for IPv4, so # use an explicit IPv4 or IPv6 expression based on the peer # address. The IPv6 branch assumes there are no extension @@ -76,108 +195,170 @@ build_capture_filter() { fi } -setup_listener() { +ovpn_setup_listener() { + local peer="$1" + local file + local peer_ns="ovpn_peer${peer}" + file=$(mktemp) - PYTHONUNBUFFERED=1 ip netns exec peer${p} ${YNL_CLI} --family ovpn \ - --subscribe peers --output-json --duration 40 > ${file} & - listener_pids[$1]=$! - tmp_jsons[$1]="${file}" + PYTHONUNBUFFERED=1 ip netns exec "${peer_ns}" "${OVPN_YNL}" --family \ + ovpn --subscribe peers --output-json > "${file}" \ + 2>/dev/null & + OVPN_LISTENER_PIDS["${peer}"]=$! + OVPN_TMP_JSONS["${peer}"]="${file}" } -add_peer() { +ovpn_add_peer() { labels=("ASYMM" "SYMM") - M_ID=${labels[SYMMETRIC_ID]} + local peer_ns + local server_ns="ovpn_peer0" + M_ID=${labels[OVPN_SYMMETRIC_ID]} - if [ "${PROTO}" == "UDP" ]; then + if [ "${OVPN_PROTO}" == "UDP" ]; then if [ ${1} -eq 0 ]; then - ip netns exec peer0 ${OVPN_CLI} new_multi_peer tun0 1 \ - ${M_ID} ${UDP_PEERS_FILE} + ip netns exec "${server_ns}" ${OVPN_CLI} \ + new_multi_peer tun0 1 ${M_ID} \ + ${OVPN_UDP_PEERS_FILE} - for p in $(seq 1 ${NUM_PEERS}); do - ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 1 0 ${ALG} 0 \ + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + ip netns exec "${server_ns}" ${OVPN_CLI} \ + new_key tun0 ${p} 1 0 ${OVPN_ALG} 0 \ data64.key done else - if [ "${SYMMETRIC_ID}" -eq 1 ]; then + peer_ns="ovpn_peer${1}" + if [ "${OVPN_SYMMETRIC_ID}" -eq 1 ]; then PEER_ID=${1} TX_ID="none" else PEER_ID=$(awk "NR == ${1} {print \$2}" \ - ${UDP_PEERS_FILE}) + ${OVPN_UDP_PEERS_FILE}) TX_ID=${1} fi - RADDR=$(awk "NR == ${1} {print \$3}" ${UDP_PEERS_FILE}) - RPORT=$(awk "NR == ${1} {print \$4}" ${UDP_PEERS_FILE}) - LPORT=$(awk "NR == ${1} {print \$6}" ${UDP_PEERS_FILE}) - ip netns exec peer${1} ${OVPN_CLI} new_peer tun${1} \ - ${PEER_ID} ${TX_ID} ${LPORT} ${RADDR} ${RPORT} - ip netns exec peer${1} ${OVPN_CLI} new_key tun${1} \ - ${PEER_ID} 1 0 ${ALG} 1 data64.key + RADDR=$(awk "NR == ${1} {print \$3}" \ + ${OVPN_UDP_PEERS_FILE}) + RPORT=$(awk "NR == ${1} {print \$4}" \ + ${OVPN_UDP_PEERS_FILE}) + LPORT=$(awk "NR == ${1} {print \$6}" \ + ${OVPN_UDP_PEERS_FILE}) + ip netns exec "${peer_ns}" ${OVPN_CLI} new_peer \ + tun${1} ${PEER_ID} ${TX_ID} ${LPORT} ${RADDR} \ + ${RPORT} + ip netns exec "${peer_ns}" ${OVPN_CLI} new_key tun${1} \ + ${PEER_ID} 1 0 ${OVPN_ALG} 1 data64.key fi else if [ ${1} -eq 0 ]; then - (ip netns exec peer0 ${OVPN_CLI} listen tun0 1 ${M_ID} \ - ${TCP_PEERS_FILE} && { - for p in $(seq 1 ${NUM_PEERS}); do - ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 1 0 \ - ${ALG} 0 data64.key + (ip netns exec "${server_ns}" ${OVPN_CLI} listen tun0 \ + 1 ${M_ID} ${OVPN_TCP_PEERS_FILE} && { + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + ip netns exec "${server_ns}" \ + ${OVPN_CLI} new_key tun0 ${p} \ + 1 0 ${OVPN_ALG} 0 data64.key done }) & sleep 5 else - if [ "${SYMMETRIC_ID}" -eq 1 ]; then + peer_ns="ovpn_peer${1}" + if [ "${OVPN_SYMMETRIC_ID}" -eq 1 ]; then PEER_ID=${1} TX_ID="none" else PEER_ID=$(awk "NR == ${1} {print \$2}" \ - ${TCP_PEERS_FILE}) + ${OVPN_TCP_PEERS_FILE}) TX_ID=${1} fi - ip netns exec peer${1} ${OVPN_CLI} connect tun${1} \ + ip netns exec "${peer_ns}" ${OVPN_CLI} connect tun${1} \ ${PEER_ID} ${TX_ID} 10.10.${1}.1 1 data64.key fi fi } -compare_ntfs() { - if [ ${#tmp_jsons[@]} -gt 0 ]; then +ovpn_compare_ntfs() { + local diff_rc=0 + local diff_file + + if [ ${#OVPN_TMP_JSONS[@]} -gt 0 ]; then suffix="" - [ "${SYMMETRIC_ID}" -eq 1 ] && suffix="${suffix}-symm" - [ "$FLOAT" == 1 ] && suffix="${suffix}-float" + [ "${OVPN_SYMMETRIC_ID}" -eq 1 ] && suffix="${suffix}-symm" + [ "$OVPN_FLOAT" == 1 ] && suffix="${suffix}-float" expected="json/peer${1}${suffix}.json" - received="${tmp_jsons[$1]}" + received="${OVPN_TMP_JSONS[$1]}" + diff_file=$(mktemp) - kill -TERM ${listener_pids[$1]} || true - wait ${listener_pids[$1]} || true + ovpn_stop_listener "${1}" 1 printf "Checking notifications for peer ${1}... " - if diff <(jq -s "${JQ_FILTER}" ${expected}) \ - <(jq -s "${JQ_FILTER}" ${received}); then + if diff <(jq -s "${OVPN_JQ_FILTER}" ${expected}) \ + <(jq -s "${OVPN_JQ_FILTER}" ${received}) \ + >"${diff_file}" 2>&1; then echo "OK" + else + diff_rc=$? + echo "failed" + cat "${diff_file}" fi - rm -f ${received} || true + rm -f "${diff_file}" || true + rm -f "${received}" || true + unset "OVPN_TMP_JSONS[$1]" fi + + return "${diff_rc}" } -cleanup() { +ovpn_stop_listener() { + local peer="$1" + local keep_json="${2:-0}" + local pid="${OVPN_LISTENER_PIDS[$peer]:-}" + local json="${OVPN_TMP_JSONS[$peer]:-}" + + if [[ -n "${pid}" ]]; then + kill -TERM "${pid}" 2>/dev/null || true + wait "${pid}" 2>/dev/null || true + unset "OVPN_LISTENER_PIDS[$peer]" + fi + + if [[ -n "${json}" && "${keep_json}" -eq 0 ]]; then + rm -f "${json}" || true + unset "OVPN_TMP_JSONS[$peer]" + fi +} + +ovpn_cleanup_peer_ns() { + local peer="$1" + local peer_id="${peer#ovpn_peer}" + + ip -n "${peer}" link set tun${peer_id} down 2>/dev/null || true + ip netns exec "${peer}" ${OVPN_CLI} del_iface tun${peer_id} \ + 1>/dev/null 2>&1 || true + ip netns del "${peer}" 2>/dev/null || true +} + +ovpn_cleanup() { + local peer + # some ovpn-cli processes sleep in background so they need manual poking - killall $(basename ${OVPN_CLI}) 2>/dev/null || true + killall "$(basename "${OVPN_CLI}")" 2>/dev/null || true - # netns peer0 is deleted without erasing ifaces first - for p in $(seq 1 10); do - ip -n peer${p} link set tun${p} down 2>/dev/null || true - ip netns exec peer${p} ${OVPN_CLI} del_iface tun${p} 2>/dev/null || true + for peer in "${!OVPN_LISTENER_PIDS[@]}"; do + ovpn_stop_listener "${peer}" 2>/dev/null done + for p in $(seq 1 10); do - ip -n peer0 link del veth${p} 2>/dev/null || true - done - for p in $(seq 0 10); do - ip netns del peer${p} 2>/dev/null || true + ip -n ovpn_peer0 link del veth${p} 2>/dev/null || true done + + # remove from ovpn's netns pool + while IFS= read -r peer; do + [[ -n "${peer}" ]] || continue + ovpn_cleanup_peer_ns "${peer}" 2>/dev/null + done < <(ip netns list 2>/dev/null | awk '/^ovpn_/ {print $1}') } -if [ "${PROTO}" == "UDP" ]; then - NUM_PEERS=${NUM_PEERS:-$(wc -l ${UDP_PEERS_FILE} | awk '{print $1}')} +if [ "${OVPN_PROTO}" == "UDP" ]; then + OVPN_NUM_PEERS=${OVPN_NUM_PEERS:-$(wc -l ${OVPN_UDP_PEERS_FILE} | \ + awk '{print $1}')} else - NUM_PEERS=${NUM_PEERS:-$(wc -l ${TCP_PEERS_FILE} | awk '{print $1}')} + OVPN_NUM_PEERS=${OVPN_NUM_PEERS:-$(wc -l ${OVPN_TCP_PEERS_FILE} | \ + awk '{print $1}')} fi diff --git a/tools/testing/selftests/net/ovpn/config b/tools/testing/selftests/net/ovpn/config index 42699740936d..d6cf033d555e 100644 --- a/tools/testing/selftests/net/ovpn/config +++ b/tools/testing/selftests/net/ovpn/config @@ -5,6 +5,9 @@ CONFIG_CRYPTO_GCM=y CONFIG_DST_CACHE=y CONFIG_INET=y CONFIG_NET=y +CONFIG_NETFILTER=y CONFIG_NET_UDP_TUNNEL=y +CONFIG_NF_TABLES=m +CONFIG_NF_TABLES_INET=y CONFIG_OVPN=m CONFIG_STREAM_PARSER=y diff --git a/tools/testing/selftests/net/ovpn/test-chachapoly.sh b/tools/testing/selftests/net/ovpn/test-chachapoly.sh index 32504079a2b8..cd3d94355d58 100755 --- a/tools/testing/selftests/net/ovpn/test-chachapoly.sh +++ b/tools/testing/selftests/net/ovpn/test-chachapoly.sh @@ -4,6 +4,6 @@ # # Author: Antonio Quartulli <antonio@openvpn.net> -ALG="chachapoly" +OVPN_ALG="chachapoly" source test.sh diff --git a/tools/testing/selftests/net/ovpn/test-close-socket-tcp.sh b/tools/testing/selftests/net/ovpn/test-close-socket-tcp.sh index 093d44772ffd..392d269bada5 100755 --- a/tools/testing/selftests/net/ovpn/test-close-socket-tcp.sh +++ b/tools/testing/selftests/net/ovpn/test-close-socket-tcp.sh @@ -4,6 +4,6 @@ # # Author: Antonio Quartulli <antonio@openvpn.net> -PROTO="TCP" +OVPN_PROTO="TCP" source test-close-socket.sh diff --git a/tools/testing/selftests/net/ovpn/test-close-socket.sh b/tools/testing/selftests/net/ovpn/test-close-socket.sh index 0d09df14fe8e..af1532b4d2da 100755 --- a/tools/testing/selftests/net/ovpn/test-close-socket.sh +++ b/tools/testing/selftests/net/ovpn/test-close-socket.sh @@ -5,41 +5,81 @@ # Author: Antonio Quartulli <antonio@openvpn.net> #set -x -set -e +set -eE source ./common.sh -cleanup +ovpn_test_finished=0 -modprobe -q ovpn || true +ovpn_test_exit() { + ovpn_cleanup + modprobe -r ovpn || true + + if [ "${ovpn_test_finished}" -eq 0 ]; then + ktap_print_totals + fi +} + +ovpn_prepare_network() { + local p + local peer_ns + + for p in $(seq 0 ${OVPN_NUM_PEERS}); do + ovpn_cmd_ok "create namespace peer${p}" ovpn_create_ns "${p}" + done -for p in $(seq 0 ${NUM_PEERS}); do - create_ns ${p} -done + for p in $(seq 0 ${OVPN_NUM_PEERS}); do + ovpn_cmd_ok "configure peer${p} namespace" ovpn_setup_ns \ + "${p}" 5.5.5.$((p + 1))/24 + done -for p in $(seq 0 ${NUM_PEERS}); do - setup_ns ${p} 5.5.5.$((${p} + 1))/24 -done + for p in $(seq 0 ${OVPN_NUM_PEERS}); do + ovpn_cmd_ok "register peer${p} in overlay" ovpn_add_peer "${p}" + done -for p in $(seq 0 ${NUM_PEERS}); do - add_peer ${p} -done + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "set peer0 timeout for peer ${p}" \ + ip netns exec ovpn_peer0 ${OVPN_CLI} set_peer tun0 \ + ${p} 60 120 + ovpn_cmd_ok "set peer${p} timeout for peer ${p}" \ + ip netns exec "${peer_ns}" ${OVPN_CLI} set_peer \ + tun${p} $((p + OVPN_ID_OFFSET)) 60 120 + done +} -for p in $(seq 1 ${NUM_PEERS}); do - ip netns exec peer0 ${OVPN_CLI} set_peer tun0 ${p} 60 120 - ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} $((${p}+9)) 60 120 -done +ovpn_run_ping_traffic() { + local p -sleep 1 + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + ovpn_cmd_ok "send ping traffic to peer ${p}" \ + ip netns exec ovpn_peer0 ping -qfc 500 -w 3 \ + 5.5.5.$((p + 1)) + done +} -for p in $(seq 1 ${NUM_PEERS}); do - ip netns exec peer0 ping -qfc 500 -w 3 5.5.5.$((${p} + 1)) -done +ovpn_run_iperf() { + local iperf_pid -ip netns exec peer0 iperf3 -1 -s & -sleep 1 -ip netns exec peer1 iperf3 -Z -t 3 -c 5.5.5.1 + ovpn_run_bg iperf_pid ip netns exec ovpn_peer0 iperf3 -1 -s + sleep 1 + ovpn_cmd_ok "run iperf throughput flow" \ + ip netns exec ovpn_peer1 iperf3 -Z -t 3 -c 5.5.5.1 + wait "${iperf_pid}" || return 1 +} + +trap ovpn_test_exit EXIT +trap ovpn_stage_err ERR + +ktap_print_header +ktap_set_plan 3 + +ovpn_cleanup +modprobe -q ovpn || true -cleanup +ovpn_run_stage "setup network topology" ovpn_prepare_network +ovpn_run_stage "run ping traffic" ovpn_run_ping_traffic +ovpn_run_stage "run iperf throughput" ovpn_run_iperf -modprobe -r ovpn || true +ovpn_test_finished=1 +ktap_finished diff --git a/tools/testing/selftests/net/ovpn/test-float.sh b/tools/testing/selftests/net/ovpn/test-float.sh index ba5d725e18b0..91f8e113718e 100755 --- a/tools/testing/selftests/net/ovpn/test-float.sh +++ b/tools/testing/selftests/net/ovpn/test-float.sh @@ -4,6 +4,6 @@ # # Author: Antonio Quartulli <antonio@openvpn.net> -FLOAT="1" +OVPN_FLOAT="1" source test.sh diff --git a/tools/testing/selftests/net/ovpn/test-mark.sh b/tools/testing/selftests/net/ovpn/test-mark.sh index 8534428ed3eb..5a8f47554286 100755 --- a/tools/testing/selftests/net/ovpn/test-mark.sh +++ b/tools/testing/selftests/net/ovpn/test-mark.sh @@ -6,91 +6,166 @@ # Antonio Quartulli <antonio@openvpn.net> #set -x -set -e +set -eE MARK=1056 +MARK_DROP_COUNTER=0 source ./common.sh -cleanup - +ovpn_test_finished=0 + +ovpn_test_exit() { + ovpn_cleanup + modprobe -r ovpn || true + + if [ "${ovpn_test_finished}" -eq 0 ]; then + ktap_print_totals + fi +} + +ovpn_mark_prepare_network() { + local p + local peer_ns + + for p in $(seq 0 "${OVPN_NUM_PEERS}"); do + ovpn_cmd_ok "create namespace peer${p}" ovpn_create_ns "${p}" + done + + for p in $(seq 0 3); do + ovpn_cmd_ok "configure peer${p} namespace" ovpn_setup_ns \ + "${p}" 5.5.5.$((p + 1))/24 + done + + ovpn_cmd_ok "create server-side multi-peer with fwmark" \ + ip netns exec ovpn_peer0 "${OVPN_CLI}" new_multi_peer tun0 1 \ + ASYMM "${OVPN_UDP_PEERS_FILE}" "${MARK}" + for p in $(seq 1 3); do + ovpn_cmd_ok "install server key for peer ${p}" \ + ip netns exec ovpn_peer0 "${OVPN_CLI}" new_key tun0 \ + "${p}" 1 0 "${OVPN_ALG}" 0 data64.key + done + + for p in $(seq 1 3); do + ovpn_cmd_ok "register peer${p} in overlay" ovpn_add_peer "${p}" + done + + for p in $(seq 1 3); do + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "set peer0 timeout for peer ${p}" \ + ip netns exec ovpn_peer0 "${OVPN_CLI}" set_peer tun0 \ + "${p}" 60 120 + ovpn_cmd_ok "set peer${p} timeout for peer ${p}" \ + ip netns exec "${peer_ns}" "${OVPN_CLI}" set_peer \ + tun"${p}" $((p + OVPN_ID_OFFSET)) 60 120 + done +} + +ovpn_mark_run_baseline_traffic() { + local p + + for p in $(seq 1 3); do + ovpn_cmd_ok "send baseline traffic to peer ${p}" \ + ip netns exec ovpn_peer0 ping -qfc 500 -w 3 \ + 5.5.5.$((p + 1)) + done +} + +ovpn_mark_add_drop_rule() { + ovpn_log "Adding an nftables drop rule based on mark value ${MARK}" + + ovpn_cmd_ok "flush nft ruleset" ip netns exec ovpn_peer0 nft flush \ + ruleset + ovpn_cmd_ok "create nft filter table" ip netns exec ovpn_peer0 nft \ + "add table inet filter" + ovpn_cmd_ok "create nft filter output chain" \ + ip netns exec ovpn_peer0 nft "add chain inet filter output { \ + type filter hook output priority 0; policy accept; }" + ovpn_cmd_ok "add nft drop rule for mark ${MARK}" \ + ip netns exec ovpn_peer0 nft add rule inet filter output \ + meta mark == "${MARK}" \ + counter drop + + MARK_DROP_COUNTER=$(ip netns exec ovpn_peer0 nft list chain inet \ + filter output | sed -n 's/.*packets \([0-9]*\).*/\1/p') + if [ -z "${MARK_DROP_COUNTER}" ]; then + printf '%s\n' "unable to read nft drop counter" + return 1 + fi +} + +ovpn_mark_verify_drop_traffic() { + local p + local ping_output + local lost_packets + local total_count + + for p in $(seq 1 3); do + if ping_output=$(ip netns exec ovpn_peer0 ping -qfc 500 -w 1 \ + 5.5.5.$((p + 1)) 2>&1); then + printf '%s\n' "expected ping to peer ${p} to fail \ + after nft drop rule" + return 1 + fi + ovpn_log "${ping_output}" + lost_packets=$(echo "${ping_output}" | \ + awk '/packets transmitted/ { print $1 }') + if [ -z "${lost_packets}" ]; then + printf '%s\n' "unable to parse lost packets for peer \ + ${p}" + return 1 + fi + MARK_DROP_COUNTER=$((MARK_DROP_COUNTER + lost_packets)) + done + + total_count=$(ip netns exec ovpn_peer0 nft list chain inet filter \ + output | sed -n 's/.*packets \([0-9]*\).*/\1/p') + if [ -z "${total_count}" ]; then + printf '%s\n' "unable to read final nft drop counter" + return 1 + fi + if [ "${MARK_DROP_COUNTER}" -ne "${total_count}" ]; then + printf '%s\n' "expected ${MARK_DROP_COUNTER} drops, got \ + ${total_count}" + return 1 + fi +} + +ovpn_mark_remove_drop_rule() { + ovpn_log "Removing the drop rule" + + ovpn_cmd_ok "flush nft ruleset" ip netns exec ovpn_peer0 nft flush \ + ruleset +} + +ovpn_mark_verify_traffic_recovery() { + local p + + sleep 1 + for p in $(seq 1 3); do + ovpn_cmd_ok "send recovery traffic to peer ${p}" \ + ip netns exec ovpn_peer0 ping -qfc 500 -w 3 \ + 5.5.5.$((p + 1)) + done +} + +trap ovpn_test_exit EXIT +trap ovpn_stage_err ERR + +ktap_print_header +ktap_set_plan 6 + +ovpn_cleanup modprobe -q ovpn || true -for p in $(seq 0 "${NUM_PEERS}"); do - create_ns "${p}" -done - -for p in $(seq 0 3); do - setup_ns "${p}" 5.5.5.$((p + 1))/24 -done - -# add peer0 with mark -ip netns exec peer0 "${OVPN_CLI}" new_multi_peer tun0 1 ASYMM \ - "${UDP_PEERS_FILE}" \ - ${MARK} -for p in $(seq 1 3); do - ip netns exec peer0 "${OVPN_CLI}" new_key tun0 "${p}" 1 0 "${ALG}" 0 \ - data64.key -done - -for p in $(seq 1 3); do - add_peer "${p}" -done - -for p in $(seq 1 3); do - ip netns exec peer0 "${OVPN_CLI}" set_peer tun0 "${p}" 60 120 - ip netns exec peer"${p}" "${OVPN_CLI}" set_peer tun"${p}" \ - $((p + 9)) 60 120 -done - -sleep 1 - -for p in $(seq 1 3); do - ip netns exec peer0 ping -qfc 500 -w 3 5.5.5.$((p + 1)) -done - -echo "Adding an nftables drop rule based on mark value ${MARK}" -ip netns exec peer0 nft flush ruleset -ip netns exec peer0 nft 'add table inet filter' -ip netns exec peer0 nft 'add chain inet filter output { - type filter hook output priority 0; - policy accept; -}' -ip netns exec peer0 nft add rule inet filter output \ - meta mark == ${MARK} \ - counter drop - -DROP_COUNTER=$(ip netns exec peer0 nft list chain inet filter output \ - | sed -n 's/.*packets \([0-9]*\).*/\1/p') -sleep 1 - -# ping should fail -for p in $(seq 1 3); do - PING_OUTPUT=$(ip netns exec peer0 ping \ - -qfc 500 -w 1 5.5.5.$((p + 1)) 2>&1) && exit 1 - echo "${PING_OUTPUT}" - LOST_PACKETS=$(echo "$PING_OUTPUT" \ - | awk '/packets transmitted/ { print $1 }') - # increment the drop counter by the amount of lost packets - DROP_COUNTER=$((DROP_COUNTER + LOST_PACKETS)) -done - -# check if the final nft counter matches our counter -TOTAL_COUNT=$(ip netns exec peer0 nft list chain inet filter output \ - | sed -n 's/.*packets \([0-9]*\).*/\1/p') -if [ "${DROP_COUNTER}" -ne "${TOTAL_COUNT}" ]; then - echo "Expected ${TOTAL_COUNT} drops, got ${DROP_COUNTER}" - exit 1 -fi - -echo "Removing the drop rule" -ip netns exec peer0 nft flush ruleset -sleep 1 - -for p in $(seq 1 3); do - ip netns exec peer0 ping -qfc 500 -w 3 5.5.5.$((p + 1)) -done - -cleanup - -modprobe -r ovpn || true +ovpn_run_stage "setup marked network topology" ovpn_mark_prepare_network +ovpn_run_stage "run baseline traffic" ovpn_mark_run_baseline_traffic +ovpn_run_stage "install nft mark drop rule" ovpn_mark_add_drop_rule +ovpn_run_stage "drop marked traffic and count packets" \ + ovpn_mark_verify_drop_traffic +ovpn_run_stage "remove nft drop rule" ovpn_mark_remove_drop_rule +ovpn_run_stage "traffic recovers after drop removal" \ + ovpn_mark_verify_traffic_recovery + +ovpn_test_finished=1 +ktap_finished diff --git a/tools/testing/selftests/net/ovpn/test-symmetric-id-float.sh b/tools/testing/selftests/net/ovpn/test-symmetric-id-float.sh index b3711a81b463..75296fe72c39 100755 --- a/tools/testing/selftests/net/ovpn/test-symmetric-id-float.sh +++ b/tools/testing/selftests/net/ovpn/test-symmetric-id-float.sh @@ -5,7 +5,7 @@ # Author: Ralf Lici <ralf@mandelbit.com> # Antonio Quartulli <antonio@openvpn.net> -SYMMETRIC_ID="1" -FLOAT="1" +OVPN_SYMMETRIC_ID="1" +OVPN_FLOAT="1" source test.sh diff --git a/tools/testing/selftests/net/ovpn/test-symmetric-id-tcp.sh b/tools/testing/selftests/net/ovpn/test-symmetric-id-tcp.sh index 188cafb67b2f..680a465c49d2 100755 --- a/tools/testing/selftests/net/ovpn/test-symmetric-id-tcp.sh +++ b/tools/testing/selftests/net/ovpn/test-symmetric-id-tcp.sh @@ -5,7 +5,7 @@ # Author: Ralf Lici <ralf@mandelbit.com> # Antonio Quartulli <antonio@openvpn.net> -PROTO="TCP" -SYMMETRIC_ID=1 +OVPN_PROTO="TCP" +OVPN_SYMMETRIC_ID=1 source test.sh diff --git a/tools/testing/selftests/net/ovpn/test-symmetric-id.sh b/tools/testing/selftests/net/ovpn/test-symmetric-id.sh index 35b119c72e4f..a2e2808959d9 100755 --- a/tools/testing/selftests/net/ovpn/test-symmetric-id.sh +++ b/tools/testing/selftests/net/ovpn/test-symmetric-id.sh @@ -5,6 +5,6 @@ # Author: Ralf Lici <ralf@mandelbit.com> # Antonio Quartulli <antonio@openvpn.net> -SYMMETRIC_ID="1" +OVPN_SYMMETRIC_ID="1" source test.sh diff --git a/tools/testing/selftests/net/ovpn/test-tcp.sh b/tools/testing/selftests/net/ovpn/test-tcp.sh index ba3f1f315a34..27cc6e7b98bc 100755 --- a/tools/testing/selftests/net/ovpn/test-tcp.sh +++ b/tools/testing/selftests/net/ovpn/test-tcp.sh @@ -4,6 +4,6 @@ # # Author: Antonio Quartulli <antonio@openvpn.net> -PROTO="TCP" +OVPN_PROTO="TCP" source test.sh diff --git a/tools/testing/selftests/net/ovpn/test.sh b/tools/testing/selftests/net/ovpn/test.sh index b60e94a4094e..b50dbe45a4d0 100755 --- a/tools/testing/selftests/net/ovpn/test.sh +++ b/tools/testing/selftests/net/ovpn/test.sh @@ -5,161 +5,316 @@ # Author: Antonio Quartulli <antonio@openvpn.net> #set -x -set -e +set -eE source ./common.sh -cleanup +ovpn_test_finished=0 -modprobe -q ovpn || true +ovpn_test_exit() { + ovpn_cleanup + modprobe -r ovpn || true + + if [ "${ovpn_test_finished}" -eq 0 ]; then + ktap_print_totals + fi +} + +ovpn_prepare_network() { + local p + local peer_ns + + for p in $(seq 0 ${OVPN_NUM_PEERS}); do + ovpn_cmd_ok "create namespace peer${p}" ovpn_create_ns "${p}" + done + + for p in $(seq 0 ${OVPN_NUM_PEERS}); do + ovpn_cmd_ok "start notification listener peer${p}" \ + ovpn_setup_listener "${p}" + # starting all YNL listeners back-to-back can intermittently + # stall their startup so serialize launches a bit + sleep 0.5 + done + + for p in $(seq 0 ${OVPN_NUM_PEERS}); do + ovpn_cmd_ok "configure peer${p} namespace" ovpn_setup_ns \ + "${p}" 5.5.5.$((p + 1))/24 "${MTU}" + done + + for p in $(seq 0 ${OVPN_NUM_PEERS}); do + ovpn_cmd_ok "register peer${p} in overlay" ovpn_add_peer "${p}" + done + + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "set peer0 timeout for peer ${p}" \ + ip netns exec ovpn_peer0 ${OVPN_CLI} set_peer tun0 \ + ${p} 60 120 + ovpn_cmd_ok "set peer${p} timeout for peer ${p}" \ + ip netns exec "${peer_ns}" ${OVPN_CLI} set_peer \ + tun${p} $((p + OVPN_ID_OFFSET)) 60 120 + done +} + +ovpn_run_basic_traffic() { + local p + local header1 + local header2 + local peer_ns + local raddr + local tcpdump_pid1 + local tcpdump_pid2 + local tcpdump_timeout="1.5s" + + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + # The first part of the data packet header consists of: + # - TCP only: 2 bytes for the packet length + # - 5 bits for opcode ("9" for DATA_V2) + # - 3 bits for key-id ("0" at this point) + # - 12 bytes for peer-id: + # - with asymmetric ID: "${p}" one way and "${p} + 9" the + # other way + # - with symmetric ID: "${p}" both ways + header1=$(printf "0x4800000%x" ${p}) + header2=$(printf "0x4800000%x" $((p + OVPN_ID_OFFSET))) + raddr="" + if [ "${OVPN_PROTO}" == "UDP" ]; then + raddr=$(awk "NR == ${p} {print \$3}" \ + "${OVPN_UDP_PEERS_FILE}") + fi + peer_ns="ovpn_peer${p}" + + timeout ${tcpdump_timeout} ip netns exec "${peer_ns}" \ + tcpdump --immediate-mode -p -ni veth${p} -c 1 \ + "$(ovpn_build_capture_filter "${header1}" "${raddr}")" \ + >/dev/null 2>&1 & + tcpdump_pid1=$! + timeout ${tcpdump_timeout} ip netns exec "${peer_ns}" \ + tcpdump --immediate-mode -p -ni veth${p} -c 1 \ + "$(ovpn_build_capture_filter "${header2}" "${raddr}")" \ + >/dev/null 2>&1 & + tcpdump_pid2=$! + + sleep 0.3 + ovpn_cmd_ok "send baseline traffic to peer ${p}" \ + ip netns exec ovpn_peer0 \ + ping -qfc 500 -w 3 5.5.5.$((p + 1)) + ovpn_cmd_ok "send large-payload traffic to peer ${p}" \ + ip netns exec ovpn_peer0 \ + ping -qfc 500 -s 3000 -w 3 5.5.5.$((p + 1)) + + wait "${tcpdump_pid1}" || return 1 + wait "${tcpdump_pid2}" || return 1 + done +} + +ovpn_run_lan_traffic() { + ovpn_cmd_ok "ping LAN behind peer1" \ + ip netns exec ovpn_peer0 ping -qfc 500 -w 3 "${OVPN_LAN_IP}" +} + +ovpn_run_float_mode() { + local p + local peer_ns + + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "float: remove old transport address on peer${p}" \ + ip -n "${peer_ns}" addr del 10.10.${p}.2/24 dev veth${p} + ovpn_cmd_ok "float: add new transport address on peer${p}" \ + ip -n "${peer_ns}" addr add 10.10.${p}.3/24 dev veth${p} + done + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "ping tunnel after float peer ${p}" \ + ip netns exec "${peer_ns}" ping -qfc 500 -w 3 5.5.5.1 + done +} + +ovpn_run_iperf() { + local iperf_pid + + ovpn_run_bg iperf_pid ip netns exec ovpn_peer0 iperf3 -1 -s + sleep 1 + + ovpn_cmd_ok "run iperf throughput flow" \ + ip netns exec ovpn_peer1 iperf3 -Z -t 3 -c 5.5.5.1 + wait "${iperf_pid}" || return 1 +} + +ovpn_run_key_rollover() { + local p + local peer_ns + + ovpn_log "Adding secondary key and then swap:" + + for p in $(seq 1 ${OVPN_NUM_PEERS}); do + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "add secondary key on peer0 for peer ${p}" \ + ip netns exec ovpn_peer0 ${OVPN_CLI} new_key tun0 \ + ${p} 2 1 ${OVPN_ALG} 0 data64.key + ovpn_cmd_ok "add secondary key on peer${p} for peer ${p}" \ + ip netns exec "${peer_ns}" ${OVPN_CLI} new_key tun${p} \ + $((p + OVPN_ID_OFFSET)) 2 1 ${OVPN_ALG} 1 \ + data64.key + ovpn_cmd_ok "swap keys on peer${p}" \ + ip netns exec "${peer_ns}" ${OVPN_CLI} swap_keys \ + tun${p} $((p + OVPN_ID_OFFSET)) + done +} -for p in $(seq 0 ${NUM_PEERS}); do - create_ns ${p} -done - -for p in $(seq 0 ${NUM_PEERS}); do - setup_listener ${p} -done - -for p in $(seq 0 ${NUM_PEERS}); do - setup_ns ${p} 5.5.5.$((${p} + 1))/24 ${MTU} -done - -for p in $(seq 0 ${NUM_PEERS}); do - add_peer ${p} -done - -for p in $(seq 1 ${NUM_PEERS}); do - ip netns exec peer0 ${OVPN_CLI} set_peer tun0 ${p} 60 120 - ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} \ - $((${p}+ID_OFFSET)) 60 120 -done - -sleep 1 - -TCPDUMP_TIMEOUT="1.5s" -for p in $(seq 1 ${NUM_PEERS}); do - # The first part of the data packet header consists of: - # - TCP only: 2 bytes for the packet length - # - 5 bits for opcode ("9" for DATA_V2) - # - 3 bits for key-id ("0" at this point) - # - 12 bytes for peer-id: - # - with asymmetric ID: "${p}" one way and "${p} + 9" the other way - # - with symmetric ID: "${p}" both ways - HEADER1=$(printf "0x4800000%x" ${p}) - HEADER2=$(printf "0x4800000%x" $((${p} + ID_OFFSET))) - RADDR="" - if [ "${PROTO}" == "UDP" ]; then - RADDR=$(awk "NR == ${p} {print \$3}" ${UDP_PEERS_FILE}) +ovpn_run_queries() { + ovpn_log "Querying all peers:" + + ovpn_cmd_ok "query all peers from peer0" \ + ip netns exec ovpn_peer0 ${OVPN_CLI} get_peer tun0 + ovpn_cmd_ok "query all peers from peer1" \ + ip netns exec ovpn_peer1 ${OVPN_CLI} get_peer tun1 + + ovpn_log "Querying peer 1:" + + ovpn_cmd_ok "query peer 1 from peer0" \ + ip netns exec ovpn_peer0 ${OVPN_CLI} get_peer tun0 1 +} + +ovpn_query_peer_missing() { + ovpn_log "Querying non-existent peer 20:" + + ovpn_cmd_fail "query missing peer 20 on peer0" \ + ip netns exec ovpn_peer0 ${OVPN_CLI} get_peer tun0 20 +} + +ovpn_run_peer_cleanup() { + local p + local peer_ns + + ovpn_log "Deleting peer 1:" + + ovpn_cmd_ok "delete peer1 on peer0" \ + ip netns exec ovpn_peer0 ${OVPN_CLI} del_peer tun0 1 + ovpn_cmd_ok "delete peer1 on peer1" \ + ip netns exec ovpn_peer1 ${OVPN_CLI} del_peer tun1 \ + $((1 + OVPN_ID_OFFSET)) + + ovpn_log "Querying keys:" + + for p in $(seq 2 ${OVPN_NUM_PEERS}); do + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "query peer${p} key 1" \ + ip netns exec "${peer_ns}" ${OVPN_CLI} get_key tun${p} \ + $((p + OVPN_ID_OFFSET)) 1 + ovpn_cmd_ok "query peer${p} key 2" \ + ip netns exec "${peer_ns}" ${OVPN_CLI} get_key tun${p} \ + $((p + OVPN_ID_OFFSET)) 2 + done +} + +ovpn_run_traffic_delete_peer() { + local ping_pid + + ovpn_log "Deleting peer while sending traffic:" + + ovpn_run_bg ping_pid ip netns exec ovpn_peer2 ping -qf -w 4 5.5.5.1 + sleep 2 + ovpn_cmd_ok "delete peer0 peer 2" \ + ip netns exec ovpn_peer0 ${OVPN_CLI} del_peer tun0 2 + + if [ "${OVPN_PROTO}" == "TCP" ]; then + # In TCP mode this command is expected to fail for both peers. + ovpn_cmd_mayfail "delete peer2 peer 2 (TCP non-fatal)" \ + ip netns exec ovpn_peer2 ${OVPN_CLI} del_peer tun2 \ + $((2 + OVPN_ID_OFFSET)) + else + ovpn_cmd_ok "delete peer2 peer 2" ip netns exec ovpn_peer2 \ + ${OVPN_CLI} del_peer tun2 $((2 + OVPN_ID_OFFSET)) fi - timeout ${TCPDUMP_TIMEOUT} ip netns exec peer${p} \ - tcpdump --immediate-mode -p -ni veth${p} -c 1 \ - "$(build_capture_filter "${HEADER1}" "${RADDR}")" \ - >/dev/null 2>&1 & - TCPDUMP_PID1=$! - timeout ${TCPDUMP_TIMEOUT} ip netns exec peer${p} \ - tcpdump --immediate-mode -p -ni veth${p} -c 1 \ - "$(build_capture_filter "${HEADER2}" "${RADDR}")" \ - >/dev/null 2>&1 & - TCPDUMP_PID2=$! - - sleep 0.3 - ip netns exec peer0 ping -qfc 500 -w 3 5.5.5.$((${p} + 1)) - ip netns exec peer0 ping -qfc 500 -s 3000 -w 3 5.5.5.$((${p} + 1)) - - wait ${TCPDUMP_PID1} - wait ${TCPDUMP_PID2} -done - -# ping LAN behind client 1 -ip netns exec peer0 ping -qfc 500 -w 3 ${LAN_IP} - -if [ "$FLOAT" == "1" ]; then - # make clients float.. - for p in $(seq 1 ${NUM_PEERS}); do - ip -n peer${p} addr del 10.10.${p}.2/24 dev veth${p} - ip -n peer${p} addr add 10.10.${p}.3/24 dev veth${p} + wait "${ping_pid}" || true +} + +ovpn_run_key_cleanup() { + local p + local peer_ns + + ovpn_log "Deleting keys:" + + for p in $(seq 3 ${OVPN_NUM_PEERS}); do + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "delete key 1 for peer${p}" \ + ip netns exec "${peer_ns}" ${OVPN_CLI} del_key tun${p} \ + $((p + OVPN_ID_OFFSET)) 1 + ovpn_cmd_ok "delete key 2 for peer${p}" \ + ip netns exec "${peer_ns}" ${OVPN_CLI} del_key tun${p} \ + $((p + OVPN_ID_OFFSET)) 2 + done +} + +ovpn_run_timeouts() { + local p + local peer_ns + + ovpn_log "Setting timeout to 3s MP:" + + for p in $(seq 3 ${OVPN_NUM_PEERS}); do + # Non-fatal: this may fail in some protocol modes. + ovpn_cmd_mayfail "set peer0 timeout for peer ${p} (non-fatal)" \ + ip netns exec ovpn_peer0 ${OVPN_CLI} set_peer tun0 \ + ${p} 3 3 + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "disable timeout on peer${p} while peer0 adjusts \ + state" ip netns exec "${peer_ns}" ${OVPN_CLI} set_peer \ + tun${p} $((p + OVPN_ID_OFFSET)) 0 0 + done + # wait for peers to timeout + sleep 5 + + ovpn_log "Setting timeout to 3s P2P:" + + for p in $(seq 3 ${OVPN_NUM_PEERS}); do + peer_ns="ovpn_peer${p}" + ovpn_cmd_ok "set peer${p} P2P timeout" \ + ip netns exec "${peer_ns}" ${OVPN_CLI} set_peer \ + tun${p} $((p + OVPN_ID_OFFSET)) 3 3 done - for p in $(seq 1 ${NUM_PEERS}); do - ip netns exec peer${p} ping -qfc 500 -w 3 5.5.5.1 + sleep 5 +} + +ovpn_run_notifications() { + local p + + for p in $(seq 0 ${OVPN_NUM_PEERS}); do + ovpn_cmd_ok "validate listener output for peer ${p}" \ + ovpn_compare_ntfs "${p}" done +} + +trap ovpn_test_exit EXIT +trap ovpn_stage_err ERR + +ktap_print_header +if [ "${OVPN_FLOAT}" == "1" ]; then + ktap_set_plan 13 +else + ktap_set_plan 12 fi -ip netns exec peer0 iperf3 -1 -s & -sleep 1 -ip netns exec peer1 iperf3 -Z -t 3 -c 5.5.5.1 - -echo "Adding secondary key and then swap:" -for p in $(seq 1 ${NUM_PEERS}); do - ip netns exec peer0 ${OVPN_CLI} new_key tun0 ${p} 2 1 ${ALG} 0 \ - data64.key - ip netns exec peer${p} ${OVPN_CLI} new_key tun${p} \ - $((${p} + ID_OFFSET)) 2 1 ${ALG} 1 data64.key - ip netns exec peer${p} ${OVPN_CLI} swap_keys tun${p} \ - $((${p} + ID_OFFSET)) -done - -sleep 1 - -echo "Querying all peers:" -ip netns exec peer0 ${OVPN_CLI} get_peer tun0 -ip netns exec peer1 ${OVPN_CLI} get_peer tun1 - -echo "Querying peer 1:" -ip netns exec peer0 ${OVPN_CLI} get_peer tun0 1 - -echo "Querying non-existent peer 20:" -ip netns exec peer0 ${OVPN_CLI} get_peer tun0 20 || true - -echo "Deleting peer 1:" -ip netns exec peer0 ${OVPN_CLI} del_peer tun0 1 -ip netns exec peer1 ${OVPN_CLI} del_peer tun1 $((1 + ID_OFFSET)) - -echo "Querying keys:" -for p in $(seq 2 ${NUM_PEERS}); do - ip netns exec peer${p} ${OVPN_CLI} get_key tun${p} \ - $((${p} + ID_OFFSET)) 1 - ip netns exec peer${p} ${OVPN_CLI} get_key tun${p} \ - $((${p} + ID_OFFSET)) 2 -done - -echo "Deleting peer while sending traffic:" -(ip netns exec peer2 ping -qf -w 4 5.5.5.1)& -sleep 2 -ip netns exec peer0 ${OVPN_CLI} del_peer tun0 2 -# following command fails in TCP mode -# (both ends get conn reset when one peer disconnects) -ip netns exec peer2 ${OVPN_CLI} del_peer tun2 $((2 + ID_OFFSET)) || true - -echo "Deleting keys:" -for p in $(seq 3 ${NUM_PEERS}); do - ip netns exec peer${p} ${OVPN_CLI} del_key tun${p} \ - $((${p} + ID_OFFSET)) 1 - ip netns exec peer${p} ${OVPN_CLI} del_key tun${p} \ - $((${p} + ID_OFFSET)) 2 -done - -echo "Setting timeout to 3s MP:" -for p in $(seq 3 ${NUM_PEERS}); do - ip netns exec peer0 ${OVPN_CLI} set_peer tun0 ${p} 3 3 || true - ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} \ - $((${p} + ID_OFFSET)) 0 0 -done -# wait for peers to timeout -sleep 5 - -echo "Setting timeout to 3s P2P:" -for p in $(seq 3 ${NUM_PEERS}); do - ip netns exec peer${p} ${OVPN_CLI} set_peer tun${p} \ - $((${p} + ID_OFFSET)) 3 3 -done -sleep 5 - -for p in $(seq 0 ${NUM_PEERS}); do - compare_ntfs ${p} -done - -cleanup - -modprobe -r ovpn || true +ovpn_cleanup +modprobe -q ovpn || true + +ovpn_run_stage "setup network topology" ovpn_prepare_network +ovpn_run_stage "run baseline data traffic" ovpn_run_basic_traffic +ovpn_run_stage "run LAN traffic behind peer1" ovpn_run_lan_traffic +[ "${OVPN_FLOAT}" == "1" ] && ovpn_run_stage "run floating peer checks" \ + ovpn_run_float_mode +ovpn_run_stage "run iperf throughput" ovpn_run_iperf +ovpn_run_stage "run key rollout" ovpn_run_key_rollover +ovpn_run_stage "query peers" ovpn_run_queries +ovpn_run_stage "query missing peer fails" ovpn_query_peer_missing +ovpn_run_stage "peer lifecycle and key queries" ovpn_run_peer_cleanup +ovpn_run_stage "delete peer while traffic" ovpn_run_traffic_delete_peer +ovpn_run_stage "delete stale keys" ovpn_run_key_cleanup +ovpn_run_stage "check timeout behavior" ovpn_run_timeouts +ovpn_run_stage "validate notification output" ovpn_run_notifications + +ovpn_test_finished=1 +ktap_finished diff --git a/tools/testing/selftests/net/packetdrill/tcp_rfc5961_ack-out-of-window.pkt b/tools/testing/selftests/net/packetdrill/tcp_rfc5961_ack-out-of-window.pkt new file mode 100644 index 000000000000..2776b8728085 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rfc5961_ack-out-of-window.pkt @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// RFC 5961 Section 5.2 / RFC 793 Section 3.9: an incoming segment's +// ACK value must lie in [SND.UNA - MAX.SND.WND, SND.NXT]; otherwise +// the receiver MUST discard the segment and send a challenge ACK +// back. Exercise both edges of that window in a single connection. + +`./defaults.sh +sysctl -q net.ipv4.tcp_invalid_ratelimit=0 +` + + 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 + +0 bind(3, ..., ...) = 0 + +0 listen(3, 1) = 0 + +// Three-way handshake. Peer advertises rwnd = 1000 (no wscale), so +// MAX.SND.WND is tracked as 1000. + +0 < S 0:0(0) win 1000 <mss 1000,sackOK,nop,nop,nop,wscale 0> + +0 > S. 0:0(0) ack 1 <...> ++.1 < . 1:1(0) ack 1 win 1000 + +0 accept(3, ..., ...) = 4 + +// ---- Upper edge: SEG.ACK > SND.NXT -------------------------------- +// Server has sent nothing yet, so SND.UNA = SND.NXT = 1. +// Peer sends a pure ACK with SEG.ACK = 2, beyond SND.NXT. + +0 < . 1:1(0) ack 2 win 1000 +// Expect a challenge ACK: <SEQ = SND.NXT = 1, ACK = RCV.NXT = 1>. + +0 > . 1:1(0) ack 1 + +// Advance SND.UNA past MAX.SND.WND so that the lower edge becomes +// reachable. Issue two 1-MSS writes so each skb is exactly one MSS +// and PSH is set by tcp_push() at the end of each sendmsg, keeping +// the setup independent of the TSO / tcp_fragment split path. + +0 write(4, ..., 1000) = 1000 + +0 > P. 1:1001(1000) ack 1 ++.01 < . 1:1(0) ack 1001 win 1000 + +0 write(4, ..., 1000) = 1000 + +0 > P. 1001:2001(1000) ack 1 ++.01 < . 1:1(0) ack 2001 win 1000 +// Now SND.UNA = SND.NXT = 2001, MAX.SND.WND = 1000, bytes_acked = 2000. + +// ---- Lower edge: SEG.ACK < SND.UNA - MAX.SND.WND ------------------ +// SND.UNA - MAX.SND.WND = 2001 - 1000 = 1001, so SEG.ACK = 1000 falls +// below the acceptable range. + +0 < . 1:1(0) ack 1000 win 1000 +// Expect a challenge ACK: <SEQ = SND.NXT = 2001, ACK = RCV.NXT = 1>. + +0 > . 2001:2001(0) ack 1 diff --git a/tools/testing/selftests/net/packetdrill/tcp_ts_recent_invalid_ack.pkt b/tools/testing/selftests/net/packetdrill/tcp_ts_recent_invalid_ack.pkt index 174ce9a1bfc0..ee6baf7c36cf 100644 --- a/tools/testing/selftests/net/packetdrill/tcp_ts_recent_invalid_ack.pkt +++ b/tools/testing/selftests/net/packetdrill/tcp_ts_recent_invalid_ack.pkt @@ -19,7 +19,9 @@ // bad packet with high tsval (its ACK sequence is above our sndnxt) +0 < F. 1:1(0) ack 9999 win 20000 <nop,nop,TS val 200000 ecr 100> - +// Challenge ACK for SEG.ACK > SND.NXT (RFC 5961 5.2 / RFC 793 3.9). +// ecr=200 (not 200000) proves ts_recent was not updated from the bad packet. + +0 > . 1:1(0) ack 1 <nop,nop,TS val 200 ecr 200> +0 < . 1:1001(1000) ack 1 win 20000 <nop,nop,TS val 201 ecr 100> +0 > . 1:1(0) ack 1001 <nop,nop,TS val 200 ecr 201> diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 5a5ff88321d5..c499953d4885 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -23,6 +23,7 @@ ALL_TESTS=" kci_test_encap kci_test_macsec kci_test_macsec_vlan + kci_test_team_bridge_macvlan kci_test_ipsec kci_test_ipsec_offload kci_test_fdb_get @@ -636,6 +637,49 @@ kci_test_macsec_vlan() end_test "PASS: macsec_vlan" } +# Test ndo_change_rx_flags call from dev_uc_add under addr_list_lock spinlock. +# When we are flipping the promisc, make sure it runs on the work queue. +# +# https://lore.kernel.org/netdev/20260214033859.43857-1-jiayuan.chen@linux.dev/ +# With (more conventional) macvlan instead of macsec. +# macvlan -> bridge -> team -> dummy +kci_test_team_bridge_macvlan() +{ + local vlan="test_macv1" + local bridge="test_br1" + local team="test_team1" + local dummy="test_dummy1" + local ret=0 + + run_cmd ip link add $team type team + if [ $ret -ne 0 ]; then + end_test "SKIP: team_bridge_macvlan: can't add team interface" + return $ksft_skip + fi + + run_cmd ip link add $dummy type dummy + run_cmd ip link set $dummy master $team + run_cmd ip link set $team up + run_cmd ip link add $bridge type bridge vlan_filtering 1 + run_cmd ip link set $bridge up + run_cmd ip link set $team master $bridge + run_cmd ip link add link $bridge name $vlan \ + address 00:aa:bb:cc:dd:ee type macvlan mode bridge + run_cmd ip link set $vlan up + + run_cmd ip link del $vlan + run_cmd ip link del $bridge + run_cmd ip link del $team + run_cmd ip link del $dummy + + if [ $ret -ne 0 ]; then + end_test "FAIL: team_bridge_macvlan" + return 1 + fi + + end_test "PASS: team_bridge_macvlan" +} + #------------------------------------------------------------------- # Example commands # ip x s add proto esp src 14.0.0.52 dst 14.0.0.70 \ diff --git a/tools/testing/selftests/net/tcp_ao/config b/tools/testing/selftests/net/tcp_ao/config index 971cb6fa2d63..f22148512365 100644 --- a/tools/testing/selftests/net/tcp_ao/config +++ b/tools/testing/selftests/net/tcp_ao/config @@ -1,3 +1,4 @@ +CONFIG_CRYPTO_CMAC=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_RMD160=y CONFIG_CRYPTO_SHA1=y diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c index 1fe1338c79cd..fe316b02a590 100644 --- a/tools/testing/vsock/util.c +++ b/tools/testing/vsock/util.c @@ -381,8 +381,14 @@ void send_buf(int fd, const void *buf, size_t len, int flags, } } +#define RECV_PEEK_RETRY_USEC (10 * 1000) + /* Receive bytes in a buffer and check the return value. * + * When MSG_PEEK is set, recv() is retried until it returns at least + * expected_ret bytes. The function returns on error, EOF, or timeout + * as usual. + * * expected_ret: * <0 Negative errno (for testing errors) * 0 End-of-file @@ -403,6 +409,15 @@ void recv_buf(int fd, void *buf, size_t len, int flags, ssize_t expected_ret) if (ret <= 0) break; + if (flags & MSG_PEEK) { + if (ret >= expected_ret) { + nread = ret; + break; + } + timeout_usleep(RECV_PEEK_RETRY_USEC); + continue; + } + nread += ret; } while (nread < len); timeout_end(); diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index 5bd20ccd9335..76be0e4a7f0e 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -346,6 +346,38 @@ static void test_stream_msg_peek_server(const struct test_opts *opts) return test_msg_peek_server(opts, false); } +static void test_stream_peek_after_recv_server(const struct test_opts *opts) +{ + unsigned char buf_normal[MSG_PEEK_BUF_LEN]; + unsigned char buf_peek[MSG_PEEK_BUF_LEN]; + int fd; + + fd = vsock_stream_accept(VMADDR_CID_ANY, opts->peer_port, NULL); + if (fd < 0) { + perror("accept"); + exit(EXIT_FAILURE); + } + + control_writeln("SRVREADY"); + + /* Partial recv to advance offset within the skb */ + recv_buf(fd, buf_normal, 1, 0, 1); + + /* Peek with a buffer larger than the remaining data */ + recv_buf(fd, buf_peek, sizeof(buf_peek), MSG_PEEK, sizeof(buf_peek) - 1); + + /* Consume the remaining data */ + recv_buf(fd, buf_normal, sizeof(buf_normal) - 1, 0, sizeof(buf_normal) - 1); + + /* Compare full peek and normal read. */ + if (memcmp(buf_peek, buf_normal, sizeof(buf_peek) - 1)) { + fprintf(stderr, "Full peek data mismatch\n"); + exit(EXIT_FAILURE); + } + + close(fd); +} + #define SOCK_BUF_SIZE (2 * 1024 * 1024) #define SOCK_BUF_SIZE_SMALL (64 * 1024) #define MAX_MSG_PAGES 4 @@ -1500,18 +1532,7 @@ static void test_stream_credit_update_test(const struct test_opts *opts, } /* Wait until there will be 128KB of data in rx queue. */ - while (1) { - ssize_t res; - - res = recv(fd, buf, buf_size, MSG_PEEK); - if (res == buf_size) - break; - - if (res <= 0) { - fprintf(stderr, "unexpected 'recv()' return: %zi\n", res); - exit(EXIT_FAILURE); - } - } + recv_buf(fd, buf, buf_size, MSG_PEEK, buf_size); /* There is 128KB of data in the socket's rx queue, dequeue first * 64KB, credit update is sent if 'low_rx_bytes_test' == true. @@ -2520,6 +2541,11 @@ static struct test_case test_cases[] = { .run_client = test_stream_tx_credit_bounds_client, .run_server = test_stream_tx_credit_bounds_server, }, + { + .name = "SOCK_STREAM MSG_PEEK after partial recv", + .run_client = test_stream_msg_peek_client, + .run_server = test_stream_peek_after_recv_server, + }, {}, }; |
