| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
ASoC: sti: use managed regmap_field allocations
The regmap_field objects allocated at player init are never freed and
may leak resources if the driver is removed.
Switch to devm_regmap_field_alloc() to automatically limit the lifetime
of the allocations the lifetime of the device. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/komeda: fix integer overflow in AFBC framebuffer size check
The AFBC framebuffer size validation calculates the minimum required
buffer size by adding the AFBC payload size to the framebuffer offset.
This addition is performed without checking for integer overflow.
If the addition oveflows, the size check may incorrectly succed and
allow userspace to provide an undersized drm_gem_object, potentially
leading to out-of-bounds memory access.
Add usage of check_add_overflow() to safely compute the minimum
required size and reject the framebuffer if an overflow is detected.
This makes the AFBC size validation more robust against malformed.
Found by Linux Verification Center (linuxtesting.org) with SVACE. |
| In the Linux kernel, the following vulnerability has been resolved:
net/sched: act_mirred: fix wrong device for mac_header_xmit check in tcf_blockcast_redir
In tcf_blockcast_redir(), when iterating block ports to redirect
packets to multiple devices, the mac_header_xmit flag is queried
from the wrong device. The loop sends to dev_prev but queries
dev_is_mac_header_xmit(dev) — which is the NEXT device in the
iteration, not the one being sent to.
This causes tcf_mirred_to_dev() to make incorrect decisions about
whether to push or pull the MAC header. When the block contains
mixed device types (e.g., an ethernet veth and a tunnel device),
intermediate devices get the wrong mac_header_xmit flag, leading to
skb header corruption. In the worst case, skb_push_rcsum with an
incorrect mac_len can exhaust headroom and panic.
The last device in the loop is handled correctly (line 365-366 uses
dev_is_mac_header_xmit(dev_prev)), confirming this is a copy-paste
oversight for the intermediate devices.
Fix by using dev_prev instead of dev for the mac_header_xmit query,
consistent with the device actually being sent to. |
| In the Linux kernel, the following vulnerability has been resolved:
ipv6: fix possible UAF in icmpv6_rcv()
Caching saddr and daddr before pskb_pull() is problematic
since skb->head can change.
Remove these temporary variables:
- We only access &ipv6_hdr(skb)->saddr and &ipv6_hdr(skb)->daddr
when net_dbg_ratelimited() is called in the slow path.
- Avoid potential future misuse after pskb_pull() call. |
| In the Linux kernel, the following vulnerability has been resolved:
dm log: fix out-of-bounds write due to region_count overflow
The local variable region_count in create_log_context() is declared as
unsigned int (32-bit), but dm_sector_div_up() returns sector_t (64-bit).
When a device-mapper target has a sufficiently large ti->len with a small
region_size, the division result can exceed UINT_MAX. The truncated
value is then used to calculate bitset_size, causing clean_bits,
sync_bits, and recovering_bits to be allocated far smaller than needed
for the actual number of regions.
Subsequent log operations (log_set_bit, log_clear_bit, log_test_bit) use
region indices derived from the full untruncated region space, causing
out-of-bounds writes to kernel heap memory allocated by vmalloc.
This can be reproduced by creating a mirror target whose region_count
overflows 32 bits:
dmsetup create bigzero --table '0 8589934594 zero'
dmsetup create mymirror --table '0 8589934594 mirror \
core 2 2 nosync 2 /dev/mapper/bigzero 0 \
/dev/mapper/bigzero 0'
The status output confirms the truncation (sync_count=1 instead of
4294967297, because 0x100000001 was truncated to 1):
$ dmsetup status mymirror
0 8589934594 mirror 2 254:1 254:1 1/4294967297 ...
This leads to a kernel crash in core_in_sync:
BUG: scheduling while atomic: (udev-worker)/9150/0x00000000
RIP: 0010:core_in_sync+0x14/0x30 [dm_log]
CR2: 0000000000000008
Fixing recursive fault but reboot is needed!
Fix by widening the local region_count to sector_t and adding an
explicit overflow check before the value is assigned to lc->region_count. |
| In the Linux kernel, the following vulnerability has been resolved:
efi/capsule-loader: fix incorrect sizeof in phys array reallocation
The krealloc() call for cap_info->phys in __efi_capsule_setup_info() uses
sizeof(phys_addr_t *) instead of sizeof(phys_addr_t), which might be
causing an undersized allocation.
The allocation is also inconsistent with the initial array allocation in
efi_capsule_open() that allocates one entry with sizeof(phys_addr_t),
and the efi_capsule_write() function that stores phys_addr_t values (not
pointers) via page_to_phys().
On 64-bit systems where sizeof(phys_addr_t) == sizeof(phys_addr_t *), this
goes unnoticed. On 32-bit systems with PAE where phys_addr_t is 64-bit but
pointers are 32-bit, this allocates half the required space, which might
lead to a heap buffer overflow when storing physical addresses.
This is similar to the bug fixed in commit fccfa646ef36 ("efi/capsule-loader:
fix incorrect allocation size") which fixed the same issue at the initial
allocation site. |
| In the Linux kernel, the following vulnerability has been resolved:
gfs2: prevent NULL pointer dereference during unmount
When flushing out outstanding glock work during an unmount, gfs2_log_flush()
can be called when sdp->sd_jdesc has already been deallocated and sdp->sd_jdesc
is NULL. Commit 35264909e9d1 ("gfs2: Fix NULL pointer dereference in
gfs2_log_flush") added a check for that to gfs2_log_flush() itself, but it
missed the sdp->sd_jdesc dereference in gfs2_log_release(). Fix that. |
| In the Linux kernel, the following vulnerability has been resolved:
quota: Fix race of dquot_scan_active() with quota deactivation
dquot_scan_active() can race with quota deactivation in
quota_release_workfn() like:
CPU0 (quota_release_workfn) CPU1 (dquot_scan_active)
============================== ==============================
spin_lock(&dq_list_lock);
list_replace_init(
&releasing_dquots, &rls_head);
/* dquot X on rls_head,
dq_count == 0,
DQ_ACTIVE_B still set */
spin_unlock(&dq_list_lock);
synchronize_srcu(&dquot_srcu);
spin_lock(&dq_list_lock);
list_for_each_entry(dquot,
&inuse_list, dq_inuse) {
/* finds dquot X */
dquot_active(X) -> true
atomic_inc(&X->dq_count);
}
spin_unlock(&dq_list_lock);
spin_lock(&dq_list_lock);
dquot = list_first_entry(&rls_head);
WARN_ON_ONCE(atomic_read(&dquot->dq_count));
The problem is not only a cosmetic one as under memory pressure the
caller of dquot_scan_active() can end up working on freed dquot.
Fix the problem by making sure the dquot is removed from releasing list
when we acquire a reference to it. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nfnetlink_osf: fix out-of-bounds read on option matching
In nf_osf_match(), the nf_osf_hdr_ctx structure is initialized once
and passed by reference to nf_osf_match_one() for each fingerprint
checked. During TCP option parsing, nf_osf_match_one() advances the
shared ctx->optp pointer.
If a fingerprint perfectly matches, the function returns early without
restoring ctx->optp to its initial state. If the user has configured
NF_OSF_LOGLEVEL_ALL, the loop continues to the next fingerprint.
However, because ctx->optp was not restored, the next call to
nf_osf_match_one() starts parsing from the end of the options buffer.
This causes subsequent matches to read garbage data and fail
immediately, making it impossible to log more than one match or logging
incorrect matches.
Instead of using a shared ctx->optp pointer, pass the context as a
constant pointer and use a local pointer (optp) for TCP option
traversal. This makes nf_osf_match_one() strictly stateless from the
caller's perspective, ensuring every fingerprint check starts at the
correct option offset. |
| In the Linux kernel, the following vulnerability has been resolved:
erofs: unify lcn as u64 for 32-bit platforms
As sashiko reported [1], `lcn` was typed as `unsigned long` (or
`unsigned int` sometimes), which is only 32 bits wide on 32-bit
platforms, which causes `(lcn << lclusterbits)` to be truncated
at 4 GiB.
In order to consolidate the logic, just use `u64` consistently
around the codebase.
[1] https://sashiko.dev/r/20260420034612.1899973-1-hsiangkao%40linux.alibaba.com |
| In the Linux kernel, the following vulnerability has been resolved:
fs/ntfs3: fix missing run load for vcn0 in attr_data_get_block_locked()
When a compressed or sparse attribute has its clusters frame-aligned,
vcn is rounded down to the frame start using cmask, which can result
in vcn != vcn0. In this case, vcn and vcn0 may reside in different
attribute segments.
The code already handles the case where vcn is in a different segment
by loading its runs before allocation. However, it fails to load runs
for vcn0 when vcn0 resides in a different segment than vcn. This causes
run_lookup_entry() to return SPARSE_LCN for vcn0 since its segment was
never loaded into the in-memory run list, triggering the WARN_ON(1).
Fix this by adding a missing check for vcn0 after the existing vcn
segment check. If vcn0 falls outside the current segment range
[svcn, evcn1), find and load the attribute segment containing vcn0
before performing the run lookup.
The following scenario triggers the bug:
attr_data_get_block_locked()
vcn = vcn0 & cmask <- vcn != vcn0 after frame alignment
load runs for vcn segment <- vcn0 segment not loaded!
attr_allocate_clusters() <- allocation succeeds
run_lookup_entry(vcn0) <- vcn0 not in run -> SPARSE_LCN
WARN_ON(1) <- bug fires here! |
| In the Linux kernel, the following vulnerability has been resolved:
bpf, sockmap: Take state lock for af_unix iter
When a BPF iterator program updates a sockmap, there is a race condition in
unix_stream_bpf_update_proto() where the `peer` pointer can become stale[1]
during a state transition TCP_ESTABLISHED -> TCP_CLOSE.
CPU0 bpf CPU1 close
-------- ----------
// unix_stream_bpf_update_proto()
sk_pair = unix_peer(sk)
if (unlikely(!sk_pair))
return -EINVAL;
// unix_release_sock()
skpair = unix_peer(sk);
unix_peer(sk) = NULL;
sock_put(skpair)
sock_hold(sk_pair) // UaF
More practically, this fix guarantees that the iterator program is
consistently provided with a unix socket that remains stable during
iterator execution.
[1]:
BUG: KASAN: slab-use-after-free in unix_stream_bpf_update_proto+0x155/0x490
Write of size 4 at addr ffff8881178c9a00 by task test_progs/2231
Call Trace:
dump_stack_lvl+0x5d/0x80
print_report+0x170/0x4f3
kasan_report+0xe4/0x1c0
kasan_check_range+0x125/0x200
unix_stream_bpf_update_proto+0x155/0x490
sock_map_link+0x71c/0xec0
sock_map_update_common+0xbc/0x600
sock_map_update_elem+0x19a/0x1f0
bpf_prog_bbbf56096cdd4f01_selective_dump_unix+0x20c/0x217
bpf_iter_run_prog+0x21e/0xae0
bpf_iter_unix_seq_show+0x1e0/0x2a0
bpf_seq_read+0x42c/0x10d0
vfs_read+0x171/0xb20
ksys_read+0xff/0x200
do_syscall_64+0xf7/0x5e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Allocated by task 2236:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
__kasan_slab_alloc+0x63/0x80
kmem_cache_alloc_noprof+0x1d5/0x680
sk_prot_alloc+0x59/0x210
sk_alloc+0x34/0x470
unix_create1+0x86/0x8a0
unix_stream_connect+0x318/0x15b0
__sys_connect+0xfd/0x130
__x64_sys_connect+0x72/0xd0
do_syscall_64+0xf7/0x5e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 2236:
kasan_save_stack+0x30/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x70
__kasan_slab_free+0x47/0x70
kmem_cache_free+0x11c/0x590
__sk_destruct+0x432/0x6e0
unix_release_sock+0x9b3/0xf60
unix_release+0x8a/0xf0
__sock_release+0xb0/0x270
sock_close+0x18/0x20
__fput+0x36e/0xac0
fput_close_sync+0xe5/0x1a0
__x64_sys_close+0x7d/0xd0
do_syscall_64+0xf7/0x5e0
entry_SYSCALL_64_after_hwframe+0x76/0x7e |
| In the Linux kernel, the following vulnerability has been resolved:
soc/tegra: cbb: Fix incorrect ARRAY_SIZE in fabric lookup tables
Fix incorrect ARRAY_SIZE usage in fabric lookup tables which could
cause out-of-bounds access during target timeout lookup. |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: fix use-after-free from async crypto on Qualcomm crypto engine
ksmbd_crypt_message() sets a NULL completion callback on AEAD requests
and does not handle the -EINPROGRESS return code from async hardware
crypto engines like the Qualcomm Crypto Engine (QCE). When QCE returns
-EINPROGRESS, ksmbd treats it as an error and immediately frees the
request while the hardware DMA operation is still in flight. The DMA
completion callback then dereferences freed memory, causing a NULL
pointer crash:
pc : qce_skcipher_done+0x24/0x174
lr : vchan_complete+0x230/0x27c
...
el1h_64_irq+0x68/0x6c
ksmbd_free_work_struct+0x20/0x118 [ksmbd]
ksmbd_exit_file_cache+0x694/0xa4c [ksmbd]
Use the standard crypto_wait_req() pattern with crypto_req_done() as
the completion callback, matching the approach used by the SMB client
in fs/smb/client/smb2ops.c. This properly handles both synchronous
engines (immediate return) and async engines (-EINPROGRESS followed
by callback notification). |
| In the Linux kernel, the following vulnerability has been resolved:
nexthop: fix IPv6 route referencing IPv4 nexthop
syzbot reported a panic [1] [2].
When an IPv6 nexthop is replaced with an IPv4 nexthop, the has_v4 flag
of all groups containing this nexthop is not updated. This is because
nh_group_v4_update is only called when replacing AF_INET to AF_INET6,
but the reverse direction (AF_INET6 to AF_INET) is missed.
This allows a stale has_v4=false to bypass fib6_check_nexthop, causing
IPv6 routes to be attached to groups that effectively contain only AF_INET
members. Subsequent route lookups then call nexthop_fib6_nh() which
returns NULL for the AF_INET member, leading to a NULL pointer
dereference.
Fix by calling nh_group_v4_update whenever the family changes, not just
AF_INET to AF_INET6.
Reproducer:
# AF_INET6 blackhole
ip -6 nexthop add id 1 blackhole
# group with has_v4=false
ip nexthop add id 100 group 1
# replace with AF_INET (no -6), has_v4 stays false
ip nexthop replace id 1 blackhole
# pass stale has_v4 check
ip -6 route add 2001:db8::/64 nhid 100
# panic
ping -6 2001:db8::1
[1] https://syzkaller.appspot.com/bug?id=e17283eb2f8dcf3dd9b47fe6f67a95f71faadad0
[2] https://syzkaller.appspot.com/bug?id=8699b6ae54c9f35837d925686208402949e12ef3 |
| In the Linux kernel, the following vulnerability has been resolved:
ocfs2/dlm: validate qr_numregions in dlm_match_regions()
Patch series "ocfs2/dlm: fix two bugs in dlm_match_regions()".
In dlm_match_regions(), the qr_numregions field from a DLM_QUERY_REGION
network message is used to drive loops over the qr_regions buffer without
sufficient validation. This series fixes two issues:
- Patch 1 adds a bounds check to reject messages where qr_numregions
exceeds O2NM_MAX_REGIONS. The o2net layer only validates message
byte length; it does not constrain field values, so a crafted message
can set qr_numregions up to 255 and trigger out-of-bounds reads past
the 1024-byte qr_regions buffer.
- Patch 2 fixes an off-by-one in the local-vs-remote comparison loop,
which uses '<=' instead of '<', reading one entry past the valid range
even when qr_numregions is within bounds.
This patch (of 2):
The qr_numregions field from a DLM_QUERY_REGION network message is used
directly as loop bounds in dlm_match_regions() without checking against
O2NM_MAX_REGIONS. Since qr_regions is sized for at most O2NM_MAX_REGIONS
(32) entries, a crafted message with qr_numregions > 32 causes
out-of-bounds reads past the qr_regions buffer.
Add a bounds check for qr_numregions before entering the loops. |
| In the Linux kernel, the following vulnerability has been resolved:
bonding: 3ad: implement proper RCU rules for port->aggregator
syzbot found a data-race in bond_3ad_get_active_agg_info /
bond_3ad_state_machine_handler [1] which hints at lack of proper
RCU implementation.
Add __rcu qualifier to port->aggregator, and add proper RCU API.
[1]
BUG: KCSAN: data-race in bond_3ad_get_active_agg_info / bond_3ad_state_machine_handler
write to 0xffff88813cf5c4b0 of 8 bytes by task 36 on cpu 0:
ad_port_selection_logic drivers/net/bonding/bond_3ad.c:1659 [inline]
bond_3ad_state_machine_handler+0x9d5/0x2d60 drivers/net/bonding/bond_3ad.c:2569
process_one_work kernel/workqueue.c:3302 [inline]
process_scheduled_works+0x4f0/0x9c0 kernel/workqueue.c:3385
worker_thread+0x58a/0x780 kernel/workqueue.c:3466
kthread+0x22a/0x280 kernel/kthread.c:436
ret_from_fork+0x146/0x330 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
read to 0xffff88813cf5c4b0 of 8 bytes by task 22063 on cpu 1:
__bond_3ad_get_active_agg_info drivers/net/bonding/bond_3ad.c:2858 [inline]
bond_3ad_get_active_agg_info+0x8c/0x230 drivers/net/bonding/bond_3ad.c:2881
bond_fill_info+0xe0f/0x10f0 drivers/net/bonding/bond_netlink.c:853
rtnl_link_info_fill net/core/rtnetlink.c:906 [inline]
rtnl_link_fill+0x1d7/0x4e0 net/core/rtnetlink.c:927
rtnl_fill_ifinfo+0xf8e/0x1380 net/core/rtnetlink.c:2168
rtmsg_ifinfo_build_skb+0x11c/0x1b0 net/core/rtnetlink.c:4453
rtmsg_ifinfo_event net/core/rtnetlink.c:4486 [inline]
rtmsg_ifinfo+0x6d/0x110 net/core/rtnetlink.c:4495
__dev_notify_flags+0x76/0x390 net/core/dev.c:9790
netif_change_flags+0xac/0xd0 net/core/dev.c:9823
do_setlink+0x905/0x2950 net/core/rtnetlink.c:3180
rtnl_group_changelink net/core/rtnetlink.c:3813 [inline]
__rtnl_newlink net/core/rtnetlink.c:3981 [inline]
rtnl_newlink+0xf55/0x1400 net/core/rtnetlink.c:4109
rtnetlink_rcv_msg+0x64b/0x720 net/core/rtnetlink.c:6995
netlink_rcv_skb+0x123/0x220 net/netlink/af_netlink.c:2550
rtnetlink_rcv+0x1c/0x30 net/core/rtnetlink.c:7022
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x5a8/0x680 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x5c8/0x6f0 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:787 [inline]
__sock_sendmsg net/socket.c:802 [inline]
____sys_sendmsg+0x563/0x5b0 net/socket.c:2698
___sys_sendmsg+0x195/0x1e0 net/socket.c:2752
__sys_sendmsg net/socket.c:2784 [inline]
__do_sys_sendmsg net/socket.c:2789 [inline]
__se_sys_sendmsg net/socket.c:2787 [inline]
__x64_sys_sendmsg+0xd4/0x160 net/socket.c:2787
x64_sys_call+0x194c/0x3020 arch/x86/include/generated/asm/syscalls_64.h:47
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x12c/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0x0000000000000000 -> 0xffff88813cf5c400
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 22063 Comm: syz.0.31122 Tainted: G W syzkaller #0 PREEMPT(full)
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026 |
| In the Linux kernel, the following vulnerability has been resolved:
net: psp: require admin permission for dev-set and key-rotate
The dev-set and key-rotate netlink operations modify shared device
state (PSP version configuration and cryptographic key material,
respectively) but do not require CAP_NET_ADMIN. The only access
control is psp_dev_check_access() which merely verifies netns
membership. |
| In the Linux kernel, the following vulnerability has been resolved:
nvmet-tcp: propagate nvmet_tcp_build_pdu_iovec() errors to its callers
Currently, when nvmet_tcp_build_pdu_iovec() detects an out-of-bounds
PDU length or offset, it triggers nvmet_tcp_fatal_error(cmd->queue)
and returns early. However, because the function returns void, the
callers are entirely unaware that a fatal error has occurred and
that the cmd->recv_msg.msg_iter was left uninitialized.
Callers such as nvmet_tcp_handle_h2c_data_pdu() proceed to blindly
overwrite the queue state with queue->rcv_state = NVMET_TCP_RECV_DATA
Consequently, the socket receiving loop may attempt to read incoming
network data into the uninitialized iterator.
Fix this by shifting the error handling responsibility to the callers. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf, sockmap: Fix af_unix null-ptr-deref in proto update
unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state,
TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`).
sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that
socket is properly set up, which would include having a defined peer. IOW,
there's a window when unix_stream_bpf_update_proto() can be called on
socket which still has unix_peer(sk) == NULL.
CPU0 bpf CPU1 connect
-------- ------------
WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
sock_map_sk_state_allowed(sk)
...
sk_pair = unix_peer(sk)
sock_hold(sk_pair)
sock_hold(newsk)
smp_mb__after_atomic()
unix_peer(sk) = newsk
BUG: kernel NULL pointer dereference, address: 0000000000000080
RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
Call Trace:
sock_map_link+0x564/0x8b0
sock_map_update_common+0x6e/0x340
sock_map_update_elem_sys+0x17d/0x240
__sys_bpf+0x26db/0x3250
__x64_sys_bpf+0x21/0x30
do_syscall_64+0x6b/0x3a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Initial idea was to move peer assignment _before_ the sk_state update[1],
but that involved an additional memory barrier, and changing the hot path
was rejected.
Then a NULL check during proto update in unix_stream_bpf_update_proto() was
considered[2], but the follow-up discussion[3] focused on the root cause,
i.e. sockmap update taking a wrong lock. Or, more specifically, missing
unix_state_lock()[4].
In the end it was concluded that teaching sockmap about the af_unix locking
would be unnecessarily complex[5].
Complexity aside, since BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT
are allowed to update sockmaps, sock_map_update_elem() taking the unix
lock, as it is currently implemented in unix_state_lock():
spin_lock(&unix_sk(s)->lock), would be problematic. unix_state_lock() taken
in a process context, followed by a softirq-context TC BPF program
attempting to take the same spinlock -- deadlock[6].
This way we circled back to the peer check idea[2].
[1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/
[2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/
[3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@linux.dev/
[4]: https://lore.kernel.org/netdev/CAAVpQUA+8GL_j63CaKb8hbxoL21izD58yr1NvhOhU=j+35+3og@mail.gmail.com/
[5]: https://lore.kernel.org/bpf/CAAVpQUAHijOMext28Gi10dSLuMzGYh+jK61Ujn+fZ-wvcODR2A@mail.gmail.com/
[6]: https://lore.kernel.org/bpf/dd043c69-4d03-46fe-8325-8f97101435cf@linux.dev/
Summary of scenarios where af_unix/stream connect() may race a sockmap
update:
1. connect() vs. bpf(BPF_MAP_UPDATE_ELEM), i.e. sock_map_update_elem_sys()
Implemented NULL check is sufficient. Once assigned, socket peer won't
be released until socket fd is released. And that's not an issue because
sock_map_update_elem_sys() bumps fd refcnf.
2. connect() vs BPF program doing update
Update restricted per verifier.c:may_update_sockmap() to
BPF_PROG_TYPE_TRACING/BPF_TRACE_ITER
BPF_PROG_TYPE_SOCK_OPS (bpf_sock_map_update() only)
BPF_PROG_TYPE_SOCKET_FILTER
BPF_PROG_TYPE_SCHED_CLS
BPF_PROG_TYPE_SCHED_ACT
BPF_PROG_TYPE_XDP
BPF_PROG_TYPE_SK_REUSEPORT
BPF_PROG_TYPE_FLOW_DISSECTOR
BPF_PROG_TYPE_SK_LOOKUP
Plus one more race to consider:
CPU0 bpf CPU1 connect
-------- ------------
WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
sock_map_sk_state_allowed(sk)
sock_hold(newsk)
smp_mb__after_atomic()
---truncated--- |