1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 /* SPDX-License-Identifier: GPL-2.0 */ /* rwsem.h: R/W semaphores, public interface * * Written by David Howells (dhowells@redhat.com). * Derived from asm-i386/semaphore.h */ #ifndef _LINUX_RWSEM_H #define _LINUX_RWSEM_H #include <linux/linkage.h> #include <linux/types.h> #include <linux/kernel.h> #include <linux/list.h> #include <linux/spinlock.h> #include <linux/atomic.h> #include <linux/err.h> #ifdef CONFIG_RWSEM_SPIN_ON_OWNER #include <linux/osq_lock.h> #endif /* * For an uncontended rwsem, count and owner are the only fields a task * needs to touch when acquiring the rwsem. So they are put next to each * other to increase the chance that they will share the same cacheline. * * In a contended rwsem, the owner is likely the most frequently accessed * field in the structure as the optimistic waiter that holds the osq lock * will spin on owner. For an embedded rwsem, other hot fields in the * containing structure should be moved further away from the rwsem to * reduce the chance that they will share the same cacheline causing * cacheline bouncing problem. */ struct rw_semaphore { atomic_long_t count; /* * Write owner or one of the read owners as well flags regarding * the current state of the rwsem. Can be used as a speculative * check to see if the write owner is running on the cpu. */ atomic_long_t owner; #ifdef CONFIG_RWSEM_SPIN_ON_OWNER struct optimistic_spin_queue osq; /* spinner MCS lock */ #endif raw_spinlock_t wait_lock; struct list_head wait_list; #ifdef CONFIG_DEBUG_RWSEMS void *magic; #endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif }; /* In all implementations count != 0 means locked */ static inline int rwsem_is_locked(struct rw_semaphore *sem) { return atomic_long_read(&sem->count) != 0; } #define RWSEM_UNLOCKED_VALUE 0L #define __RWSEM_COUNT_INIT(name) .count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE) /* Common initializer macros and functions */ #ifdef CONFIG_DEBUG_LOCK_ALLOC # define __RWSEM_DEP_MAP_INIT(lockname) \ .dep_map = { \ .name = #lockname, \ .wait_type_inner = LD_WAIT_SLEEP, \ }, #else # define __RWSEM_DEP_MAP_INIT(lockname) #endif #ifdef CONFIG_DEBUG_RWSEMS # define __RWSEM_DEBUG_INIT(lockname) .magic = &lockname, #else # define __RWSEM_DEBUG_INIT(lockname) #endif #ifdef CONFIG_RWSEM_SPIN_ON_OWNER #define __RWSEM_OPT_INIT(lockname) .osq = OSQ_LOCK_UNLOCKED, #else #define __RWSEM_OPT_INIT(lockname) #endif #define __RWSEM_INITIALIZER(name) \ { __RWSEM_COUNT_INIT(name), \ .owner = ATOMIC_LONG_INIT(0), \ __RWSEM_OPT_INIT(name) \ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(name.wait_lock),\ .wait_list = LIST_HEAD_INIT((name).wait_list), \ __RWSEM_DEBUG_INIT(name) \ __RWSEM_DEP_MAP_INIT(name) } #define DECLARE_RWSEM(name) \ struct rw_semaphore name = __RWSEM_INITIALIZER(name) extern void __init_rwsem(struct rw_semaphore *sem, const char *name, struct lock_class_key *key); #define init_rwsem(sem) \ do { \ static struct lock_class_key __key; \ \ __init_rwsem((sem), #sem, &__key); \ } while (0) /* * This is the same regardless of which rwsem implementation that is being used. * It is just a heuristic meant to be called by somebody alreadying holding the * rwsem to see if somebody from an incompatible type is wanting access to the * lock. */ static inline int rwsem_is_contended(struct rw_semaphore *sem) { return !list_empty(&sem->wait_list); } /* * lock for reading */ extern void down_read(struct rw_semaphore *sem); extern int __must_check down_read_interruptible(struct rw_semaphore *sem); extern int __must_check down_read_killable(struct rw_semaphore *sem); /* * trylock for reading -- returns 1 if successful, 0 if contention */ extern int down_read_trylock(struct rw_semaphore *sem); /* * lock for writing */ extern void down_write(struct rw_semaphore *sem); extern int __must_check down_write_killable(struct rw_semaphore *sem); /* * trylock for writing -- returns 1 if successful, 0 if contention */ extern int down_write_trylock(struct rw_semaphore *sem); /* * release a read lock */ extern void up_read(struct rw_semaphore *sem); /* * release a write lock */ extern void up_write(struct rw_semaphore *sem); /* * downgrade write lock to read lock */ extern void downgrade_write(struct rw_semaphore *sem); #ifdef CONFIG_DEBUG_LOCK_ALLOC /* * nested locking. NOTE: rwsems are not allowed to recurse * (which occurs if the same task tries to acquire the same * lock instance multiple times), but multiple locks of the * same lock class might be taken, if the order of the locks * is always the same. This ordering rule can be expressed * to lockdep via the _nested() APIs, but enumerating the * subclasses that are used. (If the nesting relationship is * static then another method for expressing nested locking is * the explicit definition of lock class keys and the use of * lockdep_set_class() at lock initialization time. * See Documentation/locking/lockdep-design.rst for more details.) */ extern void down_read_nested(struct rw_semaphore *sem, int subclass); extern int __must_check down_read_killable_nested(struct rw_semaphore *sem, int subclass); extern void down_write_nested(struct rw_semaphore *sem, int subclass); extern int down_write_killable_nested(struct rw_semaphore *sem, int subclass); extern void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest_lock); # define down_write_nest_lock(sem, nest_lock) \ do { \ typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \ _down_write_nest_lock(sem, &(nest_lock)->dep_map); \ } while (0); /* * Take/release a lock when not the owner will release it. * * [ This API should be avoided as much as possible - the * proper abstraction for this case is completions. ] */ extern void down_read_non_owner(struct rw_semaphore *sem); extern void up_read_non_owner(struct rw_semaphore *sem); #else # define down_read_nested(sem, subclass) down_read(sem) # define down_read_killable_nested(sem, subclass) down_read_killable(sem) # define down_write_nest_lock(sem, nest_lock) down_write(sem) # define down_write_nested(sem, subclass) down_write(sem) # define down_write_killable_nested(sem, subclass) down_write_killable(sem) # define down_read_non_owner(sem) down_read(sem) # define up_read_non_owner(sem) up_read(sem) #endif #endif /* _LINUX_RWSEM_H */
1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_X86_BITOPS_H #define _ASM_X86_BITOPS_H /* * Copyright 1992, Linus Torvalds. * * Note: inlines with more than a single statement should be marked * __always_inline to avoid problems with older gcc's inlining heuristics. */ #ifndef _LINUX_BITOPS_H #error only <linux/bitops.h> can be included directly #endif #include <linux/compiler.h> #include <asm/alternative.h> #include <asm/rmwcc.h> #include <asm/barrier.h> #if BITS_PER_LONG == 32 # define _BITOPS_LONG_SHIFT 5 #elif BITS_PER_LONG == 64 # define _BITOPS_LONG_SHIFT 6 #else # error "Unexpected BITS_PER_LONG" #endif #define BIT_64(n) (U64_C(1) << (n)) /* * These have to be done with inline assembly: that way the bit-setting * is guaranteed to be atomic. All bit operations return 0 if the bit * was cleared before the operation and != 0 if it was not. * * bit 0 is the LSB of addr; bit 32 is the LSB of (addr+1). */ #define RLONG_ADDR(x) "m" (*(volatile long *) (x)) #define WBYTE_ADDR(x) "+m" (*(volatile char *) (x)) #define ADDR RLONG_ADDR(addr) /* * We do the locked ops that don't return the old value as * a mask operation on a byte. */ #define CONST_MASK_ADDR(nr, addr) WBYTE_ADDR((void *)(addr) + ((nr)>>3)) #define CONST_MASK(nr) (1 << ((nr) & 7)) static __always_inline void arch_set_bit(long nr, volatile unsigned long *addr) { if (__builtin_constant_p(nr)) { asm volatile(LOCK_PREFIX "orb %b1,%0" : CONST_MASK_ADDR(nr, addr) : "iq" (CONST_MASK(nr)) : "memory"); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0" : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } } static __always_inline void arch___set_bit(long nr, volatile unsigned long *addr) { asm volatile(__ASM_SIZE(bts) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); } static __always_inline void arch_clear_bit(long nr, volatile unsigned long *addr) { if (__builtin_constant_p(nr)) { asm volatile(LOCK_PREFIX "andb %b1,%0" : CONST_MASK_ADDR(nr, addr) : "iq" (~CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0" : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } } static __always_inline void arch_clear_bit_unlock(long nr, volatile unsigned long *addr) { barrier(); arch_clear_bit(nr, addr); } static __always_inline void arch___clear_bit(long nr, volatile unsigned long *addr) { asm volatile(__ASM_SIZE(btr) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); } static __always_inline bool arch_clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr) { bool negative; asm volatile(LOCK_PREFIX "andb %2,%1" CC_SET(s) : CC_OUT(s) (negative), WBYTE_ADDR(addr) : "ir" ((char) ~(1 << nr)) : "memory"); return negative; } #define arch_clear_bit_unlock_is_negative_byte \ arch_clear_bit_unlock_is_negative_byte static __always_inline void arch___clear_bit_unlock(long nr, volatile unsigned long *addr) { arch___clear_bit(nr, addr); } static __always_inline void arch___change_bit(long nr, volatile unsigned long *addr) { asm volatile(__ASM_SIZE(btc) " %1,%0" : : ADDR, "Ir" (nr) : "memory"); } static __always_inline void arch_change_bit(long nr, volatile unsigned long *addr) { if (__builtin_constant_p(nr)) { asm volatile(LOCK_PREFIX "xorb %b1,%0" : CONST_MASK_ADDR(nr, addr) : "iq" (CONST_MASK(nr))); } else { asm volatile(LOCK_PREFIX __ASM_SIZE(btc) " %1,%0" : : RLONG_ADDR(addr), "Ir" (nr) : "memory"); } } static __always_inline bool arch_test_and_set_bit(long nr, volatile unsigned long *addr) { return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(bts), *addr, c, "Ir", nr); } static __always_inline bool arch_test_and_set_bit_lock(long nr, volatile unsigned long *addr) { return arch_test_and_set_bit(nr, addr); } static __always_inline bool arch___test_and_set_bit(long nr, volatile unsigned long *addr) { bool oldbit; asm(__ASM_SIZE(bts) " %2,%1" CC_SET(c) : CC_OUT(c) (oldbit) : ADDR, "Ir" (nr) : "memory"); return oldbit; } static __always_inline bool arch_test_and_clear_bit(long nr, volatile unsigned long *addr) { return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btr), *addr, c, "Ir", nr); } /* * Note: the operation is performed atomically with respect to * the local CPU, but not other CPUs. Portable code should not * rely on this behaviour. * KVM relies on this behaviour on x86 for modifying memory that is also * accessed from a hypervisor on the same CPU if running in a VM: don't change * this without also updating arch/x86/kernel/kvm.c */ static __always_inline bool arch___test_and_clear_bit(long nr, volatile unsigned long *addr) { bool oldbit; asm volatile(__ASM_SIZE(btr) " %2,%1" CC_SET(c) : CC_OUT(c) (oldbit) : ADDR, "Ir" (nr) : "memory"); return oldbit; } static __always_inline bool arch___test_and_change_bit(long nr, volatile unsigned long *addr) { bool oldbit; asm volatile(__ASM_SIZE(btc) " %2,%1" CC_SET(c) : CC_OUT(c) (oldbit) : ADDR, "Ir" (nr) : "memory"); return oldbit; } static __always_inline bool arch_test_and_change_bit(long nr, volatile unsigned long *addr) { return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btc), *addr, c, "Ir", nr); } static __always_inline bool constant_test_bit(long nr, const volatile unsigned long *addr) { return ((1UL << (nr & (BITS_PER_LONG-1))) & (addr[nr >> _BITOPS_LONG_SHIFT])) != 0; } static __always_inline bool variable_test_bit(long nr, volatile const unsigned long *addr) { bool oldbit; asm volatile(__ASM_SIZE(bt) " %2,%1" CC_SET(c) : CC_OUT(c) (oldbit) : "m" (*(unsigned long *)addr), "Ir" (nr) : "memory"); return oldbit; } #define arch_test_bit(nr, addr) \ (__builtin_constant_p((nr)) \ ? constant_test_bit((nr), (addr)) \ : variable_test_bit((nr), (addr))) /** * __ffs - find first set bit in word * @word: The word to search * * Undefined if no bit exists, so code should check against 0 first. */ static __always_inline unsigned long __ffs(unsigned long word) { asm("rep; bsf %1,%0" : "=r" (word) : "rm" (word)); return word; } /** * ffz - find first zero bit in word * @word: The word to search * * Undefined if no zero exists, so code should check against ~0UL first. */ static __always_inline unsigned long ffz(unsigned long word) { asm("rep; bsf %1,%0" : "=r" (word) : "r" (~word)); return word; } /* * __fls: find last set bit in word * @word: The word to search * * Undefined if no set bit exists, so code should check against 0 first. */ static __always_inline unsigned long __fls(unsigned long word) { asm("bsr %1,%0" : "=r" (word) : "rm" (word)); return word; } #undef ADDR #ifdef __KERNEL__ /** * ffs - find first set bit in word * @x: the word to search * * This is defined the same way as the libc and compiler builtin ffs * routines, therefore differs in spirit from the other bitops. * * ffs(value) returns 0 if value is 0 or the position of the first * set bit if value is nonzero. The first (least significant) bit * is at position 1. */ static __always_inline int ffs(int x) { int r; #ifdef CONFIG_X86_64 /* * AMD64 says BSFL won't clobber the dest reg if x==0; Intel64 says the * dest reg is undefined if x==0, but their CPU architect says its * value is written to set it to the same as before, except that the * top 32 bits will be cleared. * * We cannot do this on 32 bits because at the very least some * 486 CPUs did not behave this way. */ asm("bsfl %1,%0" : "=r" (r) : "rm" (x), "0" (-1)); #elif defined(CONFIG_X86_CMOV) asm("bsfl %1,%0\n\t" "cmovzl %2,%0" : "=&r" (r) : "rm" (x), "r" (-1)); #else asm("bsfl %1,%0\n\t" "jnz 1f\n\t" "movl $-1,%0\n" "1:" : "=r" (r) : "rm" (x)); #endif return r + 1; } /** * fls - find last set bit in word * @x: the word to search * * This is defined in a similar way as the libc and compiler builtin * ffs, but returns the position of the most significant set bit. * * fls(value) returns 0 if value is 0 or the position of the last * set bit if value is nonzero. The last (most significant) bit is * at position 32. */ static __always_inline int fls(unsigned int x) { int r; #ifdef CONFIG_X86_64 /* * AMD64 says BSRL won't clobber the dest reg if x==0; Intel64 says the * dest reg is undefined if x==0, but their CPU architect says its * value is written to set it to the same as before, except that the * top 32 bits will be cleared. * * We cannot do this on 32 bits because at the very least some * 486 CPUs did not behave this way. */ asm("bsrl %1,%0" : "=r" (r) : "rm" (x), "0" (-1)); #elif defined(CONFIG_X86_CMOV) asm("bsrl %1,%0\n\t" "cmovzl %2,%0" : "=&r" (r) : "rm" (x), "rm" (-1)); #else asm("bsrl %1,%0\n\t" "jnz 1f\n\t" "movl $-1,%0\n" "1:" : "=r" (r) : "rm" (x)); #endif return r + 1; } /** * fls64 - find last set bit in a 64-bit word * @x: the word to search * * This is defined in a similar way as the libc and compiler builtin * ffsll, but returns the position of the most significant set bit. * * fls64(value) returns 0 if value is 0 or the position of the last * set bit if value is nonzero. The last (most significant) bit is * at position 64. */ #ifdef CONFIG_X86_64 static __always_inline int fls64(__u64 x) { int bitpos = -1; /* * AMD64 says BSRQ won't clobber the dest reg if x==0; Intel64 says the * dest reg is undefined if x==0, but their CPU architect says its * value is written to set it to the same as before. */ asm("bsrq %1,%q0" : "+r" (bitpos) : "rm" (x)); return bitpos + 1; } #else #include <asm-generic/bitops/fls64.h> #endif #include <asm-generic/bitops/find.h> #include <asm-generic/bitops/sched.h> #include <asm/arch_hweight.h> #include <asm-generic/bitops/const_hweight.h> #include <asm-generic/bitops/instrumented-atomic.h> #include <asm-generic/bitops/instrumented-non-atomic.h> #include <asm-generic/bitops/instrumented-lock.h> #include <asm-generic/bitops/le.h> #include <asm-generic/bitops/ext2-atomic-setbit.h> #endif /* __KERNEL__ */ #endif /* _ASM_X86_BITOPS_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Copyright 1997-1998 Transmeta Corporation - All Rights Reserved * Copyright 2005-2006 Ian Kent <raven@themaw.net> */ /* Internal header file for autofs */ #include <linux/auto_fs.h> #include <linux/auto_dev-ioctl.h> #include <linux/kernel.h> #include <linux/slab.h> #include <linux/time.h> #include <linux/string.h> #include <linux/wait.h> #include <linux/sched.h> #include <linux/sched/signal.h> #include <linux/mount.h> #include <linux/namei.h> #include <linux/uaccess.h> #include <linux/mutex.h> #include <linux/spinlock.h> #include <linux/list.h> #include <linux/completion.h> #include <linux/file.h> #include <linux/magic.h> /* This is the range of ioctl() numbers we claim as ours */ #define AUTOFS_IOC_FIRST AUTOFS_IOC_READY #define AUTOFS_IOC_COUNT 32 #define AUTOFS_DEV_IOCTL_IOC_FIRST (AUTOFS_DEV_IOCTL_VERSION) #define AUTOFS_DEV_IOCTL_IOC_COUNT \ (AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD - AUTOFS_DEV_IOCTL_VERSION_CMD) #ifdef pr_fmt #undef pr_fmt #endif #define pr_fmt(fmt) KBUILD_MODNAME ":pid:%d:%s: " fmt, current->pid, __func__ extern struct file_system_type autofs_fs_type; /* * Unified info structure. This is pointed to by both the dentry and * inode structures. Each file in the filesystem has an instance of this * structure. It holds a reference to the dentry, so dentries are never * flushed while the file exists. All name lookups are dealt with at the * dentry level, although the filesystem can interfere in the validation * process. Readdir is implemented by traversing the dentry lists. */ struct autofs_info { struct dentry *dentry; struct inode *inode; int flags; struct completion expire_complete; struct list_head active; struct list_head expiring; struct autofs_sb_info *sbi; unsigned long last_used; int count; kuid_t uid; kgid_t gid; struct rcu_head rcu; }; #define AUTOFS_INF_EXPIRING (1<<0) /* dentry in the process of expiring */ #define AUTOFS_INF_WANT_EXPIRE (1<<1) /* the dentry is being considered * for expiry, so RCU_walk is * not permitted. If it progresses to * actual expiry attempt, the flag is * not cleared when EXPIRING is set - * in that case it gets cleared only * when it comes to clearing EXPIRING. */ #define AUTOFS_INF_PENDING (1<<2) /* dentry pending mount */ struct autofs_wait_queue { wait_queue_head_t queue; struct autofs_wait_queue *next; autofs_wqt_t wait_queue_token; /* We use the following to see what we are waiting for */ struct qstr name; u32 dev; u64 ino; kuid_t uid; kgid_t gid; pid_t pid; pid_t tgid; /* This is for status reporting upon return */ int status; unsigned int wait_ctr; }; #define AUTOFS_SBI_MAGIC 0x6d4a556d #define AUTOFS_SBI_CATATONIC 0x0001 #define AUTOFS_SBI_STRICTEXPIRE 0x0002 #define AUTOFS_SBI_IGNORE 0x0004 struct autofs_sb_info { u32 magic; int pipefd; struct file *pipe; struct pid *oz_pgrp; int version; int sub_version; int min_proto; int max_proto; unsigned int flags; unsigned long exp_timeout; unsigned int type; struct super_block *sb; struct mutex wq_mutex; struct mutex pipe_mutex; spinlock_t fs_lock; struct autofs_wait_queue *queues; /* Wait queue pointer */ spinlock_t lookup_lock; struct list_head active_list; struct list_head expiring_list; struct rcu_head rcu; }; static inline struct autofs_sb_info *autofs_sbi(struct super_block *sb) { return (struct autofs_sb_info *)(sb->s_fs_info); } static inline struct autofs_info *autofs_dentry_ino(struct dentry *dentry) { return (struct autofs_info *)(dentry->d_fsdata); } /* autofs_oz_mode(): do we see the man behind the curtain? (The * processes which do manipulations for us in user space sees the raw * filesystem without "magic".) */ static inline int autofs_oz_mode(struct autofs_sb_info *sbi) { return ((sbi->flags & AUTOFS_SBI_CATATONIC) || task_pgrp(current) == sbi->oz_pgrp); } struct inode *autofs_get_inode(struct super_block *, umode_t); void autofs_free_ino(struct autofs_info *); /* Expiration */ int is_autofs_dentry(struct dentry *); int autofs_expire_wait(const struct path *path, int rcu_walk); int autofs_expire_run(struct super_block *, struct vfsmount *, struct autofs_sb_info *, struct autofs_packet_expire __user *); int autofs_do_expire_multi(struct super_block *sb, struct vfsmount *mnt, struct autofs_sb_info *sbi, unsigned int how); int autofs_expire_multi(struct super_block *, struct vfsmount *, struct autofs_sb_info *, int __user *); /* Device node initialization */ int autofs_dev_ioctl_init(void); void autofs_dev_ioctl_exit(void); /* Operations structures */ extern const struct inode_operations autofs_symlink_inode_operations; extern const struct inode_operations autofs_dir_inode_operations; extern const struct file_operations autofs_dir_operations; extern const struct file_operations autofs_root_operations; extern const struct dentry_operations autofs_dentry_operations; /* VFS automount flags management functions */ static inline void __managed_dentry_set_managed(struct dentry *dentry) { dentry->d_flags |= (DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT); } static inline void managed_dentry_set_managed(struct dentry *dentry) { spin_lock(&dentry->d_lock); __managed_dentry_set_managed(dentry); spin_unlock(&dentry->d_lock); } static inline void __managed_dentry_clear_managed(struct dentry *dentry) { dentry->d_flags &= ~(DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT); } static inline void managed_dentry_clear_managed(struct dentry *dentry) { spin_lock(&dentry->d_lock); __managed_dentry_clear_managed(dentry); spin_unlock(&dentry->d_lock); } /* Initializing function */ int autofs_fill_super(struct super_block *, void *, int); struct autofs_info *autofs_new_ino(struct autofs_sb_info *); void autofs_clean_ino(struct autofs_info *); static inline int autofs_prepare_pipe(struct file *pipe) { if (!(pipe->f_mode & FMODE_CAN_WRITE)) return -EINVAL; if (!S_ISFIFO(file_inode(pipe)->i_mode)) return -EINVAL; /* We want a packet pipe */ pipe->f_flags |= O_DIRECT; /* We don't expect -EAGAIN */ pipe->f_flags &= ~O_NONBLOCK; return 0; } /* Queue management functions */ int autofs_wait(struct autofs_sb_info *, const struct path *, enum autofs_notify); int autofs_wait_release(struct autofs_sb_info *, autofs_wqt_t, int); void autofs_catatonic_mode(struct autofs_sb_info *); static inline u32 autofs_get_dev(struct autofs_sb_info *sbi) { return new_encode_dev(sbi->sb->s_dev); } static inline u64 autofs_get_ino(struct autofs_sb_info *sbi) { return d_inode(sbi->sb->s_root)->i_ino; } static inline void __autofs_add_expiring(struct dentry *dentry) { struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); if (ino) { if (list_empty(&ino->expiring)) list_add(&ino->expiring, &sbi->expiring_list); } } static inline void autofs_add_expiring(struct dentry *dentry) { struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); if (ino) { spin_lock(&sbi->lookup_lock); if (list_empty(&ino->expiring)) list_add(&ino->expiring, &sbi->expiring_list); spin_unlock(&sbi->lookup_lock); } } static inline void autofs_del_expiring(struct dentry *dentry) { struct autofs_sb_info *sbi = autofs_sbi(dentry->d_sb); struct autofs_info *ino = autofs_dentry_ino(dentry); if (ino) { spin_lock(&sbi->lookup_lock); if (!list_empty(&ino->expiring)) list_del_init(&ino->expiring); spin_unlock(&sbi->lookup_lock); } } void autofs_kill_sb(struct super_block *);
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _XFRM_HASH_H #define _XFRM_HASH_H #include <linux/xfrm.h> #include <linux/socket.h> #include <linux/jhash.h> static inline unsigned int __xfrm4_addr_hash(const xfrm_address_t *addr) { return ntohl(addr->a4); } static inline unsigned int __xfrm6_addr_hash(const xfrm_address_t *addr) { return jhash2((__force u32 *)addr->a6, 4, 0); } static inline unsigned int __xfrm4_daddr_saddr_hash(const xfrm_address_t *daddr, const xfrm_address_t *saddr) { u32 sum = (__force u32)daddr->a4 + (__force u32)saddr->a4; return ntohl((__force __be32)sum); } static inline unsigned int __xfrm6_daddr_saddr_hash(const xfrm_address_t *daddr, const xfrm_address_t *saddr) { return __xfrm6_addr_hash(daddr) ^ __xfrm6_addr_hash(saddr); } static inline u32 __bits2mask32(__u8 bits) { u32 mask32 = 0xffffffff; if (bits == 0) mask32 = 0; else if (bits < 32) mask32 <<= (32 - bits); return mask32; } static inline unsigned int __xfrm4_dpref_spref_hash(const xfrm_address_t *daddr, const xfrm_address_t *saddr, __u8 dbits, __u8 sbits) { return jhash_2words(ntohl(daddr->a4) & __bits2mask32(dbits), ntohl(saddr->a4) & __bits2mask32(sbits), 0); } static inline unsigned int __xfrm6_pref_hash(const xfrm_address_t *addr, __u8 prefixlen) { unsigned int pdw; unsigned int pbi; u32 initval = 0; pdw = prefixlen >> 5; /* num of whole u32 in prefix */ pbi = prefixlen & 0x1f; /* num of bits in incomplete u32 in prefix */ if (pbi) { __be32 mask; mask = htonl((0xffffffff) << (32 - pbi)); initval = (__force u32)(addr->a6[pdw] & mask); } return jhash2((__force u32 *)addr->a6, pdw, initval); } static inline unsigned int __xfrm6_dpref_spref_hash(const xfrm_address_t *daddr, const xfrm_address_t *saddr, __u8 dbits, __u8 sbits) { return __xfrm6_pref_hash(daddr, dbits) ^ __xfrm6_pref_hash(saddr, sbits); } static inline unsigned int __xfrm_dst_hash(const xfrm_address_t *daddr, const xfrm_address_t *saddr, u32 reqid, unsigned short family, unsigned int hmask) { unsigned int h = family ^ reqid; switch (family) { case AF_INET: h ^= __xfrm4_daddr_saddr_hash(daddr, saddr); break; case AF_INET6: h ^= __xfrm6_daddr_saddr_hash(daddr, saddr); break; } return (h ^ (h >> 16)) & hmask; } static inline unsigned int __xfrm_src_hash(const xfrm_address_t *daddr, const xfrm_address_t *saddr, unsigned short family, unsigned int hmask) { unsigned int h = family; switch (family) { case AF_INET: h ^= __xfrm4_daddr_saddr_hash(daddr, saddr); break; case AF_INET6: h ^= __xfrm6_daddr_saddr_hash(daddr, saddr); break; } return (h ^ (h >> 16)) & hmask; } static inline unsigned int __xfrm_spi_hash(const xfrm_address_t *daddr, __be32 spi, u8 proto, unsigned short family, unsigned int hmask) { unsigned int h = (__force u32)spi ^ proto; switch (family) { case AF_INET: h ^= __xfrm4_addr_hash(daddr); break; case AF_INET6: h ^= __xfrm6_addr_hash(daddr); break; } return (h ^ (h >> 10) ^ (h >> 20)) & hmask; } static inline unsigned int __idx_hash(u32 index, unsigned int hmask) { return (index ^ (index >> 8)) & hmask; } static inline unsigned int __sel_hash(const struct xfrm_selector *sel, unsigned short family, unsigned int hmask, u8 dbits, u8 sbits) { const xfrm_address_t *daddr = &sel->daddr; const xfrm_address_t *saddr = &sel->saddr; unsigned int h = 0; switch (family) { case AF_INET: if (sel->prefixlen_d < dbits || sel->prefixlen_s < sbits) return hmask + 1; h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits); break; case AF_INET6: if (sel->prefixlen_d < dbits || sel->prefixlen_s < sbits) return hmask + 1; h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits); break; } h ^= (h >> 16); return h & hmask; } static inline unsigned int __addr_hash(const xfrm_address_t *daddr, const xfrm_address_t *saddr, unsigned short family, unsigned int hmask, u8 dbits, u8 sbits) { unsigned int h = 0; switch (family) { case AF_INET: h = __xfrm4_dpref_spref_hash(daddr, saddr, dbits, sbits); break; case AF_INET6: h = __xfrm6_dpref_spref_hash(daddr, saddr, dbits, sbits); break; } h ^= (h >> 16); return h & hmask; } struct hlist_head *xfrm_hash_alloc(unsigned int sz); void xfrm_hash_free(struct hlist_head *n, unsigned int sz); #endif /* _XFRM_HASH_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 /* SPDX-License-Identifier: GPL-2.0 */ /* * ioport.h Definitions of routines for detecting, reserving and * allocating system resources. * * Authors: Linus Torvalds */ #ifndef _LINUX_IOPORT_H #define _LINUX_IOPORT_H #ifndef __ASSEMBLY__ #include <linux/compiler.h> #include <linux/types.h> #include <linux/bits.h> /* * Resources are tree-like, allowing * nesting etc.. */ struct resource { resource_size_t start; resource_size_t end; const char *name; unsigned long flags; unsigned long desc; struct resource *parent, *sibling, *child; }; /* * IO resources have these defined flags. * * PCI devices expose these flags to userspace in the "resource" sysfs file, * so don't move them. */ #define IORESOURCE_BITS 0x000000ff /* Bus-specific bits */ #define IORESOURCE_TYPE_BITS 0x00001f00 /* Resource type */ #define IORESOURCE_IO 0x00000100 /* PCI/ISA I/O ports */ #define IORESOURCE_MEM 0x00000200 #define IORESOURCE_REG 0x00000300 /* Register offsets */ #define IORESOURCE_IRQ 0x00000400 #define IORESOURCE_DMA 0x00000800 #define IORESOURCE_BUS 0x00001000 #define IORESOURCE_PREFETCH 0x00002000 /* No side effects */ #define IORESOURCE_READONLY 0x00004000 #define IORESOURCE_CACHEABLE 0x00008000 #define IORESOURCE_RANGELENGTH 0x00010000 #define IORESOURCE_SHADOWABLE 0x00020000 #define IORESOURCE_SIZEALIGN 0x00040000 /* size indicates alignment */ #define IORESOURCE_STARTALIGN 0x00080000 /* start field is alignment */ #define IORESOURCE_MEM_64 0x00100000 #define IORESOURCE_WINDOW 0x00200000 /* forwarded by bridge */ #define IORESOURCE_MUXED 0x00400000 /* Resource is software muxed */ #define IORESOURCE_EXT_TYPE_BITS 0x01000000 /* Resource extended types */ #define IORESOURCE_SYSRAM 0x01000000 /* System RAM (modifier) */ /* IORESOURCE_SYSRAM specific bits. */ #define IORESOURCE_SYSRAM_DRIVER_MANAGED 0x02000000 /* Always detected via a driver. */ #define IORESOURCE_SYSRAM_MERGEABLE 0x04000000 /* Resource can be merged. */ #define IORESOURCE_EXCLUSIVE 0x08000000 /* Userland may not map this resource */ #define IORESOURCE_DISABLED 0x10000000 #define IORESOURCE_UNSET 0x20000000 /* No address assigned yet */ #define IORESOURCE_AUTO 0x40000000 #define IORESOURCE_BUSY 0x80000000 /* Driver has marked this resource busy */ /* I/O resource extended types */ #define IORESOURCE_SYSTEM_RAM (IORESOURCE_MEM|IORESOURCE_SYSRAM) /* PnP IRQ specific bits (IORESOURCE_BITS) */ #define IORESOURCE_IRQ_HIGHEDGE (1<<0) #define IORESOURCE_IRQ_LOWEDGE (1<<1) #define IORESOURCE_IRQ_HIGHLEVEL (1<<2) #define IORESOURCE_IRQ_LOWLEVEL (1<<3) #define IORESOURCE_IRQ_SHAREABLE (1<<4) #define IORESOURCE_IRQ_OPTIONAL (1<<5) /* PnP DMA specific bits (IORESOURCE_BITS) */ #define IORESOURCE_DMA_TYPE_MASK (3<<0) #define IORESOURCE_DMA_8BIT (0<<0) #define IORESOURCE_DMA_8AND16BIT (1<<0) #define IORESOURCE_DMA_16BIT (2<<0) #define IORESOURCE_DMA_MASTER (1<<2) #define IORESOURCE_DMA_BYTE (1<<3) #define IORESOURCE_DMA_WORD (1<<4) #define IORESOURCE_DMA_SPEED_MASK (3<<6) #define IORESOURCE_DMA_COMPATIBLE (0<<6) #define IORESOURCE_DMA_TYPEA (1<<6) #define IORESOURCE_DMA_TYPEB (2<<6) #define IORESOURCE_DMA_TYPEF (3<<6) /* PnP memory I/O specific bits (IORESOURCE_BITS) */ #define IORESOURCE_MEM_WRITEABLE (1<<0) /* dup: IORESOURCE_READONLY */ #define IORESOURCE_MEM_CACHEABLE (1<<1) /* dup: IORESOURCE_CACHEABLE */ #define IORESOURCE_MEM_RANGELENGTH (1<<2) /* dup: IORESOURCE_RANGELENGTH */ #define IORESOURCE_MEM_TYPE_MASK (3<<3) #define IORESOURCE_MEM_8BIT (0<<3) #define IORESOURCE_MEM_16BIT (1<<3) #define IORESOURCE_MEM_8AND16BIT (2<<3) #define IORESOURCE_MEM_32BIT (3<<3) #define IORESOURCE_MEM_SHADOWABLE (1<<5) /* dup: IORESOURCE_SHADOWABLE */ #define IORESOURCE_MEM_EXPANSIONROM (1<<6) /* PnP I/O specific bits (IORESOURCE_BITS) */ #define IORESOURCE_IO_16BIT_ADDR (1<<0) #define IORESOURCE_IO_FIXED (1<<1) #define IORESOURCE_IO_SPARSE (1<<2) /* PCI ROM control bits (IORESOURCE_BITS) */ #define IORESOURCE_ROM_ENABLE (1<<0) /* ROM is enabled, same as PCI_ROM_ADDRESS_ENABLE */ #define IORESOURCE_ROM_SHADOW (1<<1) /* Use RAM image, not ROM BAR */ /* PCI control bits. Shares IORESOURCE_BITS with above PCI ROM. */ #define IORESOURCE_PCI_FIXED (1<<4) /* Do not move resource */ #define IORESOURCE_PCI_EA_BEI (1<<5) /* BAR Equivalent Indicator */ /* * I/O Resource Descriptors * * Descriptors are used by walk_iomem_res_desc() and region_intersects() * for searching a specific resource range in the iomem table. Assign * a new descriptor when a resource range supports the search interfaces. * Otherwise, resource.desc must be set to IORES_DESC_NONE (0). */ enum { IORES_DESC_NONE = 0, IORES_DESC_CRASH_KERNEL = 1, IORES_DESC_ACPI_TABLES = 2, IORES_DESC_ACPI_NV_STORAGE = 3, IORES_DESC_PERSISTENT_MEMORY = 4, IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5, IORES_DESC_DEVICE_PRIVATE_MEMORY = 6, IORES_DESC_RESERVED = 7, IORES_DESC_SOFT_RESERVED = 8, }; /* * Flags controlling ioremap() behavior. */ enum { IORES_MAP_SYSTEM_RAM = BIT(0), IORES_MAP_ENCRYPTED = BIT(1), }; /* helpers to define resources */ #define DEFINE_RES_NAMED(_start, _size, _name, _flags) \ { \ .start = (_start), \ .end = (_start) + (_size) - 1, \ .name = (_name), \ .flags = (_flags), \ .desc = IORES_DESC_NONE, \ } #define DEFINE_RES_IO_NAMED(_start, _size, _name) \ DEFINE_RES_NAMED((_start), (_size), (_name), IORESOURCE_IO) #define DEFINE_RES_IO(_start, _size) \ DEFINE_RES_IO_NAMED((_start), (_size), NULL) #define DEFINE_RES_MEM_NAMED(_start, _size, _name) \ DEFINE_RES_NAMED((_start), (_size), (_name), IORESOURCE_MEM) #define DEFINE_RES_MEM(_start, _size) \ DEFINE_RES_MEM_NAMED((_start), (_size), NULL) #define DEFINE_RES_IRQ_NAMED(_irq, _name) \ DEFINE_RES_NAMED((_irq), 1, (_name), IORESOURCE_IRQ) #define DEFINE_RES_IRQ(_irq) \ DEFINE_RES_IRQ_NAMED((_irq), NULL) #define DEFINE_RES_DMA_NAMED(_dma, _name) \ DEFINE_RES_NAMED((_dma), 1, (_name), IORESOURCE_DMA) #define DEFINE_RES_DMA(_dma) \ DEFINE_RES_DMA_NAMED((_dma), NULL) /* PC/ISA/whatever - the normal PC address spaces: IO and memory */ extern struct resource ioport_resource; extern struct resource iomem_resource; extern struct resource *request_resource_conflict(struct resource *root, struct resource *new); extern int request_resource(struct resource *root, struct resource *new); extern int release_resource(struct resource *new); void release_child_resources(struct resource *new); extern void reserve_region_with_split(struct resource *root, resource_size_t start, resource_size_t end, const char *name); extern struct resource *insert_resource_conflict(struct resource *parent, struct resource *new); extern int insert_resource(struct resource *parent, struct resource *new); extern void insert_resource_expand_to_fit(struct resource *root, struct resource *new); extern int remove_resource(struct resource *old); extern void arch_remove_reservations(struct resource *avail); extern int allocate_resource(struct resource *root, struct resource *new, resource_size_t size, resource_size_t min, resource_size_t max, resource_size_t align, resource_size_t (*alignf)(void *, const struct resource *, resource_size_t, resource_size_t), void *alignf_data); struct resource *lookup_resource(struct resource *root, resource_size_t start); int adjust_resource(struct resource *res, resource_size_t start, resource_size_t size); resource_size_t resource_alignment(struct resource *res); static inline resource_size_t resource_size(const struct resource *res) { return res->end - res->start + 1; } static inline unsigned long resource_type(const struct resource *res) { return res->flags & IORESOURCE_TYPE_BITS; } static inline unsigned long resource_ext_type(const struct resource *res) { return res->flags & IORESOURCE_EXT_TYPE_BITS; } /* True iff r1 completely contains r2 */ static inline bool resource_contains(struct resource *r1, struct resource *r2) { if (resource_type(r1) != resource_type(r2)) return false; if (r1->flags & IORESOURCE_UNSET || r2->flags & IORESOURCE_UNSET) return false; return r1->start <= r2->start && r1->end >= r2->end; } /* Convenience shorthand with allocation */ #define request_region(start,n,name) __request_region(&ioport_resource, (start), (n), (name), 0) #define request_muxed_region(start,n,name) __request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED) #define __request_mem_region(start,n,name, excl) __request_region(&iomem_resource, (start), (n), (name), excl) #define request_mem_region(start,n,name) __request_region(&iomem_resource, (start), (n), (name), 0) #define request_mem_region_exclusive(start,n,name) \ __request_region(&iomem_resource, (start), (n), (name), IORESOURCE_EXCLUSIVE) #define rename_region(region, newname) do { (region)->name = (newname); } while (0) extern struct resource * __request_region(struct resource *, resource_size_t start, resource_size_t n, const char *name, int flags); /* Compatibility cruft */ #define release_region(start,n) __release_region(&ioport_resource, (start), (n)) #define release_mem_region(start,n) __release_region(&iomem_resource, (start), (n)) extern void __release_region(struct resource *, resource_size_t, resource_size_t); #ifdef CONFIG_MEMORY_HOTREMOVE extern void release_mem_region_adjustable(resource_size_t, resource_size_t); #endif #ifdef CONFIG_MEMORY_HOTPLUG extern void merge_system_ram_resource(struct resource *res); #endif /* Wrappers for managed devices */ struct device; extern int devm_request_resource(struct device *dev, struct resource *root, struct resource *new); extern void devm_release_resource(struct device *dev, struct resource *new); #define devm_request_region(dev,start,n,name) \ __devm_request_region(dev, &ioport_resource, (start), (n), (name)) #define devm_request_mem_region(dev,start,n,name) \ __devm_request_region(dev, &iomem_resource, (start), (n), (name)) extern struct resource * __devm_request_region(struct device *dev, struct resource *parent, resource_size_t start, resource_size_t n, const char *name); #define devm_release_region(dev, start, n) \ __devm_release_region(dev, &ioport_resource, (start), (n)) #define devm_release_mem_region(dev, start, n) \ __devm_release_region(dev, &iomem_resource, (start), (n)) extern void __devm_release_region(struct device *dev, struct resource *parent, resource_size_t start, resource_size_t n); extern int iomem_map_sanity_check(resource_size_t addr, unsigned long size); extern bool iomem_is_exclusive(u64 addr); extern int walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages, void *arg, int (*func)(unsigned long, unsigned long, void *)); extern int walk_mem_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int walk_system_ram_res(u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); extern int walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 end, void *arg, int (*func)(struct resource *, void *)); /* True if any part of r1 overlaps r2 */ static inline bool resource_overlaps(struct resource *r1, struct resource *r2) { return (r1->start <= r2->end && r1->end >= r2->start); } struct resource *devm_request_free_mem_region(struct device *dev, struct resource *base, unsigned long size); struct resource *request_free_mem_region(struct resource *base, unsigned long size, const char *name); #ifdef CONFIG_IO_STRICT_DEVMEM void revoke_devmem(struct resource *res); #else static inline void revoke_devmem(struct resource *res) { }; #endif #endif /* __ASSEMBLY__ */ #endif /* _LINUX_IOPORT_H */
1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 // SPDX-License-Identifier: GPL-2.0-only /* * sr.c Copyright (C) 1992 David Giller * Copyright (C) 1993, 1994, 1995, 1999 Eric Youngdale * * adapted from: * sd.c Copyright (C) 1992 Drew Eckhardt * Linux scsi disk driver by * Drew Eckhardt <drew@colorado.edu> * * Modified by Eric Youngdale ericy@andante.org to * add scatter-gather, multiple outstanding request, and other * enhancements. * * Modified by Eric Youngdale eric@andante.org to support loadable * low-level scsi drivers. * * Modified by Thomas Quinot thomas@melchior.cuivre.fdn.fr to * provide auto-eject. * * Modified by Gerd Knorr <kraxel@cs.tu-berlin.de> to support the * generic cdrom interface * * Modified by Jens Axboe <axboe@suse.de> - Uniform sr_packet() * interface, capabilities probe additions, ioctl cleanups, etc. * * Modified by Richard Gooch <rgooch@atnf.csiro.au> to support devfs * * Modified by Jens Axboe <axboe@suse.de> - support DVD-RAM * transparently and lose the GHOST hack * * Modified by Arnaldo Carvalho de Melo <acme@conectiva.com.br> * check resource allocation in sr_init and some cleanups */ #include <linux/module.h> #include <linux/fs.h> #include <linux/kernel.h> #include <linux/mm.h> #include <linux/bio.h> #include <linux/compat.h> #include <linux/string.h> #include <linux/errno.h> #include <linux/cdrom.h> #include <linux/interrupt.h> #include <linux/init.h> #include <linux/blkdev.h> #include <linux/blk-pm.h> #include <linux/mutex.h> #include <linux/slab.h> #include <linux/pm_runtime.h> #include <linux/uaccess.h> #include <asm/unaligned.h> #include <scsi/scsi.h> #include <scsi/scsi_dbg.h> #include <scsi/scsi_device.h> #include <scsi/scsi_driver.h> #include <scsi/scsi_cmnd.h> #include <scsi/scsi_eh.h> #include <scsi/scsi_host.h> #include <scsi/scsi_ioctl.h> /* For the door lock/unlock commands */ #include "scsi_logging.h" #include "sr.h" MODULE_DESCRIPTION("SCSI cdrom (sr) driver"); MODULE_LICENSE("GPL"); MODULE_ALIAS_BLOCKDEV_MAJOR(SCSI_CDROM_MAJOR); MODULE_ALIAS_SCSI_DEVICE(TYPE_ROM); MODULE_ALIAS_SCSI_DEVICE(TYPE_WORM); #define SR_DISKS 256 #define SR_CAPABILITIES \ (CDC_CLOSE_TRAY|CDC_OPEN_TRAY|CDC_LOCK|CDC_SELECT_SPEED| \ CDC_SELECT_DISC|CDC_MULTI_SESSION|CDC_MCN|CDC_MEDIA_CHANGED| \ CDC_PLAY_AUDIO|CDC_RESET|CDC_DRIVE_STATUS| \ CDC_CD_R|CDC_CD_RW|CDC_DVD|CDC_DVD_R|CDC_DVD_RAM|CDC_GENERIC_PACKET| \ CDC_MRW|CDC_MRW_W|CDC_RAM) static int sr_probe(struct device *); static int sr_remove(struct device *); static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt); static int sr_done(struct scsi_cmnd *); static int sr_runtime_suspend(struct device *dev); static const struct dev_pm_ops sr_pm_ops = { .runtime_suspend = sr_runtime_suspend, }; static struct scsi_driver sr_template = { .gendrv = { .name = "sr", .owner = THIS_MODULE, .probe = sr_probe, .remove = sr_remove, .pm = &sr_pm_ops, }, .init_command = sr_init_command, .done = sr_done, }; static unsigned long sr_index_bits[SR_DISKS / BITS_PER_LONG]; static DEFINE_SPINLOCK(sr_index_lock); /* This semaphore is used to mediate the 0->1 reference get in the * face of object destruction (i.e. we can't allow a get on an * object after last put) */ static DEFINE_MUTEX(sr_ref_mutex); static int sr_open(struct cdrom_device_info *, int); static void sr_release(struct cdrom_device_info *); static void get_sectorsize(struct scsi_cd *); static void get_capabilities(struct scsi_cd *); static unsigned int sr_check_events(struct cdrom_device_info *cdi, unsigned int clearing, int slot); static int sr_packet(struct cdrom_device_info *, struct packet_command *); static const struct cdrom_device_ops sr_dops = { .open = sr_open, .release = sr_release, .drive_status = sr_drive_status, .check_events = sr_check_events, .tray_move = sr_tray_move, .lock_door = sr_lock_door, .select_speed = sr_select_speed, .get_last_session = sr_get_last_session, .get_mcn = sr_get_mcn, .reset = sr_reset, .audio_ioctl = sr_audio_ioctl, .capability = SR_CAPABILITIES, .generic_packet = sr_packet, }; static void sr_kref_release(struct kref *kref); static inline struct scsi_cd *scsi_cd(struct gendisk *disk) { return container_of(disk->private_data, struct scsi_cd, driver); } static int sr_runtime_suspend(struct device *dev) { struct scsi_cd *cd = dev_get_drvdata(dev); if (!cd) /* E.g.: runtime suspend following sr_remove() */ return 0; if (cd->media_present) return -EBUSY; else return 0; } /* * The get and put routines for the struct scsi_cd. Note this entity * has a scsi_device pointer and owns a reference to this. */ static inline struct scsi_cd *scsi_cd_get(struct gendisk *disk) { struct scsi_cd *cd = NULL; mutex_lock(&sr_ref_mutex); if (disk->private_data == NULL) goto out; cd = scsi_cd(disk); kref_get(&cd->kref); if (scsi_device_get(cd->device)) { kref_put(&cd->kref, sr_kref_release); cd = NULL; } out: mutex_unlock(&sr_ref_mutex); return cd; } static void scsi_cd_put(struct scsi_cd *cd) { struct scsi_device *sdev = cd->device; mutex_lock(&sr_ref_mutex); kref_put(&cd->kref, sr_kref_release); scsi_device_put(sdev); mutex_unlock(&sr_ref_mutex); } static unsigned int sr_get_events(struct scsi_device *sdev) { u8 buf[8]; u8 cmd[] = { GET_EVENT_STATUS_NOTIFICATION, 1, /* polled */ 0, 0, /* reserved */ 1 << 4, /* notification class: media */ 0, 0, /* reserved */ 0, sizeof(buf), /* allocation length */ 0, /* control */ }; struct event_header *eh = (void *)buf; struct media_event_desc *med = (void *)(buf + 4); struct scsi_sense_hdr sshdr; int result; result = scsi_execute_req(sdev, cmd, DMA_FROM_DEVICE, buf, sizeof(buf), &sshdr, SR_TIMEOUT, MAX_RETRIES, NULL); if (scsi_sense_valid(&sshdr) && sshdr.sense_key == UNIT_ATTENTION) return DISK_EVENT_MEDIA_CHANGE; if (result || be16_to_cpu(eh->data_len) < sizeof(*med)) return 0; if (eh->nea || eh->notification_class != 0x4) return 0; if (med->media_event_code == 1) return DISK_EVENT_EJECT_REQUEST; else if (med->media_event_code == 2) return DISK_EVENT_MEDIA_CHANGE; else if (med->media_event_code == 3) return DISK_EVENT_MEDIA_CHANGE; return 0; } /* * This function checks to see if the media has been changed or eject * button has been pressed. It is possible that we have already * sensed a change, or the drive may have sensed one and not yet * reported it. The past events are accumulated in sdev->changed and * returned together with the current state. */ static unsigned int sr_check_events(struct cdrom_device_info *cdi, unsigned int clearing, int slot) { struct scsi_cd *cd = cdi->handle; bool last_present; struct scsi_sense_hdr sshdr; unsigned int events; int ret; /* no changer support */ if (CDSL_CURRENT != slot) return 0; events = sr_get_events(cd->device); cd->get_event_changed |= events & DISK_EVENT_MEDIA_CHANGE; /* * If earlier GET_EVENT_STATUS_NOTIFICATION and TUR did not agree * for several times in a row. We rely on TUR only for this likely * broken device, to prevent generating incorrect media changed * events for every open(). */ if (cd->ignore_get_event) { events &= ~DISK_EVENT_MEDIA_CHANGE; goto do_tur; } /* * GET_EVENT_STATUS_NOTIFICATION is enough unless MEDIA_CHANGE * is being cleared. Note that there are devices which hang * if asked to execute TUR repeatedly. */ if (cd->device->changed) { events |= DISK_EVENT_MEDIA_CHANGE; cd->device->changed = 0; cd->tur_changed = true; } if (!(clearing & DISK_EVENT_MEDIA_CHANGE)) return events; do_tur: /* let's see whether the media is there with TUR */ last_present = cd->media_present; ret = scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr); /* * Media is considered to be present if TUR succeeds or fails with * sense data indicating something other than media-not-present * (ASC 0x3a). */ cd->media_present = scsi_status_is_good(ret) || (scsi_sense_valid(&sshdr) && sshdr.asc != 0x3a); if (last_present != cd->media_present) cd->device->changed = 1; if (cd->device->changed) { events |= DISK_EVENT_MEDIA_CHANGE; cd->device->changed = 0; cd->tur_changed = true; } if (cd->ignore_get_event) return events; /* check whether GET_EVENT is reporting spurious MEDIA_CHANGE */ if (!cd->tur_changed) { if (cd->get_event_changed) { if (cd->tur_mismatch++ > 8) { sr_printk(KERN_WARNING, cd, "GET_EVENT and TUR disagree continuously, suppress GET_EVENT events\n"); cd->ignore_get_event = true; } } else { cd->tur_mismatch = 0; } } cd->tur_changed = false; cd->get_event_changed = false; return events; } /* * sr_done is the interrupt routine for the device driver. * * It will be notified on the end of a SCSI read / write, and will take one * of several actions based on success or failure. */ static int sr_done(struct scsi_cmnd *SCpnt) { int result = SCpnt->result; int this_count = scsi_bufflen(SCpnt); int good_bytes = (result == 0 ? this_count : 0); int block_sectors = 0; long error_sector; struct scsi_cd *cd = scsi_cd(SCpnt->request->rq_disk); #ifdef DEBUG scmd_printk(KERN_INFO, SCpnt, "done: %x\n", result); #endif /* * Handle MEDIUM ERRORs or VOLUME OVERFLOWs that indicate partial * success. Since this is a relatively rare error condition, no * care is taken to avoid unnecessary additional work such as * memcpy's that could be avoided. */ if (driver_byte(result) != 0 && /* An error occurred */ (SCpnt->sense_buffer[0] & 0x7f) == 0x70) { /* Sense current */ switch (SCpnt->sense_buffer[2]) { case MEDIUM_ERROR: case VOLUME_OVERFLOW: case ILLEGAL_REQUEST: if (!(SCpnt->sense_buffer[0] & 0x90)) break; error_sector = get_unaligned_be32(&SCpnt->sense_buffer[3]); if (SCpnt->request->bio != NULL) block_sectors = bio_sectors(SCpnt->request->bio); if (block_sectors < 4) block_sectors = 4; if (cd->device->sector_size == 2048) error_sector <<= 2; error_sector &= ~(block_sectors - 1); good_bytes = (error_sector - blk_rq_pos(SCpnt->request)) << 9; if (good_bytes < 0 || good_bytes >= this_count) good_bytes = 0; /* * The SCSI specification allows for the value * returned by READ CAPACITY to be up to 75 2K * sectors past the last readable block. * Therefore, if we hit a medium error within the * last 75 2K sectors, we decrease the saved size * value. */ if (error_sector < get_capacity(cd->disk) && cd->capacity - error_sector < 4 * 75) set_capacity(cd->disk, error_sector); break; case RECOVERED_ERROR: good_bytes = this_count; break; default: break; } } return good_bytes; } static blk_status_t sr_init_command(struct scsi_cmnd *SCpnt) { int block = 0, this_count, s_size; struct scsi_cd *cd; struct request *rq = SCpnt->request; blk_status_t ret; ret = scsi_alloc_sgtables(SCpnt); if (ret != BLK_STS_OK) return ret; cd = scsi_cd(rq->rq_disk); SCSI_LOG_HLQUEUE(1, scmd_printk(KERN_INFO, SCpnt, "Doing sr request, block = %d\n", block)); if (!cd->device || !scsi_device_online(cd->device)) { SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "Finishing %u sectors\n", blk_rq_sectors(rq))); SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "Retry with 0x%p\n", SCpnt)); goto out; } if (cd->device->changed) { /* * quietly refuse to do anything to a changed disc until the * changed bit has been reset */ goto out; } /* * we do lazy blocksize switching (when reading XA sectors, * see CDROMREADMODE2 ioctl) */ s_size = cd->device->sector_size; if (s_size > 2048) { if (!in_interrupt()) sr_set_blocklength(cd, 2048); else scmd_printk(KERN_INFO, SCpnt, "can't switch blocksize: in interrupt\n"); } if (s_size != 512 && s_size != 1024 && s_size != 2048) { scmd_printk(KERN_ERR, SCpnt, "bad sector size %d\n", s_size); goto out; } switch (req_op(rq)) { case REQ_OP_WRITE: if (!cd->writeable) goto out; SCpnt->cmnd[0] = WRITE_10; cd->cdi.media_written = 1; break; case REQ_OP_READ: SCpnt->cmnd[0] = READ_10; break; default: blk_dump_rq_flags(rq, "Unknown sr command"); goto out; } { struct scatterlist *sg; int i, size = 0, sg_count = scsi_sg_count(SCpnt); scsi_for_each_sg(SCpnt, sg, sg_count, i) size += sg->length; if (size != scsi_bufflen(SCpnt)) { scmd_printk(KERN_ERR, SCpnt, "mismatch count %d, bytes %d\n", size, scsi_bufflen(SCpnt)); if (scsi_bufflen(SCpnt) > size) SCpnt->sdb.length = size; } } /* * request doesn't start on hw block boundary, add scatter pads */ if (((unsigned int)blk_rq_pos(rq) % (s_size >> 9)) || (scsi_bufflen(SCpnt) % s_size)) { scmd_printk(KERN_NOTICE, SCpnt, "unaligned transfer\n"); goto out; } this_count = (scsi_bufflen(SCpnt) >> 9) / (s_size >> 9); SCSI_LOG_HLQUEUE(2, scmd_printk(KERN_INFO, SCpnt, "%s %d/%u 512 byte blocks.\n", (rq_data_dir(rq) == WRITE) ? "writing" : "reading", this_count, blk_rq_sectors(rq))); SCpnt->cmnd[1] = 0; block = (unsigned int)blk_rq_pos(rq) / (s_size >> 9); if (this_count > 0xffff) { this_count = 0xffff; SCpnt->sdb.length = this_count * s_size; } put_unaligned_be32(block, &SCpnt->cmnd[2]); SCpnt->cmnd[6] = SCpnt->cmnd[9] = 0; put_unaligned_be16(this_count, &SCpnt->cmnd[7]); /* * We shouldn't disconnect in the middle of a sector, so with a dumb * host adapter, it's safe to assume that we can at least transfer * this many bytes between each connect / disconnect. */ SCpnt->transfersize = cd->device->sector_size; SCpnt->underflow = this_count << 9; SCpnt->allowed = MAX_RETRIES; SCpnt->cmd_len = 10; /* * This indicates that the command is ready from our end to be queued. */ return BLK_STS_OK; out: scsi_free_sgtables(SCpnt); return BLK_STS_IOERR; } static void sr_revalidate_disk(struct scsi_cd *cd) { struct scsi_sense_hdr sshdr; /* if the unit is not ready, nothing more to do */ if (scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr)) return; sr_cd_check(&cd->cdi); get_sectorsize(cd); } static int sr_block_open(struct block_device *bdev, fmode_t mode) { struct scsi_cd *cd; struct scsi_device *sdev; int ret = -ENXIO; cd = scsi_cd_get(bdev->bd_disk); if (!cd) goto out; sdev = cd->device; scsi_autopm_get_device(sdev); if (bdev_check_media_change(bdev)) sr_revalidate_disk(cd); mutex_lock(&cd->lock); ret = cdrom_open(&cd->cdi, bdev, mode); mutex_unlock(&cd->lock); scsi_autopm_put_device(sdev); if (ret) scsi_cd_put(cd); out: return ret; } static void sr_block_release(struct gendisk *disk, fmode_t mode) { struct scsi_cd *cd = scsi_cd(disk); mutex_lock(&cd->lock); cdrom_release(&cd->cdi, mode); mutex_unlock(&cd->lock); scsi_cd_put(cd); } static int sr_block_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long arg) { struct scsi_cd *cd = scsi_cd(bdev->bd_disk); struct scsi_device *sdev = cd->device; void __user *argp = (void __user *)arg; int ret; mutex_lock(&cd->lock); ret = scsi_ioctl_block_when_processing_errors(sdev, cmd, (mode & FMODE_NDELAY) != 0); if (ret) goto out; scsi_autopm_get_device(sdev); /* * Send SCSI addressing ioctls directly to mid level, send other * ioctls to cdrom/block level. */ switch (cmd) { case SCSI_IOCTL_GET_IDLUN: case SCSI_IOCTL_GET_BUS_NUMBER: ret = scsi_ioctl(sdev, cmd, argp); goto put; } ret = cdrom_ioctl(&cd->cdi, bdev, mode, cmd, arg); if (ret != -ENOSYS) goto put; ret = scsi_ioctl(sdev, cmd, argp); put: scsi_autopm_put_device(sdev); out: mutex_unlock(&cd->lock); return ret; } #ifdef CONFIG_COMPAT static int sr_block_compat_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long arg) { struct scsi_cd *cd = scsi_cd(bdev->bd_disk); struct scsi_device *sdev = cd->device; void __user *argp = compat_ptr(arg); int ret; mutex_lock(&cd->lock); ret = scsi_ioctl_block_when_processing_errors(sdev, cmd, (mode & FMODE_NDELAY) != 0); if (ret) goto out; scsi_autopm_get_device(sdev); /* * Send SCSI addressing ioctls directly to mid level, send other * ioctls to cdrom/block level. */ switch (cmd) { case SCSI_IOCTL_GET_IDLUN: case SCSI_IOCTL_GET_BUS_NUMBER: ret = scsi_compat_ioctl(sdev, cmd, argp); goto put; } ret = cdrom_ioctl(&cd->cdi, bdev, mode, cmd, (unsigned long)argp); if (ret != -ENOSYS) goto put; ret = scsi_compat_ioctl(sdev, cmd, argp); put: scsi_autopm_put_device(sdev); out: mutex_unlock(&cd->lock); return ret; } #endif static unsigned int sr_block_check_events(struct gendisk *disk, unsigned int clearing) { unsigned int ret = 0; struct scsi_cd *cd; cd = scsi_cd_get(disk); if (!cd) return 0; if (!atomic_read(&cd->device->disk_events_disable_depth)) ret = cdrom_check_events(&cd->cdi, clearing); scsi_cd_put(cd); return ret; } static const struct block_device_operations sr_bdops = { .owner = THIS_MODULE, .open = sr_block_open, .release = sr_block_release, .ioctl = sr_block_ioctl, #ifdef CONFIG_COMPAT .compat_ioctl = sr_block_compat_ioctl, #endif .check_events = sr_block_check_events, }; static int sr_open(struct cdrom_device_info *cdi, int purpose) { struct scsi_cd *cd = cdi->handle; struct scsi_device *sdev = cd->device; int retval; /* * If the device is in error recovery, wait until it is done. * If the device is offline, then disallow any access to it. */ retval = -ENXIO; if (!scsi_block_when_processing_errors(sdev)) goto error_out; return 0; error_out: return retval; } static void sr_release(struct cdrom_device_info *cdi) { struct scsi_cd *cd = cdi->handle; if (cd->device->sector_size > 2048) sr_set_blocklength(cd, 2048); } static int sr_probe(struct device *dev) { struct scsi_device *sdev = to_scsi_device(dev); struct gendisk *disk; struct scsi_cd *cd; int minor, error; scsi_autopm_get_device(sdev); error = -ENODEV; if (sdev->type != TYPE_ROM && sdev->type != TYPE_WORM) goto fail; error = -ENOMEM; cd = kzalloc(sizeof(*cd), GFP_KERNEL); if (!cd) goto fail; kref_init(&cd->kref); disk = alloc_disk(1); if (!disk) goto fail_free; mutex_init(&cd->lock); spin_lock(&sr_index_lock); minor = find_first_zero_bit(sr_index_bits, SR_DISKS); if (minor == SR_DISKS) { spin_unlock(&sr_index_lock); error = -EBUSY; goto fail_put; } __set_bit(minor, sr_index_bits); spin_unlock(&sr_index_lock); disk->major = SCSI_CDROM_MAJOR; disk->first_minor = minor; sprintf(disk->disk_name, "sr%d", minor); disk->fops = &sr_bdops; disk->flags = GENHD_FL_CD | GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE; disk->events = DISK_EVENT_MEDIA_CHANGE | DISK_EVENT_EJECT_REQUEST; disk->event_flags = DISK_EVENT_FLAG_POLL | DISK_EVENT_FLAG_UEVENT; blk_queue_rq_timeout(sdev->request_queue, SR_TIMEOUT); cd->device = sdev; cd->disk = disk; cd->driver = &sr_template; cd->disk = disk; cd->capacity = 0x1fffff; cd->device->changed = 1; /* force recheck CD type */ cd->media_present = 1; cd->use = 1; cd->readcd_known = 0; cd->readcd_cdda = 0; cd->cdi.ops = &sr_dops; cd->cdi.handle = cd; cd->cdi.mask = 0; cd->cdi.capacity = 1; sprintf(cd->cdi.name, "sr%d", minor); sdev->sector_size = 2048; /* A guess, just in case */ /* FIXME: need to handle a get_capabilities failure properly ?? */ get_capabilities(cd); sr_vendor_init(cd); set_capacity(disk, cd->capacity); disk->private_data = &cd->driver; disk->queue = sdev->request_queue; if (register_cdrom(disk, &cd->cdi)) goto fail_minor; /* * Initialize block layer runtime PM stuffs before the * periodic event checking request gets started in add_disk. */ blk_pm_runtime_init(sdev->request_queue, dev); dev_set_drvdata(dev, cd); disk->flags |= GENHD_FL_REMOVABLE; sr_revalidate_disk(cd); device_add_disk(&sdev->sdev_gendev, disk, NULL); sdev_printk(KERN_DEBUG, sdev, "Attached scsi CD-ROM %s\n", cd->cdi.name); scsi_autopm_put_device(cd->device); return 0; fail_minor: spin_lock(&sr_index_lock); clear_bit(minor, sr_index_bits); spin_unlock(&sr_index_lock); fail_put: put_disk(disk); mutex_destroy(&cd->lock); fail_free: kfree(cd); fail: scsi_autopm_put_device(sdev); return error; } static void get_sectorsize(struct scsi_cd *cd) { unsigned char cmd[10]; unsigned char buffer[8]; int the_result, retries = 3; int sector_size; struct request_queue *queue; do { cmd[0] = READ_CAPACITY; memset((void *) &cmd[1], 0, 9); memset(buffer, 0, sizeof(buffer)); /* Do the command and wait.. */ the_result = scsi_execute_req(cd->device, cmd, DMA_FROM_DEVICE, buffer, sizeof(buffer), NULL, SR_TIMEOUT, MAX_RETRIES, NULL); retries--; } while (the_result && retries); if (the_result) { cd->capacity = 0x1fffff; sector_size = 2048; /* A guess, just in case */ } else { long last_written; cd->capacity = 1 + get_unaligned_be32(&buffer[0]); /* * READ_CAPACITY doesn't return the correct size on * certain UDF media. If last_written is larger, use * it instead. * * http://bugzilla.kernel.org/show_bug.cgi?id=9668 */ if (!cdrom_get_last_written(&cd->cdi, &last_written)) cd->capacity = max_t(long, cd->capacity, last_written); sector_size = get_unaligned_be32(&buffer[4]); switch (sector_size) { /* * HP 4020i CD-Recorder reports 2340 byte sectors * Philips CD-Writers report 2352 byte sectors * * Use 2k sectors for them.. */ case 0: case 2340: case 2352: sector_size = 2048; fallthrough; case 2048: cd->capacity *= 4; fallthrough; case 512: break; default: sr_printk(KERN_INFO, cd, "unsupported sector size %d.", sector_size); cd->capacity = 0; } cd->device->sector_size = sector_size; /* * Add this so that we have the ability to correctly gauge * what the device is capable of. */ set_capacity(cd->disk, cd->capacity); } queue = cd->device->request_queue; blk_queue_logical_block_size(queue, sector_size); return; } static void get_capabilities(struct scsi_cd *cd) { unsigned char *buffer; struct scsi_mode_data data; struct scsi_sense_hdr sshdr; unsigned int ms_len = 128; int rc, n; static const char *loadmech[] = { "caddy", "tray", "pop-up", "", "changer", "cartridge changer", "", "" }; /* allocate transfer buffer */ buffer = kmalloc(512, GFP_KERNEL | GFP_DMA); if (!buffer) { sr_printk(KERN_ERR, cd, "out of memory.\n"); return; } /* eat unit attentions */ scsi_test_unit_ready(cd->device, SR_TIMEOUT, MAX_RETRIES, &sshdr); /* ask for mode page 0x2a */ rc = scsi_mode_sense(cd->device, 0, 0x2a, buffer, ms_len, SR_TIMEOUT, 3, &data, NULL); if (rc < 0 || data.length > ms_len || data.header_length + data.block_descriptor_length > data.length) { /* failed, drive doesn't have capabilities mode page */ cd->cdi.speed = 1; cd->cdi.mask |= (CDC_CD_R | CDC_CD_RW | CDC_DVD_R | CDC_DVD | CDC_DVD_RAM | CDC_SELECT_DISC | CDC_SELECT_SPEED | CDC_MRW | CDC_MRW_W | CDC_RAM); kfree(buffer); sr_printk(KERN_INFO, cd, "scsi-1 drive"); return; } n = data.header_length + data.block_descriptor_length; cd->cdi.speed = get_unaligned_be16(&buffer[n + 8]) / 176; cd->readcd_known = 1; cd->readcd_cdda = buffer[n + 5] & 0x01; /* print some capability bits */ sr_printk(KERN_INFO, cd, "scsi3-mmc drive: %dx/%dx %s%s%s%s%s%s\n", get_unaligned_be16(&buffer[n + 14]) / 176, cd->cdi.speed, buffer[n + 3] & 0x01 ? "writer " : "", /* CD Writer */ buffer[n + 3] & 0x20 ? "dvd-ram " : "", buffer[n + 2] & 0x02 ? "cd/rw " : "", /* can read rewriteable */ buffer[n + 4] & 0x20 ? "xa/form2 " : "", /* can read xa/from2 */ buffer[n + 5] & 0x01 ? "cdda " : "", /* can read audio data */ loadmech[buffer[n + 6] >> 5]); if ((buffer[n + 6] >> 5) == 0) /* caddy drives can't close tray... */ cd->cdi.mask |= CDC_CLOSE_TRAY; if ((buffer[n + 2] & 0x8) == 0) /* not a DVD drive */ cd->cdi.mask |= CDC_DVD; if ((buffer[n + 3] & 0x20) == 0) /* can't write DVD-RAM media */ cd->cdi.mask |= CDC_DVD_RAM; if ((buffer[n + 3] & 0x10) == 0) /* can't write DVD-R media */ cd->cdi.mask |= CDC_DVD_R; if ((buffer[n + 3] & 0x2) == 0) /* can't write CD-RW media */ cd->cdi.mask |= CDC_CD_RW; if ((buffer[n + 3] & 0x1) == 0) /* can't write CD-R media */ cd->cdi.mask |= CDC_CD_R; if ((buffer[n + 6] & 0x8) == 0) /* can't eject */ cd->cdi.mask |= CDC_OPEN_TRAY; if ((buffer[n + 6] >> 5) == mechtype_individual_changer || (buffer[n + 6] >> 5) == mechtype_cartridge_changer) cd->cdi.capacity = cdrom_number_of_slots(&cd->cdi); if (cd->cdi.capacity <= 1) /* not a changer */ cd->cdi.mask |= CDC_SELECT_DISC; /*else I don't think it can close its tray cd->cdi.mask |= CDC_CLOSE_TRAY; */ /* * if DVD-RAM, MRW-W or CD-RW, we are randomly writable */ if ((cd->cdi.mask & (CDC_DVD_RAM | CDC_MRW_W | CDC_RAM | CDC_CD_RW)) != (CDC_DVD_RAM | CDC_MRW_W | CDC_RAM | CDC_CD_RW)) { cd->writeable = 1; } kfree(buffer); } /* * sr_packet() is the entry point for the generic commands generated * by the Uniform CD-ROM layer. */ static int sr_packet(struct cdrom_device_info *cdi, struct packet_command *cgc) { struct scsi_cd *cd = cdi->handle; struct scsi_device *sdev = cd->device; if (cgc->cmd[0] == GPCMD_READ_DISC_INFO && sdev->no_read_disc_info) return -EDRIVE_CANT_DO_THIS; if (cgc->timeout <= 0) cgc->timeout = IOCTL_TIMEOUT; sr_do_ioctl(cd, cgc); return cgc->stat; } /** * sr_kref_release - Called to free the scsi_cd structure * @kref: pointer to embedded kref * * sr_ref_mutex must be held entering this routine. Because it is * called on last put, you should always use the scsi_cd_get() * scsi_cd_put() helpers which manipulate the semaphore directly * and never do a direct kref_put(). **/ static void sr_kref_release(struct kref *kref) { struct scsi_cd *cd = container_of(kref, struct scsi_cd, kref); struct gendisk *disk = cd->disk; spin_lock(&sr_index_lock); clear_bit(MINOR(disk_devt(disk)), sr_index_bits); spin_unlock(&sr_index_lock); unregister_cdrom(&cd->cdi); disk->private_data = NULL; put_disk(disk); mutex_destroy(&cd->lock); kfree(cd); } static int sr_remove(struct device *dev) { struct scsi_cd *cd = dev_get_drvdata(dev); scsi_autopm_get_device(cd->device); del_gendisk(cd->disk); dev_set_drvdata(dev, NULL); mutex_lock(&sr_ref_mutex); kref_put(&cd->kref, sr_kref_release); mutex_unlock(&sr_ref_mutex); return 0; } static int __init init_sr(void) { int rc; rc = register_blkdev(SCSI_CDROM_MAJOR, "sr"); if (rc) return rc; rc = scsi_register_driver(&sr_template.gendrv); if (rc) unregister_blkdev(SCSI_CDROM_MAJOR, "sr"); return rc; } static void __exit exit_sr(void) { scsi_unregister_driver(&sr_template.gendrv); unregister_blkdev(SCSI_CDROM_MAJOR, "sr"); } module_init(init_sr); module_exit(exit_sr); MODULE_LICENSE("GPL");
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 /* SPDX-License-Identifier: GPL-2.0 */ /* * Linux Socket Filter Data Structures */ #ifndef __LINUX_FILTER_H__ #define __LINUX_FILTER_H__ #include <stdarg.h> #include <linux/atomic.h> #include <linux/refcount.h> #include <linux/compat.h> #include <linux/skbuff.h> #include <linux/linkage.h> #include <linux/printk.h> #include <linux/workqueue.h> #include <linux/sched.h> #include <linux/capability.h> #include <linux/set_memory.h> #include <linux/kallsyms.h> #include <linux/if_vlan.h> #include <linux/vmalloc.h> #include <linux/sockptr.h> #include <crypto/sha.h> #include <net/sch_generic.h> #include <asm/byteorder.h> #include <uapi/linux/filter.h> #include <uapi/linux/bpf.h> struct sk_buff; struct sock; struct seccomp_data; struct bpf_prog_aux; struct xdp_rxq_info; struct xdp_buff; struct sock_reuseport; struct ctl_table; struct ctl_table_header; /* ArgX, context and stack frame pointer register positions. Note, * Arg1, Arg2, Arg3, etc are used as argument mappings of function * calls in BPF_CALL instruction. */ #define BPF_REG_ARG1 BPF_REG_1 #define BPF_REG_ARG2 BPF_REG_2 #define BPF_REG_ARG3 BPF_REG_3 #define BPF_REG_ARG4 BPF_REG_4 #define BPF_REG_ARG5 BPF_REG_5 #define BPF_REG_CTX BPF_REG_6 #define BPF_REG_FP BPF_REG_10 /* Additional register mappings for converted user programs. */ #define BPF_REG_A BPF_REG_0 #define BPF_REG_X BPF_REG_7 #define BPF_REG_TMP BPF_REG_2 /* scratch reg */ #define BPF_REG_D BPF_REG_8 /* data, callee-saved */ #define BPF_REG_H BPF_REG_9 /* hlen, callee-saved */ /* Kernel hidden auxiliary/helper register. */ #define BPF_REG_AX MAX_BPF_REG #define MAX_BPF_EXT_REG (MAX_BPF_REG + 1) #define MAX_BPF_JIT_REG MAX_BPF_EXT_REG /* unused opcode to mark special call to bpf_tail_call() helper */ #define BPF_TAIL_CALL 0xf0 /* unused opcode to mark special load instruction. Same as BPF_ABS */ #define BPF_PROBE_MEM 0x20 /* unused opcode to mark call to interpreter with arguments */ #define BPF_CALL_ARGS 0xe0 /* unused opcode to mark speculation barrier for mitigating * Speculative Store Bypass */ #define BPF_NOSPEC 0xc0 /* As per nm, we expose JITed images as text (code) section for * kallsyms. That way, tools like perf can find it to match * addresses. */ #define BPF_SYM_ELF_TYPE 't' /* BPF program can access up to 512 bytes of stack space. */ #define MAX_BPF_STACK 512 /* Helper macros for filter block array initializers. */ /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */ #define BPF_ALU64_REG(OP, DST, SRC) \ ((struct bpf_insn) { \ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = 0, \ .imm = 0 }) #define BPF_ALU32_REG(OP, DST, SRC) \ ((struct bpf_insn) { \ .code = BPF_ALU | BPF_OP(OP) | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = 0, \ .imm = 0 }) /* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */ #define BPF_ALU64_IMM(OP, DST, IMM) \ ((struct bpf_insn) { \ .code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \ .dst_reg = DST, \ .src_reg = 0, \ .off = 0, \ .imm = IMM }) #define BPF_ALU32_IMM(OP, DST, IMM) \ ((struct bpf_insn) { \ .code = BPF_ALU | BPF_OP(OP) | BPF_K, \ .dst_reg = DST, \ .src_reg = 0, \ .off = 0, \ .imm = IMM }) /* Endianess conversion, cpu_to_{l,b}e(), {l,b}e_to_cpu() */ #define BPF_ENDIAN(TYPE, DST, LEN) \ ((struct bpf_insn) { \ .code = BPF_ALU | BPF_END | BPF_SRC(TYPE), \ .dst_reg = DST, \ .src_reg = 0, \ .off = 0, \ .imm = LEN }) /* Short form of mov, dst_reg = src_reg */ #define BPF_MOV64_REG(DST, SRC) \ ((struct bpf_insn) { \ .code = BPF_ALU64 | BPF_MOV | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = 0, \ .imm = 0 }) #define BPF_MOV32_REG(DST, SRC) \ ((struct bpf_insn) { \ .code = BPF_ALU | BPF_MOV | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = 0, \ .imm = 0 }) /* Short form of mov, dst_reg = imm32 */ #define BPF_MOV64_IMM(DST, IMM) \ ((struct bpf_insn) { \ .code = BPF_ALU64 | BPF_MOV | BPF_K, \ .dst_reg = DST, \ .src_reg = 0, \ .off = 0, \ .imm = IMM }) #define BPF_MOV32_IMM(DST, IMM) \ ((struct bpf_insn) { \ .code = BPF_ALU | BPF_MOV | BPF_K, \ .dst_reg = DST, \ .src_reg = 0, \ .off = 0, \ .imm = IMM }) /* Special form of mov32, used for doing explicit zero extension on dst. */ #define BPF_ZEXT_REG(DST) \ ((struct bpf_insn) { \ .code = BPF_ALU | BPF_MOV | BPF_X, \ .dst_reg = DST, \ .src_reg = DST, \ .off = 0, \ .imm = 1 }) static inline bool insn_is_zext(const struct bpf_insn *insn) { return insn->code == (BPF_ALU | BPF_MOV | BPF_X) && insn->imm == 1; } /* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */ #define BPF_LD_IMM64(DST, IMM) \ BPF_LD_IMM64_RAW(DST, 0, IMM) #define BPF_LD_IMM64_RAW(DST, SRC, IMM) \ ((struct bpf_insn) { \ .code = BPF_LD | BPF_DW | BPF_IMM, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = 0, \ .imm = (__u32) (IMM) }), \ ((struct bpf_insn) { \ .code = 0, /* zero is reserved opcode */ \ .dst_reg = 0, \ .src_reg = 0, \ .off = 0, \ .imm = ((__u64) (IMM)) >> 32 }) /* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */ #define BPF_LD_MAP_FD(DST, MAP_FD) \ BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD) /* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */ #define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \ ((struct bpf_insn) { \ .code = BPF_ALU64 | BPF_MOV | BPF_SRC(TYPE), \ .dst_reg = DST, \ .src_reg = SRC, \ .off = 0, \ .imm = IMM }) #define BPF_MOV32_RAW(TYPE, DST, SRC, IMM) \ ((struct bpf_insn) { \ .code = BPF_ALU | BPF_MOV | BPF_SRC(TYPE), \ .dst_reg = DST, \ .src_reg = SRC, \ .off = 0, \ .imm = IMM }) /* Direct packet access, R0 = *(uint *) (skb->data + imm32) */ #define BPF_LD_ABS(SIZE, IMM) \ ((struct bpf_insn) { \ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \ .dst_reg = 0, \ .src_reg = 0, \ .off = 0, \ .imm = IMM }) /* Indirect packet access, R0 = *(uint *) (skb->data + src_reg + imm32) */ #define BPF_LD_IND(SIZE, SRC, IMM) \ ((struct bpf_insn) { \ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_IND, \ .dst_reg = 0, \ .src_reg = SRC, \ .off = 0, \ .imm = IMM }) /* Memory load, dst_reg = *(uint *) (src_reg + off16) */ #define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \ ((struct bpf_insn) { \ .code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = OFF, \ .imm = 0 }) /* Memory store, *(uint *) (dst_reg + off16) = src_reg */ #define BPF_STX_MEM(SIZE, DST, SRC, OFF) \ ((struct bpf_insn) { \ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = OFF, \ .imm = 0 }) /* Atomic memory add, *(uint *)(dst_reg + off16) += src_reg */ #define BPF_STX_XADD(SIZE, DST, SRC, OFF) \ ((struct bpf_insn) { \ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_XADD, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = OFF, \ .imm = 0 }) /* Memory store, *(uint *) (dst_reg + off16) = imm32 */ #define BPF_ST_MEM(SIZE, DST, OFF, IMM) \ ((struct bpf_insn) { \ .code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \ .dst_reg = DST, \ .src_reg = 0, \ .off = OFF, \ .imm = IMM }) /* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */ #define BPF_JMP_REG(OP, DST, SRC, OFF) \ ((struct bpf_insn) { \ .code = BPF_JMP | BPF_OP(OP) | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = OFF, \ .imm = 0 }) /* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */ #define BPF_JMP_IMM(OP, DST, IMM, OFF) \ ((struct bpf_insn) { \ .code = BPF_JMP | BPF_OP(OP) | BPF_K, \ .dst_reg = DST, \ .src_reg = 0, \ .off = OFF, \ .imm = IMM }) /* Like BPF_JMP_REG, but with 32-bit wide operands for comparison. */ #define BPF_JMP32_REG(OP, DST, SRC, OFF) \ ((struct bpf_insn) { \ .code = BPF_JMP32 | BPF_OP(OP) | BPF_X, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = OFF, \ .imm = 0 }) /* Like BPF_JMP_IMM, but with 32-bit wide operands for comparison. */ #define BPF_JMP32_IMM(OP, DST, IMM, OFF) \ ((struct bpf_insn) { \ .code = BPF_JMP32 | BPF_OP(OP) | BPF_K, \ .dst_reg = DST, \ .src_reg = 0, \ .off = OFF, \ .imm = IMM }) /* Unconditional jumps, goto pc + off16 */ #define BPF_JMP_A(OFF) \ ((struct bpf_insn) { \ .code = BPF_JMP | BPF_JA, \ .dst_reg = 0, \ .src_reg = 0, \ .off = OFF, \ .imm = 0 }) /* Relative call */ #define BPF_CALL_REL(TGT) \ ((struct bpf_insn) { \ .code = BPF_JMP | BPF_CALL, \ .dst_reg = 0, \ .src_reg = BPF_PSEUDO_CALL, \ .off = 0, \ .imm = TGT }) /* Function call */ #define BPF_CAST_CALL(x) \ ((u64 (*)(u64, u64, u64, u64, u64))(x)) #define BPF_EMIT_CALL(FUNC) \ ((struct bpf_insn) { \ .code = BPF_JMP | BPF_CALL, \ .dst_reg = 0, \ .src_reg = 0, \ .off = 0, \ .imm = ((FUNC) - __bpf_call_base) }) /* Raw code statement block */ #define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \ ((struct bpf_insn) { \ .code = CODE, \ .dst_reg = DST, \ .src_reg = SRC, \ .off = OFF, \ .imm = IMM }) /* Program exit */ #define BPF_EXIT_INSN() \ ((struct bpf_insn) { \ .code = BPF_JMP | BPF_EXIT, \ .dst_reg = 0, \ .src_reg = 0, \ .off = 0, \ .imm = 0 }) /* Speculation barrier */ #define BPF_ST_NOSPEC() \ ((struct bpf_insn) { \ .code = BPF_ST | BPF_NOSPEC, \ .dst_reg = 0, \ .src_reg = 0, \ .off = 0, \ .imm = 0 }) /* Internal classic blocks for direct assignment */ #define __BPF_STMT(CODE, K) \ ((struct sock_filter) BPF_STMT(CODE, K)) #define __BPF_JUMP(CODE, K, JT, JF) \ ((struct sock_filter) BPF_JUMP(CODE, K, JT, JF)) #define bytes_to_bpf_size(bytes) \ ({ \ int bpf_size = -EINVAL; \ \ if (bytes == sizeof(u8)) \ bpf_size = BPF_B; \ else if (bytes == sizeof(u16)) \ bpf_size = BPF_H; \ else if (bytes == sizeof(u32)) \ bpf_size = BPF_W; \ else if (bytes == sizeof(u64)) \ bpf_size = BPF_DW; \ \ bpf_size; \ }) #define bpf_size_to_bytes(bpf_size) \ ({ \ int bytes = -EINVAL; \ \ if (bpf_size == BPF_B) \ bytes = sizeof(u8); \ else if (bpf_size == BPF_H) \ bytes = sizeof(u16); \ else if (bpf_size == BPF_W) \ bytes = sizeof(u32); \ else if (bpf_size == BPF_DW) \ bytes = sizeof(u64); \ \ bytes; \ }) #define BPF_SIZEOF(type) \ ({ \ const int __size = bytes_to_bpf_size(sizeof(type)); \ BUILD_BUG_ON(__size < 0); \ __size; \ }) #define BPF_FIELD_SIZEOF(type, field) \ ({ \ const int __size = bytes_to_bpf_size(sizeof_field(type, field)); \ BUILD_BUG_ON(__size < 0); \ __size; \ }) #define BPF_LDST_BYTES(insn) \ ({ \ const int __size = bpf_size_to_bytes(BPF_SIZE((insn)->code)); \ WARN_ON(__size < 0); \ __size; \ }) #define __BPF_MAP_0(m, v, ...) v #define __BPF_MAP_1(m, v, t, a, ...) m(t, a) #define __BPF_MAP_2(m, v, t, a, ...) m(t, a), __BPF_MAP_1(m, v, __VA_ARGS__) #define __BPF_MAP_3(m, v, t, a, ...) m(t, a), __BPF_MAP_2(m, v, __VA_ARGS__) #define __BPF_MAP_4(m, v, t, a, ...) m(t, a), __BPF_MAP_3(m, v, __VA_ARGS__) #define __BPF_MAP_5(m, v, t, a, ...) m(t, a), __BPF_MAP_4(m, v, __VA_ARGS__) #define __BPF_REG_0(...) __BPF_PAD(5) #define __BPF_REG_1(...) __BPF_MAP(1, __VA_ARGS__), __BPF_PAD(4) #define __BPF_REG_2(...) __BPF_MAP(2, __VA_ARGS__), __BPF_PAD(3) #define __BPF_REG_3(...) __BPF_MAP(3, __VA_ARGS__), __BPF_PAD(2) #define __BPF_REG_4(...) __BPF_MAP(4, __VA_ARGS__), __BPF_PAD(1) #define __BPF_REG_5(...) __BPF_MAP(5, __VA_ARGS__) #define __BPF_MAP(n, ...) __BPF_MAP_##n(__VA_ARGS__) #define __BPF_REG(n, ...) __BPF_REG_##n(__VA_ARGS__) #define __BPF_CAST(t, a) \ (__force t) \ (__force \ typeof(__builtin_choose_expr(sizeof(t) == sizeof(unsigned long), \ (unsigned long)0, (t)0))) a #define __BPF_V void #define __BPF_N #define __BPF_DECL_ARGS(t, a) t a #define __BPF_DECL_REGS(t, a) u64 a #define __BPF_PAD(n) \ __BPF_MAP(n, __BPF_DECL_ARGS, __BPF_N, u64, __ur_1, u64, __ur_2, \ u64, __ur_3, u64, __ur_4, u64, __ur_5) #define BPF_CALL_x(x, name, ...) \ static __always_inline \ u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \ typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \ u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)); \ u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)) \ { \ return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\ } \ static __always_inline \ u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)) #define BPF_CALL_0(name, ...) BPF_CALL_x(0, name, __VA_ARGS__) #define BPF_CALL_1(name, ...) BPF_CALL_x(1, name, __VA_ARGS__) #define BPF_CALL_2(name, ...) BPF_CALL_x(2, name, __VA_ARGS__) #define BPF_CALL_3(name, ...) BPF_CALL_x(3, name, __VA_ARGS__) #define BPF_CALL_4(name, ...) BPF_CALL_x(4, name, __VA_ARGS__) #define BPF_CALL_5(name, ...) BPF_CALL_x(5, name, __VA_ARGS__) #define bpf_ctx_range(TYPE, MEMBER) \ offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1 #define bpf_ctx_range_till(TYPE, MEMBER1, MEMBER2) \ offsetof(TYPE, MEMBER1) ... offsetofend(TYPE, MEMBER2) - 1 #if BITS_PER_LONG == 64 # define bpf_ctx_range_ptr(TYPE, MEMBER) \ offsetof(TYPE, MEMBER) ... offsetofend(TYPE, MEMBER) - 1 #else # define bpf_ctx_range_ptr(TYPE, MEMBER) \ offsetof(TYPE, MEMBER) ... offsetof(TYPE, MEMBER) + 8 - 1 #endif /* BITS_PER_LONG == 64 */ #define bpf_target_off(TYPE, MEMBER, SIZE, PTR_SIZE) \ ({ \ BUILD_BUG_ON(sizeof_field(TYPE, MEMBER) != (SIZE)); \ *(PTR_SIZE) = (SIZE); \ offsetof(TYPE, MEMBER); \ }) /* A struct sock_filter is architecture independent. */ struct compat_sock_fprog { u16 len; compat_uptr_t filter; /* struct sock_filter * */ }; struct sock_fprog_kern { u16 len; struct sock_filter *filter; }; /* Some arches need doubleword alignment for their instructions and/or data */ #define BPF_IMAGE_ALIGNMENT 8 struct bpf_binary_header { u32 pages; u8 image[] __aligned(BPF_IMAGE_ALIGNMENT); }; struct bpf_prog { u16 pages; /* Number of allocated pages */ u16 jited:1, /* Is our filter JIT'ed? */ jit_requested:1,/* archs need to JIT the prog */ gpl_compatible:1, /* Is filter GPL compatible? */ cb_access:1, /* Is control block accessed? */ dst_needed:1, /* Do we need dst entry? */ blinded:1, /* Was blinded */ is_func:1, /* program is a bpf function */ kprobe_override:1, /* Do we override a kprobe? */ has_callchain_buf:1, /* callchain buffer allocated? */ enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ call_get_stack:1; /* Do we call bpf_get_stack() or bpf_get_stackid() */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ u32 jited_len; /* Size of jited insns in bytes */ u8 tag[BPF_TAG_SIZE]; struct bpf_prog_aux *aux; /* Auxiliary fields */ struct sock_fprog_kern *orig_prog; /* Original BPF program */ unsigned int (*bpf_func)(const void *ctx, const struct bpf_insn *insn); /* Instructions for interpreter */ struct sock_filter insns[0]; struct bpf_insn insnsi[]; }; struct sk_filter { refcount_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); #define __BPF_PROG_RUN(prog, ctx, dfunc) ({ \ u32 __ret; \ cant_migrate(); \ if (static_branch_unlikely(&bpf_stats_enabled_key)) { \ struct bpf_prog_stats *__stats; \ u64 __start = sched_clock(); \ __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ __stats = this_cpu_ptr(prog->aux->stats); \ u64_stats_update_begin(&__stats->syncp); \ __stats->cnt++; \ __stats->nsecs += sched_clock() - __start; \ u64_stats_update_end(&__stats->syncp); \ } else { \ __ret = dfunc(ctx, (prog)->insnsi, (prog)->bpf_func); \ } \ __ret; }) #define BPF_PROG_RUN(prog, ctx) \ __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func) /* * Use in preemptible and therefore migratable context to make sure that * the execution of the BPF program runs on one CPU. * * This uses migrate_disable/enable() explicitly to document that the * invocation of a BPF program does not require reentrancy protection * against a BPF program which is invoked from a preempting task. * * For non RT enabled kernels migrate_disable/enable() maps to * preempt_disable/enable(), i.e. it disables also preemption. */ static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog, const void *ctx) { u32 ret; migrate_disable(); ret = __BPF_PROG_RUN(prog, ctx, bpf_dispatcher_nop_func); migrate_enable(); return ret; } #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN struct bpf_skb_data_end { struct qdisc_skb_cb qdisc_cb; void *data_meta; void *data_end; }; struct bpf_nh_params { u32 nh_family; union { u32 ipv4_nh; struct in6_addr ipv6_nh; }; }; struct bpf_redirect_info { u32 flags; u32 tgt_index; void *tgt_value; struct bpf_map *map; u32 kern_flags; struct bpf_nh_params nh; }; DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info); /* flags for bpf_redirect_info kern_flags */ #define BPF_RI_F_RF_NO_DIRECT BIT(0) /* no napi_direct on return_frame */ /* Compute the linear packet data range [data, data_end) which * will be accessed by various program types (cls_bpf, act_bpf, * lwt, ...). Subsystems allowing direct data access must (!) * ensure that cb[] area can be written to when BPF program is * invoked (otherwise cb[] save/restore is necessary). */ static inline void bpf_compute_data_pointers(struct sk_buff *skb) { struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb; BUILD_BUG_ON(sizeof(*cb) > sizeof_field(struct sk_buff, cb)); cb->data_meta = skb->data - skb_metadata_len(skb); cb->data_end = skb->data + skb_headlen(skb); } /* Similar to bpf_compute_data_pointers(), except that save orginal * data in cb->data and cb->meta_data for restore. */ static inline void bpf_compute_and_save_data_end( struct sk_buff *skb, void **saved_data_end) { struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb; *saved_data_end = cb->data_end; cb->data_end = skb->data + skb_headlen(skb); } /* Restore data saved by bpf_compute_data_pointers(). */ static inline void bpf_restore_data_end( struct sk_buff *skb, void *saved_data_end) { struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb; cb->data_end = saved_data_end; } static inline u8 *bpf_skb_cb(struct sk_buff *skb) { /* eBPF programs may read/write skb->cb[] area to transfer meta * data between tail calls. Since this also needs to work with * tc, that scratch memory is mapped to qdisc_skb_cb's data area. * * In some socket filter cases, the cb unfortunately needs to be * saved/restored so that protocol specific skb->cb[] data won't * be lost. In any case, due to unpriviledged eBPF programs * attached to sockets, we need to clear the bpf_skb_cb() area * to not leak previous contents to user space. */ BUILD_BUG_ON(sizeof_field(struct __sk_buff, cb) != BPF_SKB_CB_LEN); BUILD_BUG_ON(sizeof_field(struct __sk_buff, cb) != sizeof_field(struct qdisc_skb_cb, data)); return qdisc_skb_cb(skb)->data; } /* Must be invoked with migration disabled */ static inline u32 __bpf_prog_run_save_cb(const struct bpf_prog *prog, struct sk_buff *skb) { u8 *cb_data = bpf_skb_cb(skb); u8 cb_saved[BPF_SKB_CB_LEN]; u32 res; if (unlikely(prog->cb_access)) { memcpy(cb_saved, cb_data, sizeof(cb_saved)); memset(cb_data, 0, sizeof(cb_saved)); } res = BPF_PROG_RUN(prog, skb); if (unlikely(prog->cb_access)) memcpy(cb_data, cb_saved, sizeof(cb_saved)); return res; } static inline u32 bpf_prog_run_save_cb(const struct bpf_prog *prog, struct sk_buff *skb) { u32 res; migrate_disable(); res = __bpf_prog_run_save_cb(prog, skb); migrate_enable(); return res; } static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, struct sk_buff *skb) { u8 *cb_data = bpf_skb_cb(skb); u32 res; if (unlikely(prog->cb_access)) memset(cb_data, 0, BPF_SKB_CB_LEN); res = bpf_prog_run_pin_on_cpu(prog, skb); return res; } DECLARE_BPF_DISPATCHER(xdp) static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog, struct xdp_buff *xdp) { /* Caller needs to hold rcu_read_lock() (!), otherwise program * can be released while still running, or map elements could be * freed early while still having concurrent users. XDP fastpath * already takes rcu_read_lock() when fetching the program, so * it's not necessary here anymore. */ return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp)); } void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog); static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog) { return prog->len * sizeof(struct bpf_insn); } static inline u32 bpf_prog_tag_scratch_size(const struct bpf_prog *prog) { return round_up(bpf_prog_insn_size(prog) + sizeof(__be64) + 1, SHA1_BLOCK_SIZE); } static inline unsigned int bpf_prog_size(unsigned int proglen) { return max(sizeof(struct bpf_prog), offsetof(struct bpf_prog, insns[proglen])); } static inline bool bpf_prog_was_classic(const struct bpf_prog *prog) { /* When classic BPF programs have been loaded and the arch * does not have a classic BPF JIT (anymore), they have been * converted via bpf_migrate_filter() to eBPF and thus always * have an unspec program type. */ return prog->type == BPF_PROG_TYPE_UNSPEC; } static inline u32 bpf_ctx_off_adjust_machine(u32 size) { const u32 size_machine = sizeof(unsigned long); if (size > size_machine && size % size_machine == 0) size = size_machine; return size; } static inline bool bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default) { return size <= size_default && (size & (size - 1)) == 0; } static inline u8 bpf_ctx_narrow_access_offset(u32 off, u32 size, u32 size_default) { u8 access_off = off & (size_default - 1); #ifdef __LITTLE_ENDIAN return access_off; #else return size_default - (access_off + size); #endif } #define bpf_ctx_wide_access_ok(off, size, type, field) \ (size == sizeof(__u64) && \ off >= offsetof(type, field) && \ off + sizeof(__u64) <= offsetofend(type, field) && \ off % sizeof(__u64) == 0) #define bpf_classic_proglen(fprog) (fprog->len * sizeof(fprog->filter[0])) static inline void bpf_prog_lock_ro(struct bpf_prog *fp) { #ifndef CONFIG_BPF_JIT_ALWAYS_ON if (!fp->jited) { set_vm_flush_reset_perms(fp); set_memory_ro((unsigned long)fp, fp->pages); } #endif } static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) { set_vm_flush_reset_perms(hdr); set_memory_ro((unsigned long)hdr, hdr->pages); set_memory_x((unsigned long)hdr, hdr->pages); } static inline struct bpf_binary_header * bpf_jit_binary_hdr(const struct bpf_prog *fp) { unsigned long real_start = (unsigned long)fp->bpf_func; unsigned long addr = real_start & PAGE_MASK; return (void *)addr; } int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap); static inline int sk_filter(struct sock *sk, struct sk_buff *skb) { return sk_filter_trim_cap(sk, skb, 1); } struct bpf_prog *bpf_prog_select_runtime(struct bpf_prog *fp, int *err); void bpf_prog_free(struct bpf_prog *fp); bool bpf_opcode_in_insntable(u8 code); void bpf_prog_free_linfo(struct bpf_prog *prog); void bpf_prog_fill_jited_linfo(struct bpf_prog *prog, const u32 *insn_to_jit_off); int bpf_prog_alloc_jited_linfo(struct bpf_prog *prog); void bpf_prog_free_jited_linfo(struct bpf_prog *prog); void bpf_prog_free_unused_jited_linfo(struct bpf_prog *prog); struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags); struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flags); struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size, gfp_t gfp_extra_flags); void __bpf_prog_free(struct bpf_prog *fp); static inline void bpf_prog_unlock_free(struct bpf_prog *fp) { __bpf_prog_free(fp); } typedef int (*bpf_aux_classic_check_t)(struct sock_filter *filter, unsigned int flen); int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog); int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog, bpf_aux_classic_check_t trans, bool save_orig); void bpf_prog_destroy(struct bpf_prog *fp); int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk); int sk_attach_bpf(u32 ufd, struct sock *sk); int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk); int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk); void sk_reuseport_prog_free(struct bpf_prog *prog); int sk_detach_filter(struct sock *sk); int sk_get_filter(struct sock *sk, struct sock_filter __user *filter, unsigned int len); bool sk_filter_charge(struct sock *sk, struct sk_filter *fp); void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp); u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); #define __bpf_call_base_args \ ((u64 (*)(u64, u64, u64, u64, u64, const struct bpf_insn *)) \ (void *)__bpf_call_base) struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog); void bpf_jit_compile(struct bpf_prog *prog); bool bpf_jit_needs_zext(void); bool bpf_helper_changes_pkt_data(void *func); static inline bool bpf_dump_raw_ok(const struct cred *cred) { /* Reconstruction of call-sites is dependent on kallsyms, * thus make dump the same restriction. */ return kallsyms_show_value(cred); } struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off, const struct bpf_insn *patch, u32 len); int bpf_remove_insns(struct bpf_prog *prog, u32 off, u32 cnt); void bpf_clear_redirect_map(struct bpf_map *map); static inline bool xdp_return_frame_no_direct(void) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); return ri->kern_flags & BPF_RI_F_RF_NO_DIRECT; } static inline void xdp_set_return_frame_no_direct(void) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); ri->kern_flags |= BPF_RI_F_RF_NO_DIRECT; } static inline void xdp_clear_return_frame_no_direct(void) { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); ri->kern_flags &= ~BPF_RI_F_RF_NO_DIRECT; } static inline int xdp_ok_fwd_dev(const struct net_device *fwd, unsigned int pktlen) { unsigned int len; if (unlikely(!(fwd->flags & IFF_UP))) return -ENETDOWN; len = fwd->mtu + fwd->hard_header_len + VLAN_HLEN; if (pktlen > len) return -EMSGSIZE; return 0; } /* The pair of xdp_do_redirect and xdp_do_flush MUST be called in the * same cpu context. Further for best results no more than a single map * for the do_redirect/do_flush pair should be used. This limitation is * because we only track one map and force a flush when the map changes. * This does not appear to be a real limitation for existing software. */ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb, struct xdp_buff *xdp, struct bpf_prog *prog); int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, struct bpf_prog *prog); void xdp_do_flush(void); /* The xdp_do_flush_map() helper has been renamed to drop the _map suffix, as * it is no longer only flushing maps. Keep this define for compatibility * until all drivers are updated - do not use xdp_do_flush_map() in new code! */ #define xdp_do_flush_map xdp_do_flush void bpf_warn_invalid_xdp_action(u32 act); #ifdef CONFIG_INET struct sock *bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, struct bpf_prog *prog, struct sk_buff *skb, u32 hash); #else static inline struct sock * bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk, struct bpf_prog *prog, struct sk_buff *skb, u32 hash) { return NULL; } #endif #ifdef CONFIG_BPF_JIT extern int bpf_jit_enable; extern int bpf_jit_harden; extern int bpf_jit_kallsyms; extern long bpf_jit_limit; extern long bpf_jit_limit_max; typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size); struct bpf_binary_header * bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, unsigned int alignment, bpf_jit_fill_hole_t bpf_fill_ill_insns); void bpf_jit_binary_free(struct bpf_binary_header *hdr); u64 bpf_jit_alloc_exec_limit(void); void *bpf_jit_alloc_exec(unsigned long size); void bpf_jit_free_exec(void *addr); void bpf_jit_free(struct bpf_prog *fp); int bpf_jit_add_poke_descriptor(struct bpf_prog *prog, struct bpf_jit_poke_descriptor *poke); int bpf_jit_get_func_addr(const struct bpf_prog *prog, const struct bpf_insn *insn, bool extra_pass, u64 *func_addr, bool *func_addr_fixed); struct bpf_prog *bpf_jit_blind_constants(struct bpf_prog *fp); void bpf_jit_prog_release_other(struct bpf_prog *fp, struct bpf_prog *fp_other); static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen, u32 pass, void *image) { pr_err("flen=%u proglen=%u pass=%u image=%pK from=%s pid=%d\n", flen, proglen, pass, image, current->comm, task_pid_nr(current)); if (image) print_hex_dump(KERN_ERR, "JIT code: ", DUMP_PREFIX_OFFSET, 16, 1, image, proglen, false); } static inline bool bpf_jit_is_ebpf(void) { # ifdef CONFIG_HAVE_EBPF_JIT return true; # else return false; # endif } static inline bool ebpf_jit_enabled(void) { return bpf_jit_enable && bpf_jit_is_ebpf(); } static inline bool bpf_prog_ebpf_jited(const struct bpf_prog *fp) { return fp->jited && bpf_jit_is_ebpf(); } static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog) { /* These are the prerequisites, should someone ever have the * idea to call blinding outside of them, we make sure to * bail out. */ if (!bpf_jit_is_ebpf()) return false; if (!prog->jit_requested) return false; if (!bpf_jit_harden) return false; if (bpf_jit_harden == 1 && capable(CAP_SYS_ADMIN)) return false; return true; } static inline bool bpf_jit_kallsyms_enabled(void) { /* There are a couple of corner cases where kallsyms should * not be enabled f.e. on hardening. */ if (bpf_jit_harden) return false; if (!bpf_jit_kallsyms) return false; if (bpf_jit_kallsyms == 1) return true; return false; } const char *__bpf_address_lookup(unsigned long addr, unsigned long *size, unsigned long *off, char *sym); bool is_bpf_text_address(unsigned long addr); int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type, char *sym); static inline const char * bpf_address_lookup(unsigned long addr, unsigned long *size, unsigned long *off, char **modname, char *sym) { const char *ret = __bpf_address_lookup(addr, size, off, sym); if (ret && modname) *modname = NULL; return ret; } void bpf_prog_kallsyms_add(struct bpf_prog *fp); void bpf_prog_kallsyms_del(struct bpf_prog *fp); #else /* CONFIG_BPF_JIT */ static inline bool ebpf_jit_enabled(void) { return false; } static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog) { return false; } static inline bool bpf_prog_ebpf_jited(const struct bpf_prog *fp) { return false; } static inline int bpf_jit_add_poke_descriptor(struct bpf_prog *prog, struct bpf_jit_poke_descriptor *poke) { return -ENOTSUPP; } static inline void bpf_jit_free(struct bpf_prog *fp) { bpf_prog_unlock_free(fp); } static inline bool bpf_jit_kallsyms_enabled(void) { return false; } static inline const char * __bpf_address_lookup(unsigned long addr, unsigned long *size, unsigned long *off, char *sym) { return NULL; } static inline bool is_bpf_text_address(unsigned long addr) { return false; } static inline int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type, char *sym) { return -ERANGE; } static inline const char * bpf_address_lookup(unsigned long addr, unsigned long *size, unsigned long *off, char **modname, char *sym) { return NULL; } static inline void bpf_prog_kallsyms_add(struct bpf_prog *fp) { } static inline void bpf_prog_kallsyms_del(struct bpf_prog *fp) { } #endif /* CONFIG_BPF_JIT */ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp); #define BPF_ANC BIT(15) static inline bool bpf_needs_clear_a(const struct sock_filter *first) { switch (first->code) { case BPF_RET | BPF_K: case BPF_LD | BPF_W | BPF_LEN: return false; case BPF_LD | BPF_W | BPF_ABS: case BPF_LD | BPF_H | BPF_ABS: case BPF_LD | BPF_B | BPF_ABS: if (first->k == SKF_AD_OFF + SKF_AD_ALU_XOR_X) return true; return false; default: return true; } } static inline u16 bpf_anc_helper(const struct sock_filter *ftest) { BUG_ON(ftest->code & BPF_ANC); switch (ftest->code) { case BPF_LD | BPF_W | BPF_ABS: case BPF_LD | BPF_H | BPF_ABS: case BPF_LD | BPF_B | BPF_ABS: #define BPF_ANCILLARY(CODE) case SKF_AD_OFF + SKF_AD_##CODE: \ return BPF_ANC | SKF_AD_##CODE switch (ftest->k) { BPF_ANCILLARY(PROTOCOL); BPF_ANCILLARY(PKTTYPE); BPF_ANCILLARY(IFINDEX); BPF_ANCILLARY(NLATTR); BPF_ANCILLARY(NLATTR_NEST); BPF_ANCILLARY(MARK); BPF_ANCILLARY(QUEUE); BPF_ANCILLARY(HATYPE); BPF_ANCILLARY(RXHASH); BPF_ANCILLARY(CPU); BPF_ANCILLARY(ALU_XOR_X); BPF_ANCILLARY(VLAN_TAG); BPF_ANCILLARY(VLAN_TAG_PRESENT); BPF_ANCILLARY(PAY_OFFSET); BPF_ANCILLARY(RANDOM); BPF_ANCILLARY(VLAN_TPID); } fallthrough; default: return ftest->code; } } void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, int k, unsigned int size); static inline void *bpf_load_pointer(const struct sk_buff *skb, int k, unsigned int size, void *buffer) { if (k >= 0) return skb_header_pointer(skb, k, size, buffer); return bpf_internal_load_pointer_neg_helper(skb, k, size); } static inline int bpf_tell_extensions(void) { return SKF_AD_MAX; } struct bpf_sock_addr_kern { struct sock *sk; struct sockaddr *uaddr; /* Temporary "register" to make indirect stores to nested structures * defined above. We need three registers to make such a store, but * only two (src and dst) are available at convert_ctx_access time */ u64 tmp_reg; void *t_ctx; /* Attach type specific context. */ }; struct bpf_sock_ops_kern { struct sock *sk; union { u32 args[4]; u32 reply; u32 replylong[4]; }; struct sk_buff *syn_skb; struct sk_buff *skb; void *skb_data_end; u8 op; u8 is_fullsock; u8 remaining_opt_len; u64 temp; /* temp and everything after is not * initialized to 0 before calling * the BPF program. New fields that * should be initialized to 0 should * be inserted before temp. * temp is scratch storage used by * sock_ops_convert_ctx_access * as temporary storage of a register. */ }; struct bpf_sysctl_kern { struct ctl_table_header *head; struct ctl_table *table; void *cur_val; size_t cur_len; void *new_val; size_t new_len; int new_updated; int write; loff_t *ppos; /* Temporary "register" for indirect stores to ppos. */ u64 tmp_reg; }; struct bpf_sockopt_kern { struct sock *sk; u8 *optval; u8 *optval_end; s32 level; s32 optname; s32 optlen; s32 retval; }; int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len); struct bpf_sk_lookup_kern { u16 family; u16 protocol; __be16 sport; u16 dport; struct { __be32 saddr; __be32 daddr; } v4; struct { const struct in6_addr *saddr; const struct in6_addr *daddr; } v6; struct sock *selected_sk; bool no_reuseport; }; extern struct static_key_false bpf_sk_lookup_enabled; /* Runners for BPF_SK_LOOKUP programs to invoke on socket lookup. * * Allowed return values for a BPF SK_LOOKUP program are SK_PASS and * SK_DROP. Their meaning is as follows: * * SK_PASS && ctx.selected_sk != NULL: use selected_sk as lookup result * SK_PASS && ctx.selected_sk == NULL: continue to htable-based socket lookup * SK_DROP : terminate lookup with -ECONNREFUSED * * This macro aggregates return values and selected sockets from * multiple BPF programs according to following rules in order: * * 1. If any program returned SK_PASS and a non-NULL ctx.selected_sk, * macro result is SK_PASS and last ctx.selected_sk is used. * 2. If any program returned SK_DROP return value, * macro result is SK_DROP. * 3. Otherwise result is SK_PASS and ctx.selected_sk is NULL. * * Caller must ensure that the prog array is non-NULL, and that the * array as well as the programs it contains remain valid. */ #define BPF_PROG_SK_LOOKUP_RUN_ARRAY(array, ctx, func) \ ({ \ struct bpf_sk_lookup_kern *_ctx = &(ctx); \ struct bpf_prog_array_item *_item; \ struct sock *_selected_sk = NULL; \ bool _no_reuseport = false; \ struct bpf_prog *_prog; \ bool _all_pass = true; \ u32 _ret; \ \ migrate_disable(); \ _item = &(array)->items[0]; \ while ((_prog = READ_ONCE(_item->prog))) { \ /* restore most recent selection */ \ _ctx->selected_sk = _selected_sk; \ _ctx->no_reuseport = _no_reuseport; \ \ _ret = func(_prog, _ctx); \ if (_ret == SK_PASS && _ctx->selected_sk) { \ /* remember last non-NULL socket */ \ _selected_sk = _ctx->selected_sk; \ _no_reuseport = _ctx->no_reuseport; \ } else if (_ret == SK_DROP && _all_pass) { \ _all_pass = false; \ } \ _item++; \ } \ _ctx->selected_sk = _selected_sk; \ _ctx->no_reuseport = _no_reuseport; \ migrate_enable(); \ _all_pass || _selected_sk ? SK_PASS : SK_DROP; \ }) static inline bool bpf_sk_lookup_run_v4(struct net *net, int protocol, const __be32 saddr, const __be16 sport, const __be32 daddr, const u16 dport, struct sock **psk) { struct bpf_prog_array *run_array; struct sock *selected_sk = NULL; bool no_reuseport = false; rcu_read_lock(); run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]); if (run_array) { struct bpf_sk_lookup_kern ctx = { .family = AF_INET, .protocol = protocol, .v4.saddr = saddr, .v4.daddr = daddr, .sport = sport, .dport = dport, }; u32 act; act = BPF_PROG_SK_LOOKUP_RUN_ARRAY(run_array, ctx, BPF_PROG_RUN); if (act == SK_PASS) { selected_sk = ctx.selected_sk; no_reuseport = ctx.no_reuseport; } else { selected_sk = ERR_PTR(-ECONNREFUSED); } } rcu_read_unlock(); *psk = selected_sk; return no_reuseport; } #if IS_ENABLED(CONFIG_IPV6) static inline bool bpf_sk_lookup_run_v6(struct net *net, int protocol, const struct in6_addr *saddr, const __be16 sport, const struct in6_addr *daddr, const u16 dport, struct sock **psk) { struct bpf_prog_array *run_array; struct sock *selected_sk = NULL; bool no_reuseport = false; rcu_read_lock(); run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]); if (run_array) { struct bpf_sk_lookup_kern ctx = { .family = AF_INET6, .protocol = protocol, .v6.saddr = saddr, .v6.daddr = daddr, .sport = sport, .dport = dport, }; u32 act; act = BPF_PROG_SK_LOOKUP_RUN_ARRAY(run_array, ctx, BPF_PROG_RUN); if (act == SK_PASS) { selected_sk = ctx.selected_sk; no_reuseport = ctx.no_reuseport; } else { selected_sk = ERR_PTR(-ECONNREFUSED); } } rcu_read_unlock(); *psk = selected_sk; return no_reuseport; } #endif /* IS_ENABLED(CONFIG_IPV6) */ #endif /* __LINUX_FILTER_H__ */
1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef INT_BLK_MQ_H #define INT_BLK_MQ_H #include "blk-stat.h" #include "blk-mq-tag.h" struct blk_mq_tag_set; struct blk_mq_ctxs { struct kobject kobj; struct blk_mq_ctx __percpu *queue_ctx; }; /** * struct blk_mq_ctx - State for a software queue facing the submitting CPUs */ struct blk_mq_ctx { struct { spinlock_t lock; struct list_head rq_lists[HCTX_MAX_TYPES]; } ____cacheline_aligned_in_smp; unsigned int cpu; unsigned short index_hw[HCTX_MAX_TYPES]; struct blk_mq_hw_ctx *hctxs[HCTX_MAX_TYPES]; /* incremented at dispatch time */ unsigned long rq_dispatched[2]; unsigned long rq_merged; /* incremented at completion time */ unsigned long ____cacheline_aligned_in_smp rq_completed[2]; struct request_queue *queue; struct blk_mq_ctxs *ctxs; struct kobject kobj; } ____cacheline_aligned_in_smp; void blk_mq_exit_queue(struct request_queue *q); int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr); void blk_mq_wake_waiters(struct request_queue *q); bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *, unsigned int); void blk_mq_add_to_requeue_list(struct request *rq, bool at_head, bool kick_requeue_list); void blk_mq_flush_busy_ctxs(struct blk_mq_hw_ctx *hctx, struct list_head *list); struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *start); void blk_mq_put_rq_ref(struct request *rq); /* * Internal helpers for allocating/freeing the request map */ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, unsigned int hctx_idx); void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags); struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, unsigned int hctx_idx, unsigned int nr_tags, unsigned int reserved_tags, unsigned int flags); int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, unsigned int hctx_idx, unsigned int depth); /* * Internal helpers for request insertion into sw queues */ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, bool at_head); void blk_mq_request_bypass_insert(struct request *rq, bool at_head, bool run_queue); void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx, struct list_head *list); /* Used by blk_insert_cloned_request() to issue request directly */ blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last); void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx, struct list_head *list); /* * CPU -> queue mappings */ extern int blk_mq_hw_queue_to_node(struct blk_mq_queue_map *qmap, unsigned int); /* * blk_mq_map_queue_type() - map (hctx_type,cpu) to hardware queue * @q: request queue * @type: the hctx type index * @cpu: CPU */ static inline struct blk_mq_hw_ctx *blk_mq_map_queue_type(struct request_queue *q, enum hctx_type type, unsigned int cpu) { return q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]]; } /* * blk_mq_map_queue() - map (cmd_flags,type) to hardware queue * @q: request queue * @flags: request command flags * @cpu: cpu ctx */ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q, unsigned int flags, struct blk_mq_ctx *ctx) { enum hctx_type type = HCTX_TYPE_DEFAULT; /* * The caller ensure that if REQ_HIPRI, poll must be enabled. */ if (flags & REQ_HIPRI) type = HCTX_TYPE_POLL; else if ((flags & REQ_OP_MASK) == REQ_OP_READ) type = HCTX_TYPE_READ; return ctx->hctxs[type]; } /* * sysfs helpers */ extern void blk_mq_sysfs_init(struct request_queue *q); extern void blk_mq_sysfs_deinit(struct request_queue *q); extern int __blk_mq_register_dev(struct device *dev, struct request_queue *q); extern int blk_mq_sysfs_register(struct request_queue *q); extern void blk_mq_sysfs_unregister(struct request_queue *q); extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx); void blk_mq_release(struct request_queue *q); static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q, unsigned int cpu) { return per_cpu_ptr(q->queue_ctx, cpu); } /* * This assumes per-cpu software queueing queues. They could be per-node * as well, for instance. For now this is hardcoded as-is. Note that we don't * care about preemption, since we know the ctx's are persistent. This does * mean that we can't rely on ctx always matching the currently running CPU. */ static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q) { return __blk_mq_get_ctx(q, raw_smp_processor_id()); } struct blk_mq_alloc_data { /* input parameter */ struct request_queue *q; blk_mq_req_flags_t flags; unsigned int shallow_depth; unsigned int cmd_flags; /* input & output parameter */ struct blk_mq_ctx *ctx; struct blk_mq_hw_ctx *hctx; }; static inline bool blk_mq_is_sbitmap_shared(unsigned int flags) { return flags & BLK_MQ_F_TAG_HCTX_SHARED; } static inline struct blk_mq_tags *blk_mq_tags_from_data(struct blk_mq_alloc_data *data) { if (data->q->elevator) return data->hctx->sched_tags; return data->hctx->tags; } static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx) { return test_bit(BLK_MQ_S_STOPPED, &hctx->state); } static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx) { return hctx->nr_ctx && hctx->tags; } unsigned int blk_mq_in_flight(struct request_queue *q, struct hd_struct *part); void blk_mq_in_flight_rw(struct request_queue *q, struct hd_struct *part, unsigned int inflight[2]); static inline void blk_mq_put_dispatch_budget(struct request_queue *q) { if (q->mq_ops->put_budget) q->mq_ops->put_budget(q); } static inline bool blk_mq_get_dispatch_budget(struct request_queue *q) { if (q->mq_ops->get_budget) return q->mq_ops->get_budget(q); return true; } static inline void __blk_mq_inc_active_requests(struct blk_mq_hw_ctx *hctx) { if (blk_mq_is_sbitmap_shared(hctx->flags)) atomic_inc(&hctx->queue->nr_active_requests_shared_sbitmap); else atomic_inc(&hctx->nr_active); } static inline void __blk_mq_dec_active_requests(struct blk_mq_hw_ctx *hctx) { if (blk_mq_is_sbitmap_shared(hctx->flags)) atomic_dec(&hctx->queue->nr_active_requests_shared_sbitmap); else atomic_dec(&hctx->nr_active); } static inline int __blk_mq_active_requests(struct blk_mq_hw_ctx *hctx) { if (blk_mq_is_sbitmap_shared(hctx->flags)) return atomic_read(&hctx->queue->nr_active_requests_shared_sbitmap); return atomic_read(&hctx->nr_active); } static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx, struct request *rq) { blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag); rq->tag = BLK_MQ_NO_TAG; if (rq->rq_flags & RQF_MQ_INFLIGHT) { rq->rq_flags &= ~RQF_MQ_INFLIGHT; __blk_mq_dec_active_requests(hctx); } } static inline void blk_mq_put_driver_tag(struct request *rq) { if (rq->tag == BLK_MQ_NO_TAG || rq->internal_tag == BLK_MQ_NO_TAG) return; __blk_mq_put_driver_tag(rq->mq_hctx, rq); } static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap) { int cpu; for_each_possible_cpu(cpu) qmap->mq_map[cpu] = 0; } /* * blk_mq_plug() - Get caller context plug * @q: request queue * @bio : the bio being submitted by the caller context * * Plugging, by design, may delay the insertion of BIOs into the elevator in * order to increase BIO merging opportunities. This however can cause BIO * insertion order to change from the order in which submit_bio() is being * executed in the case of multiple contexts concurrently issuing BIOs to a * device, even if these context are synchronized to tightly control BIO issuing * order. While this is not a problem with regular block devices, this ordering * change can cause write BIO failures with zoned block devices as these * require sequential write patterns to zones. Prevent this from happening by * ignoring the plug state of a BIO issuing context if the target request queue * is for a zoned block device and the BIO to plug is a write operation. * * Return current->plug if the bio can be plugged and NULL otherwise */ static inline struct blk_plug *blk_mq_plug(struct request_queue *q, struct bio *bio) { /* * For regular block devices or read operations, use the context plug * which may be NULL if blk_start_plug() was not executed. */ if (!blk_queue_is_zoned(q) || !op_is_write(bio_op(bio))) return current->plug; /* Zoned block device write operation case: do not plug the BIO */ return NULL; } /* * For shared tag users, we track the number of currently active users * and attempt to provide a fair share of the tag depth for each of them. */ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt) { unsigned int depth, users; if (!hctx || !(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) return true; /* * Don't try dividing an ant */ if (bt->sb.depth == 1) return true; if (blk_mq_is_sbitmap_shared(hctx->flags)) { struct request_queue *q = hctx->queue; struct blk_mq_tag_set *set = q->tag_set; if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) return true; users = atomic_read(&set->active_queues_shared_sbitmap); } else { if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) return true; users = atomic_read(&hctx->tags->active_queues); } if (!users) return true; /* * Allow at least some tags */ depth = max((bt->sb.depth + users - 1) / users, 4U); return __blk_mq_active_requests(hctx) < depth; } #endif
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 // SPDX-License-Identifier: GPL-2.0-or-later /* * libata-scsi.c - helper library for ATA * * Copyright 2003-2004 Red Hat, Inc. All rights reserved. * Copyright 2003-2004 Jeff Garzik * * libata documentation is available via 'make {ps|pdf}docs', * as Documentation/driver-api/libata.rst * * Hardware documentation available from * - http://www.t10.org/ * - http://www.t13.org/ */ #include <linux/compat.h> #include <linux/slab.h> #include <linux/kernel.h> #include <linux/blkdev.h> #include <linux/spinlock.h> #include <linux/export.h> #include <scsi/scsi.h> #include <scsi/scsi_host.h> #include <scsi/scsi_cmnd.h> #include <scsi/scsi_eh.h> #include <scsi/scsi_device.h> #include <scsi/scsi_tcq.h> #include <scsi/scsi_transport.h> #include <linux/libata.h> #include <linux/hdreg.h> #include <linux/uaccess.h> #include <linux/suspend.h> #include <asm/unaligned.h> #include <linux/ioprio.h> #include <linux/of.h> #include "libata.h" #include "libata-transport.h" #define ATA_SCSI_RBUF_SIZE 576 static DEFINE_SPINLOCK(ata_scsi_rbuf_lock); static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE]; typedef unsigned int (*ata_xlat_func_t)(struct ata_queued_cmd *qc); static struct ata_device *__ata_scsi_find_dev(struct ata_port *ap, const struct scsi_device *scsidev); #define RW_RECOVERY_MPAGE 0x1 #define RW_RECOVERY_MPAGE_LEN 12 #define CACHE_MPAGE 0x8 #define CACHE_MPAGE_LEN 20 #define CONTROL_MPAGE 0xa #define CONTROL_MPAGE_LEN 12 #define ALL_MPAGES 0x3f #define ALL_SUB_MPAGES 0xff static const u8 def_rw_recovery_mpage[RW_RECOVERY_MPAGE_LEN] = { RW_RECOVERY_MPAGE, RW_RECOVERY_MPAGE_LEN - 2, (1 << 7), /* AWRE */ 0, /* read retry count */ 0, 0, 0, 0, 0, /* write retry count */ 0, 0, 0 }; static const u8 def_cache_mpage[CACHE_MPAGE_LEN] = { CACHE_MPAGE, CACHE_MPAGE_LEN - 2, 0, /* contains WCE, needs to be 0 for logic */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* contains DRA, needs to be 0 for logic */ 0, 0, 0, 0, 0, 0, 0 }; static const u8 def_control_mpage[CONTROL_MPAGE_LEN] = { CONTROL_MPAGE, CONTROL_MPAGE_LEN - 2, 2, /* DSENSE=0, GLTSD=1 */ 0, /* [QAM+QERR may be 1, see 05-359r1] */ 0, 0, 0, 0, 0xff, 0xff, 0, 30 /* extended self test time, see 05-359r1 */ }; static ssize_t ata_scsi_park_show(struct device *device, struct device_attribute *attr, char *buf) { struct scsi_device *sdev = to_scsi_device(device); struct ata_port *ap; struct ata_link *link; struct ata_device *dev; unsigned long now; unsigned int msecs; int rc = 0; ap = ata_shost_to_port(sdev->host); spin_lock_irq(ap->lock); dev = ata_scsi_find_dev(ap, sdev); if (!dev) { rc = -ENODEV; goto unlock; } if (dev->flags & ATA_DFLAG_NO_UNLOAD) { rc = -EOPNOTSUPP; goto unlock; } link = dev->link; now = jiffies; if (ap->pflags & ATA_PFLAG_EH_IN_PROGRESS && link->eh_context.unloaded_mask & (1 << dev->devno) && time_after(dev->unpark_deadline, now)) msecs = jiffies_to_msecs(dev->unpark_deadline - now); else msecs = 0; unlock: spin_unlock_irq(ap->lock); return rc ? rc : snprintf(buf, 20, "%u\n", msecs); } static ssize_t ata_scsi_park_store(struct device *device, struct device_attribute *attr, const char *buf, size_t len) { struct scsi_device *sdev = to_scsi_device(device); struct ata_port *ap; struct ata_device *dev; long int input; unsigned long flags; int rc; rc = kstrtol(buf, 10, &input); if (rc) return rc; if (input < -2) return -EINVAL; if (input > ATA_TMOUT_MAX_PARK) { rc = -EOVERFLOW; input = ATA_TMOUT_MAX_PARK; } ap = ata_shost_to_port(sdev->host); spin_lock_irqsave(ap->lock, flags); dev = ata_scsi_find_dev(ap, sdev); if (unlikely(!dev)) { rc = -ENODEV; goto unlock; } if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ZAC) { rc = -EOPNOTSUPP; goto unlock; } if (input >= 0) { if (dev->flags & ATA_DFLAG_NO_UNLOAD) { rc = -EOPNOTSUPP; goto unlock; } dev->unpark_deadline = ata_deadline(jiffies, input); dev->link->eh_info.dev_action[dev->devno] |= ATA_EH_PARK; ata_port_schedule_eh(ap); complete(&ap->park_req_pending); } else { switch (input) { case -1: dev->flags &= ~ATA_DFLAG_NO_UNLOAD; break; case -2: dev->flags |= ATA_DFLAG_NO_UNLOAD; break; } } unlock: spin_unlock_irqrestore(ap->lock, flags); return rc ? rc : len; } DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR, ata_scsi_park_show, ata_scsi_park_store); EXPORT_SYMBOL_GPL(dev_attr_unload_heads); void ata_scsi_set_sense(struct ata_device *dev, struct scsi_cmnd *cmd, u8 sk, u8 asc, u8 ascq) { bool d_sense = (dev->flags & ATA_DFLAG_D_SENSE); if (!cmd) return; cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION; scsi_build_sense_buffer(d_sense, cmd->sense_buffer, sk, asc, ascq); } void ata_scsi_set_sense_information(struct ata_device *dev, struct scsi_cmnd *cmd, const struct ata_taskfile *tf) { u64 information; if (!cmd) return; information = ata_tf_read_block(tf, dev); if (information == U64_MAX) return; scsi_set_sense_information(cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE, information); } static void ata_scsi_set_invalid_field(struct ata_device *dev, struct scsi_cmnd *cmd, u16 field, u8 bit) { ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x24, 0x0); /* "Invalid field in CDB" */ scsi_set_sense_field_pointer(cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE, field, bit, 1); } static void ata_scsi_set_invalid_parameter(struct ata_device *dev, struct scsi_cmnd *cmd, u16 field) { /* "Invalid field in parameter list" */ ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x26, 0x0); scsi_set_sense_field_pointer(cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE, field, 0xff, 0); } struct device_attribute *ata_common_sdev_attrs[] = { &dev_attr_unload_heads, NULL }; EXPORT_SYMBOL_GPL(ata_common_sdev_attrs); /** * ata_std_bios_param - generic bios head/sector/cylinder calculator used by sd. * @sdev: SCSI device for which BIOS geometry is to be determined * @bdev: block device associated with @sdev * @capacity: capacity of SCSI device * @geom: location to which geometry will be output * * Generic bios head/sector/cylinder calculator * used by sd. Most BIOSes nowadays expect a XXX/255/16 (CHS) * mapping. Some situations may arise where the disk is not * bootable if this is not used. * * LOCKING: * Defined by the SCSI layer. We don't really care. * * RETURNS: * Zero. */ int ata_std_bios_param(struct scsi_device *sdev, struct block_device *bdev, sector_t capacity, int geom[]) { geom[0] = 255; geom[1] = 63; sector_div(capacity, 255*63); geom[2] = capacity; return 0; } EXPORT_SYMBOL_GPL(ata_std_bios_param); /** * ata_scsi_unlock_native_capacity - unlock native capacity * @sdev: SCSI device to adjust device capacity for * * This function is called if a partition on @sdev extends beyond * the end of the device. It requests EH to unlock HPA. * * LOCKING: * Defined by the SCSI layer. Might sleep. */ void ata_scsi_unlock_native_capacity(struct scsi_device *sdev) { struct ata_port *ap = ata_shost_to_port(sdev->host); struct ata_device *dev; unsigned long flags; spin_lock_irqsave(ap->lock, flags); dev = ata_scsi_find_dev(ap, sdev); if (dev && dev->n_sectors < dev->n_native_sectors) { dev->flags |= ATA_DFLAG_UNLOCK_HPA; dev->link->eh_info.action |= ATA_EH_RESET; ata_port_schedule_eh(ap); } spin_unlock_irqrestore(ap->lock, flags); ata_port_wait_eh(ap); } EXPORT_SYMBOL_GPL(ata_scsi_unlock_native_capacity); /** * ata_get_identity - Handler for HDIO_GET_IDENTITY ioctl * @ap: target port * @sdev: SCSI device to get identify data for * @arg: User buffer area for identify data * * LOCKING: * Defined by the SCSI layer. We don't really care. * * RETURNS: * Zero on success, negative errno on error. */ static int ata_get_identity(struct ata_port *ap, struct scsi_device *sdev, void __user *arg) { struct ata_device *dev = ata_scsi_find_dev(ap, sdev); u16 __user *dst = arg; char buf[40]; if (!dev) return -ENOMSG; if (copy_to_user(dst, dev->id, ATA_ID_WORDS * sizeof(u16))) return -EFAULT; ata_id_string(dev->id, buf, ATA_ID_PROD, ATA_ID_PROD_LEN); if (copy_to_user(dst + ATA_ID_PROD, buf, ATA_ID_PROD_LEN)) return -EFAULT; ata_id_string(dev->id, buf, ATA_ID_FW_REV, ATA_ID_FW_REV_LEN); if (copy_to_user(dst + ATA_ID_FW_REV, buf, ATA_ID_FW_REV_LEN)) return -EFAULT; ata_id_string(dev->id, buf, ATA_ID_SERNO, ATA_ID_SERNO_LEN); if (copy_to_user(dst + ATA_ID_SERNO, buf, ATA_ID_SERNO_LEN)) return -EFAULT; return 0; } /** * ata_cmd_ioctl - Handler for HDIO_DRIVE_CMD ioctl * @scsidev: Device to which we are issuing command * @arg: User provided data for issuing command * * LOCKING: * Defined by the SCSI layer. We don't really care. * * RETURNS: * Zero on success, negative errno on error. */ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg) { int rc = 0; u8 sensebuf[SCSI_SENSE_BUFFERSIZE]; u8 scsi_cmd[MAX_COMMAND_SIZE]; u8 args[4], *argbuf = NULL; int argsize = 0; enum dma_data_direction data_dir; struct scsi_sense_hdr sshdr; int cmd_result; if (arg == NULL) return -EINVAL; if (copy_from_user(args, arg, sizeof(args))) return -EFAULT; memset(sensebuf, 0, sizeof(sensebuf)); memset(scsi_cmd, 0, sizeof(scsi_cmd)); if (args[3]) { argsize = ATA_SECT_SIZE * args[3]; argbuf = kmalloc(argsize, GFP_KERNEL); if (argbuf == NULL) { rc = -ENOMEM; goto error; } scsi_cmd[1] = (4 << 1); /* PIO Data-in */ scsi_cmd[2] = 0x0e; /* no off.line or cc, read from dev, block count in sector count field */ data_dir = DMA_FROM_DEVICE; } else { scsi_cmd[1] = (3 << 1); /* Non-data */ scsi_cmd[2] = 0x20; /* cc but no off.line or data xfer */ data_dir = DMA_NONE; } scsi_cmd[0] = ATA_16; scsi_cmd[4] = args[2]; if (args[0] == ATA_CMD_SMART) { /* hack -- ide driver does this too */ scsi_cmd[6] = args[3]; scsi_cmd[8] = args[1]; scsi_cmd[10] = ATA_SMART_LBAM_PASS; scsi_cmd[12] = ATA_SMART_LBAH_PASS; } else { scsi_cmd[6] = args[1]; } scsi_cmd[14] = args[0]; /* Good values for timeout and retries? Values below from scsi_ioctl_send_command() for default case... */ cmd_result = scsi_execute(scsidev, scsi_cmd, data_dir, argbuf, argsize, sensebuf, &sshdr, (10*HZ), 5, 0, 0, NULL); if (driver_byte(cmd_result) == DRIVER_SENSE) {/* sense data available */ u8 *desc = sensebuf + 8; cmd_result &= ~(0xFF<<24); /* DRIVER_SENSE is not an error */ /* If we set cc then ATA pass-through will cause a * check condition even if no error. Filter that. */ if (cmd_result & SAM_STAT_CHECK_CONDITION) { if (sshdr.sense_key == RECOVERED_ERROR && sshdr.asc == 0 && sshdr.ascq == 0x1d) cmd_result &= ~SAM_STAT_CHECK_CONDITION; } /* Send userspace a few ATA registers (same as drivers/ide) */ if (sensebuf[0] == 0x72 && /* format is "descriptor" */ desc[0] == 0x09) { /* code is "ATA Descriptor" */ args[0] = desc[13]; /* status */ args[1] = desc[3]; /* error */ args[2] = desc[5]; /* sector count (0:7) */ if (copy_to_user(arg, args, sizeof(args))) rc = -EFAULT; } } if (cmd_result) { rc = -EIO; goto error; } if ((argbuf) && copy_to_user(arg + sizeof(args), argbuf, argsize)) rc = -EFAULT; error: kfree(argbuf); return rc; } /** * ata_task_ioctl - Handler for HDIO_DRIVE_TASK ioctl * @scsidev: Device to which we are issuing command * @arg: User provided data for issuing command * * LOCKING: * Defined by the SCSI layer. We don't really care. * * RETURNS: * Zero on success, negative errno on error. */ int ata_task_ioctl(struct scsi_device *scsidev, void __user *arg) { int rc = 0; u8 sensebuf[SCSI_SENSE_BUFFERSIZE]; u8 scsi_cmd[MAX_COMMAND_SIZE]; u8 args[7]; struct scsi_sense_hdr sshdr; int cmd_result; if (arg == NULL) return -EINVAL; if (copy_from_user(args, arg, sizeof(args))) return -EFAULT; memset(sensebuf, 0, sizeof(sensebuf)); memset(scsi_cmd, 0, sizeof(scsi_cmd)); scsi_cmd[0] = ATA_16; scsi_cmd[1] = (3 << 1); /* Non-data */ scsi_cmd[2] = 0x20; /* cc but no off.line or data xfer */ scsi_cmd[4] = args[1]; scsi_cmd[6] = args[2]; scsi_cmd[8] = args[3]; scsi_cmd[10] = args[4]; scsi_cmd[12] = args[5]; scsi_cmd[13] = args[6] & 0x4f; scsi_cmd[14] = args[0]; /* Good values for timeout and retries? Values below from scsi_ioctl_send_command() for default case... */ cmd_result = scsi_execute(scsidev, scsi_cmd, DMA_NONE, NULL, 0, sensebuf, &sshdr, (10*HZ), 5, 0, 0, NULL); if (driver_byte(cmd_result) == DRIVER_SENSE) {/* sense data available */ u8 *desc = sensebuf + 8; cmd_result &= ~(0xFF<<24); /* DRIVER_SENSE is not an error */ /* If we set cc then ATA pass-through will cause a * check condition even if no error. Filter that. */ if (cmd_result & SAM_STAT_CHECK_CONDITION) { if (sshdr.sense_key == RECOVERED_ERROR && sshdr.asc == 0 && sshdr.ascq == 0x1d) cmd_result &= ~SAM_STAT_CHECK_CONDITION; } /* Send userspace ATA registers */ if (sensebuf[0] == 0x72 && /* format is "descriptor" */ desc[0] == 0x09) {/* code is "ATA Descriptor" */ args[0] = desc[13]; /* status */ args[1] = desc[3]; /* error */ args[2] = desc[5]; /* sector count (0:7) */ args[3] = desc[7]; /* lbal */ args[4] = desc[9]; /* lbam */ args[5] = desc[11]; /* lbah */ args[6] = desc[12]; /* select */ if (copy_to_user(arg, args, sizeof(args))) rc = -EFAULT; } } if (cmd_result) { rc = -EIO; goto error; } error: return rc; } static int ata_ioc32(struct ata_port *ap) { if (ap->flags & ATA_FLAG_PIO_DMA) return 1; if (ap->pflags & ATA_PFLAG_PIO32) return 1; return 0; } /* * This handles both native and compat commands, so anything added * here must have a compatible argument, or check in_compat_syscall() */ int ata_sas_scsi_ioctl(struct ata_port *ap, struct scsi_device *scsidev, unsigned int cmd, void __user *arg) { unsigned long val; int rc = -EINVAL; unsigned long flags; switch (cmd) { case HDIO_GET_32BIT: spin_lock_irqsave(ap->lock, flags); val = ata_ioc32(ap); spin_unlock_irqrestore(ap->lock, flags); #ifdef CONFIG_COMPAT if (in_compat_syscall()) return put_user(val, (compat_ulong_t __user *)arg); #endif return put_user(val, (unsigned long __user *)arg); case HDIO_SET_32BIT: val = (unsigned long) arg; rc = 0; spin_lock_irqsave(ap->lock, flags); if (ap->pflags & ATA_PFLAG_PIO32CHANGE) { if (val) ap->pflags |= ATA_PFLAG_PIO32; else ap->pflags &= ~ATA_PFLAG_PIO32; } else { if (val != ata_ioc32(ap)) rc = -EINVAL; } spin_unlock_irqrestore(ap->lock, flags); return rc; case HDIO_GET_IDENTITY: return ata_get_identity(ap, scsidev, arg); case HDIO_DRIVE_CMD: if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)) return -EACCES; return ata_cmd_ioctl(scsidev, arg); case HDIO_DRIVE_TASK: if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)) return -EACCES; return ata_task_ioctl(scsidev, arg); default: rc = -ENOTTY; break; } return rc; } EXPORT_SYMBOL_GPL(ata_sas_scsi_ioctl); int ata_scsi_ioctl(struct scsi_device *scsidev, unsigned int cmd, void __user *arg) { return ata_sas_scsi_ioctl(ata_shost_to_port(scsidev->host), scsidev, cmd, arg); } EXPORT_SYMBOL_GPL(ata_scsi_ioctl); /** * ata_scsi_qc_new - acquire new ata_queued_cmd reference * @dev: ATA device to which the new command is attached * @cmd: SCSI command that originated this ATA command * * Obtain a reference to an unused ata_queued_cmd structure, * which is the basic libata structure representing a single * ATA command sent to the hardware. * * If a command was available, fill in the SCSI-specific * portions of the structure with information on the * current command. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Command allocated, or %NULL if none available. */ static struct ata_queued_cmd *ata_scsi_qc_new(struct ata_device *dev, struct scsi_cmnd *cmd) { struct ata_queued_cmd *qc; qc = ata_qc_new_init(dev, cmd->request->tag); if (qc) { qc->scsicmd = cmd; qc->scsidone = cmd->scsi_done; qc->sg = scsi_sglist(cmd); qc->n_elem = scsi_sg_count(cmd); if (cmd->request->rq_flags & RQF_QUIET) qc->flags |= ATA_QCFLAG_QUIET; } else { cmd->result = (DID_OK << 16) | (QUEUE_FULL << 1); cmd->scsi_done(cmd); } return qc; } static void ata_qc_set_pc_nbytes(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; qc->extrabytes = scmd->extra_len; qc->nbytes = scsi_bufflen(scmd) + qc->extrabytes; } /** * ata_dump_status - user friendly display of error info * @id: id of the port in question * @tf: ptr to filled out taskfile * * Decode and dump the ATA error/status registers for the user so * that they have some idea what really happened at the non * make-believe layer. * * LOCKING: * inherited from caller */ static void ata_dump_status(unsigned id, struct ata_taskfile *tf) { u8 stat = tf->command, err = tf->feature; pr_warn("ata%u: status=0x%02x { ", id, stat); if (stat & ATA_BUSY) { pr_cont("Busy }\n"); /* Data is not valid in this case */ } else { if (stat & ATA_DRDY) pr_cont("DriveReady "); if (stat & ATA_DF) pr_cont("DeviceFault "); if (stat & ATA_DSC) pr_cont("SeekComplete "); if (stat & ATA_DRQ) pr_cont("DataRequest "); if (stat & ATA_CORR) pr_cont("CorrectedError "); if (stat & ATA_SENSE) pr_cont("Sense "); if (stat & ATA_ERR) pr_cont("Error "); pr_cont("}\n"); if (err) { pr_warn("ata%u: error=0x%02x { ", id, err); if (err & ATA_ABORTED) pr_cont("DriveStatusError "); if (err & ATA_ICRC) { if (err & ATA_ABORTED) pr_cont("BadCRC "); else pr_cont("Sector "); } if (err & ATA_UNC) pr_cont("UncorrectableError "); if (err & ATA_IDNF) pr_cont("SectorIdNotFound "); if (err & ATA_TRK0NF) pr_cont("TrackZeroNotFound "); if (err & ATA_AMNF) pr_cont("AddrMarkNotFound "); pr_cont("}\n"); } } } /** * ata_to_sense_error - convert ATA error to SCSI error * @id: ATA device number * @drv_stat: value contained in ATA status register * @drv_err: value contained in ATA error register * @sk: the sense key we'll fill out * @asc: the additional sense code we'll fill out * @ascq: the additional sense code qualifier we'll fill out * @verbose: be verbose * * Converts an ATA error into a SCSI error. Fill out pointers to * SK, ASC, and ASCQ bytes for later use in fixed or descriptor * format sense blocks. * * LOCKING: * spin_lock_irqsave(host lock) */ static void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, u8 *ascq, int verbose) { int i; /* Based on the 3ware driver translation table */ static const unsigned char sense_table[][4] = { /* BBD|ECC|ID|MAR */ {0xd1, ABORTED_COMMAND, 0x00, 0x00}, // Device busy Aborted command /* BBD|ECC|ID */ {0xd0, ABORTED_COMMAND, 0x00, 0x00}, // Device busy Aborted command /* ECC|MC|MARK */ {0x61, HARDWARE_ERROR, 0x00, 0x00}, // Device fault Hardware error /* ICRC|ABRT */ /* NB: ICRC & !ABRT is BBD */ {0x84, ABORTED_COMMAND, 0x47, 0x00}, // Data CRC error SCSI parity error /* MC|ID|ABRT|TRK0|MARK */ {0x37, NOT_READY, 0x04, 0x00}, // Unit offline Not ready /* MCR|MARK */ {0x09, NOT_READY, 0x04, 0x00}, // Unrecovered disk error Not ready /* Bad address mark */ {0x01, MEDIUM_ERROR, 0x13, 0x00}, // Address mark not found for data field /* TRK0 - Track 0 not found */ {0x02, HARDWARE_ERROR, 0x00, 0x00}, // Hardware error /* Abort: 0x04 is not translated here, see below */ /* Media change request */ {0x08, NOT_READY, 0x04, 0x00}, // FIXME: faking offline /* SRV/IDNF - ID not found */ {0x10, ILLEGAL_REQUEST, 0x21, 0x00}, // Logical address out of range /* MC - Media Changed */ {0x20, UNIT_ATTENTION, 0x28, 0x00}, // Not ready to ready change, medium may have changed /* ECC - Uncorrectable ECC error */ {0x40, MEDIUM_ERROR, 0x11, 0x04}, // Unrecovered read error /* BBD - block marked bad */ {0x80, MEDIUM_ERROR, 0x11, 0x04}, // Block marked bad Medium error, unrecovered read error {0xFF, 0xFF, 0xFF, 0xFF}, // END mark }; static const unsigned char stat_table[][4] = { /* Must be first because BUSY means no other bits valid */ {0x80, ABORTED_COMMAND, 0x47, 0x00}, // Busy, fake parity for now {0x40, ILLEGAL_REQUEST, 0x21, 0x04}, // Device ready, unaligned write command {0x20, HARDWARE_ERROR, 0x44, 0x00}, // Device fault, internal target failure {0x08, ABORTED_COMMAND, 0x47, 0x00}, // Timed out in xfer, fake parity for now {0x04, RECOVERED_ERROR, 0x11, 0x00}, // Recovered ECC error Medium error, recovered {0xFF, 0xFF, 0xFF, 0xFF}, // END mark }; /* * Is this an error we can process/parse */ if (drv_stat & ATA_BUSY) { drv_err = 0; /* Ignore the err bits, they're invalid */ } if (drv_err) { /* Look for drv_err */ for (i = 0; sense_table[i][0] != 0xFF; i++) { /* Look for best matches first */ if ((sense_table[i][0] & drv_err) == sense_table[i][0]) { *sk = sense_table[i][1]; *asc = sense_table[i][2]; *ascq = sense_table[i][3]; goto translate_done; } } } /* * Fall back to interpreting status bits. Note that if the drv_err * has only the ABRT bit set, we decode drv_stat. ABRT by itself * is not descriptive enough. */ for (i = 0; stat_table[i][0] != 0xFF; i++) { if (stat_table[i][0] & drv_stat) { *sk = stat_table[i][1]; *asc = stat_table[i][2]; *ascq = stat_table[i][3]; goto translate_done; } } /* * We need a sensible error return here, which is tricky, and one * that won't cause people to do things like return a disk wrongly. */ *sk = ABORTED_COMMAND; *asc = 0x00; *ascq = 0x00; translate_done: if (verbose) pr_err("ata%u: translated ATA stat/err 0x%02x/%02x to SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, *sk, *asc, *ascq); return; } /* * ata_gen_passthru_sense - Generate check condition sense block. * @qc: Command that completed. * * This function is specific to the ATA descriptor format sense * block specified for the ATA pass through commands. Regardless * of whether the command errored or not, return a sense * block. Copy all controller registers into the sense * block. If there was no error, we get the request from an ATA * passthrough command, so we use the following sense data: * sk = RECOVERED ERROR * asc,ascq = ATA PASS-THROUGH INFORMATION AVAILABLE * * * LOCKING: * None. */ static void ata_gen_passthru_sense(struct ata_queued_cmd *qc) { struct scsi_cmnd *cmd = qc->scsicmd; struct ata_taskfile *tf = &qc->result_tf; unsigned char *sb = cmd->sense_buffer; unsigned char *desc = sb + 8; int verbose = qc->ap->ops->error_handler == NULL; u8 sense_key, asc, ascq; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION; /* * Use ata_to_sense_error() to map status register bits * onto sense key, asc & ascq. */ if (qc->err_mask || tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->print_id, tf->command, tf->feature, &sense_key, &asc, &ascq, verbose); ata_scsi_set_sense(qc->dev, cmd, sense_key, asc, ascq); } else { /* * ATA PASS-THROUGH INFORMATION AVAILABLE * Always in descriptor format sense. */ scsi_build_sense_buffer(1, cmd->sense_buffer, RECOVERED_ERROR, 0, 0x1D); } if ((cmd->sense_buffer[0] & 0x7f) >= 0x72) { u8 len; /* descriptor format */ len = sb[7]; desc = (char *)scsi_sense_desc_find(sb, len + 8, 9); if (!desc) { if (SCSI_SENSE_BUFFERSIZE < len + 14) return; sb[7] = len + 14; desc = sb + 8 + len; } desc[0] = 9; desc[1] = 12; /* * Copy registers into sense buffer. */ desc[2] = 0x00; desc[3] = tf->feature; /* == error reg */ desc[5] = tf->nsect; desc[7] = tf->lbal; desc[9] = tf->lbam; desc[11] = tf->lbah; desc[12] = tf->device; desc[13] = tf->command; /* == status reg */ /* * Fill in Extend bit, and the high order bytes * if applicable. */ if (tf->flags & ATA_TFLAG_LBA48) { desc[2] |= 0x01; desc[4] = tf->hob_nsect; desc[6] = tf->hob_lbal; desc[8] = tf->hob_lbam; desc[10] = tf->hob_lbah; } } else { /* Fixed sense format */ desc[0] = tf->feature; desc[1] = tf->command; /* status */ desc[2] = tf->device; desc[3] = tf->nsect; desc[7] = 0; if (tf->flags & ATA_TFLAG_LBA48) { desc[8] |= 0x80; if (tf->hob_nsect) desc[8] |= 0x40; if (tf->hob_lbal || tf->hob_lbam || tf->hob_lbah) desc[8] |= 0x20; } desc[9] = tf->lbal; desc[10] = tf->lbam; desc[11] = tf->lbah; } } /** * ata_gen_ata_sense - generate a SCSI fixed sense block * @qc: Command that we are erroring out * * Generate sense block for a failed ATA command @qc. Descriptor * format is used to accommodate LBA48 block address. * * LOCKING: * None. */ static void ata_gen_ata_sense(struct ata_queued_cmd *qc) { struct ata_device *dev = qc->dev; struct scsi_cmnd *cmd = qc->scsicmd; struct ata_taskfile *tf = &qc->result_tf; unsigned char *sb = cmd->sense_buffer; int verbose = qc->ap->ops->error_handler == NULL; u64 block; u8 sense_key, asc, ascq; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION; if (ata_dev_disabled(dev)) { /* Device disabled after error recovery */ /* LOGICAL UNIT NOT READY, HARD RESET REQUIRED */ ata_scsi_set_sense(dev, cmd, NOT_READY, 0x04, 0x21); return; } /* Use ata_to_sense_error() to map status register bits * onto sense key, asc & ascq. */ if (qc->err_mask || tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->print_id, tf->command, tf->feature, &sense_key, &asc, &ascq, verbose); ata_scsi_set_sense(dev, cmd, sense_key, asc, ascq); } else { /* Could not decode error */ ata_dev_warn(dev, "could not decode error status 0x%x err_mask 0x%x\n", tf->command, qc->err_mask); ata_scsi_set_sense(dev, cmd, ABORTED_COMMAND, 0, 0); return; } block = ata_tf_read_block(&qc->result_tf, dev); if (block == U64_MAX) return; scsi_set_sense_information(sb, SCSI_SENSE_BUFFERSIZE, block); } void ata_scsi_sdev_config(struct scsi_device *sdev) { sdev->use_10_for_rw = 1; sdev->use_10_for_ms = 1; sdev->no_write_same = 1; /* Schedule policy is determined by ->qc_defer() callback and * it needs to see every deferred qc. Set dev_blocked to 1 to * prevent SCSI midlayer from automatically deferring * requests. */ sdev->max_device_blocked = 1; } /** * ata_scsi_dma_need_drain - Check whether data transfer may overflow * @rq: request to be checked * * ATAPI commands which transfer variable length data to host * might overflow due to application error or hardware bug. This * function checks whether overflow should be drained and ignored * for @request. * * LOCKING: * None. * * RETURNS: * 1 if ; otherwise, 0. */ bool ata_scsi_dma_need_drain(struct request *rq) { return atapi_cmd_type(scsi_req(rq)->cmd[0]) == ATAPI_MISC; } EXPORT_SYMBOL_GPL(ata_scsi_dma_need_drain); int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) { struct request_queue *q = sdev->request_queue; if (!ata_id_has_unload(dev->id)) dev->flags |= ATA_DFLAG_NO_UNLOAD; /* configure max sectors */ blk_queue_max_hw_sectors(q, dev->max_sectors); if (dev->class == ATA_DEV_ATAPI) { sdev->sector_size = ATA_SECT_SIZE; /* set DMA padding */ blk_queue_update_dma_pad(q, ATA_DMA_PAD_SZ - 1); /* make room for appending the drain */ blk_queue_max_segments(q, queue_max_segments(q) - 1); sdev->dma_drain_len = ATAPI_MAX_DRAIN; sdev->dma_drain_buf = kmalloc(sdev->dma_drain_len, q->bounce_gfp | GFP_KERNEL); if (!sdev->dma_drain_buf) { ata_dev_err(dev, "drain buffer allocation failed\n"); return -ENOMEM; } } else { sdev->sector_size = ata_id_logical_sector_size(dev->id); sdev->manage_start_stop = 1; } /* * ata_pio_sectors() expects buffer for each sector to not cross * page boundary. Enforce it by requiring buffers to be sector * aligned, which works iff sector_size is not larger than * PAGE_SIZE. ATAPI devices also need the alignment as * IDENTIFY_PACKET is executed as ATA_PROT_PIO. */ if (sdev->sector_size > PAGE_SIZE) ata_dev_warn(dev, "sector_size=%u > PAGE_SIZE, PIO may malfunction\n", sdev->sector_size); blk_queue_update_dma_alignment(q, sdev->sector_size - 1); if (dev->flags & ATA_DFLAG_AN) set_bit(SDEV_EVT_MEDIA_CHANGE, sdev->supported_events); if (dev->flags & ATA_DFLAG_NCQ) { int depth; depth = min(sdev->host->can_queue, ata_id_queue_depth(dev->id)); depth = min(ATA_MAX_QUEUE, depth); scsi_change_queue_depth(sdev, depth); } if (dev->flags & ATA_DFLAG_TRUSTED) sdev->security_supported = 1; dev->sdev = sdev; return 0; } /** * ata_scsi_slave_config - Set SCSI device attributes * @sdev: SCSI device to examine * * This is called before we actually start reading * and writing to the device, to configure certain * SCSI mid-layer behaviors. * * LOCKING: * Defined by SCSI layer. We don't really care. */ int ata_scsi_slave_config(struct scsi_device *sdev) { struct ata_port *ap = ata_shost_to_port(sdev->host); struct ata_device *dev = __ata_scsi_find_dev(ap, sdev); int rc = 0; ata_scsi_sdev_config(sdev); if (dev) rc = ata_scsi_dev_config(sdev, dev); return rc; } EXPORT_SYMBOL_GPL(ata_scsi_slave_config); /** * ata_scsi_slave_destroy - SCSI device is about to be destroyed * @sdev: SCSI device to be destroyed * * @sdev is about to be destroyed for hot/warm unplugging. If * this unplugging was initiated by libata as indicated by NULL * dev->sdev, this function doesn't have to do anything. * Otherwise, SCSI layer initiated warm-unplug is in progress. * Clear dev->sdev, schedule the device for ATA detach and invoke * EH. * * LOCKING: * Defined by SCSI layer. We don't really care. */ void ata_scsi_slave_destroy(struct scsi_device *sdev) { struct ata_port *ap = ata_shost_to_port(sdev->host); unsigned long flags; struct ata_device *dev; if (!ap->ops->error_handler) return; spin_lock_irqsave(ap->lock, flags); dev = __ata_scsi_find_dev(ap, sdev); if (dev && dev->sdev) { /* SCSI device already in CANCEL state, no need to offline it */ dev->sdev = NULL; dev->flags |= ATA_DFLAG_DETACH; ata_port_schedule_eh(ap); } spin_unlock_irqrestore(ap->lock, flags); kfree(sdev->dma_drain_buf); } EXPORT_SYMBOL_GPL(ata_scsi_slave_destroy); /** * ata_scsi_start_stop_xlat - Translate SCSI START STOP UNIT command * @qc: Storage for translated ATA taskfile * * Sets up an ATA taskfile to issue STANDBY (to stop) or READ VERIFY * (to start). Perhaps these commands should be preceded by * CHECK POWER MODE to see what power mode the device is already in. * [See SAT revision 5 at www.t10.org] * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on error. */ static unsigned int ata_scsi_start_stop_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; struct ata_taskfile *tf = &qc->tf; const u8 *cdb = scmd->cmnd; u16 fp; u8 bp = 0xff; if (scmd->cmd_len < 5) { fp = 4; goto invalid_fld; } tf->flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR; tf->protocol = ATA_PROT_NODATA; if (cdb[1] & 0x1) { ; /* ignore IMMED bit, violates sat-r05 */ } if (cdb[4] & 0x2) { fp = 4; bp = 1; goto invalid_fld; /* LOEJ bit set not supported */ } if (((cdb[4] >> 4) & 0xf) != 0) { fp = 4; bp = 3; goto invalid_fld; /* power conditions not supported */ } if (cdb[4] & 0x1) { tf->nsect = 1; /* 1 sector, lba=0 */ if (qc->dev->flags & ATA_DFLAG_LBA) { tf->flags |= ATA_TFLAG_LBA; tf->lbah = 0x0; tf->lbam = 0x0; tf->lbal = 0x0; tf->device |= ATA_LBA; } else { /* CHS */ tf->lbal = 0x1; /* sect */ tf->lbam = 0x0; /* cyl low */ tf->lbah = 0x0; /* cyl high */ } tf->command = ATA_CMD_VERIFY; /* READ VERIFY */ } else { /* Some odd clown BIOSen issue spindown on power off (ACPI S4 * or S5) causing some drives to spin up and down again. */ if ((qc->ap->flags & ATA_FLAG_NO_POWEROFF_SPINDOWN) && system_state == SYSTEM_POWER_OFF) goto skip; if ((qc->ap->flags & ATA_FLAG_NO_HIBERNATE_SPINDOWN) && system_entering_hibernation()) goto skip; /* Issue ATA STANDBY IMMEDIATE command */ tf->command = ATA_CMD_STANDBYNOW1; } /* * Standby and Idle condition timers could be implemented but that * would require libata to implement the Power condition mode page * and allow the user to change it. Changing mode pages requires * MODE SELECT to be implemented. */ return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, bp); return 1; skip: scmd->result = SAM_STAT_GOOD; return 1; } /** * ata_scsi_flush_xlat - Translate SCSI SYNCHRONIZE CACHE command * @qc: Storage for translated ATA taskfile * * Sets up an ATA taskfile to issue FLUSH CACHE or * FLUSH CACHE EXT. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on error. */ static unsigned int ata_scsi_flush_xlat(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &qc->tf; tf->flags |= ATA_TFLAG_DEVICE; tf->protocol = ATA_PROT_NODATA; if (qc->dev->flags & ATA_DFLAG_FLUSH_EXT) tf->command = ATA_CMD_FLUSH_EXT; else tf->command = ATA_CMD_FLUSH; /* flush is critical for IO integrity, consider it an IO command */ qc->flags |= ATA_QCFLAG_IO; return 0; } /** * scsi_6_lba_len - Get LBA and transfer length * @cdb: SCSI command to translate * * Calculate LBA and transfer length for 6-byte commands. * * RETURNS: * @plba: the LBA * @plen: the transfer length */ static void scsi_6_lba_len(const u8 *cdb, u64 *plba, u32 *plen) { u64 lba = 0; u32 len; VPRINTK("six-byte command\n"); lba |= ((u64)(cdb[1] & 0x1f)) << 16; lba |= ((u64)cdb[2]) << 8; lba |= ((u64)cdb[3]); len = cdb[4]; *plba = lba; *plen = len; } /** * scsi_10_lba_len - Get LBA and transfer length * @cdb: SCSI command to translate * * Calculate LBA and transfer length for 10-byte commands. * * RETURNS: * @plba: the LBA * @plen: the transfer length */ static void scsi_10_lba_len(const u8 *cdb, u64 *plba, u32 *plen) { u64 lba = 0; u32 len = 0; VPRINTK("ten-byte command\n"); lba |= ((u64)cdb[2]) << 24; lba |= ((u64)cdb[3]) << 16; lba |= ((u64)cdb[4]) << 8; lba |= ((u64)cdb[5]); len |= ((u32)cdb[7]) << 8; len |= ((u32)cdb[8]); *plba = lba; *plen = len; } /** * scsi_16_lba_len - Get LBA and transfer length * @cdb: SCSI command to translate * * Calculate LBA and transfer length for 16-byte commands. * * RETURNS: * @plba: the LBA * @plen: the transfer length */ static void scsi_16_lba_len(const u8 *cdb, u64 *plba, u32 *plen) { u64 lba = 0; u32 len = 0; VPRINTK("sixteen-byte command\n"); lba |= ((u64)cdb[2]) << 56; lba |= ((u64)cdb[3]) << 48; lba |= ((u64)cdb[4]) << 40; lba |= ((u64)cdb[5]) << 32; lba |= ((u64)cdb[6]) << 24; lba |= ((u64)cdb[7]) << 16; lba |= ((u64)cdb[8]) << 8; lba |= ((u64)cdb[9]); len |= ((u32)cdb[10]) << 24; len |= ((u32)cdb[11]) << 16; len |= ((u32)cdb[12]) << 8; len |= ((u32)cdb[13]); *plba = lba; *plen = len; } /** * ata_scsi_verify_xlat - Translate SCSI VERIFY command into an ATA one * @qc: Storage for translated ATA taskfile * * Converts SCSI VERIFY command to an ATA READ VERIFY command. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on error. */ static unsigned int ata_scsi_verify_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; struct ata_taskfile *tf = &qc->tf; struct ata_device *dev = qc->dev; u64 dev_sectors = qc->dev->n_sectors; const u8 *cdb = scmd->cmnd; u64 block; u32 n_block; u16 fp; tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf->protocol = ATA_PROT_NODATA; if (cdb[0] == VERIFY) { if (scmd->cmd_len < 10) { fp = 9; goto invalid_fld; } scsi_10_lba_len(cdb, &block, &n_block); } else if (cdb[0] == VERIFY_16) { if (scmd->cmd_len < 16) { fp = 15; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); } else { fp = 0; goto invalid_fld; } if (!n_block) goto nothing_to_do; if (block >= dev_sectors) goto out_of_range; if ((block + n_block) > dev_sectors) goto out_of_range; if (dev->flags & ATA_DFLAG_LBA) { tf->flags |= ATA_TFLAG_LBA; if (lba_28_ok(block, n_block)) { /* use LBA28 */ tf->command = ATA_CMD_VERIFY; tf->device |= (block >> 24) & 0xf; } else if (lba_48_ok(block, n_block)) { if (!(dev->flags & ATA_DFLAG_LBA48)) goto out_of_range; /* use LBA48 */ tf->flags |= ATA_TFLAG_LBA48; tf->command = ATA_CMD_VERIFY_EXT; tf->hob_nsect = (n_block >> 8) & 0xff; tf->hob_lbah = (block >> 40) & 0xff; tf->hob_lbam = (block >> 32) & 0xff; tf->hob_lbal = (block >> 24) & 0xff; } else /* request too large even for LBA48 */ goto out_of_range; tf->nsect = n_block & 0xff; tf->lbah = (block >> 16) & 0xff; tf->lbam = (block >> 8) & 0xff; tf->lbal = block & 0xff; tf->device |= ATA_LBA; } else { /* CHS */ u32 sect, head, cyl, track; if (!lba_28_ok(block, n_block)) goto out_of_range; /* Convert LBA to CHS */ track = (u32)block / dev->sectors; cyl = track / dev->heads; head = track % dev->heads; sect = (u32)block % dev->sectors + 1; DPRINTK("block %u track %u cyl %u head %u sect %u\n", (u32)block, track, cyl, head, sect); /* Check whether the converted CHS can fit. Cylinder: 0-65535 Head: 0-15 Sector: 1-255*/ if ((cyl >> 16) || (head >> 4) || (sect >> 8) || (!sect)) goto out_of_range; tf->command = ATA_CMD_VERIFY; tf->nsect = n_block & 0xff; /* Sector count 0 means 256 sectors */ tf->lbal = sect; tf->lbam = cyl; tf->lbah = cyl >> 8; tf->device |= head; } return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, 0xff); return 1; out_of_range: ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x21, 0x0); /* "Logical Block Address out of range" */ return 1; nothing_to_do: scmd->result = SAM_STAT_GOOD; return 1; } static bool ata_check_nblocks(struct scsi_cmnd *scmd, u32 n_blocks) { struct request *rq = scmd->request; u32 req_blocks; if (!blk_rq_is_passthrough(rq)) return true; req_blocks = blk_rq_bytes(rq) / scmd->device->sector_size; if (n_blocks > req_blocks) return false; return true; } /** * ata_scsi_rw_xlat - Translate SCSI r/w command into an ATA one * @qc: Storage for translated ATA taskfile * * Converts any of six SCSI read/write commands into the * ATA counterpart, including starting sector (LBA), * sector count, and taking into account the device's LBA48 * support. * * Commands %READ_6, %READ_10, %READ_16, %WRITE_6, %WRITE_10, and * %WRITE_16 are currently supported. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on error. */ static unsigned int ata_scsi_rw_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; struct request *rq = scmd->request; int class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq)); unsigned int tf_flags = 0; u64 block; u32 n_block; int rc; u16 fp = 0; if (cdb[0] == WRITE_10 || cdb[0] == WRITE_6 || cdb[0] == WRITE_16) tf_flags |= ATA_TFLAG_WRITE; /* Calculate the SCSI LBA, transfer length and FUA. */ switch (cdb[0]) { case READ_10: case WRITE_10: if (unlikely(scmd->cmd_len < 10)) { fp = 9; goto invalid_fld; } scsi_10_lba_len(cdb, &block, &n_block); if (cdb[1] & (1 << 3)) tf_flags |= ATA_TFLAG_FUA; if (!ata_check_nblocks(scmd, n_block)) goto invalid_fld; break; case READ_6: case WRITE_6: if (unlikely(scmd->cmd_len < 6)) { fp = 5; goto invalid_fld; } scsi_6_lba_len(cdb, &block, &n_block); /* for 6-byte r/w commands, transfer length 0 * means 256 blocks of data, not 0 block. */ if (!n_block) n_block = 256; if (!ata_check_nblocks(scmd, n_block)) goto invalid_fld; break; case READ_16: case WRITE_16: if (unlikely(scmd->cmd_len < 16)) { fp = 15; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); if (cdb[1] & (1 << 3)) tf_flags |= ATA_TFLAG_FUA; if (!ata_check_nblocks(scmd, n_block)) goto invalid_fld; break; default: DPRINTK("no-byte command\n"); fp = 0; goto invalid_fld; } /* Check and compose ATA command */ if (!n_block) /* For 10-byte and 16-byte SCSI R/W commands, transfer * length 0 means transfer 0 block of data. * However, for ATA R/W commands, sector count 0 means * 256 or 65536 sectors, not 0 sectors as in SCSI. * * WARNING: one or two older ATA drives treat 0 as 0... */ goto nothing_to_do; qc->flags |= ATA_QCFLAG_IO; qc->nbytes = n_block * scmd->device->sector_size; rc = ata_build_rw_tf(&qc->tf, qc->dev, block, n_block, tf_flags, qc->hw_tag, class); if (likely(rc == 0)) return 0; if (rc == -ERANGE) goto out_of_range; /* treat all other errors as -EINVAL, fall through */ invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, 0xff); return 1; out_of_range: ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x21, 0x0); /* "Logical Block Address out of range" */ return 1; nothing_to_do: scmd->result = SAM_STAT_GOOD; return 1; } static void ata_qc_done(struct ata_queued_cmd *qc) { struct scsi_cmnd *cmd = qc->scsicmd; void (*done)(struct scsi_cmnd *) = qc->scsidone; ata_qc_free(qc); done(cmd); } static void ata_scsi_qc_complete(struct ata_queued_cmd *qc) { struct ata_port *ap = qc->ap; struct scsi_cmnd *cmd = qc->scsicmd; u8 *cdb = cmd->cmnd; int need_sense = (qc->err_mask != 0); /* For ATA pass thru (SAT) commands, generate a sense block if * user mandated it or if there's an error. Note that if we * generate because the user forced us to [CK_COND =1], a check * condition is generated and the ATA register values are returned * whether the command completed successfully or not. If there * was no error, we use the following sense data: * sk = RECOVERED ERROR * asc,ascq = ATA PASS-THROUGH INFORMATION AVAILABLE */ if (((cdb[0] == ATA_16) || (cdb[0] == ATA_12)) && ((cdb[2] & 0x20) || need_sense)) ata_gen_passthru_sense(qc); else if (qc->flags & ATA_QCFLAG_SENSE_VALID) cmd->result = SAM_STAT_CHECK_CONDITION; else if (need_sense) ata_gen_ata_sense(qc); else cmd->result = SAM_STAT_GOOD; if (need_sense && !ap->ops->error_handler) ata_dump_status(ap->print_id, &qc->result_tf); ata_qc_done(qc); } /** * ata_scsi_translate - Translate then issue SCSI command to ATA device * @dev: ATA device to which the command is addressed * @cmd: SCSI command to execute * @xlat_func: Actor which translates @cmd to an ATA taskfile * * Our ->queuecommand() function has decided that the SCSI * command issued can be directly translated into an ATA * command, rather than handled internally. * * This function sets up an ata_queued_cmd structure for the * SCSI command, and sends that ata_queued_cmd to the hardware. * * The xlat_func argument (actor) returns 0 if ready to execute * ATA command, else 1 to finish translation. If 1 is returned * then cmd->result (and possibly cmd->sense_buffer) are assumed * to be set reflecting an error condition or clean (early) * termination. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * 0 on success, SCSI_ML_QUEUE_DEVICE_BUSY if the command * needs to be deferred. */ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd, ata_xlat_func_t xlat_func) { struct ata_port *ap = dev->link->ap; struct ata_queued_cmd *qc; int rc; VPRINTK("ENTER\n"); qc = ata_scsi_qc_new(dev, cmd); if (!qc) goto err_mem; /* data is present; dma-map it */ if (cmd->sc_data_direction == DMA_FROM_DEVICE || cmd->sc_data_direction == DMA_TO_DEVICE) { if (unlikely(scsi_bufflen(cmd) < 1)) { ata_dev_warn(dev, "WARNING: zero len r/w req\n"); goto err_did; } ata_sg_init(qc, scsi_sglist(cmd), scsi_sg_count(cmd)); qc->dma_dir = cmd->sc_data_direction; } qc->complete_fn = ata_scsi_qc_complete; if (xlat_func(qc)) goto early_finish; if (ap->ops->qc_defer) { if ((rc = ap->ops->qc_defer(qc))) goto defer; } /* select device, send command to hardware */ ata_qc_issue(qc); VPRINTK("EXIT\n"); return 0; early_finish: ata_qc_free(qc); cmd->scsi_done(cmd); DPRINTK("EXIT - early finish (good or error)\n"); return 0; err_did: ata_qc_free(qc); cmd->result = (DID_ERROR << 16); cmd->scsi_done(cmd); err_mem: DPRINTK("EXIT - internal\n"); return 0; defer: ata_qc_free(qc); DPRINTK("EXIT - defer\n"); if (rc == ATA_DEFER_LINK) return SCSI_MLQUEUE_DEVICE_BUSY; else return SCSI_MLQUEUE_HOST_BUSY; } struct ata_scsi_args { struct ata_device *dev; u16 *id; struct scsi_cmnd *cmd; }; /** * ata_scsi_rbuf_get - Map response buffer. * @cmd: SCSI command containing buffer to be mapped. * @flags: unsigned long variable to store irq enable status * @copy_in: copy in from user buffer * * Prepare buffer for simulated SCSI commands. * * LOCKING: * spin_lock_irqsave(ata_scsi_rbuf_lock) on success * * RETURNS: * Pointer to response buffer. */ static void *ata_scsi_rbuf_get(struct scsi_cmnd *cmd, bool copy_in, unsigned long *flags) { spin_lock_irqsave(&ata_scsi_rbuf_lock, *flags); memset(ata_scsi_rbuf, 0, ATA_SCSI_RBUF_SIZE); if (copy_in) sg_copy_to_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), ata_scsi_rbuf, ATA_SCSI_RBUF_SIZE); return ata_scsi_rbuf; } /** * ata_scsi_rbuf_put - Unmap response buffer. * @cmd: SCSI command containing buffer to be unmapped. * @copy_out: copy out result * @flags: @flags passed to ata_scsi_rbuf_get() * * Returns rbuf buffer. The result is copied to @cmd's buffer if * @copy_back is true. * * LOCKING: * Unlocks ata_scsi_rbuf_lock. */ static inline void ata_scsi_rbuf_put(struct scsi_cmnd *cmd, bool copy_out, unsigned long *flags) { if (copy_out) sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), ata_scsi_rbuf, ATA_SCSI_RBUF_SIZE); spin_unlock_irqrestore(&ata_scsi_rbuf_lock, *flags); } /** * ata_scsi_rbuf_fill - wrapper for SCSI command simulators * @args: device IDENTIFY data / SCSI command of interest. * @actor: Callback hook for desired SCSI command simulator * * Takes care of the hard work of simulating a SCSI command... * Mapping the response buffer, calling the command's handler, * and handling the handler's return value. This return value * indicates whether the handler wishes the SCSI command to be * completed successfully (0), or not (in which case cmd->result * and sense buffer are assumed to be set). * * LOCKING: * spin_lock_irqsave(host lock) */ static void ata_scsi_rbuf_fill(struct ata_scsi_args *args, unsigned int (*actor)(struct ata_scsi_args *args, u8 *rbuf)) { u8 *rbuf; unsigned int rc; struct scsi_cmnd *cmd = args->cmd; unsigned long flags; rbuf = ata_scsi_rbuf_get(cmd, false, &flags); rc = actor(args, rbuf); ata_scsi_rbuf_put(cmd, rc == 0, &flags); if (rc == 0) cmd->result = SAM_STAT_GOOD; } /** * ata_scsiop_inq_std - Simulate INQUIRY command * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Returns standard device identification data associated * with non-VPD INQUIRY command output. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_std(struct ata_scsi_args *args, u8 *rbuf) { static const u8 versions[] = { 0x00, 0x60, /* SAM-3 (no version claimed) */ 0x03, 0x20, /* SBC-2 (no version claimed) */ 0x03, 0x00 /* SPC-3 (no version claimed) */ }; static const u8 versions_zbc[] = { 0x00, 0xA0, /* SAM-5 (no version claimed) */ 0x06, 0x00, /* SBC-4 (no version claimed) */ 0x05, 0xC0, /* SPC-5 (no version claimed) */ 0x60, 0x24, /* ZBC r05 */ }; u8 hdr[] = { TYPE_DISK, 0, 0x5, /* claim SPC-3 version compatibility */ 2, 95 - 4, 0, 0, 2 }; VPRINTK("ENTER\n"); /* set scsi removable (RMB) bit per ata bit, or if the * AHCI port says it's external (Hotplug-capable, eSATA). */ if (ata_id_removable(args->id) || (args->dev->link->ap->pflags & ATA_PFLAG_EXTERNAL)) hdr[1] |= (1 << 7); if (args->dev->class == ATA_DEV_ZAC) { hdr[0] = TYPE_ZBC; hdr[2] = 0x7; /* claim SPC-5 version compatibility */ } memcpy(rbuf, hdr, sizeof(hdr)); memcpy(&rbuf[8], "ATA ", 8); ata_id_string(args->id, &rbuf[16], ATA_ID_PROD, 16); /* From SAT, use last 2 words from fw rev unless they are spaces */ ata_id_string(args->id, &rbuf[32], ATA_ID_FW_REV + 2, 4); if (strncmp(&rbuf[32], " ", 4) == 0) ata_id_string(args->id, &rbuf[32], ATA_ID_FW_REV, 4); if (rbuf[32] == 0 || rbuf[32] == ' ') memcpy(&rbuf[32], "n/a ", 4); if (ata_id_zoned_cap(args->id) || args->dev->class == ATA_DEV_ZAC) memcpy(rbuf + 58, versions_zbc, sizeof(versions_zbc)); else memcpy(rbuf + 58, versions, sizeof(versions)); return 0; } /** * ata_scsiop_inq_00 - Simulate INQUIRY VPD page 0, list of pages * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Returns list of inquiry VPD pages available. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf) { int num_pages; static const u8 pages[] = { 0x00, /* page 0x00, this page */ 0x80, /* page 0x80, unit serial no page */ 0x83, /* page 0x83, device ident page */ 0x89, /* page 0x89, ata info page */ 0xb0, /* page 0xb0, block limits page */ 0xb1, /* page 0xb1, block device characteristics page */ 0xb2, /* page 0xb2, thin provisioning page */ 0xb6, /* page 0xb6, zoned block device characteristics */ }; num_pages = sizeof(pages); if (!(args->dev->flags & ATA_DFLAG_ZAC)) num_pages--; rbuf[3] = num_pages; /* number of supported VPD pages */ memcpy(rbuf + 4, pages, num_pages); return 0; } /** * ata_scsiop_inq_80 - Simulate INQUIRY VPD page 80, device serial number * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Returns ATA device serial number. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_80(struct ata_scsi_args *args, u8 *rbuf) { static const u8 hdr[] = { 0, 0x80, /* this page code */ 0, ATA_ID_SERNO_LEN, /* page len */ }; memcpy(rbuf, hdr, sizeof(hdr)); ata_id_string(args->id, (unsigned char *) &rbuf[4], ATA_ID_SERNO, ATA_ID_SERNO_LEN); return 0; } /** * ata_scsiop_inq_83 - Simulate INQUIRY VPD page 83, device identity * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Yields two logical unit device identification designators: * - vendor specific ASCII containing the ATA serial number * - SAT defined "t10 vendor id based" containing ASCII vendor * name ("ATA "), model and serial numbers. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_83(struct ata_scsi_args *args, u8 *rbuf) { const int sat_model_serial_desc_len = 68; int num; rbuf[1] = 0x83; /* this page code */ num = 4; /* piv=0, assoc=lu, code_set=ACSII, designator=vendor */ rbuf[num + 0] = 2; rbuf[num + 3] = ATA_ID_SERNO_LEN; num += 4; ata_id_string(args->id, (unsigned char *) rbuf + num, ATA_ID_SERNO, ATA_ID_SERNO_LEN); num += ATA_ID_SERNO_LEN; /* SAT defined lu model and serial numbers descriptor */ /* piv=0, assoc=lu, code_set=ACSII, designator=t10 vendor id */ rbuf[num + 0] = 2; rbuf[num + 1] = 1; rbuf[num + 3] = sat_model_serial_desc_len; num += 4; memcpy(rbuf + num, "ATA ", 8); num += 8; ata_id_string(args->id, (unsigned char *) rbuf + num, ATA_ID_PROD, ATA_ID_PROD_LEN); num += ATA_ID_PROD_LEN; ata_id_string(args->id, (unsigned char *) rbuf + num, ATA_ID_SERNO, ATA_ID_SERNO_LEN); num += ATA_ID_SERNO_LEN; if (ata_id_has_wwn(args->id)) { /* SAT defined lu world wide name */ /* piv=0, assoc=lu, code_set=binary, designator=NAA */ rbuf[num + 0] = 1; rbuf[num + 1] = 3; rbuf[num + 3] = ATA_ID_WWN_LEN; num += 4; ata_id_string(args->id, (unsigned char *) rbuf + num, ATA_ID_WWN, ATA_ID_WWN_LEN); num += ATA_ID_WWN_LEN; } rbuf[3] = num - 4; /* page len (assume less than 256 bytes) */ return 0; } /** * ata_scsiop_inq_89 - Simulate INQUIRY VPD page 89, ATA info * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Yields SAT-specified ATA VPD page. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_89(struct ata_scsi_args *args, u8 *rbuf) { rbuf[1] = 0x89; /* our page code */ rbuf[2] = (0x238 >> 8); /* page size fixed at 238h */ rbuf[3] = (0x238 & 0xff); memcpy(&rbuf[8], "linux ", 8); memcpy(&rbuf[16], "libata ", 16); memcpy(&rbuf[32], DRV_VERSION, 4); rbuf[36] = 0x34; /* force D2H Reg FIS (34h) */ rbuf[37] = (1 << 7); /* bit 7 indicates Command FIS */ /* TODO: PMP? */ /* we don't store the ATA device signature, so we fake it */ rbuf[38] = ATA_DRDY; /* really, this is Status reg */ rbuf[40] = 0x1; rbuf[48] = 0x1; rbuf[56] = ATA_CMD_ID_ATA; memcpy(&rbuf[60], &args->id[0], 512); return 0; } static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) { struct ata_device *dev = args->dev; u16 min_io_sectors; rbuf[1] = 0xb0; rbuf[3] = 0x3c; /* required VPD size with unmap support */ /* * Optimal transfer length granularity. * * This is always one physical block, but for disks with a smaller * logical than physical sector size we need to figure out what the * latter is. */ min_io_sectors = 1 << ata_id_log2_per_physical_sector(args->id); put_unaligned_be16(min_io_sectors, &rbuf[6]); /* * Optimal unmap granularity. * * The ATA spec doesn't even know about a granularity or alignment * for the TRIM command. We can leave away most of the unmap related * VPD page entries, but we have specifify a granularity to signal * that we support some form of unmap - in thise case via WRITE SAME * with the unmap bit set. */ if (ata_id_has_trim(args->id)) { u64 max_blocks = 65535 * ATA_MAX_TRIM_RNUM; if (dev->horkage & ATA_HORKAGE_MAX_TRIM_128M) max_blocks = 128 << (20 - SECTOR_SHIFT); put_unaligned_be64(max_blocks, &rbuf[36]); put_unaligned_be32(1, &rbuf[28]); } return 0; } static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf) { int form_factor = ata_id_form_factor(args->id); int media_rotation_rate = ata_id_rotation_rate(args->id); u8 zoned = ata_id_zoned_cap(args->id); rbuf[1] = 0xb1; rbuf[3] = 0x3c; rbuf[4] = media_rotation_rate >> 8; rbuf[5] = media_rotation_rate; rbuf[7] = form_factor; if (zoned) rbuf[8] = (zoned << 4); return 0; } static unsigned int ata_scsiop_inq_b2(struct ata_scsi_args *args, u8 *rbuf) { /* SCSI Thin Provisioning VPD page: SBC-3 rev 22 or later */ rbuf[1] = 0xb2; rbuf[3] = 0x4; rbuf[5] = 1 << 6; /* TPWS */ return 0; } static unsigned int ata_scsiop_inq_b6(struct ata_scsi_args *args, u8 *rbuf) { /* * zbc-r05 SCSI Zoned Block device characteristics VPD page */ rbuf[1] = 0xb6; rbuf[3] = 0x3C; /* * URSWRZ bit is only meaningful for host-managed ZAC drives */ if (args->dev->zac_zoned_cap & 1) rbuf[4] |= 1; put_unaligned_be32(args->dev->zac_zones_optimal_open, &rbuf[8]); put_unaligned_be32(args->dev->zac_zones_optimal_nonseq, &rbuf[12]); put_unaligned_be32(args->dev->zac_zones_max_open, &rbuf[16]); return 0; } /** * modecpy - Prepare response for MODE SENSE * @dest: output buffer * @src: data being copied * @n: length of mode page * @changeable: whether changeable parameters are requested * * Generate a generic MODE SENSE page for either current or changeable * parameters. * * LOCKING: * None. */ static void modecpy(u8 *dest, const u8 *src, int n, bool changeable) { if (changeable) { memcpy(dest, src, 2); memset(dest + 2, 0, n - 2); } else { memcpy(dest, src, n); } } /** * ata_msense_caching - Simulate MODE SENSE caching info page * @id: device IDENTIFY data * @buf: output buffer * @changeable: whether changeable parameters are requested * * Generate a caching info page, which conditionally indicates * write caching to the SCSI layer, depending on device * capabilities. * * LOCKING: * None. */ static unsigned int ata_msense_caching(u16 *id, u8 *buf, bool changeable) { modecpy(buf, def_cache_mpage, sizeof(def_cache_mpage), changeable); if (changeable) { buf[2] |= (1 << 2); /* ata_mselect_caching() */ } else { buf[2] |= (ata_id_wcache_enabled(id) << 2); /* write cache enable */ buf[12] |= (!ata_id_rahead_enabled(id) << 5); /* disable read ahead */ } return sizeof(def_cache_mpage); } /** * ata_msense_control - Simulate MODE SENSE control mode page * @dev: ATA device of interest * @buf: output buffer * @changeable: whether changeable parameters are requested * * Generate a generic MODE SENSE control mode page. * * LOCKING: * None. */ static unsigned int ata_msense_control(struct ata_device *dev, u8 *buf, bool changeable) { modecpy(buf, def_control_mpage, sizeof(def_control_mpage), changeable); if (changeable) { buf[2] |= (1 << 2); /* ata_mselect_control() */ } else { bool d_sense = (dev->flags & ATA_DFLAG_D_SENSE); buf[2] |= (d_sense << 2); /* descriptor format sense data */ } return sizeof(def_control_mpage); } /** * ata_msense_rw_recovery - Simulate MODE SENSE r/w error recovery page * @buf: output buffer * @changeable: whether changeable parameters are requested * * Generate a generic MODE SENSE r/w error recovery page. * * LOCKING: * None. */ static unsigned int ata_msense_rw_recovery(u8 *buf, bool changeable) { modecpy(buf, def_rw_recovery_mpage, sizeof(def_rw_recovery_mpage), changeable); return sizeof(def_rw_recovery_mpage); } /* * We can turn this into a real blacklist if it's needed, for now just * blacklist any Maxtor BANC1G10 revision firmware */ static int ata_dev_supports_fua(u16 *id) { unsigned char model[ATA_ID_PROD_LEN + 1], fw[ATA_ID_FW_REV_LEN + 1]; if (!libata_fua) return 0; if (!ata_id_has_fua(id)) return 0; ata_id_c_string(id, model, ATA_ID_PROD, sizeof(model)); ata_id_c_string(id, fw, ATA_ID_FW_REV, sizeof(fw)); if (strcmp(model, "Maxtor")) return 1; if (strcmp(fw, "BANC1G10")) return 1; return 0; /* blacklisted */ } /** * ata_scsiop_mode_sense - Simulate MODE SENSE 6, 10 commands * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Simulate MODE SENSE commands. Assume this is invoked for direct * access devices (e.g. disks) only. There should be no block * descriptor for other device types. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_mode_sense(struct ata_scsi_args *args, u8 *rbuf) { struct ata_device *dev = args->dev; u8 *scsicmd = args->cmd->cmnd, *p = rbuf; static const u8 sat_blk_desc[] = { 0, 0, 0, 0, /* number of blocks: sat unspecified */ 0, 0, 0x2, 0x0 /* block length: 512 bytes */ }; u8 pg, spg; unsigned int ebd, page_control, six_byte; u8 dpofua, bp = 0xff; u16 fp; VPRINTK("ENTER\n"); six_byte = (scsicmd[0] == MODE_SENSE); ebd = !(scsicmd[1] & 0x8); /* dbd bit inverted == edb */ /* * LLBA bit in msense(10) ignored (compliant) */ page_control = scsicmd[2] >> 6; switch (page_control) { case 0: /* current */ case 1: /* changeable */ case 2: /* defaults */ break; /* supported */ case 3: /* saved */ goto saving_not_supp; default: fp = 2; bp = 6; goto invalid_fld; } if (six_byte) p += 4 + (ebd ? 8 : 0); else p += 8 + (ebd ? 8 : 0); pg = scsicmd[2] & 0x3f; spg = scsicmd[3]; /* * No mode subpages supported (yet) but asking for _all_ * subpages may be valid */ if (spg && (spg != ALL_SUB_MPAGES)) { fp = 3; goto invalid_fld; } switch(pg) { case RW_RECOVERY_MPAGE: p += ata_msense_rw_recovery(p, page_control == 1); break; case CACHE_MPAGE: p += ata_msense_caching(args->id, p, page_control == 1); break; case CONTROL_MPAGE: p += ata_msense_control(args->dev, p, page_control == 1); break; case ALL_MPAGES: p += ata_msense_rw_recovery(p, page_control == 1); p += ata_msense_caching(args->id, p, page_control == 1); p += ata_msense_control(args->dev, p, page_control == 1); break; default: /* invalid page code */ fp = 2; goto invalid_fld; } dpofua = 0; if (ata_dev_supports_fua(args->id) && (dev->flags & ATA_DFLAG_LBA48) && (!(dev->flags & ATA_DFLAG_PIO) || dev->multi_count)) dpofua = 1 << 4; if (six_byte) { rbuf[0] = p - rbuf - 1; rbuf[2] |= dpofua; if (ebd) { rbuf[3] = sizeof(sat_blk_desc); memcpy(rbuf + 4, sat_blk_desc, sizeof(sat_blk_desc)); } } else { unsigned int output_len = p - rbuf - 2; rbuf[0] = output_len >> 8; rbuf[1] = output_len; rbuf[3] |= dpofua; if (ebd) { rbuf[7] = sizeof(sat_blk_desc); memcpy(rbuf + 8, sat_blk_desc, sizeof(sat_blk_desc)); } } return 0; invalid_fld: ata_scsi_set_invalid_field(dev, args->cmd, fp, bp); return 1; saving_not_supp: ata_scsi_set_sense(dev, args->cmd, ILLEGAL_REQUEST, 0x39, 0x0); /* "Saving parameters not supported" */ return 1; } /** * ata_scsiop_read_cap - Simulate READ CAPACITY[ 16] commands * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Simulate READ CAPACITY commands. * * LOCKING: * None. */ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf) { struct ata_device *dev = args->dev; u64 last_lba = dev->n_sectors - 1; /* LBA of the last block */ u32 sector_size; /* physical sector size in bytes */ u8 log2_per_phys; u16 lowest_aligned; sector_size = ata_id_logical_sector_size(dev->id); log2_per_phys = ata_id_log2_per_physical_sector(dev->id); lowest_aligned = ata_id_logical_sector_offset(dev->id, log2_per_phys); VPRINTK("ENTER\n"); if (args->cmd->cmnd[0] == READ_CAPACITY) { if (last_lba >= 0xffffffffULL) last_lba = 0xffffffff; /* sector count, 32-bit */ rbuf[0] = last_lba >> (8 * 3); rbuf[1] = last_lba >> (8 * 2); rbuf[2] = last_lba >> (8 * 1); rbuf[3] = last_lba; /* sector size */ rbuf[4] = sector_size >> (8 * 3); rbuf[5] = sector_size >> (8 * 2); rbuf[6] = sector_size >> (8 * 1); rbuf[7] = sector_size; } else { /* sector count, 64-bit */ rbuf[0] = last_lba >> (8 * 7); rbuf[1] = last_lba >> (8 * 6); rbuf[2] = last_lba >> (8 * 5); rbuf[3] = last_lba >> (8 * 4); rbuf[4] = last_lba >> (8 * 3); rbuf[5] = last_lba >> (8 * 2); rbuf[6] = last_lba >> (8 * 1); rbuf[7] = last_lba; /* sector size */ rbuf[ 8] = sector_size >> (8 * 3); rbuf[ 9] = sector_size >> (8 * 2); rbuf[10] = sector_size >> (8 * 1); rbuf[11] = sector_size; rbuf[12] = 0; rbuf[13] = log2_per_phys; rbuf[14] = (lowest_aligned >> 8) & 0x3f; rbuf[15] = lowest_aligned; if (ata_id_has_trim(args->id) && !(dev->horkage & ATA_HORKAGE_NOTRIM)) { rbuf[14] |= 0x80; /* LBPME */ if (ata_id_has_zero_after_trim(args->id) && dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { ata_dev_info(dev, "Enabling discard_zeroes_data\n"); rbuf[14] |= 0x40; /* LBPRZ */ } } if (ata_id_zoned_cap(args->id) || args->dev->class == ATA_DEV_ZAC) rbuf[12] = (1 << 4); /* RC_BASIS */ } return 0; } /** * ata_scsiop_report_luns - Simulate REPORT LUNS command * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Simulate REPORT LUNS command. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_report_luns(struct ata_scsi_args *args, u8 *rbuf) { VPRINTK("ENTER\n"); rbuf[3] = 8; /* just one lun, LUN 0, size 8 bytes */ return 0; } static void atapi_sense_complete(struct ata_queued_cmd *qc) { if (qc->err_mask && ((qc->err_mask & AC_ERR_DEV) == 0)) { /* FIXME: not quite right; we don't want the * translation of taskfile registers into * a sense descriptors, since that's only * correct for ATA, not ATAPI */ ata_gen_passthru_sense(qc); } ata_qc_done(qc); } /* is it pointless to prefer PIO for "safety reasons"? */ static inline int ata_pio_use_silly(struct ata_port *ap) { return (ap->flags & ATA_FLAG_PIO_DMA); } static void atapi_request_sense(struct ata_queued_cmd *qc) { struct ata_port *ap = qc->ap; struct scsi_cmnd *cmd = qc->scsicmd; DPRINTK("ATAPI request sense\n"); memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); #ifdef CONFIG_ATA_SFF if (ap->ops->sff_tf_read) ap->ops->sff_tf_read(ap, &qc->tf); #endif /* fill these in, for the case where they are -not- overwritten */ cmd->sense_buffer[0] = 0x70; cmd->sense_buffer[2] = qc->tf.feature >> 4; ata_qc_reinit(qc); /* setup sg table and init transfer direction */ sg_init_one(&qc->sgent, cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE); ata_sg_init(qc, &qc->sgent, 1); qc->dma_dir = DMA_FROM_DEVICE; memset(&qc->cdb, 0, qc->dev->cdb_len); qc->cdb[0] = REQUEST_SENSE; qc->cdb[4] = SCSI_SENSE_BUFFERSIZE; qc->tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; qc->tf.command = ATA_CMD_PACKET; if (ata_pio_use_silly(ap)) { qc->tf.protocol = ATAPI_PROT_DMA; qc->tf.feature |= ATAPI_PKT_DMA; } else { qc->tf.protocol = ATAPI_PROT_PIO; qc->tf.lbam = SCSI_SENSE_BUFFERSIZE; qc->tf.lbah = 0; } qc->nbytes = SCSI_SENSE_BUFFERSIZE; qc->complete_fn = atapi_sense_complete; ata_qc_issue(qc); DPRINTK("EXIT\n"); } /* * ATAPI devices typically report zero for their SCSI version, and sometimes * deviate from the spec WRT response data format. If SCSI version is * reported as zero like normal, then we make the following fixups: * 1) Fake MMC-5 version, to indicate to the Linux scsi midlayer this is a * modern device. * 2) Ensure response data format / ATAPI information are always correct. */ static void atapi_fixup_inquiry(struct scsi_cmnd *cmd) { u8 buf[4]; sg_copy_to_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, 4); if (buf[2] == 0) { buf[2] = 0x5; buf[3] = 0x32; } sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, 4); } static void atapi_qc_complete(struct ata_queued_cmd *qc) { struct scsi_cmnd *cmd = qc->scsicmd; unsigned int err_mask = qc->err_mask; VPRINTK("ENTER, err_mask 0x%X\n", err_mask); /* handle completion from new EH */ if (unlikely(qc->ap->ops->error_handler && (err_mask || qc->flags & ATA_QCFLAG_SENSE_VALID))) { if (!(qc->flags & ATA_QCFLAG_SENSE_VALID)) { /* FIXME: not quite right; we don't want the * translation of taskfile registers into a * sense descriptors, since that's only * correct for ATA, not ATAPI */ ata_gen_passthru_sense(qc); } /* SCSI EH automatically locks door if sdev->locked is * set. Sometimes door lock request continues to * fail, for example, when no media is present. This * creates a loop - SCSI EH issues door lock which * fails and gets invoked again to acquire sense data * for the failed command. * * If door lock fails, always clear sdev->locked to * avoid this infinite loop. * * This may happen before SCSI scan is complete. Make * sure qc->dev->sdev isn't NULL before dereferencing. */ if (qc->cdb[0] == ALLOW_MEDIUM_REMOVAL && qc->dev->sdev) qc->dev->sdev->locked = 0; qc->scsicmd->result = SAM_STAT_CHECK_CONDITION; ata_qc_done(qc); return; } /* successful completion or old EH failure path */ if (unlikely(err_mask & AC_ERR_DEV)) { cmd->result = SAM_STAT_CHECK_CONDITION; atapi_request_sense(qc); return; } else if (unlikely(err_mask)) { /* FIXME: not quite right; we don't want the * translation of taskfile registers into * a sense descriptors, since that's only * correct for ATA, not ATAPI */ ata_gen_passthru_sense(qc); } else { if (cmd->cmnd[0] == INQUIRY && (cmd->cmnd[1] & 0x03) == 0) atapi_fixup_inquiry(cmd); cmd->result = SAM_STAT_GOOD; } ata_qc_done(qc); } /** * atapi_xlat - Initialize PACKET taskfile * @qc: command structure to be initialized * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on failure. */ static unsigned int atapi_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; struct ata_device *dev = qc->dev; int nodata = (scmd->sc_data_direction == DMA_NONE); int using_pio = !nodata && (dev->flags & ATA_DFLAG_PIO); unsigned int nbytes; memset(qc->cdb, 0, dev->cdb_len); memcpy(qc->cdb, scmd->cmnd, scmd->cmd_len); qc->complete_fn = atapi_qc_complete; qc->tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; if (scmd->sc_data_direction == DMA_TO_DEVICE) { qc->tf.flags |= ATA_TFLAG_WRITE; DPRINTK("direction: write\n"); } qc->tf.command = ATA_CMD_PACKET; ata_qc_set_pc_nbytes(qc); /* check whether ATAPI DMA is safe */ if (!nodata && !using_pio && atapi_check_dma(qc)) using_pio = 1; /* Some controller variants snoop this value for Packet * transfers to do state machine and FIFO management. Thus we * want to set it properly, and for DMA where it is * effectively meaningless. */ nbytes = min(ata_qc_raw_nbytes(qc), (unsigned int)63 * 1024); /* Most ATAPI devices which honor transfer chunk size don't * behave according to the spec when odd chunk size which * matches the transfer length is specified. If the number of * bytes to transfer is 2n+1. According to the spec, what * should happen is to indicate that 2n+1 is going to be * transferred and transfer 2n+2 bytes where the last byte is * padding. * * In practice, this doesn't happen. ATAPI devices first * indicate and transfer 2n bytes and then indicate and * transfer 2 bytes where the last byte is padding. * * This inconsistency confuses several controllers which * perform PIO using DMA such as Intel AHCIs and sil3124/32. * These controllers use actual number of transferred bytes to * update DMA pointer and transfer of 4n+2 bytes make those * controller push DMA pointer by 4n+4 bytes because SATA data * FISes are aligned to 4 bytes. This causes data corruption * and buffer overrun. * * Always setting nbytes to even number solves this problem * because then ATAPI devices don't have to split data at 2n * boundaries. */ if (nbytes & 0x1) nbytes++; qc->tf.lbam = (nbytes & 0xFF); qc->tf.lbah = (nbytes >> 8); if (nodata) qc->tf.protocol = ATAPI_PROT_NODATA; else if (using_pio) qc->tf.protocol = ATAPI_PROT_PIO; else { /* DMA data xfer */ qc->tf.protocol = ATAPI_PROT_DMA; qc->tf.feature |= ATAPI_PKT_DMA; if ((dev->flags & ATA_DFLAG_DMADIR) && (scmd->sc_data_direction != DMA_TO_DEVICE)) /* some SATA bridges need us to indicate data xfer direction */ qc->tf.feature |= ATAPI_DMADIR; } /* FIXME: We need to translate 0x05 READ_BLOCK_LIMITS to a MODE_SENSE as ATAPI tape drives don't get this right otherwise */ return 0; } static struct ata_device *ata_find_dev(struct ata_port *ap, int devno) { if (!sata_pmp_attached(ap)) { if (likely(devno >= 0 && devno < ata_link_max_devices(&ap->link))) return &ap->link.device[devno]; } else { if (likely(devno >= 0 && devno < ap->nr_pmp_links)) return &ap->pmp_link[devno].device[0]; } return NULL; } static struct ata_device *__ata_scsi_find_dev(struct ata_port *ap, const struct scsi_device *scsidev) { int devno; /* skip commands not addressed to targets we simulate */ if (!sata_pmp_attached(ap)) { if (unlikely(scsidev->channel || scsidev->lun)) return NULL; devno = scsidev->id; } else { if (unlikely(scsidev->id || scsidev->lun)) return NULL; devno = scsidev->channel; } return ata_find_dev(ap, devno); } /** * ata_scsi_find_dev - lookup ata_device from scsi_cmnd * @ap: ATA port to which the device is attached * @scsidev: SCSI device from which we derive the ATA device * * Given various information provided in struct scsi_cmnd, * map that onto an ATA bus, and using that mapping * determine which ata_device is associated with the * SCSI command to be sent. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Associated ATA device, or %NULL if not found. */ struct ata_device * ata_scsi_find_dev(struct ata_port *ap, const struct scsi_device *scsidev) { struct ata_device *dev = __ata_scsi_find_dev(ap, scsidev); if (unlikely(!dev || !ata_dev_enabled(dev))) return NULL; return dev; } /* * ata_scsi_map_proto - Map pass-thru protocol value to taskfile value. * @byte1: Byte 1 from pass-thru CDB. * * RETURNS: * ATA_PROT_UNKNOWN if mapping failed/unimplemented, protocol otherwise. */ static u8 ata_scsi_map_proto(u8 byte1) { switch((byte1 & 0x1e) >> 1) { case 3: /* Non-data */ return ATA_PROT_NODATA; case 6: /* DMA */ case 10: /* UDMA Data-in */ case 11: /* UDMA Data-Out */ return ATA_PROT_DMA; case 4: /* PIO Data-in */ case 5: /* PIO Data-out */ return ATA_PROT_PIO; case 12: /* FPDMA */ return ATA_PROT_NCQ; case 0: /* Hard Reset */ case 1: /* SRST */ case 8: /* Device Diagnostic */ case 9: /* Device Reset */ case 7: /* DMA Queued */ case 15: /* Return Response Info */ default: /* Reserved */ break; } return ATA_PROT_UNKNOWN; } /** * ata_scsi_pass_thru - convert ATA pass-thru CDB to taskfile * @qc: command structure to be initialized * * Handles either 12, 16, or 32-byte versions of the CDB. * * RETURNS: * Zero on success, non-zero on failure. */ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &(qc->tf); struct scsi_cmnd *scmd = qc->scsicmd; struct ata_device *dev = qc->dev; const u8 *cdb = scmd->cmnd; u16 fp; u16 cdb_offset = 0; /* 7Fh variable length cmd means a ata pass-thru(32) */ if (cdb[0] == VARIABLE_LENGTH_CMD) cdb_offset = 9; tf->protocol = ata_scsi_map_proto(cdb[1 + cdb_offset]); if (tf->protocol == ATA_PROT_UNKNOWN) { fp = 1; goto invalid_fld; } if ((cdb[2 + cdb_offset] & 0x3) == 0) { /* * When T_LENGTH is zero (No data is transferred), dir should * be DMA_NONE. */ if (scmd->sc_data_direction != DMA_NONE) { fp = 2 + cdb_offset; goto invalid_fld; } if (ata_is_ncq(tf->protocol)) tf->protocol = ATA_PROT_NCQ_NODATA; } /* enable LBA */ tf->flags |= ATA_TFLAG_LBA; /* * 12 and 16 byte CDBs use different offsets to * provide the various register values. */ if (cdb[0] == ATA_16) { /* * 16-byte CDB - may contain extended commands. * * If that is the case, copy the upper byte register values. */ if (cdb[1] & 0x01) { tf->hob_feature = cdb[3]; tf->hob_nsect = cdb[5]; tf->hob_lbal = cdb[7]; tf->hob_lbam = cdb[9]; tf->hob_lbah = cdb[11]; tf->flags |= ATA_TFLAG_LBA48; } else tf->flags &= ~ATA_TFLAG_LBA48; /* * Always copy low byte, device and command registers. */ tf->feature = cdb[4]; tf->nsect = cdb[6]; tf->lbal = cdb[8]; tf->lbam = cdb[10]; tf->lbah = cdb[12]; tf->device = cdb[13]; tf->command = cdb[14]; } else if (cdb[0] == ATA_12) { /* * 12-byte CDB - incapable of extended commands. */ tf->flags &= ~ATA_TFLAG_LBA48; tf->feature = cdb[3]; tf->nsect = cdb[4]; tf->lbal = cdb[5]; tf->lbam = cdb[6]; tf->lbah = cdb[7]; tf->device = cdb[8]; tf->command = cdb[9]; } else { /* * 32-byte CDB - may contain extended command fields. * * If that is the case, copy the upper byte register values. */ if (cdb[10] & 0x01) { tf->hob_feature = cdb[20]; tf->hob_nsect = cdb[22]; tf->hob_lbal = cdb[16]; tf->hob_lbam = cdb[15]; tf->hob_lbah = cdb[14]; tf->flags |= ATA_TFLAG_LBA48; } else tf->flags &= ~ATA_TFLAG_LBA48; tf->feature = cdb[21]; tf->nsect = cdb[23]; tf->lbal = cdb[19]; tf->lbam = cdb[18]; tf->lbah = cdb[17]; tf->device = cdb[24]; tf->command = cdb[25]; tf->auxiliary = get_unaligned_be32(&cdb[28]); } /* For NCQ commands copy the tag value */ if (ata_is_ncq(tf->protocol)) tf->nsect = qc->hw_tag << 3; /* enforce correct master/slave bit */ tf->device = dev->devno ? tf->device | ATA_DEV1 : tf->device & ~ATA_DEV1; switch (tf->command) { /* READ/WRITE LONG use a non-standard sect_size */ case ATA_CMD_READ_LONG: case ATA_CMD_READ_LONG_ONCE: case ATA_CMD_WRITE_LONG: case ATA_CMD_WRITE_LONG_ONCE: if (tf->protocol != ATA_PROT_PIO || tf->nsect != 1) { fp = 1; goto invalid_fld; } qc->sect_size = scsi_bufflen(scmd); break; /* commands using reported Logical Block size (e.g. 512 or 4K) */ case ATA_CMD_CFA_WRITE_NE: case ATA_CMD_CFA_TRANS_SECT: case ATA_CMD_CFA_WRITE_MULT_NE: /* XXX: case ATA_CMD_CFA_WRITE_SECTORS_WITHOUT_ERASE: */ case ATA_CMD_READ: case ATA_CMD_READ_EXT: case ATA_CMD_READ_QUEUED: /* XXX: case ATA_CMD_READ_QUEUED_EXT: */ case ATA_CMD_FPDMA_READ: case ATA_CMD_READ_MULTI: case ATA_CMD_READ_MULTI_EXT: case ATA_CMD_PIO_READ: case ATA_CMD_PIO_READ_EXT: case ATA_CMD_READ_STREAM_DMA_EXT: case ATA_CMD_READ_STREAM_EXT: case ATA_CMD_VERIFY: case ATA_CMD_VERIFY_EXT: case ATA_CMD_WRITE: case ATA_CMD_WRITE_EXT: case ATA_CMD_WRITE_FUA_EXT: case ATA_CMD_WRITE_QUEUED: case ATA_CMD_WRITE_QUEUED_FUA_EXT: case ATA_CMD_FPDMA_WRITE: case ATA_CMD_WRITE_MULTI: case ATA_CMD_WRITE_MULTI_EXT: case ATA_CMD_WRITE_MULTI_FUA_EXT: case ATA_CMD_PIO_WRITE: case ATA_CMD_PIO_WRITE_EXT: case ATA_CMD_WRITE_STREAM_DMA_EXT: case ATA_CMD_WRITE_STREAM_EXT: qc->sect_size = scmd->device->sector_size; break; /* Everything else uses 512 byte "sectors" */ default: qc->sect_size = ATA_SECT_SIZE; } /* * Set flags so that all registers will be written, pass on * write indication (used for PIO/DMA setup), result TF is * copied back and we don't whine too much about its failure. */ tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; if (scmd->sc_data_direction == DMA_TO_DEVICE) tf->flags |= ATA_TFLAG_WRITE; qc->flags |= ATA_QCFLAG_RESULT_TF | ATA_QCFLAG_QUIET; /* * Set transfer length. * * TODO: find out if we need to do more here to * cover scatter/gather case. */ ata_qc_set_pc_nbytes(qc); /* We may not issue DMA commands if no DMA mode is set */ if (tf->protocol == ATA_PROT_DMA && dev->dma_mode == 0) { fp = 1; goto invalid_fld; } /* We may not issue NCQ commands to devices not supporting NCQ */ if (ata_is_ncq(tf->protocol) && !ata_ncq_enabled(dev)) { fp = 1; goto invalid_fld; } /* sanity check for pio multi commands */ if ((cdb[1] & 0xe0) && !is_multi_taskfile(tf)) { fp = 1; goto invalid_fld; } if (is_multi_taskfile(tf)) { unsigned int multi_count = 1 << (cdb[1] >> 5); /* compare the passed through multi_count * with the cached multi_count of libata */ if (multi_count != dev->multi_count) ata_dev_warn(dev, "invalid multi_count %u ignored\n", multi_count); } /* * Filter SET_FEATURES - XFER MODE command -- otherwise, * SET_FEATURES - XFER MODE must be preceded/succeeded * by an update to hardware-specific registers for each * controller (i.e. the reason for ->set_piomode(), * ->set_dmamode(), and ->post_set_mode() hooks). */ if (tf->command == ATA_CMD_SET_FEATURES && tf->feature == SETFEATURES_XFER) { fp = (cdb[0] == ATA_16) ? 4 : 3; goto invalid_fld; } /* * Filter TPM commands by default. These provide an * essentially uncontrolled encrypted "back door" between * applications and the disk. Set libata.allow_tpm=1 if you * have a real reason for wanting to use them. This ensures * that installed software cannot easily mess stuff up without * user intent. DVR type users will probably ship with this enabled * for movie content management. * * Note that for ATA8 we can issue a DCS change and DCS freeze lock * for this and should do in future but that it is not sufficient as * DCS is an optional feature set. Thus we also do the software filter * so that we comply with the TC consortium stated goal that the user * can turn off TC features of their system. */ if (tf->command >= 0x5C && tf->command <= 0x5F && !libata_allow_tpm) { fp = (cdb[0] == ATA_16) ? 14 : 9; goto invalid_fld; } return 0; invalid_fld: ata_scsi_set_invalid_field(dev, scmd, fp, 0xff); return 1; } /** * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim * @cmd: SCSI command being translated * @trmax: Maximum number of entries that will fit in sector_size bytes. * @sector: Starting sector * @count: Total Range of request in logical sectors * * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted * descriptor. * * Upto 64 entries of the format: * 63:48 Range Length * 47:0 LBA * * Range Length of 0 is ignored. * LBA's should be sorted order and not overlap. * * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET * * Return: Number of bytes copied into sglist. */ static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax, u64 sector, u32 count) { struct scsi_device *sdp = cmd->device; size_t len = sdp->sector_size; size_t r; __le64 *buf; u32 i = 0; unsigned long flags; WARN_ON(len > ATA_SCSI_RBUF_SIZE); if (len > ATA_SCSI_RBUF_SIZE) len = ATA_SCSI_RBUF_SIZE; spin_lock_irqsave(&ata_scsi_rbuf_lock, flags); buf = ((void *)ata_scsi_rbuf); memset(buf, 0, len); while (i < trmax) { u64 entry = sector | ((u64)(count > 0xffff ? 0xffff : count) << 48); buf[i++] = __cpu_to_le64(entry); if (count <= 0xffff) break; count -= 0xffff; sector += 0xffff; } r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len); spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags); return r; } /** * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same * @qc: Command to be translated * * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or * an SCT Write Same command. * Based on WRITE SAME has the UNMAP flag: * * - When set translate to DSM TRIM * - When clear translate to SCT Write Same */ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &qc->tf; struct scsi_cmnd *scmd = qc->scsicmd; struct scsi_device *sdp = scmd->device; size_t len = sdp->sector_size; struct ata_device *dev = qc->dev; const u8 *cdb = scmd->cmnd; u64 block; u32 n_block; const u32 trmax = len >> 3; u32 size; u16 fp; u8 bp = 0xff; u8 unmap = cdb[1] & 0x8; /* we may not issue DMA commands if no DMA mode is set */ if (unlikely(!dev->dma_mode)) goto invalid_opcode; /* * We only allow sending this command through the block layer, * as it modifies the DATA OUT buffer, which would corrupt user * memory for SG_IO commands. */ if (unlikely(blk_rq_is_passthrough(scmd->request))) goto invalid_opcode; if (unlikely(scmd->cmd_len < 16)) { fp = 15; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); if (!unmap || (dev->horkage & ATA_HORKAGE_NOTRIM) || !ata_id_has_trim(dev->id)) { fp = 1; bp = 3; goto invalid_fld; } /* If the request is too large the cmd is invalid */ if (n_block > 0xffff * trmax) { fp = 2; goto invalid_fld; } /* * WRITE SAME always has a sector sized buffer as payload, this * should never be a multiple entry S/G list. */ if (!scsi_sg_count(scmd)) goto invalid_param_len; /* * size must match sector size in bytes * For DATA SET MANAGEMENT TRIM in ACS-2 nsect (aka count) * is defined as number of 512 byte blocks to be transferred. */ size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block); if (size != len) goto invalid_param_len; if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) { /* Newer devices support queued TRIM commands */ tf->protocol = ATA_PROT_NCQ; tf->command = ATA_CMD_FPDMA_SEND; tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f; tf->nsect = qc->hw_tag << 3; tf->hob_feature = (size / 512) >> 8; tf->feature = size / 512; tf->auxiliary = 1; } else { tf->protocol = ATA_PROT_DMA; tf->hob_feature = 0; tf->feature = ATA_DSM_TRIM; tf->hob_nsect = (size / 512) >> 8; tf->nsect = size / 512; tf->command = ATA_CMD_DSM; } tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 | ATA_TFLAG_WRITE; ata_qc_set_pc_nbytes(qc); return 0; invalid_fld: ata_scsi_set_invalid_field(dev, scmd, fp, bp); return 1; invalid_param_len: /* "Parameter list length error" */ ata_scsi_set_sense(dev, scmd, ILLEGAL_REQUEST, 0x1a, 0x0); return 1; invalid_opcode: /* "Invalid command operation code" */ ata_scsi_set_sense(dev, scmd, ILLEGAL_REQUEST, 0x20, 0x0); return 1; } /** * ata_scsiop_maint_in - Simulate a subset of MAINTENANCE_IN * @args: device MAINTENANCE_IN data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Yields a subset to satisfy scsi_report_opcode() * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_maint_in(struct ata_scsi_args *args, u8 *rbuf) { struct ata_device *dev = args->dev; u8 *cdb = args->cmd->cmnd; u8 supported = 0; unsigned int err = 0; if (cdb[2] != 1) { ata_dev_warn(dev, "invalid command format %d\n", cdb[2]); err = 2; goto out; } switch (cdb[3]) { case INQUIRY: case MODE_SENSE: case MODE_SENSE_10: case READ_CAPACITY: case SERVICE_ACTION_IN_16: case REPORT_LUNS: case REQUEST_SENSE: case SYNCHRONIZE_CACHE: case REZERO_UNIT: case SEEK_6: case SEEK_10: case TEST_UNIT_READY: case SEND_DIAGNOSTIC: case MAINTENANCE_IN: case READ_6: case READ_10: case READ_16: case WRITE_6: case WRITE_10: case WRITE_16: case ATA_12: case ATA_16: case VERIFY: case VERIFY_16: case MODE_SELECT: case MODE_SELECT_10: case START_STOP: supported = 3; break; case ZBC_IN: case ZBC_OUT: if (ata_id_zoned_cap(dev->id) || dev->class == ATA_DEV_ZAC) supported = 3; break; case SECURITY_PROTOCOL_IN: case SECURITY_PROTOCOL_OUT: if (dev->flags & ATA_DFLAG_TRUSTED) supported = 3; break; default: break; } out: rbuf[1] = supported; /* supported */ return err; } /** * ata_scsi_report_zones_complete - convert ATA output * @qc: command structure returning the data * * Convert T-13 little-endian field representation into * T-10 big-endian field representation. * What a mess. */ static void ata_scsi_report_zones_complete(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; struct sg_mapping_iter miter; unsigned long flags; unsigned int bytes = 0; sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd), SG_MITER_TO_SG | SG_MITER_ATOMIC); local_irq_save(flags); while (sg_miter_next(&miter)) { unsigned int offset = 0; if (bytes == 0) { char *hdr; u32 list_length; u64 max_lba, opt_lba; u16 same; /* Swizzle header */ hdr = miter.addr; list_length = get_unaligned_le32(&hdr[0]); same = get_unaligned_le16(&hdr[4]); max_lba = get_unaligned_le64(&hdr[8]); opt_lba = get_unaligned_le64(&hdr[16]); put_unaligned_be32(list_length, &hdr[0]); hdr[4] = same & 0xf; put_unaligned_be64(max_lba, &hdr[8]); put_unaligned_be64(opt_lba, &hdr[16]); offset += 64; bytes += 64; } while (offset < miter.length) { char *rec; u8 cond, type, non_seq, reset; u64 size, start, wp; /* Swizzle zone descriptor */ rec = miter.addr + offset; type = rec[0] & 0xf; cond = (rec[1] >> 4) & 0xf; non_seq = (rec[1] & 2); reset = (rec[1] & 1); size = get_unaligned_le64(&rec[8]); start = get_unaligned_le64(&rec[16]); wp = get_unaligned_le64(&rec[24]); rec[0] = type; rec[1] = (cond << 4) | non_seq | reset; put_unaligned_be64(size, &rec[8]); put_unaligned_be64(start, &rec[16]); put_unaligned_be64(wp, &rec[24]); WARN_ON(offset + 64 > miter.length); offset += 64; bytes += 64; } } sg_miter_stop(&miter); local_irq_restore(flags); ata_scsi_qc_complete(qc); } static unsigned int ata_scsi_zbc_in_xlat(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &qc->tf; struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; u16 sect, fp = (u16)-1; u8 sa, options, bp = 0xff; u64 block; u32 n_block; if (unlikely(scmd->cmd_len < 16)) { ata_dev_warn(qc->dev, "invalid cdb length %d\n", scmd->cmd_len); fp = 15; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); if (n_block != scsi_bufflen(scmd)) { ata_dev_warn(qc->dev, "non-matching transfer count (%d/%d)\n", n_block, scsi_bufflen(scmd)); goto invalid_param_len; } sa = cdb[1] & 0x1f; if (sa != ZI_REPORT_ZONES) { ata_dev_warn(qc->dev, "invalid service action %d\n", sa); fp = 1; goto invalid_fld; } /* * ZAC allows only for transfers in 512 byte blocks, * and uses a 16 bit value for the transfer count. */ if ((n_block / 512) > 0xffff || n_block < 512 || (n_block % 512)) { ata_dev_warn(qc->dev, "invalid transfer count %d\n", n_block); goto invalid_param_len; } sect = n_block / 512; options = cdb[14] & 0xbf; if (ata_ncq_enabled(qc->dev) && ata_fpdma_zac_mgmt_in_supported(qc->dev)) { tf->protocol = ATA_PROT_NCQ; tf->command = ATA_CMD_FPDMA_RECV; tf->hob_nsect = ATA_SUBCMD_FPDMA_RECV_ZAC_MGMT_IN & 0x1f; tf->nsect = qc->hw_tag << 3; tf->feature = sect & 0xff; tf->hob_feature = (sect >> 8) & 0xff; tf->auxiliary = ATA_SUBCMD_ZAC_MGMT_IN_REPORT_ZONES | (options << 8); } else { tf->command = ATA_CMD_ZAC_MGMT_IN; tf->feature = ATA_SUBCMD_ZAC_MGMT_IN_REPORT_ZONES; tf->protocol = ATA_PROT_DMA; tf->hob_feature = options; tf->hob_nsect = (sect >> 8) & 0xff; tf->nsect = sect & 0xff; } tf->device = ATA_LBA; tf->lbah = (block >> 16) & 0xff; tf->lbam = (block >> 8) & 0xff; tf->lbal = block & 0xff; tf->hob_lbah = (block >> 40) & 0xff; tf->hob_lbam = (block >> 32) & 0xff; tf->hob_lbal = (block >> 24) & 0xff; tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48; qc->flags |= ATA_QCFLAG_RESULT_TF; ata_qc_set_pc_nbytes(qc); qc->complete_fn = ata_scsi_report_zones_complete; return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, bp); return 1; invalid_param_len: /* "Parameter list length error" */ ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x1a, 0x0); return 1; } static unsigned int ata_scsi_zbc_out_xlat(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &qc->tf; struct scsi_cmnd *scmd = qc->scsicmd; struct ata_device *dev = qc->dev; const u8 *cdb = scmd->cmnd; u8 all, sa; u64 block; u32 n_block; u16 fp = (u16)-1; if (unlikely(scmd->cmd_len < 16)) { fp = 15; goto invalid_fld; } sa = cdb[1] & 0x1f; if ((sa != ZO_CLOSE_ZONE) && (sa != ZO_FINISH_ZONE) && (sa != ZO_OPEN_ZONE) && (sa != ZO_RESET_WRITE_POINTER)) { fp = 1; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); if (n_block) { /* * ZAC MANAGEMENT OUT doesn't define any length */ goto invalid_param_len; } all = cdb[14] & 0x1; if (all) { /* * Ignore the block address (zone ID) as defined by ZBC. */ block = 0; } else if (block >= dev->n_sectors) { /* * Block must be a valid zone ID (a zone start LBA). */ fp = 2; goto invalid_fld; } if (ata_ncq_enabled(qc->dev) && ata_fpdma_zac_mgmt_out_supported(qc->dev)) { tf->protocol = ATA_PROT_NCQ_NODATA; tf->command = ATA_CMD_NCQ_NON_DATA; tf->feature = ATA_SUBCMD_NCQ_NON_DATA_ZAC_MGMT_OUT; tf->nsect = qc->hw_tag << 3; tf->auxiliary = sa | ((u16)all << 8); } else { tf->protocol = ATA_PROT_NODATA; tf->command = ATA_CMD_ZAC_MGMT_OUT; tf->feature = sa; tf->hob_feature = all; } tf->lbah = (block >> 16) & 0xff; tf->lbam = (block >> 8) & 0xff; tf->lbal = block & 0xff; tf->hob_lbah = (block >> 40) & 0xff; tf->hob_lbam = (block >> 32) & 0xff; tf->hob_lbal = (block >> 24) & 0xff; tf->device = ATA_LBA; tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48; return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, 0xff); return 1; invalid_param_len: /* "Parameter list length error" */ ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x1a, 0x0); return 1; } /** * ata_mselect_caching - Simulate MODE SELECT for caching info page * @qc: Storage for translated ATA taskfile * @buf: input buffer * @len: number of valid bytes in the input buffer * @fp: out parameter for the failed field on error * * Prepare a taskfile to modify caching information for the device. * * LOCKING: * None. */ static int ata_mselect_caching(struct ata_queued_cmd *qc, const u8 *buf, int len, u16 *fp) { struct ata_taskfile *tf = &qc->tf; struct ata_device *dev = qc->dev; u8 mpage[CACHE_MPAGE_LEN]; u8 wce; int i; /* * The first two bytes of def_cache_mpage are a header, so offsets * in mpage are off by 2 compared to buf. Same for len. */ if (len != CACHE_MPAGE_LEN - 2) { if (len < CACHE_MPAGE_LEN - 2) *fp = len; else *fp = CACHE_MPAGE_LEN - 2; return -EINVAL; } wce = buf[0] & (1 << 2); /* * Check that read-only bits are not modified. */ ata_msense_caching(dev->id, mpage, false); for (i = 0; i < CACHE_MPAGE_LEN - 2; i++) { if (i == 0) continue; if (mpage[i + 2] != buf[i]) { *fp = i; return -EINVAL; } } tf->flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR; tf->protocol = ATA_PROT_NODATA; tf->nsect = 0; tf->command = ATA_CMD_SET_FEATURES; tf->feature = wce ? SETFEATURES_WC_ON : SETFEATURES_WC_OFF; return 0; } /** * ata_mselect_control - Simulate MODE SELECT for control page * @qc: Storage for translated ATA taskfile * @buf: input buffer * @len: number of valid bytes in the input buffer * @fp: out parameter for the failed field on error * * Prepare a taskfile to modify caching information for the device. * * LOCKING: * None. */ static int ata_mselect_control(struct ata_queued_cmd *qc, const u8 *buf, int len, u16 *fp) { struct ata_device *dev = qc->dev; u8 mpage[CONTROL_MPAGE_LEN]; u8 d_sense; int i; /* * The first two bytes of def_control_mpage are a header, so offsets * in mpage are off by 2 compared to buf. Same for len. */ if (len != CONTROL_MPAGE_LEN - 2) { if (len < CONTROL_MPAGE_LEN - 2) *fp = len; else *fp = CONTROL_MPAGE_LEN - 2; return -EINVAL; } d_sense = buf[0] & (1 << 2); /* * Check that read-only bits are not modified. */ ata_msense_control(dev, mpage, false); for (i = 0; i < CONTROL_MPAGE_LEN - 2; i++) { if (i == 0) continue; if (mpage[2 + i] != buf[i]) { *fp = i; return -EINVAL; } } if (d_sense & (1 << 2)) dev->flags |= ATA_DFLAG_D_SENSE; else dev->flags &= ~ATA_DFLAG_D_SENSE; return 0; } /** * ata_scsi_mode_select_xlat - Simulate MODE SELECT 6, 10 commands * @qc: Storage for translated ATA taskfile * * Converts a MODE SELECT command to an ATA SET FEATURES taskfile. * Assume this is invoked for direct access devices (e.g. disks) only. * There should be no block descriptor for other device types. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsi_mode_select_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; u8 pg, spg; unsigned six_byte, pg_len, hdr_len, bd_len; int len; u16 fp = (u16)-1; u8 bp = 0xff; u8 buffer[64]; const u8 *p = buffer; VPRINTK("ENTER\n"); six_byte = (cdb[0] == MODE_SELECT); if (six_byte) { if (scmd->cmd_len < 5) { fp = 4; goto invalid_fld; } len = cdb[4]; hdr_len = 4; } else { if (scmd->cmd_len < 9) { fp = 8; goto invalid_fld; } len = (cdb[7] << 8) + cdb[8]; hdr_len = 8; } /* We only support PF=1, SP=0. */ if ((cdb[1] & 0x11) != 0x10) { fp = 1; bp = (cdb[1] & 0x01) ? 1 : 5; goto invalid_fld; } /* Test early for possible overrun. */ if (!scsi_sg_count(scmd) || scsi_sglist(scmd)->length < len) goto invalid_param_len; /* Move past header and block descriptors. */ if (len < hdr_len) goto invalid_param_len; if (!sg_copy_to_buffer(scsi_sglist(scmd), scsi_sg_count(scmd), buffer, sizeof(buffer))) goto invalid_param_len; if (six_byte) bd_len = p[3]; else bd_len = (p[6] << 8) + p[7]; len -= hdr_len; p += hdr_len; if (len < bd_len) goto invalid_param_len; if (bd_len != 0 && bd_len != 8) { fp = (six_byte) ? 3 : 6; fp += bd_len + hdr_len; goto invalid_param; } len -= bd_len; p += bd_len; if (len == 0) goto skip; /* Parse both possible formats for the mode page headers. */ pg = p[0] & 0x3f; if (p[0] & 0x40) { if (len < 4) goto invalid_param_len; spg = p[1]; pg_len = (p[2] << 8) | p[3]; p += 4; len -= 4; } else { if (len < 2) goto invalid_param_len; spg = 0; pg_len = p[1]; p += 2; len -= 2; } /* * No mode subpages supported (yet) but asking for _all_ * subpages may be valid */ if (spg && (spg != ALL_SUB_MPAGES)) { fp = (p[0] & 0x40) ? 1 : 0; fp += hdr_len + bd_len; goto invalid_param; } if (pg_len > len) goto invalid_param_len; switch (pg) { case CACHE_MPAGE: if (ata_mselect_caching(qc, p, pg_len, &fp) < 0) { fp += hdr_len + bd_len; goto invalid_param; } break; case CONTROL_MPAGE: if (ata_mselect_control(qc, p, pg_len, &fp) < 0) { fp += hdr_len + bd_len; goto invalid_param; } else { goto skip; /* No ATA command to send */ } break; default: /* invalid page code */ fp = bd_len + hdr_len; goto invalid_param; } /* * Only one page has changeable data, so we only support setting one * page at a time. */ if (len > pg_len) goto invalid_param; return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, bp); return 1; invalid_param: ata_scsi_set_invalid_parameter(qc->dev, scmd, fp); return 1; invalid_param_len: /* "Parameter list length error" */ ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x1a, 0x0); return 1; skip: scmd->result = SAM_STAT_GOOD; return 1; } static u8 ata_scsi_trusted_op(u32 len, bool send, bool dma) { if (len == 0) return ATA_CMD_TRUSTED_NONDATA; else if (send) return dma ? ATA_CMD_TRUSTED_SND_DMA : ATA_CMD_TRUSTED_SND; else return dma ? ATA_CMD_TRUSTED_RCV_DMA : ATA_CMD_TRUSTED_RCV; } static unsigned int ata_scsi_security_inout_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; struct ata_taskfile *tf = &qc->tf; u8 secp = cdb[1]; bool send = (cdb[0] == SECURITY_PROTOCOL_OUT); u16 spsp = get_unaligned_be16(&cdb[2]); u32 len = get_unaligned_be32(&cdb[6]); bool dma = !(qc->dev->flags & ATA_DFLAG_PIO); /* * We don't support the ATA "security" protocol. */ if (secp == 0xef) { ata_scsi_set_invalid_field(qc->dev, scmd, 1, 0); return 1; } if (cdb[4] & 7) { /* INC_512 */ if (len > 0xffff) { ata_scsi_set_invalid_field(qc->dev, scmd, 6, 0); return 1; } } else { if (len > 0x01fffe00) { ata_scsi_set_invalid_field(qc->dev, scmd, 6, 0); return 1; } /* convert to the sector-based ATA addressing */ len = (len + 511) / 512; } tf->protocol = dma ? ATA_PROT_DMA : ATA_PROT_PIO; tf->flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR | ATA_TFLAG_LBA; if (send) tf->flags |= ATA_TFLAG_WRITE; tf->command = ata_scsi_trusted_op(len, send, dma); tf->feature = secp; tf->lbam = spsp & 0xff; tf->lbah = spsp >> 8; if (len) { tf->nsect = len & 0xff; tf->lbal = len >> 8; } else { if (!send) tf->lbah = (1 << 7); } ata_qc_set_pc_nbytes(qc); return 0; } /** * ata_scsi_var_len_cdb_xlat - SATL variable length CDB to Handler * @qc: Command to be translated * * Translate a SCSI variable length CDB to specified commands. * It checks a service action value in CDB to call corresponding handler. * * RETURNS: * Zero on success, non-zero on failure * */ static unsigned int ata_scsi_var_len_cdb_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; const u16 sa = get_unaligned_be16(&cdb[8]); /* * if service action represents a ata pass-thru(32) command, * then pass it to ata_scsi_pass_thru handler. */ if (sa == ATA_32) return ata_scsi_pass_thru(qc); /* unsupported service action */ return 1; } /** * ata_get_xlat_func - check if SCSI to ATA translation is possible * @dev: ATA device * @cmd: SCSI command opcode to consider * * Look up the SCSI command given, and determine whether the * SCSI command is to be translated or simulated. * * RETURNS: * Pointer to translation function if possible, %NULL if not. */ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd) { switch (cmd) { case READ_6: case READ_10: case READ_16: case WRITE_6: case WRITE_10: case WRITE_16: return ata_scsi_rw_xlat; case WRITE_SAME_16: return ata_scsi_write_same_xlat; case SYNCHRONIZE_CACHE: if (ata_try_flush_cache(dev)) return ata_scsi_flush_xlat; break; case VERIFY: case VERIFY_16: return ata_scsi_verify_xlat; case ATA_12: case ATA_16: return ata_scsi_pass_thru; case VARIABLE_LENGTH_CMD: return ata_scsi_var_len_cdb_xlat; case MODE_SELECT: case MODE_SELECT_10: return ata_scsi_mode_select_xlat; break; case ZBC_IN: return ata_scsi_zbc_in_xlat; case ZBC_OUT: return ata_scsi_zbc_out_xlat; case SECURITY_PROTOCOL_IN: case SECURITY_PROTOCOL_OUT: if (!(dev->flags & ATA_DFLAG_TRUSTED)) break; return ata_scsi_security_inout_xlat; case START_STOP: return ata_scsi_start_stop_xlat; } return NULL; } /** * ata_scsi_dump_cdb - dump SCSI command contents to dmesg * @ap: ATA port to which the command was being sent * @cmd: SCSI command to dump * * Prints the contents of a SCSI command via printk(). */ void ata_scsi_dump_cdb(struct ata_port *ap, struct scsi_cmnd *cmd) { #ifdef ATA_VERBOSE_DEBUG struct scsi_device *scsidev = cmd->device; VPRINTK("CDB (%u:%d,%d,%lld) %9ph\n", ap->print_id, scsidev->channel, scsidev->id, scsidev->lun, cmd->cmnd); #endif } int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev) { u8 scsi_op = scmd->cmnd[0]; ata_xlat_func_t xlat_func; int rc = 0; if (dev->class == ATA_DEV_ATA || dev->class == ATA_DEV_ZAC) { if (unlikely(!scmd->cmd_len || scmd->cmd_len > dev->cdb_len)) goto bad_cdb_len; xlat_func = ata_get_xlat_func(dev, scsi_op); } else { if (unlikely(!scmd->cmd_len)) goto bad_cdb_len; xlat_func = NULL; if (likely((scsi_op != ATA_16) || !atapi_passthru16)) { /* relay SCSI command to ATAPI device */ int len = COMMAND_SIZE(scsi_op); if (unlikely(len > scmd->cmd_len || len > dev->cdb_len || scmd->cmd_len > ATAPI_CDB_LEN)) goto bad_cdb_len; xlat_func = atapi_xlat; } else { /* ATA_16 passthru, treat as an ATA command */ if (unlikely(scmd->cmd_len > 16)) goto bad_cdb_len; xlat_func = ata_get_xlat_func(dev, scsi_op); } } if (xlat_func) rc = ata_scsi_translate(dev, scmd, xlat_func); else ata_scsi_simulate(dev, scmd); return rc; bad_cdb_len: DPRINTK("bad CDB len=%u, scsi_op=0x%02x, max=%u\n", scmd->cmd_len, scsi_op, dev->cdb_len); scmd->result = DID_ERROR << 16; scmd->scsi_done(scmd); return 0; } /** * ata_scsi_queuecmd - Issue SCSI cdb to libata-managed device * @shost: SCSI host of command to be sent * @cmd: SCSI command to be sent * * In some cases, this function translates SCSI commands into * ATA taskfiles, and queues the taskfiles to be sent to * hardware. In other cases, this function simulates a * SCSI device by evaluating and responding to certain * SCSI commands. This creates the overall effect of * ATA and ATAPI devices appearing as SCSI devices. * * LOCKING: * ATA host lock * * RETURNS: * Return value from __ata_scsi_queuecmd() if @cmd can be queued, * 0 otherwise. */ int ata_scsi_queuecmd(struct Scsi_Host *shost, struct scsi_cmnd *cmd) { struct ata_port *ap; struct ata_device *dev; struct scsi_device *scsidev = cmd->device; int rc = 0; unsigned long irq_flags; ap = ata_shost_to_port(shost); spin_lock_irqsave(ap->lock, irq_flags); ata_scsi_dump_cdb(ap, cmd); dev = ata_scsi_find_dev(ap, scsidev); if (likely(dev)) rc = __ata_scsi_queuecmd(cmd, dev); else { cmd->result = (DID_BAD_TARGET << 16); cmd->scsi_done(cmd); } spin_unlock_irqrestore(ap->lock, irq_flags); return rc; } EXPORT_SYMBOL_GPL(ata_scsi_queuecmd); /** * ata_scsi_simulate - simulate SCSI command on ATA device * @dev: the target device * @cmd: SCSI command being sent to device. * * Interprets and directly executes a select list of SCSI commands * that can be handled internally. * * LOCKING: * spin_lock_irqsave(host lock) */ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd) { struct ata_scsi_args args; const u8 *scsicmd = cmd->cmnd; u8 tmp8; args.dev = dev; args.id = dev->id; args.cmd = cmd; switch(scsicmd[0]) { case INQUIRY: if (scsicmd[1] & 2) /* is CmdDt set? */ ata_scsi_set_invalid_field(dev, cmd, 1, 0xff); else if ((scsicmd[1] & 1) == 0) /* is EVPD clear? */ ata_scsi_rbuf_fill(&args, ata_scsiop_inq_std); else switch (scsicmd[2]) { case 0x00: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_00); break; case 0x80: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_80); break; case 0x83: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_83); break; case 0x89: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_89); break; case 0xb0: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b0); break; case 0xb1: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b1); break; case 0xb2: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b2); break; case 0xb6: if (dev->flags & ATA_DFLAG_ZAC) { ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b6); break; } fallthrough; default: ata_scsi_set_invalid_field(dev, cmd, 2, 0xff); break; } break; case MODE_SENSE: case MODE_SENSE_10: ata_scsi_rbuf_fill(&args, ata_scsiop_mode_sense); break; case READ_CAPACITY: ata_scsi_rbuf_fill(&args, ata_scsiop_read_cap); break; case SERVICE_ACTION_IN_16: if ((scsicmd[1] & 0x1f) == SAI_READ_CAPACITY_16) ata_scsi_rbuf_fill(&args, ata_scsiop_read_cap); else ata_scsi_set_invalid_field(dev, cmd, 1, 0xff); break; case REPORT_LUNS: ata_scsi_rbuf_fill(&args, ata_scsiop_report_luns); break; case REQUEST_SENSE: ata_scsi_set_sense(dev, cmd, 0, 0, 0); cmd->result = (DRIVER_SENSE << 24); break; /* if we reach this, then writeback caching is disabled, * turning this into a no-op. */ case SYNCHRONIZE_CACHE: fallthrough; /* no-op's, complete with success */ case REZERO_UNIT: case SEEK_6: case SEEK_10: case TEST_UNIT_READY: break; case SEND_DIAGNOSTIC: tmp8 = scsicmd[1] & ~(1 << 3); if (tmp8 != 0x4 || scsicmd[3] || scsicmd[4]) ata_scsi_set_invalid_field(dev, cmd, 1, 0xff); break; case MAINTENANCE_IN: if (scsicmd[1] == MI_REPORT_SUPPORTED_OPERATION_CODES) ata_scsi_rbuf_fill(&args, ata_scsiop_maint_in); else ata_scsi_set_invalid_field(dev, cmd, 1, 0xff); break; /* all other commands */ default: ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x20, 0x0); /* "Invalid command operation code" */ break; } cmd->scsi_done(cmd); } int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht) { int i, rc; for (i = 0; i < host->n_ports; i++) { struct ata_port *ap = host->ports[i]; struct Scsi_Host *shost; rc = -ENOMEM; shost = scsi_host_alloc(sht, sizeof(struct ata_port *)); if (!shost) goto err_alloc; shost->eh_noresume = 1; *(struct ata_port **)&shost->hostdata[0] = ap; ap->scsi_host = shost; shost->transportt = ata_scsi_transport_template; shost->unique_id = ap->print_id; shost->max_id = 16; shost->max_lun = 1; shost->max_channel = 1; shost->max_cmd_len = 32; /* Schedule policy is determined by ->qc_defer() * callback and it needs to see every deferred qc. * Set host_blocked to 1 to prevent SCSI midlayer from * automatically deferring requests. */ shost->max_host_blocked = 1; rc = scsi_add_host_with_dma(shost, &ap->tdev, ap->host->dev); if (rc) goto err_alloc; } return 0; err_alloc: while (--i >= 0) { struct Scsi_Host *shost = host->ports[i]->scsi_host; /* scsi_host_put() is in ata_devres_release() */ scsi_remove_host(shost); } return rc; } #ifdef CONFIG_OF static void ata_scsi_assign_ofnode(struct ata_device *dev, struct ata_port *ap) { struct scsi_device *sdev = dev->sdev; struct device *d = ap->host->dev; struct device_node *np = d->of_node; struct device_node *child; for_each_available_child_of_node(np, child) { int ret; u32 val; ret = of_property_read_u32(child, "reg", &val); if (ret) continue; if (val == dev->devno) { dev_dbg(d, "found matching device node\n"); sdev->sdev_gendev.of_node = child; return; } } } #else static void ata_scsi_assign_ofnode(struct ata_device *dev, struct ata_port *ap) { } #endif void ata_scsi_scan_host(struct ata_port *ap, int sync) { int tries = 5; struct ata_device *last_failed_dev = NULL; struct ata_link *link; struct ata_device *dev; repeat: ata_for_each_link(link, ap, EDGE) { ata_for_each_dev(dev, link, ENABLED) { struct scsi_device *sdev; int channel = 0, id = 0; if (dev->sdev) continue; if (ata_is_host_link(link)) id = dev->devno; else channel = link->pmp; sdev = __scsi_add_device(ap->scsi_host, channel, id, 0, NULL); if (!IS_ERR(sdev)) { dev->sdev = sdev; ata_scsi_assign_ofnode(dev, ap); scsi_device_put(sdev); } else { dev->sdev = NULL; } } } /* If we scanned while EH was in progress or allocation * failure occurred, scan would have failed silently. Check * whether all devices are attached. */ ata_for_each_link(link, ap, EDGE) { ata_for_each_dev(dev, link, ENABLED) { if (!dev->sdev) goto exit_loop; } } exit_loop: if (!link) return; /* we're missing some SCSI devices */ if (sync) { /* If caller requested synchrnous scan && we've made * any progress, sleep briefly and repeat. */ if (dev != last_failed_dev) { msleep(100); last_failed_dev = dev; goto repeat; } /* We might be failing to detect boot device, give it * a few more chances. */ if (--tries) { msleep(100); goto repeat; } ata_port_err(ap, "WARNING: synchronous SCSI scan failed without making any progress, switching to async\n"); } queue_delayed_work(system_long_wq, &ap->hotplug_task, round_jiffies_relative(HZ)); } /** * ata_scsi_offline_dev - offline attached SCSI device * @dev: ATA device to offline attached SCSI device for * * This function is called from ata_eh_hotplug() and responsible * for taking the SCSI device attached to @dev offline. This * function is called with host lock which protects dev->sdev * against clearing. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * 1 if attached SCSI device exists, 0 otherwise. */ int ata_scsi_offline_dev(struct ata_device *dev) { if (dev->sdev) { scsi_device_set_state(dev->sdev, SDEV_OFFLINE); return 1; } return 0; } /** * ata_scsi_remove_dev - remove attached SCSI device * @dev: ATA device to remove attached SCSI device for * * This function is called from ata_eh_scsi_hotplug() and * responsible for removing the SCSI device attached to @dev. * * LOCKING: * Kernel thread context (may sleep). */ static void ata_scsi_remove_dev(struct ata_device *dev) { struct ata_port *ap = dev->link->ap; struct scsi_device *sdev; unsigned long flags; /* Alas, we need to grab scan_mutex to ensure SCSI device * state doesn't change underneath us and thus * scsi_device_get() always succeeds. The mutex locking can * be removed if there is __scsi_device_get() interface which * increments reference counts regardless of device state. */ mutex_lock(&ap->scsi_host->scan_mutex); spin_lock_irqsave(ap->lock, flags); /* clearing dev->sdev is protected by host lock */ sdev = dev->sdev; dev->sdev = NULL; if (sdev) { /* If user initiated unplug races with us, sdev can go * away underneath us after the host lock and * scan_mutex are released. Hold onto it. */ if (scsi_device_get(sdev) == 0) { /* The following ensures the attached sdev is * offline on return from ata_scsi_offline_dev() * regardless it wins or loses the race * against this function. */ scsi_device_set_state(sdev, SDEV_OFFLINE); } else { WARN_ON(1); sdev = NULL; } } spin_unlock_irqrestore(ap->lock, flags); mutex_unlock(&ap->scsi_host->scan_mutex); if (sdev) { ata_dev_info(dev, "detaching (SCSI %s)\n", dev_name(&sdev->sdev_gendev)); scsi_remove_device(sdev); scsi_device_put(sdev); } } static void ata_scsi_handle_link_detach(struct ata_link *link) { struct ata_port *ap = link->ap; struct ata_device *dev; ata_for_each_dev(dev, link, ALL) { unsigned long flags; if (!(dev->flags & ATA_DFLAG_DETACHED)) continue; spin_lock_irqsave(ap->lock, flags); dev->flags &= ~ATA_DFLAG_DETACHED; spin_unlock_irqrestore(ap->lock, flags); if (zpodd_dev_enabled(dev)) zpodd_exit(dev); ata_scsi_remove_dev(dev); } } /** * ata_scsi_media_change_notify - send media change event * @dev: Pointer to the disk device with media change event * * Tell the block layer to send a media change notification * event. * * LOCKING: * spin_lock_irqsave(host lock) */ void ata_scsi_media_change_notify(struct ata_device *dev) { if (dev->sdev) sdev_evt_send_simple(dev->sdev, SDEV_EVT_MEDIA_CHANGE, GFP_ATOMIC); } /** * ata_scsi_hotplug - SCSI part of hotplug * @work: Pointer to ATA port to perform SCSI hotplug on * * Perform SCSI part of hotplug. It's executed from a separate * workqueue after EH completes. This is necessary because SCSI * hot plugging requires working EH and hot unplugging is * synchronized with hot plugging with a mutex. * * LOCKING: * Kernel thread context (may sleep). */ void ata_scsi_hotplug(struct work_struct *work) { struct ata_port *ap = container_of(work, struct ata_port, hotplug_task.work); int i; if (ap->pflags & ATA_PFLAG_UNLOADING) { DPRINTK("ENTER/EXIT - unloading\n"); return; } DPRINTK("ENTER\n"); mutex_lock(&ap->scsi_scan_mutex); /* Unplug detached devices. We cannot use link iterator here * because PMP links have to be scanned even if PMP is * currently not attached. Iterate manually. */ ata_scsi_handle_link_detach(&ap->link); if (ap->pmp_link) for (i = 0; i < SATA_PMP_MAX_PORTS; i++) ata_scsi_handle_link_detach(&ap->pmp_link[i]); /* scan for new ones */ ata_scsi_scan_host(ap, 0); mutex_unlock(&ap->scsi_scan_mutex); DPRINTK("EXIT\n"); } /** * ata_scsi_user_scan - indication for user-initiated bus scan * @shost: SCSI host to scan * @channel: Channel to scan * @id: ID to scan * @lun: LUN to scan * * This function is called when user explicitly requests bus * scan. Set probe pending flag and invoke EH. * * LOCKING: * SCSI layer (we don't care) * * RETURNS: * Zero. */ int ata_scsi_user_scan(struct Scsi_Host *shost, unsigned int channel, unsigned int id, u64 lun) { struct ata_port *ap = ata_shost_to_port(shost); unsigned long flags; int devno, rc = 0; if (!ap->ops->error_handler) return -EOPNOTSUPP; if (lun != SCAN_WILD_CARD && lun) return -EINVAL; if (!sata_pmp_attached(ap)) { if (channel != SCAN_WILD_CARD && channel) return -EINVAL; devno = id; } else { if (id != SCAN_WILD_CARD && id) return -EINVAL; devno = channel; } spin_lock_irqsave(ap->lock, flags); if (devno == SCAN_WILD_CARD) { struct ata_link *link; ata_for_each_link(link, ap, EDGE) { struct ata_eh_info *ehi = &link->eh_info; ehi->probe_mask |= ATA_ALL_DEVICES; ehi->action |= ATA_EH_RESET; } } else { struct ata_device *dev = ata_find_dev(ap, devno); if (dev) { struct ata_eh_info *ehi = &dev->link->eh_info; ehi->probe_mask |= 1 << dev->devno; ehi->action |= ATA_EH_RESET; } else rc = -EINVAL; } if (rc == 0) { ata_port_schedule_eh(ap); spin_unlock_irqrestore(ap->lock, flags); ata_port_wait_eh(ap); } else spin_unlock_irqrestore(ap->lock, flags); return rc; } /** * ata_scsi_dev_rescan - initiate scsi_rescan_device() * @work: Pointer to ATA port to perform scsi_rescan_device() * * After ATA pass thru (SAT) commands are executed successfully, * libata need to propagate the changes to SCSI layer. * * LOCKING: * Kernel thread context (may sleep). */ void ata_scsi_dev_rescan(struct work_struct *work) { struct ata_port *ap = container_of(work, struct ata_port, scsi_rescan_task); struct ata_link *link; struct ata_device *dev; unsigned long flags; mutex_lock(&ap->scsi_scan_mutex); spin_lock_irqsave(ap->lock, flags); ata_for_each_link(link, ap, EDGE) { ata_for_each_dev(dev, link, ENABLED) { struct scsi_device *sdev = dev->sdev; if (!sdev) continue; if (scsi_device_get(sdev)) continue; spin_unlock_irqrestore(ap->lock, flags); scsi_rescan_device(&(sdev->sdev_gendev)); scsi_device_put(sdev); spin_lock_irqsave(ap->lock, flags); } } spin_unlock_irqrestore(ap->lock, flags); mutex_unlock(&ap->scsi_scan_mutex); }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 /* SPDX-License-Identifier: GPL-2.0 */ #include <linux/fs.h> #include <linux/buffer_head.h> #include <linux/exportfs.h> #include <linux/iso_fs.h> #include <asm/unaligned.h> enum isofs_file_format { isofs_file_normal = 0, isofs_file_sparse = 1, isofs_file_compressed = 2, }; /* * iso fs inode data in memory */ struct iso_inode_info { unsigned long i_iget5_block; unsigned long i_iget5_offset; unsigned int i_first_extent; unsigned char i_file_format; unsigned char i_format_parm[3]; unsigned long i_next_section_block; unsigned long i_next_section_offset; off_t i_section_size; struct inode vfs_inode; }; /* * iso9660 super-block data in memory */ struct isofs_sb_info { unsigned long s_ninodes; unsigned long s_nzones; unsigned long s_firstdatazone; unsigned long s_log_zone_size; unsigned long s_max_size; int s_rock_offset; /* offset of SUSP fields within SU area */ s32 s_sbsector; unsigned char s_joliet_level; unsigned char s_mapping; unsigned char s_check; unsigned char s_session; unsigned int s_high_sierra:1; unsigned int s_rock:2; unsigned int s_cruft:1; /* Broken disks with high byte of length * containing junk */ unsigned int s_nocompress:1; unsigned int s_hide:1; unsigned int s_showassoc:1; unsigned int s_overriderockperm:1; unsigned int s_uid_set:1; unsigned int s_gid_set:1; umode_t s_fmode; umode_t s_dmode; kgid_t s_gid; kuid_t s_uid; struct nls_table *s_nls_iocharset; /* Native language support table */ }; #define ISOFS_INVALID_MODE ((umode_t) -1) static inline struct isofs_sb_info *ISOFS_SB(struct super_block *sb) { return sb->s_fs_info; } static inline struct iso_inode_info *ISOFS_I(struct inode *inode) { return container_of(inode, struct iso_inode_info, vfs_inode); } static inline int isonum_711(u8 *p) { return *p; } static inline int isonum_712(s8 *p) { return *p; } static inline unsigned int isonum_721(u8 *p) { return get_unaligned_le16(p); } static inline unsigned int isonum_722(u8 *p) { return get_unaligned_be16(p); } static inline unsigned int isonum_723(u8 *p) { /* Ignore bigendian datum due to broken mastering programs */ return get_unaligned_le16(p); } static inline unsigned int isonum_731(u8 *p) { return get_unaligned_le32(p); } static inline unsigned int isonum_732(u8 *p) { return get_unaligned_be32(p); } static inline unsigned int isonum_733(u8 *p) { /* Ignore bigendian datum due to broken mastering programs */ return get_unaligned_le32(p); } extern int iso_date(u8 *, int); struct inode; /* To make gcc happy */ extern int parse_rock_ridge_inode(struct iso_directory_record *, struct inode *, int relocated); extern int get_rock_ridge_filename(struct iso_directory_record *, char *, struct inode *); extern int isofs_name_translate(struct iso_directory_record *, char *, struct inode *); int get_joliet_filename(struct iso_directory_record *, unsigned char *, struct inode *); int get_acorn_filename(struct iso_directory_record *, char *, struct inode *); extern struct dentry *isofs_lookup(struct inode *, struct dentry *, unsigned int flags); extern struct buffer_head *isofs_bread(struct inode *, sector_t); extern int isofs_get_blocks(struct inode *, sector_t, struct buffer_head **, unsigned long); struct inode *__isofs_iget(struct super_block *sb, unsigned long block, unsigned long offset, int relocated); static inline struct inode *isofs_iget(struct super_block *sb, unsigned long block, unsigned long offset) { return __isofs_iget(sb, block, offset, 0); } static inline struct inode *isofs_iget_reloc(struct super_block *sb, unsigned long block, unsigned long offset) { return __isofs_iget(sb, block, offset, 1); } /* Because the inode number is no longer relevant to finding the * underlying meta-data for an inode, we are free to choose a more * convenient 32-bit number as the inode number. The inode numbering * scheme was recommended by Sergey Vlasov and Eric Lammerts. */ static inline unsigned long isofs_get_ino(unsigned long block, unsigned long offset, unsigned long bufbits) { return (block << (bufbits - 5)) | (offset >> 5); } /* Every directory can have many redundant directory entries scattered * throughout the directory tree. First there is the directory entry * with the name of the directory stored in the parent directory. * Then, there is the "." directory entry stored in the directory * itself. Finally, there are possibly many ".." directory entries * stored in all the subdirectories. * * In order for the NFS get_parent() method to work and for the * general consistency of the dcache, we need to make sure the * "i_iget5_block" and "i_iget5_offset" all point to exactly one of * the many redundant entries for each directory. We normalize the * block and offset by always making them point to the "." directory. * * Notice that we do not use the entry for the directory with the name * that is located in the parent directory. Even though choosing this * first directory is more natural, it is much easier to find the "." * entry in the NFS get_parent() method because it is implicitly * encoded in the "extent + ext_attr_length" fields of _all_ the * redundant entries for the directory. Thus, it can always be * reached regardless of which directory entry you have in hand. * * This works because the "." entry is simply the first directory * record when you start reading the file that holds all the directory * records, and this file starts at "extent + ext_attr_length" blocks. * Because the "." entry is always the first entry listed in the * directories file, the normalized "offset" value is always 0. * * You should pass the directory entry in "de". On return, "block" * and "offset" will hold normalized values. Only directories are * affected making it safe to call even for non-directory file * types. */ static inline void isofs_normalize_block_and_offset(struct iso_directory_record* de, unsigned long *block, unsigned long *offset) { /* Only directories are normalized. */ if (de->flags[0] & 2) { *offset = 0; *block = (unsigned long)isonum_733(de->extent) + (unsigned long)isonum_711(de->ext_attr_length); } } extern const struct inode_operations isofs_dir_inode_operations; extern const struct file_operations isofs_dir_operations; extern const struct address_space_operations isofs_symlink_aops; extern const struct export_operations isofs_export_ops;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 /* SPDX-License-Identifier: GPL-2.0-or-later */ #ifndef _LINUX_IO_URING_H #define _LINUX_IO_URING_H #include <linux/sched.h> #include <linux/xarray.h> struct io_identity { struct files_struct *files; struct mm_struct *mm; #ifdef CONFIG_BLK_CGROUP struct cgroup_subsys_state *blkcg_css; #endif const struct cred *creds; struct nsproxy *nsproxy; struct fs_struct *fs; unsigned long fsize; #ifdef CONFIG_AUDIT kuid_t loginuid; unsigned int sessionid; #endif refcount_t count; }; struct io_uring_task { /* submission side */ struct xarray xa; struct wait_queue_head wait; struct file *last; struct percpu_counter inflight; struct io_identity __identity; struct io_identity *identity; atomic_t in_idle; bool sqpoll; }; #if defined(CONFIG_IO_URING) struct sock *io_uring_get_socket(struct file *file); void __io_uring_task_cancel(void); void __io_uring_files_cancel(struct files_struct *files); void __io_uring_free(struct task_struct *tsk); static inline void io_uring_task_cancel(void) { if (current->io_uring && !xa_empty(&current->io_uring->xa)) __io_uring_task_cancel(); } static inline void io_uring_files_cancel(struct files_struct *files) { if (current->io_uring && !xa_empty(&current->io_uring->xa)) __io_uring_files_cancel(files); } static inline void io_uring_free(struct task_struct *tsk) { if (tsk->io_uring) __io_uring_free(tsk); } #else static inline struct sock *io_uring_get_socket(struct file *file) { return NULL; } static inline void io_uring_task_cancel(void) { } static inline void io_uring_files_cancel(struct files_struct *files) { } static inline void io_uring_free(struct task_struct *tsk) { } #endif #endif
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __MAC802154_DRIVER_OPS #define __MAC802154_DRIVER_OPS #include <linux/types.h> #include <linux/rtnetlink.h> #include <net/mac802154.h> #include "ieee802154_i.h" #include "trace.h" static inline int drv_xmit_async(struct ieee802154_local *local, struct sk_buff *skb) { return local->ops->xmit_async(&local->hw, skb); } static inline int drv_xmit_sync(struct ieee802154_local *local, struct sk_buff *skb) { might_sleep(); return local->ops->xmit_sync(&local->hw, skb); } static inline int drv_start(struct ieee802154_local *local) { int ret; might_sleep(); trace_802154_drv_start(local); local->started = true; smp_mb(); ret = local->ops->start(&local->hw); trace_802154_drv_return_int(local, ret); return ret; } static inline void drv_stop(struct ieee802154_local *local) { might_sleep(); trace_802154_drv_stop(local); local->ops->stop(&local->hw); trace_802154_drv_return_void(local); /* sync away all work on the tasklet before clearing started */ tasklet_disable(&local->tasklet); tasklet_enable(&local->tasklet); barrier(); local->started = false; } static inline int drv_set_channel(struct ieee802154_local *local, u8 page, u8 channel) { int ret; might_sleep(); trace_802154_drv_set_channel(local, page, channel); ret = local->ops->set_channel(&local->hw, page, channel); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_tx_power(struct ieee802154_local *local, s32 mbm) { int ret; might_sleep(); if (!local->ops->set_txpower) { WARN_ON(1); return -EOPNOTSUPP; } trace_802154_drv_set_tx_power(local, mbm); ret = local->ops->set_txpower(&local->hw, mbm); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_cca_mode(struct ieee802154_local *local, const struct wpan_phy_cca *cca) { int ret; might_sleep(); if (!local->ops->set_cca_mode) { WARN_ON(1); return -EOPNOTSUPP; } trace_802154_drv_set_cca_mode(local, cca); ret = local->ops->set_cca_mode(&local->hw, cca); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_lbt_mode(struct ieee802154_local *local, bool mode) { int ret; might_sleep(); if (!local->ops->set_lbt) { WARN_ON(1); return -EOPNOTSUPP; } trace_802154_drv_set_lbt_mode(local, mode); ret = local->ops->set_lbt(&local->hw, mode); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_cca_ed_level(struct ieee802154_local *local, s32 mbm) { int ret; might_sleep(); if (!local->ops->set_cca_ed_level) { WARN_ON(1); return -EOPNOTSUPP; } trace_802154_drv_set_cca_ed_level(local, mbm); ret = local->ops->set_cca_ed_level(&local->hw, mbm); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_pan_id(struct ieee802154_local *local, __le16 pan_id) { struct ieee802154_hw_addr_filt filt; int ret; might_sleep(); if (!local->ops->set_hw_addr_filt) { WARN_ON(1); return -EOPNOTSUPP; } filt.pan_id = pan_id; trace_802154_drv_set_pan_id(local, pan_id); ret = local->ops->set_hw_addr_filt(&local->hw, &filt, IEEE802154_AFILT_PANID_CHANGED); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_extended_addr(struct ieee802154_local *local, __le64 extended_addr) { struct ieee802154_hw_addr_filt filt; int ret; might_sleep(); if (!local->ops->set_hw_addr_filt) { WARN_ON(1); return -EOPNOTSUPP; } filt.ieee_addr = extended_addr; trace_802154_drv_set_extended_addr(local, extended_addr); ret = local->ops->set_hw_addr_filt(&local->hw, &filt, IEEE802154_AFILT_IEEEADDR_CHANGED); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_short_addr(struct ieee802154_local *local, __le16 short_addr) { struct ieee802154_hw_addr_filt filt; int ret; might_sleep(); if (!local->ops->set_hw_addr_filt) { WARN_ON(1); return -EOPNOTSUPP; } filt.short_addr = short_addr; trace_802154_drv_set_short_addr(local, short_addr); ret = local->ops->set_hw_addr_filt(&local->hw, &filt, IEEE802154_AFILT_SADDR_CHANGED); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_pan_coord(struct ieee802154_local *local, bool is_coord) { struct ieee802154_hw_addr_filt filt; int ret; might_sleep(); if (!local->ops->set_hw_addr_filt) { WARN_ON(1); return -EOPNOTSUPP; } filt.pan_coord = is_coord; trace_802154_drv_set_pan_coord(local, is_coord); ret = local->ops->set_hw_addr_filt(&local->hw, &filt, IEEE802154_AFILT_PANC_CHANGED); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_csma_params(struct ieee802154_local *local, u8 min_be, u8 max_be, u8 max_csma_backoffs) { int ret; might_sleep(); if (!local->ops->set_csma_params) { WARN_ON(1); return -EOPNOTSUPP; } trace_802154_drv_set_csma_params(local, min_be, max_be, max_csma_backoffs); ret = local->ops->set_csma_params(&local->hw, min_be, max_be, max_csma_backoffs); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_max_frame_retries(struct ieee802154_local *local, s8 max_frame_retries) { int ret; might_sleep(); if (!local->ops->set_frame_retries) { WARN_ON(1); return -EOPNOTSUPP; } trace_802154_drv_set_max_frame_retries(local, max_frame_retries); ret = local->ops->set_frame_retries(&local->hw, max_frame_retries); trace_802154_drv_return_int(local, ret); return ret; } static inline int drv_set_promiscuous_mode(struct ieee802154_local *local, bool on) { int ret; might_sleep(); if (!local->ops->set_promiscuous_mode) { WARN_ON(1); return -EOPNOTSUPP; } trace_802154_drv_set_promiscuous_mode(local, on); ret = local->ops->set_promiscuous_mode(&local->hw, on); trace_802154_drv_return_int(local, ret); return ret; } #endif /* __MAC802154_DRIVER_OPS */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (C) 2020 ARM Ltd. */ #ifndef __ASM_VDSO_PROCESSOR_H #define __ASM_VDSO_PROCESSOR_H #ifndef __ASSEMBLY__ /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */ static __always_inline void rep_nop(void) { asm volatile("rep; nop" ::: "memory"); } static __always_inline void cpu_relax(void) { rep_nop(); } #endif /* __ASSEMBLY__ */ #endif /* __ASM_VDSO_PROCESSOR_H */
1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _SCSI_SCSI_DEVICE_H #define _SCSI_SCSI_DEVICE_H #include <linux/list.h> #include <linux/spinlock.h> #include <linux/workqueue.h> #include <linux/blkdev.h> #include <scsi/scsi.h> #include <linux/atomic.h> struct device; struct request_queue; struct scsi_cmnd; struct scsi_lun; struct scsi_sense_hdr; typedef __u64 __bitwise blist_flags_t; #define SCSI_SENSE_BUFFERSIZE 96 struct scsi_mode_data { __u32 length; __u16 block_descriptor_length; __u8 medium_type; __u8 device_specific; __u8 header_length; __u8 longlba:1; }; /* * sdev state: If you alter this, you also need to alter scsi_sysfs.c * (for the ascii descriptions) and the state model enforcer: * scsi_lib:scsi_device_set_state(). */ enum scsi_device_state { SDEV_CREATED = 1, /* device created but not added to sysfs * Only internal commands allowed (for inq) */ SDEV_RUNNING, /* device properly configured * All commands allowed */ SDEV_CANCEL, /* beginning to delete device * Only error handler commands allowed */ SDEV_DEL, /* device deleted * no commands allowed */ SDEV_QUIESCE, /* Device quiescent. No block commands * will be accepted, only specials (which * originate in the mid-layer) */ SDEV_OFFLINE, /* Device offlined (by error handling or * user request */ SDEV_TRANSPORT_OFFLINE, /* Offlined by transport class error handler */ SDEV_BLOCK, /* Device blocked by scsi lld. No * scsi commands from user or midlayer * should be issued to the scsi * lld. */ SDEV_CREATED_BLOCK, /* same as above but for created devices */ }; enum scsi_scan_mode { SCSI_SCAN_INITIAL = 0, SCSI_SCAN_RESCAN, SCSI_SCAN_MANUAL, }; enum scsi_device_event { SDEV_EVT_MEDIA_CHANGE = 1, /* media has changed */ SDEV_EVT_INQUIRY_CHANGE_REPORTED, /* 3F 03 UA reported */ SDEV_EVT_CAPACITY_CHANGE_REPORTED, /* 2A 09 UA reported */ SDEV_EVT_SOFT_THRESHOLD_REACHED_REPORTED, /* 38 07 UA reported */ SDEV_EVT_MODE_PARAMETER_CHANGE_REPORTED, /* 2A 01 UA reported */ SDEV_EVT_LUN_CHANGE_REPORTED, /* 3F 0E UA reported */ SDEV_EVT_ALUA_STATE_CHANGE_REPORTED, /* 2A 06 UA reported */ SDEV_EVT_POWER_ON_RESET_OCCURRED, /* 29 00 UA reported */ SDEV_EVT_FIRST = SDEV_EVT_MEDIA_CHANGE, SDEV_EVT_LAST = SDEV_EVT_POWER_ON_RESET_OCCURRED, SDEV_EVT_MAXBITS = SDEV_EVT_LAST + 1 }; struct scsi_event { enum scsi_device_event evt_type; struct list_head node; /* put union of data structures, for non-simple event types, * here */ }; /** * struct scsi_vpd - SCSI Vital Product Data * @rcu: For kfree_rcu(). * @len: Length in bytes of @data. * @data: VPD data as defined in various T10 SCSI standard documents. */ struct scsi_vpd { struct rcu_head rcu; int len; unsigned char data[]; }; struct scsi_device { struct Scsi_Host *host; struct request_queue *request_queue; /* the next two are protected by the host->host_lock */ struct list_head siblings; /* list of all devices on this host */ struct list_head same_target_siblings; /* just the devices sharing same target id */ atomic_t device_busy; /* commands actually active on LLDD */ atomic_t device_blocked; /* Device returned QUEUE_FULL. */ atomic_t restarts; spinlock_t list_lock; struct list_head starved_entry; unsigned short queue_depth; /* How deep of a queue we want */ unsigned short max_queue_depth; /* max queue depth */ unsigned short last_queue_full_depth; /* These two are used by */ unsigned short last_queue_full_count; /* scsi_track_queue_full() */ unsigned long last_queue_full_time; /* last queue full time */ unsigned long queue_ramp_up_period; /* ramp up period in jiffies */ #define SCSI_DEFAULT_RAMP_UP_PERIOD (120 * HZ) unsigned long last_queue_ramp_up; /* last queue ramp up time */ unsigned int id, channel; u64 lun; unsigned int manufacturer; /* Manufacturer of device, for using * vendor-specific cmd's */ unsigned sector_size; /* size in bytes */ void *hostdata; /* available to low-level driver */ unsigned char type; char scsi_level; char inq_periph_qual; /* PQ from INQUIRY data */ struct mutex inquiry_mutex; unsigned char inquiry_len; /* valid bytes in 'inquiry' */ unsigned char * inquiry; /* INQUIRY response data */ const char * vendor; /* [back_compat] point into 'inquiry' ... */ const char * model; /* ... after scan; point to static string */ const char * rev; /* ... "nullnullnullnull" before scan */ #define SCSI_VPD_PG_LEN 255 struct scsi_vpd __rcu *vpd_pg0; struct scsi_vpd __rcu *vpd_pg83; struct scsi_vpd __rcu *vpd_pg80; struct scsi_vpd __rcu *vpd_pg89; unsigned char current_tag; /* current tag */ struct scsi_target *sdev_target; /* used only for single_lun */ blist_flags_t sdev_bflags; /* black/white flags as also found in * scsi_devinfo.[hc]. For now used only to * pass settings from slave_alloc to scsi * core. */ unsigned int eh_timeout; /* Error handling timeout */ unsigned removable:1; unsigned changed:1; /* Data invalid due to media change */ unsigned busy:1; /* Used to prevent races */ unsigned lockable:1; /* Able to prevent media removal */ unsigned locked:1; /* Media removal disabled */ unsigned borken:1; /* Tell the Seagate driver to be * painfully slow on this device */ unsigned disconnect:1; /* can disconnect */ unsigned soft_reset:1; /* Uses soft reset option */ unsigned sdtr:1; /* Device supports SDTR messages */ unsigned wdtr:1; /* Device supports WDTR messages */ unsigned ppr:1; /* Device supports PPR messages */ unsigned tagged_supported:1; /* Supports SCSI-II tagged queuing */ unsigned simple_tags:1; /* simple queue tag messages are enabled */ unsigned was_reset:1; /* There was a bus reset on the bus for * this device */ unsigned expecting_cc_ua:1; /* Expecting a CHECK_CONDITION/UNIT_ATTN * because we did a bus reset. */ unsigned use_10_for_rw:1; /* first try 10-byte read / write */ unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */ unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */ unsigned no_report_opcodes:1; /* no REPORT SUPPORTED OPERATION CODES */ unsigned no_write_same:1; /* no WRITE SAME command */ unsigned use_16_for_rw:1; /* Use read/write(16) over read/write(10) */ unsigned skip_ms_page_8:1; /* do not use MODE SENSE page 0x08 */ unsigned skip_ms_page_3f:1; /* do not use MODE SENSE page 0x3f */ unsigned skip_vpd_pages:1; /* do not read VPD pages */ unsigned try_vpd_pages:1; /* attempt to read VPD pages */ unsigned use_192_bytes_for_3f:1; /* ask for 192 bytes from page 0x3f */ unsigned no_start_on_add:1; /* do not issue start on add */ unsigned allow_restart:1; /* issue START_UNIT in error handler */ unsigned manage_start_stop:1; /* Let HLD (sd) manage start/stop */ unsigned start_stop_pwr_cond:1; /* Set power cond. in START_STOP_UNIT */ unsigned no_uld_attach:1; /* disable connecting to upper level drivers */ unsigned select_no_atn:1; unsigned fix_capacity:1; /* READ_CAPACITY is too high by 1 */ unsigned guess_capacity:1; /* READ_CAPACITY might be too high by 1 */ unsigned retry_hwerror:1; /* Retry HARDWARE_ERROR */ unsigned last_sector_bug:1; /* do not use multisector accesses on SD_LAST_BUGGY_SECTORS */ unsigned no_read_disc_info:1; /* Avoid READ_DISC_INFO cmds */ unsigned no_read_capacity_16:1; /* Avoid READ_CAPACITY_16 cmds */ unsigned try_rc_10_first:1; /* Try READ_CAPACACITY_10 first */ unsigned security_supported:1; /* Supports Security Protocols */ unsigned is_visible:1; /* is the device visible in sysfs */ unsigned wce_default_on:1; /* Cache is ON by default */ unsigned no_dif:1; /* T10 PI (DIF) should be disabled */ unsigned broken_fua:1; /* Don't set FUA bit */ unsigned lun_in_cdb:1; /* Store LUN bits in CDB[1] */ unsigned unmap_limit_for_ws:1; /* Use the UNMAP limit for WRITE SAME */ unsigned rpm_autosuspend:1; /* Enable runtime autosuspend at device * creation time */ bool offline_already; /* Device offline message logged */ atomic_t disk_events_disable_depth; /* disable depth for disk events */ DECLARE_BITMAP(supported_events, SDEV_EVT_MAXBITS); /* supported events */ DECLARE_BITMAP(pending_events, SDEV_EVT_MAXBITS); /* pending events */ struct list_head event_list; /* asserted events */ struct work_struct event_work; unsigned int max_device_blocked; /* what device_blocked counts down from */ #define SCSI_DEFAULT_DEVICE_BLOCKED 3 atomic_t iorequest_cnt; atomic_t iodone_cnt; atomic_t ioerr_cnt; struct device sdev_gendev, sdev_dev; struct execute_work ew; /* used to get process context on put */ struct work_struct requeue_work; struct scsi_device_handler *handler; void *handler_data; size_t dma_drain_len; void *dma_drain_buf; unsigned char access_state; struct mutex state_mutex; enum scsi_device_state sdev_state; struct task_struct *quiesced_by; unsigned long sdev_data[]; } __attribute__((aligned(sizeof(unsigned long)))); #define to_scsi_device(d) \ container_of(d, struct scsi_device, sdev_gendev) #define class_to_sdev(d) \ container_of(d, struct scsi_device, sdev_dev) #define transport_class_to_sdev(class_dev) \ to_scsi_device(class_dev->parent) #define sdev_dbg(sdev, fmt, a...) \ dev_dbg(&(sdev)->sdev_gendev, fmt, ##a) /* * like scmd_printk, but the device name is passed in * as a string pointer */ __printf(4, 5) void sdev_prefix_printk(const char *, const struct scsi_device *, const char *, const char *, ...); #define sdev_printk(l, sdev, fmt, a...) \ sdev_prefix_printk(l, sdev, NULL, fmt, ##a) __printf(3, 4) void scmd_printk(const char *, const struct scsi_cmnd *, const char *, ...); #define scmd_dbg(scmd, fmt, a...) \ do { \ if ((scmd)->request->rq_disk) \ sdev_dbg((scmd)->device, "[%s] " fmt, \ (scmd)->request->rq_disk->disk_name, ##a);\ else \ sdev_dbg((scmd)->device, fmt, ##a); \ } while (0) enum scsi_target_state { STARGET_CREATED = 1, STARGET_RUNNING, STARGET_REMOVE, STARGET_CREATED_REMOVE, STARGET_DEL, }; /* * scsi_target: representation of a scsi target, for now, this is only * used for single_lun devices. If no one has active IO to the target, * starget_sdev_user is NULL, else it points to the active sdev. */ struct scsi_target { struct scsi_device *starget_sdev_user; struct list_head siblings; struct list_head devices; struct device dev; struct kref reap_ref; /* last put renders target invisible */ unsigned int channel; unsigned int id; /* target id ... replace * scsi_device.id eventually */ unsigned int create:1; /* signal that it needs to be added */ unsigned int single_lun:1; /* Indicates we should only * allow I/O to one of the luns * for the device at a time. */ unsigned int pdt_1f_for_no_lun:1; /* PDT = 0x1f * means no lun present. */ unsigned int no_report_luns:1; /* Don't use * REPORT LUNS for scanning. */ unsigned int expecting_lun_change:1; /* A device has reported * a 3F/0E UA, other devices on * the same target will also. */ /* commands actually active on LLD. */ atomic_t target_busy; atomic_t target_blocked; /* * LLDs should set this in the slave_alloc host template callout. * If set to zero then there is not limit. */ unsigned int can_queue; unsigned int max_target_blocked; #define SCSI_DEFAULT_TARGET_BLOCKED 3 char scsi_level; enum scsi_target_state state; void *hostdata; /* available to low-level driver */ unsigned long starget_data[]; /* for the transport */ /* starget_data must be the last element!!!! */ } __attribute__((aligned(sizeof(unsigned long)))); #define to_scsi_target(d) container_of(d, struct scsi_target, dev) static inline struct scsi_target *scsi_target(struct scsi_device *sdev) { return to_scsi_target(sdev->sdev_gendev.parent); } #define transport_class_to_starget(class_dev) \ to_scsi_target(class_dev->parent) #define starget_printk(prefix, starget, fmt, a...) \ dev_printk(prefix, &(starget)->dev, fmt, ##a) extern struct scsi_device *__scsi_add_device(struct Scsi_Host *, uint, uint, u64, void *hostdata); extern int scsi_add_device(struct Scsi_Host *host, uint channel, uint target, u64 lun); extern int scsi_register_device_handler(struct scsi_device_handler *scsi_dh); extern void scsi_remove_device(struct scsi_device *); extern int scsi_unregister_device_handler(struct scsi_device_handler *scsi_dh); void scsi_attach_vpd(struct scsi_device *sdev); extern struct scsi_device *scsi_device_from_queue(struct request_queue *q); extern int __must_check scsi_device_get(struct scsi_device *); extern void scsi_device_put(struct scsi_device *); extern struct scsi_device *scsi_device_lookup(struct Scsi_Host *, uint, uint, u64); extern struct scsi_device *__scsi_device_lookup(struct Scsi_Host *, uint, uint, u64); extern struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *, u64); extern struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *, u64); extern void starget_for_each_device(struct scsi_target *, void *, void (*fn)(struct scsi_device *, void *)); extern void __starget_for_each_device(struct scsi_target *, void *, void (*fn)(struct scsi_device *, void *)); /* only exposed to implement shost_for_each_device */ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *, struct scsi_device *); /** * shost_for_each_device - iterate over all devices of a host * @sdev: the &struct scsi_device to use as a cursor * @shost: the &struct scsi_host to iterate over * * Iterator that returns each device attached to @shost. This loop * takes a reference on each device and releases it at the end. If * you break out of the loop, you must call scsi_device_put(sdev). */ #define shost_for_each_device(sdev, shost) \ for ((sdev) = __scsi_iterate_devices((shost), NULL); \ (sdev); \ (sdev) = __scsi_iterate_devices((shost), (sdev))) /** * __shost_for_each_device - iterate over all devices of a host (UNLOCKED) * @sdev: the &struct scsi_device to use as a cursor * @shost: the &struct scsi_host to iterate over * * Iterator that returns each device attached to @shost. It does _not_ * take a reference on the scsi_device, so the whole loop must be * protected by shost->host_lock. * * Note: The only reason to use this is because you need to access the * device list in interrupt context. Otherwise you really want to use * shost_for_each_device instead. */ #define __shost_for_each_device(sdev, shost) \ list_for_each_entry((sdev), &((shost)->__devices), siblings) extern int scsi_change_queue_depth(struct scsi_device *, int); extern int scsi_track_queue_full(struct scsi_device *, int); extern int scsi_set_medium_removal(struct scsi_device *, char); extern int scsi_mode_sense(struct scsi_device *sdev, int dbd, int modepage, unsigned char *buffer, int len, int timeout, int retries, struct scsi_mode_data *data, struct scsi_sense_hdr *); extern int scsi_mode_select(struct scsi_device *sdev, int pf, int sp, int modepage, unsigned char *buffer, int len, int timeout, int retries, struct scsi_mode_data *data, struct scsi_sense_hdr *); extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout, int retries, struct scsi_sense_hdr *sshdr); extern int scsi_get_vpd_page(struct scsi_device *, u8 page, unsigned char *buf, int buf_len); extern int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer, unsigned int len, unsigned char opcode); extern int scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state); extern struct scsi_event *sdev_evt_alloc(enum scsi_device_event evt_type, gfp_t gfpflags); extern void sdev_evt_send(struct scsi_device *sdev, struct scsi_event *evt); extern void sdev_evt_send_simple(struct scsi_device *sdev, enum scsi_device_event evt_type, gfp_t gfpflags); extern int scsi_device_quiesce(struct scsi_device *sdev); extern void scsi_device_resume(struct scsi_device *sdev); extern void scsi_target_quiesce(struct scsi_target *); extern void scsi_target_resume(struct scsi_target *); extern void scsi_scan_target(struct device *parent, unsigned int channel, unsigned int id, u64 lun, enum scsi_scan_mode rescan); extern void scsi_target_reap(struct scsi_target *); extern void scsi_target_block(struct device *); extern void scsi_target_unblock(struct device *, enum scsi_device_state); extern void scsi_remove_target(struct device *); extern const char *scsi_device_state_name(enum scsi_device_state); extern int scsi_is_sdev_device(const struct device *); extern int scsi_is_target_device(const struct device *); extern void scsi_sanitize_inquiry_string(unsigned char *s, int len); extern int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd, int data_direction, void *buffer, unsigned bufflen, unsigned char *sense, struct scsi_sense_hdr *sshdr, int timeout, int retries, u64 flags, req_flags_t rq_flags, int *resid); /* Make sure any sense buffer is the correct size. */ #define scsi_execute(sdev, cmd, data_direction, buffer, bufflen, sense, \ sshdr, timeout, retries, flags, rq_flags, resid) \ ({ \ BUILD_BUG_ON((sense) != NULL && \ sizeof(sense) != SCSI_SENSE_BUFFERSIZE); \ __scsi_execute(sdev, cmd, data_direction, buffer, bufflen, \ sense, sshdr, timeout, retries, flags, rq_flags, \ resid); \ }) static inline int scsi_execute_req(struct scsi_device *sdev, const unsigned char *cmd, int data_direction, void *buffer, unsigned bufflen, struct scsi_sense_hdr *sshdr, int timeout, int retries, int *resid) { return scsi_execute(sdev, cmd, data_direction, buffer, bufflen, NULL, sshdr, timeout, retries, 0, 0, resid); } extern void sdev_disable_disk_events(struct scsi_device *sdev); extern void sdev_enable_disk_events(struct scsi_device *sdev); extern int scsi_vpd_lun_id(struct scsi_device *, char *, size_t); extern int scsi_vpd_tpg_id(struct scsi_device *, int *); #ifdef CONFIG_PM extern int scsi_autopm_get_device(struct scsi_device *); extern void scsi_autopm_put_device(struct scsi_device *); #else static inline int scsi_autopm_get_device(struct scsi_device *d) { return 0; } static inline void scsi_autopm_put_device(struct scsi_device *d) {} #endif /* CONFIG_PM */ static inline int __must_check scsi_device_reprobe(struct scsi_device *sdev) { return device_reprobe(&sdev->sdev_gendev); } static inline unsigned int sdev_channel(struct scsi_device *sdev) { return sdev->channel; } static inline unsigned int sdev_id(struct scsi_device *sdev) { return sdev->id; } #define scmd_id(scmd) sdev_id((scmd)->device) #define scmd_channel(scmd) sdev_channel((scmd)->device) /* * checks for positions of the SCSI state machine */ static inline int scsi_device_online(struct scsi_device *sdev) { return (sdev->sdev_state != SDEV_OFFLINE && sdev->sdev_state != SDEV_TRANSPORT_OFFLINE && sdev->sdev_state != SDEV_DEL); } static inline int scsi_device_blocked(struct scsi_device *sdev) { return sdev->sdev_state == SDEV_BLOCK || sdev->sdev_state == SDEV_CREATED_BLOCK; } static inline int scsi_device_created(struct scsi_device *sdev) { return sdev->sdev_state == SDEV_CREATED || sdev->sdev_state == SDEV_CREATED_BLOCK; } int scsi_internal_device_block_nowait(struct scsi_device *sdev); int scsi_internal_device_unblock_nowait(struct scsi_device *sdev, enum scsi_device_state new_state); /* accessor functions for the SCSI parameters */ static inline int scsi_device_sync(struct scsi_device *sdev) { return sdev->sdtr; } static inline int scsi_device_wide(struct scsi_device *sdev) { return sdev->wdtr; } static inline int scsi_device_dt(struct scsi_device *sdev) { return sdev->ppr; } static inline int scsi_device_dt_only(struct scsi_device *sdev) { if (sdev->inquiry_len < 57) return 0; return (sdev->inquiry[56] & 0x0c) == 0x04; } static inline int scsi_device_ius(struct scsi_device *sdev) { if (sdev->inquiry_len < 57) return 0; return sdev->inquiry[56] & 0x01; } static inline int scsi_device_qas(struct scsi_device *sdev) { if (sdev->inquiry_len < 57) return 0; return sdev->inquiry[56] & 0x02; } static inline int scsi_device_enclosure(struct scsi_device *sdev) { return sdev->inquiry ? (sdev->inquiry[6] & (1<<6)) : 1; } static inline int scsi_device_protection(struct scsi_device *sdev) { if (sdev->no_dif) return 0; return sdev->scsi_level > SCSI_2 && sdev->inquiry[5] & (1<<0); } static inline int scsi_device_tpgs(struct scsi_device *sdev) { return sdev->inquiry ? (sdev->inquiry[5] >> 4) & 0x3 : 0; } /** * scsi_device_supports_vpd - test if a device supports VPD pages * @sdev: the &struct scsi_device to test * * If the 'try_vpd_pages' flag is set it takes precedence. * Otherwise we will assume VPD pages are supported if the * SCSI level is at least SPC-3 and 'skip_vpd_pages' is not set. */ static inline int scsi_device_supports_vpd(struct scsi_device *sdev) { /* Attempt VPD inquiry if the device blacklist explicitly calls * for it. */ if (sdev->try_vpd_pages) return 1; /* * Although VPD inquiries can go to SCSI-2 type devices, * some USB ones crash on receiving them, and the pages * we currently ask for are mandatory for SPC-2 and beyond */ if (sdev->scsi_level >= SCSI_SPC_2 && !sdev->skip_vpd_pages) return 1; return 0; } #define MODULE_ALIAS_SCSI_DEVICE(type) \ MODULE_ALIAS("scsi:t-" __stringify(type) "*") #define SCSI_DEVICE_MODALIAS_FMT "scsi:t-0x%02x" #endif /* _SCSI_SCSI_DEVICE_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_X86_PROCESSOR_H #define _ASM_X86_PROCESSOR_H #include <asm/processor-flags.h> /* Forward declaration, a strange C thing */ struct task_struct; struct mm_struct; struct io_bitmap; struct vm86; #include <asm/math_emu.h> #include <asm/segment.h> #include <asm/types.h> #include <uapi/asm/sigcontext.h> #include <asm/current.h> #include <asm/cpufeatures.h> #include <asm/page.h> #include <asm/pgtable_types.h> #include <asm/percpu.h> #include <asm/msr.h> #include <asm/desc_defs.h> #include <asm/nops.h> #include <asm/special_insns.h> #include <asm/fpu/types.h> #include <asm/unwind_hints.h> #include <asm/vmxfeatures.h> #include <asm/vdso/processor.h> #include <linux/personality.h> #include <linux/cache.h> #include <linux/threads.h> #include <linux/math64.h> #include <linux/err.h> #include <linux/irqflags.h> #include <linux/mem_encrypt.h> /* * We handle most unaligned accesses in hardware. On the other hand * unaligned DMA can be quite expensive on some Nehalem processors. * * Based on this we disable the IP header alignment in network drivers. */ #define NET_IP_ALIGN 0 #define HBP_NUM 4 /* * These alignment constraints are for performance in the vSMP case, * but in the task_struct case we must also meet hardware imposed * alignment requirements of the FPU state: */ #ifdef CONFIG_X86_VSMP # define ARCH_MIN_TASKALIGN (1 << INTERNODE_CACHE_SHIFT) # define ARCH_MIN_MMSTRUCT_ALIGN (1 << INTERNODE_CACHE_SHIFT) #else # define ARCH_MIN_TASKALIGN __alignof__(union fpregs_state) # define ARCH_MIN_MMSTRUCT_ALIGN 0 #endif enum tlb_infos { ENTRIES, NR_INFO }; extern u16 __read_mostly tlb_lli_4k[NR_INFO]; extern u16 __read_mostly tlb_lli_2m[NR_INFO]; extern u16 __read_mostly tlb_lli_4m[NR_INFO]; extern u16 __read_mostly tlb_lld_4k[NR_INFO]; extern u16 __read_mostly tlb_lld_2m[NR_INFO]; extern u16 __read_mostly tlb_lld_4m[NR_INFO]; extern u16 __read_mostly tlb_lld_1g[NR_INFO]; /* * CPU type and hardware bug flags. Kept separately for each CPU. * Members of this structure are referenced in head_32.S, so think twice * before touching them. [mj] */ struct cpuinfo_x86 { __u8 x86; /* CPU family */ __u8 x86_vendor; /* CPU vendor */ __u8 x86_model; __u8 x86_stepping; #ifdef CONFIG_X86_64 /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize; #endif #ifdef CONFIG_X86_VMX_FEATURE_NAMES __u32 vmx_capability[NVMXINTS]; #endif __u8 x86_virt_bits; __u8 x86_phys_bits; /* CPUID returned core id bits: */ __u8 x86_coreid_bits; __u8 cu_id; /* Max extended CPUID function supported: */ __u32 extended_cpuid_level; /* Maximum supported CPUID level, -1=no CPUID: */ int cpuid_level; /* * Align to size of unsigned long because the x86_capability array * is passed to bitops which require the alignment. Use unnamed * union to enforce the array is aligned to size of unsigned long. */ union { __u32 x86_capability[NCAPINTS + NBUGINTS]; unsigned long x86_capability_alignment; }; char x86_vendor_id[16]; char x86_model_id[64]; /* in KB - valid for CPUS which support this call: */ unsigned int x86_cache_size; int x86_cache_alignment; /* In bytes */ /* Cache QoS architectural values, valid only on the BSP: */ int x86_cache_max_rmid; /* max index */ int x86_cache_occ_scale; /* scale to bytes */ int x86_cache_mbm_width_offset; int x86_power; unsigned long loops_per_jiffy; /* cpuid returned max cores value: */ u16 x86_max_cores; u16 apicid; u16 initial_apicid; u16 x86_clflush_size; /* number of cores as seen by the OS: */ u16 booted_cores; /* Physical processor id: */ u16 phys_proc_id; /* Logical processor id: */ u16 logical_proc_id; /* Core id: */ u16 cpu_core_id; u16 cpu_die_id; u16 logical_die_id; /* Index into per_cpu list: */ u16 cpu_index; u32 microcode; /* Address space bits used by the cache internally */ u8 x86_cache_bits; unsigned initialized : 1; } __randomize_layout; struct cpuid_regs { u32 eax, ebx, ecx, edx; }; enum cpuid_regs_idx { CPUID_EAX = 0, CPUID_EBX, CPUID_ECX, CPUID_EDX, }; #define X86_VENDOR_INTEL 0 #define X86_VENDOR_CYRIX 1 #define X86_VENDOR_AMD 2 #define X86_VENDOR_UMC 3 #define X86_VENDOR_CENTAUR 5 #define X86_VENDOR_TRANSMETA 7 #define X86_VENDOR_NSC 8 #define X86_VENDOR_HYGON 9 #define X86_VENDOR_ZHAOXIN 10 #define X86_VENDOR_NUM 11 #define X86_VENDOR_UNKNOWN 0xff /* * capabilities of CPUs */ extern struct cpuinfo_x86 boot_cpu_data; extern struct cpuinfo_x86 new_cpu_data; extern __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS]; extern __u32 cpu_caps_set[NCAPINTS + NBUGINTS]; #ifdef CONFIG_SMP DECLARE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); #define cpu_data(cpu) per_cpu(cpu_info, cpu) #else #define cpu_info boot_cpu_data #define cpu_data(cpu) boot_cpu_data #endif extern const struct seq_operations cpuinfo_op; #define cache_line_size() (boot_cpu_data.x86_cache_alignment) extern void cpu_detect(struct cpuinfo_x86 *c); static inline unsigned long long l1tf_pfn_limit(void) { return BIT_ULL(boot_cpu_data.x86_cache_bits - 1 - PAGE_SHIFT); } extern void early_cpu_init(void); extern void identify_boot_cpu(void); extern void identify_secondary_cpu(struct cpuinfo_x86 *); extern void print_cpu_info(struct cpuinfo_x86 *); void print_cpu_msr(struct cpuinfo_x86 *); #ifdef CONFIG_X86_32 extern int have_cpuid_p(void); #else static inline int have_cpuid_p(void) { return 1; } #endif static inline void native_cpuid(unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { /* ecx is often an input as well as an output. */ asm volatile("cpuid" : "=a" (*eax), "=b" (*ebx), "=c" (*ecx), "=d" (*edx) : "0" (*eax), "2" (*ecx) : "memory"); } #define native_cpuid_reg(reg) \ static inline unsigned int native_cpuid_##reg(unsigned int op) \ { \ unsigned int eax = op, ebx, ecx = 0, edx; \ \ native_cpuid(&eax, &ebx, &ecx, &edx); \ \ return reg; \ } /* * Native CPUID functions returning a single datum. */ native_cpuid_reg(eax) native_cpuid_reg(ebx) native_cpuid_reg(ecx) native_cpuid_reg(edx) /* * Friendlier CR3 helpers. */ static inline unsigned long read_cr3_pa(void) { return __read_cr3() & CR3_ADDR_MASK; } static inline unsigned long native_read_cr3_pa(void) { return __native_read_cr3() & CR3_ADDR_MASK; } static inline void load_cr3(pgd_t *pgdir) { write_cr3(__sme_pa(pgdir)); } /* * Note that while the legacy 'TSS' name comes from 'Task State Segment', * on modern x86 CPUs the TSS also holds information important to 64-bit mode, * unrelated to the task-switch mechanism: */ #ifdef CONFIG_X86_32 /* This is the TSS defined by the hardware. */ struct x86_hw_tss { unsigned short back_link, __blh; unsigned long sp0; unsigned short ss0, __ss0h; unsigned long sp1; /* * We don't use ring 1, so ss1 is a convenient scratch space in * the same cacheline as sp0. We use ss1 to cache the value in * MSR_IA32_SYSENTER_CS. When we context switch * MSR_IA32_SYSENTER_CS, we first check if the new value being * written matches ss1, and, if it's not, then we wrmsr the new * value and update ss1. * * The only reason we context switch MSR_IA32_SYSENTER_CS is * that we set it to zero in vm86 tasks to avoid corrupting the * stack if we were to go through the sysenter path from vm86 * mode. */ unsigned short ss1; /* MSR_IA32_SYSENTER_CS */ unsigned short __ss1h; unsigned long sp2; unsigned short ss2, __ss2h; unsigned long __cr3; unsigned long ip; unsigned long flags; unsigned long ax; unsigned long cx; unsigned long dx; unsigned long bx; unsigned long sp; unsigned long bp; unsigned long si; unsigned long di; unsigned short es, __esh; unsigned short cs, __csh; unsigned short ss, __ssh; unsigned short ds, __dsh; unsigned short fs, __fsh; unsigned short gs, __gsh; unsigned short ldt, __ldth; unsigned short trace; unsigned short io_bitmap_base; } __attribute__((packed)); #else struct x86_hw_tss { u32 reserved1; u64 sp0; /* * We store cpu_current_top_of_stack in sp1 so it's always accessible. * Linux does not use ring 1, so sp1 is not otherwise needed. */ u64 sp1; /* * Since Linux does not use ring 2, the 'sp2' slot is unused by * hardware. entry_SYSCALL_64 uses it as scratch space to stash * the user RSP value. */ u64 sp2; u64 reserved2; u64 ist[7]; u32 reserved3; u32 reserved4; u16 reserved5; u16 io_bitmap_base; } __attribute__((packed)); #endif /* * IO-bitmap sizes: */ #define IO_BITMAP_BITS 65536 #define IO_BITMAP_BYTES (IO_BITMAP_BITS / BITS_PER_BYTE) #define IO_BITMAP_LONGS (IO_BITMAP_BYTES / sizeof(long)) #define IO_BITMAP_OFFSET_VALID_MAP \ (offsetof(struct tss_struct, io_bitmap.bitmap) - \ offsetof(struct tss_struct, x86_tss)) #define IO_BITMAP_OFFSET_VALID_ALL \ (offsetof(struct tss_struct, io_bitmap.mapall) - \ offsetof(struct tss_struct, x86_tss)) #ifdef CONFIG_X86_IOPL_IOPERM /* * sizeof(unsigned long) coming from an extra "long" at the end of the * iobitmap. The limit is inclusive, i.e. the last valid byte. */ # define __KERNEL_TSS_LIMIT \ (IO_BITMAP_OFFSET_VALID_ALL + IO_BITMAP_BYTES + \ sizeof(unsigned long) - 1) #else # define __KERNEL_TSS_LIMIT \ (offsetof(struct tss_struct, x86_tss) + sizeof(struct x86_hw_tss) - 1) #endif /* Base offset outside of TSS_LIMIT so unpriviledged IO causes #GP */ #define IO_BITMAP_OFFSET_INVALID (__KERNEL_TSS_LIMIT + 1) struct entry_stack { char stack[PAGE_SIZE]; }; struct entry_stack_page { struct entry_stack stack; } __aligned(PAGE_SIZE); /* * All IO bitmap related data stored in the TSS: */ struct x86_io_bitmap { /* The sequence number of the last active bitmap. */ u64 prev_sequence; /* * Store the dirty size of the last io bitmap offender. The next * one will have to do the cleanup as the switch out to a non io * bitmap user will just set x86_tss.io_bitmap_base to a value * outside of the TSS limit. So for sane tasks there is no need to * actually touch the io_bitmap at all. */ unsigned int prev_max; /* * The extra 1 is there because the CPU will access an * additional byte beyond the end of the IO permission * bitmap. The extra byte must be all 1 bits, and must * be within the limit. */ unsigned long bitmap[IO_BITMAP_LONGS + 1]; /* * Special I/O bitmap to emulate IOPL(3). All bytes zero, * except the additional byte at the end. */ unsigned long mapall[IO_BITMAP_LONGS + 1]; }; struct tss_struct { /* * The fixed hardware portion. This must not cross a page boundary * at risk of violating the SDM's advice and potentially triggering * errata. */ struct x86_hw_tss x86_tss; struct x86_io_bitmap io_bitmap; } __aligned(PAGE_SIZE); DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); /* Per CPU interrupt stacks */ struct irq_stack { char stack[IRQ_STACK_SIZE]; } __aligned(IRQ_STACK_SIZE); DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr); #ifdef CONFIG_X86_32 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack); #else /* The RO copy can't be accessed with this_cpu_xyz(), so use the RW copy. */ #define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1 #endif #ifdef CONFIG_X86_64 struct fixed_percpu_data { /* * GCC hardcodes the stack canary as %gs:40. Since the * irq_stack is the object at %gs:0, we reserve the bottom * 48 bytes of the irq stack for the canary. */ char gs_base[40]; unsigned long stack_canary; }; DECLARE_PER_CPU_FIRST(struct fixed_percpu_data, fixed_percpu_data) __visible; DECLARE_INIT_PER_CPU(fixed_percpu_data); static inline unsigned long cpu_kernelmode_gs_base(int cpu) { return (unsigned long)per_cpu(fixed_percpu_data.gs_base, cpu); } DECLARE_PER_CPU(unsigned int, irq_count); extern asmlinkage void ignore_sysret(void); /* Save actual FS/GS selectors and bases to current->thread */ void current_save_fsgs(void); #else /* X86_64 */ #ifdef CONFIG_STACKPROTECTOR /* * Make sure stack canary segment base is cached-aligned: * "For Intel Atom processors, avoid non zero segment base address * that is not aligned to cache line boundary at all cost." * (Optim Ref Manual Assembly/Compiler Coding Rule 15.) */ struct stack_canary { char __pad[20]; /* canary at %gs:20 */ unsigned long canary; }; DECLARE_PER_CPU_ALIGNED(struct stack_canary, stack_canary); #endif /* Per CPU softirq stack pointer */ DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr); #endif /* X86_64 */ extern unsigned int fpu_kernel_xstate_size; extern unsigned int fpu_user_xstate_size; struct perf_event; struct thread_struct { /* Cached TLS descriptors: */ struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES]; #ifdef CONFIG_X86_32 unsigned long sp0; #endif unsigned long sp; #ifdef CONFIG_X86_32 unsigned long sysenter_cs; #else unsigned short es; unsigned short ds; unsigned short fsindex; unsigned short gsindex; #endif #ifdef CONFIG_X86_64 unsigned long fsbase; unsigned long gsbase; #else /* * XXX: this could presumably be unsigned short. Alternatively, * 32-bit kernels could be taught to use fsindex instead. */ unsigned long fs; unsigned long gs; #endif /* Save middle states of ptrace breakpoints */ struct perf_event *ptrace_bps[HBP_NUM]; /* Debug status used for traps, single steps, etc... */ unsigned long virtual_dr6; /* Keep track of the exact dr7 value set by the user */ unsigned long ptrace_dr7; /* Fault info: */ unsigned long cr2; unsigned long trap_nr; unsigned long error_code; #ifdef CONFIG_VM86 /* Virtual 86 mode info */ struct vm86 *vm86; #endif /* IO permissions: */ struct io_bitmap *io_bitmap; /* * IOPL. Priviledge level dependent I/O permission which is * emulated via the I/O bitmap to prevent user space from disabling * interrupts. */ unsigned long iopl_emul; unsigned int iopl_warn:1; unsigned int sig_on_uaccess_err:1; /* Floating point and extended processor state */ struct fpu fpu; /* * WARNING: 'fpu' is dynamically-sized. It *MUST* be at * the end. */ }; /* Whitelist the FPU state from the task_struct for hardened usercopy. */ static inline void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size) { *offset = offsetof(struct thread_struct, fpu.state); *size = fpu_kernel_xstate_size; } static inline void native_load_sp0(unsigned long sp0) { this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0); } static __always_inline void native_swapgs(void) { #ifdef CONFIG_X86_64 asm volatile("swapgs" ::: "memory"); #endif } static inline unsigned long current_top_of_stack(void) { /* * We can't read directly from tss.sp0: sp0 on x86_32 is special in * and around vm86 mode and sp0 on x86_64 is special because of the * entry trampoline. */ return this_cpu_read_stable(cpu_current_top_of_stack); } static inline bool on_thread_stack(void) { return (unsigned long)(current_top_of_stack() - current_stack_pointer) < THREAD_SIZE; } #ifdef CONFIG_PARAVIRT_XXL #include <asm/paravirt.h> #else #define __cpuid native_cpuid static inline void load_sp0(unsigned long sp0) { native_load_sp0(sp0); } #endif /* CONFIG_PARAVIRT_XXL */ /* Free all resources held by a thread. */ extern void release_thread(struct task_struct *); unsigned long get_wchan(struct task_struct *p); /* * Generic CPUID function * clear %ecx since some cpus (Cyrix MII) do not set or clear %ecx * resulting in stale register contents being returned. */ static inline void cpuid(unsigned int op, unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { *eax = op; *ecx = 0; __cpuid(eax, ebx, ecx, edx); } /* Some CPUID calls want 'count' to be placed in ecx */ static inline void cpuid_count(unsigned int op, int count, unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx) { *eax = op; *ecx = count; __cpuid(eax, ebx, ecx, edx); } /* * CPUID functions returning a single datum */ static inline unsigned int cpuid_eax(unsigned int op) { unsigned int eax, ebx, ecx, edx; cpuid(op, &eax, &ebx, &ecx, &edx); return eax; } static inline unsigned int cpuid_ebx(unsigned int op) { unsigned int eax, ebx, ecx, edx; cpuid(op, &eax, &ebx, &ecx, &edx); return ebx; } static inline unsigned int cpuid_ecx(unsigned int op) { unsigned int eax, ebx, ecx, edx; cpuid(op, &eax, &ebx, &ecx, &edx); return ecx; } static inline unsigned int cpuid_edx(unsigned int op) { unsigned int eax, ebx, ecx, edx; cpuid(op, &eax, &ebx, &ecx, &edx); return edx; } extern void select_idle_routine(const struct cpuinfo_x86 *c); extern void amd_e400_c1e_apic_setup(void); extern unsigned long boot_option_idle_override; enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT, IDLE_POLL}; extern void enable_sep_cpu(void); extern int sysenter_setup(void); /* Defined in head.S */ extern struct desc_ptr early_gdt_descr; extern void switch_to_new_gdt(int); extern void load_direct_gdt(int); extern void load_fixmap_gdt(int); extern void load_percpu_segment(int); extern void cpu_init(void); extern void cpu_init_exception_handling(void); extern void cr4_init(void); static inline unsigned long get_debugctlmsr(void) { unsigned long debugctlmsr = 0; #ifndef CONFIG_X86_DEBUGCTLMSR if (boot_cpu_data.x86 < 6) return 0; #endif rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctlmsr); return debugctlmsr; } static inline void update_debugctlmsr(unsigned long debugctlmsr) { #ifndef CONFIG_X86_DEBUGCTLMSR if (boot_cpu_data.x86 < 6) return; #endif wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctlmsr); } extern void set_task_blockstep(struct task_struct *task, bool on); /* Boot loader type from the setup header: */ extern int bootloader_type; extern int bootloader_version; extern char ignore_fpu_irq; #define HAVE_ARCH_PICK_MMAP_LAYOUT 1 #define ARCH_HAS_PREFETCHW #define ARCH_HAS_SPINLOCK_PREFETCH #ifdef CONFIG_X86_32 # define BASE_PREFETCH "" # define ARCH_HAS_PREFETCH #else # define BASE_PREFETCH "prefetcht0 %P1" #endif /* * Prefetch instructions for Pentium III (+) and AMD Athlon (+) * * It's not worth to care about 3dnow prefetches for the K6 * because they are microcoded there and very slow. */ static inline void prefetch(const void *x) { alternative_input(BASE_PREFETCH, "prefetchnta %P1", X86_FEATURE_XMM, "m" (*(const char *)x)); } /* * 3dnow prefetch to get an exclusive cache line. * Useful for spinlocks to avoid one state transition in the * cache coherency protocol: */ static __always_inline void prefetchw(const void *x) { alternative_input(BASE_PREFETCH, "prefetchw %P1", X86_FEATURE_3DNOWPREFETCH, "m" (*(const char *)x)); } static inline void spin_lock_prefetch(const void *x) { prefetchw(x); } #define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \ TOP_OF_KERNEL_STACK_PADDING) #define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1)) #define task_pt_regs(task) \ ({ \ unsigned long __ptr = (unsigned long)task_stack_page(task); \ __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \ ((struct pt_regs *)__ptr) - 1; \ }) #ifdef CONFIG_X86_32 #define INIT_THREAD { \ .sp0 = TOP_OF_INIT_STACK, \ .sysenter_cs = __KERNEL_CS, \ } #define KSTK_ESP(task) (task_pt_regs(task)->sp) #else #define INIT_THREAD { } extern unsigned long KSTK_ESP(struct task_struct *task); #endif /* CONFIG_X86_64 */ extern void start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp); /* * This decides where the kernel will search for a free chunk of vm * space during mmap's. */ #define __TASK_UNMAPPED_BASE(task_size) (PAGE_ALIGN(task_size / 3)) #define TASK_UNMAPPED_BASE __TASK_UNMAPPED_BASE(TASK_SIZE_LOW) #define KSTK_EIP(task) (task_pt_regs(task)->ip) /* Get/set a process' ability to use the timestamp counter instruction */ #define GET_TSC_CTL(adr) get_tsc_mode((adr)) #define SET_TSC_CTL(val) set_tsc_mode((val)) extern int get_tsc_mode(unsigned long adr); extern int set_tsc_mode(unsigned int val); DECLARE_PER_CPU(u64, msr_misc_features_shadow); #ifdef CONFIG_CPU_SUP_AMD extern u16 amd_get_nb_id(int cpu); extern u32 amd_get_nodes_per_socket(void); #else static inline u16 amd_get_nb_id(int cpu) { return 0; } static inline u32 amd_get_nodes_per_socket(void) { return 0; } #endif static inline uint32_t hypervisor_cpuid_base(const char *sig, uint32_t leaves) { uint32_t base, eax, signature[3]; for (base = 0x40000000; base < 0x40010000; base += 0x100) { cpuid(base, &eax, &signature[0], &signature[1], &signature[2]); if (!memcmp(sig, signature, 12) && (leaves == 0 || ((eax - base) >= leaves))) return base; } return 0; } extern unsigned long arch_align_stack(unsigned long sp); void free_init_pages(const char *what, unsigned long begin, unsigned long end); extern void free_kernel_image_pages(const char *what, void *begin, void *end); void default_idle(void); #ifdef CONFIG_XEN bool xen_set_default_idle(void); #else #define xen_set_default_idle 0 #endif void stop_this_cpu(void *dummy); void microcode_check(void); enum l1tf_mitigations { L1TF_MITIGATION_OFF, L1TF_MITIGATION_FLUSH_NOWARN, L1TF_MITIGATION_FLUSH, L1TF_MITIGATION_FLUSH_NOSMT, L1TF_MITIGATION_FULL, L1TF_MITIGATION_FULL_FORCE }; extern enum l1tf_mitigations l1tf_mitigation; enum mds_mitigations { MDS_MITIGATION_OFF, MDS_MITIGATION_FULL, MDS_MITIGATION_VMWERV, }; #endif /* _ASM_X86_PROCESSOR_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 /* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (c) 2008 Intel Corporation * Author: Matthew Wilcox <willy@linux.intel.com> * * Please see kernel/locking/semaphore.c for documentation of these functions */ #ifndef __LINUX_SEMAPHORE_H #define __LINUX_SEMAPHORE_H #include <linux/list.h> #include <linux/spinlock.h> /* Please don't access any members of this structure directly */ struct semaphore { raw_spinlock_t lock; unsigned int count; struct list_head wait_list; }; #define __SEMAPHORE_INITIALIZER(name, n) \ { \ .lock = __RAW_SPIN_LOCK_UNLOCKED((name).lock), \ .count = n, \ .wait_list = LIST_HEAD_INIT((name).wait_list), \ } #define DEFINE_SEMAPHORE(name) \ struct semaphore name = __SEMAPHORE_INITIALIZER(name, 1) static inline void sema_init(struct semaphore *sem, int val) { static struct lock_class_key __key; *sem = (struct semaphore) __SEMAPHORE_INITIALIZER(*sem, val); lockdep_init_map(&sem->lock.dep_map, "semaphore->lock", &__key, 0); } extern void down(struct semaphore *sem); extern int __must_check down_interruptible(struct semaphore *sem); extern int __must_check down_killable(struct semaphore *sem); extern int __must_check down_trylock(struct semaphore *sem); extern int __must_check down_timeout(struct semaphore *sem, long jiffies); extern void up(struct semaphore *sem); #endif /* __LINUX_SEMAPHORE_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_PERCPU_RWSEM_H #define _LINUX_PERCPU_RWSEM_H #include <linux/atomic.h> #include <linux/percpu.h> #include <linux/rcuwait.h> #include <linux/wait.h> #include <linux/rcu_sync.h> #include <linux/lockdep.h> struct percpu_rw_semaphore { struct rcu_sync rss; unsigned int __percpu *read_count; struct rcuwait writer; wait_queue_head_t waiters; atomic_t block; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif }; #ifdef CONFIG_DEBUG_LOCK_ALLOC #define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname }, #else #define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) #endif #define __DEFINE_PERCPU_RWSEM(name, is_static) \ static DEFINE_PER_CPU(unsigned int, __percpu_rwsem_rc_##name); \ is_static struct percpu_rw_semaphore name = { \ .rss = __RCU_SYNC_INITIALIZER(name.rss), \ .read_count = &__percpu_rwsem_rc_##name, \ .writer = __RCUWAIT_INITIALIZER(name.writer), \ .waiters = __WAIT_QUEUE_HEAD_INITIALIZER(name.waiters), \ .block = ATOMIC_INIT(0), \ __PERCPU_RWSEM_DEP_MAP_INIT(name) \ } #define DEFINE_PERCPU_RWSEM(name) \ __DEFINE_PERCPU_RWSEM(name, /* not static */) #define DEFINE_STATIC_PERCPU_RWSEM(name) \ __DEFINE_PERCPU_RWSEM(name, static) extern bool __percpu_down_read(struct percpu_rw_semaphore *, bool); static inline void percpu_down_read(struct percpu_rw_semaphore *sem) { might_sleep(); rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_); preempt_disable(); /* * We are in an RCU-sched read-side critical section, so the writer * cannot both change sem->state from readers_fast and start checking * counters while we are here. So if we see !sem->state, we know that * the writer won't be checking until we're past the preempt_enable() * and that once the synchronize_rcu() is done, the writer will see * anything we did within this RCU-sched read-size critical section. */ if (likely(rcu_sync_is_idle(&sem->rss))) this_cpu_inc(*sem->read_count); else __percpu_down_read(sem, false); /* Unconditional memory barrier */ /* * The preempt_enable() prevents the compiler from * bleeding the critical section out. */ preempt_enable(); } static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem) { bool ret = true; preempt_disable(); /* * Same as in percpu_down_read(). */ if (likely(rcu_sync_is_idle(&sem->rss))) this_cpu_inc(*sem->read_count); else ret = __percpu_down_read(sem, true); /* Unconditional memory barrier */ preempt_enable(); /* * The barrier() from preempt_enable() prevents the compiler from * bleeding the critical section out. */ if (ret) rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_); return ret; } static inline void percpu_up_read(struct percpu_rw_semaphore *sem) { rwsem_release(&sem->dep_map, _RET_IP_); preempt_disable(); /* * Same as in percpu_down_read(). */ if (likely(rcu_sync_is_idle(&sem->rss))) { this_cpu_dec(*sem->read_count); } else { /* * slowpath; reader will only ever wake a single blocked * writer. */ smp_mb(); /* B matches C */ /* * In other words, if they see our decrement (presumably to * aggregate zero, as that is the only time it matters) they * will also see our critical section. */ this_cpu_dec(*sem->read_count); rcuwait_wake_up(&sem->writer); } preempt_enable(); } extern void percpu_down_write(struct percpu_rw_semaphore *); extern void percpu_up_write(struct percpu_rw_semaphore *); extern int __percpu_init_rwsem(struct percpu_rw_semaphore *, const char *, struct lock_class_key *); extern void percpu_free_rwsem(struct percpu_rw_semaphore *); #define percpu_init_rwsem(sem) \ ({ \ static struct lock_class_key rwsem_key; \ __percpu_init_rwsem(sem, #sem, &rwsem_key); \ }) #define percpu_rwsem_is_held(sem) lockdep_is_held(sem) #define percpu_rwsem_assert_held(sem) lockdep_assert_held(sem) static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem, bool read, unsigned long ip) { lock_release(&sem->dep_map, ip); } static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem, bool read, unsigned long ip) { lock_acquire(&sem->dep_map, 0, 1, read, 1, NULL, ip); } #endif
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_SWAP_H #define _LINUX_SWAP_H #include <linux/spinlock.h> #include <linux/linkage.h> #include <linux/mmzone.h> #include <linux/list.h> #include <linux/memcontrol.h> #include <linux/sched.h> #include <linux/node.h> #include <linux/fs.h> #include <linux/atomic.h> #include <linux/page-flags.h> #include <asm/page.h> struct notifier_block; struct bio; struct pagevec; #define SWAP_FLAG_PREFER 0x8000 /* set if swap priority specified */ #define SWAP_FLAG_PRIO_MASK 0x7fff #define SWAP_FLAG_PRIO_SHIFT 0 #define SWAP_FLAG_DISCARD 0x10000 /* enable discard for swap */ #define SWAP_FLAG_DISCARD_ONCE 0x20000 /* discard swap area at swapon-time */ #define SWAP_FLAG_DISCARD_PAGES 0x40000 /* discard page-clusters after use */ #define SWAP_FLAGS_VALID (SWAP_FLAG_PRIO_MASK | SWAP_FLAG_PREFER | \ SWAP_FLAG_DISCARD | SWAP_FLAG_DISCARD_ONCE | \ SWAP_FLAG_DISCARD_PAGES) #define SWAP_BATCH 64 static inline int current_is_kswapd(void) { return current->flags & PF_KSWAPD; } /* * MAX_SWAPFILES defines the maximum number of swaptypes: things which can * be swapped to. The swap type and the offset into that swap type are * encoded into pte's and into pgoff_t's in the swapcache. Using five bits * for the type means that the maximum number of swapcache pages is 27 bits * on 32-bit-pgoff_t architectures. And that assumes that the architecture packs * the type/offset into the pte as 5/27 as well. */ #define MAX_SWAPFILES_SHIFT 5 /* * Use some of the swap files numbers for other purposes. This * is a convenient way to hook into the VM to trigger special * actions on faults. */ /* * Unaddressable device memory support. See include/linux/hmm.h and * Documentation/vm/hmm.rst. Short description is we need struct pages for * device memory that is unaddressable (inaccessible) by CPU, so that we can * migrate part of a process memory to device memory. * * When a page is migrated from CPU to device, we set the CPU page table entry * to a special SWP_DEVICE_* entry. */ #ifdef CONFIG_DEVICE_PRIVATE #define SWP_DEVICE_NUM 2 #define SWP_DEVICE_WRITE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM) #define SWP_DEVICE_READ (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+1) #else #define SWP_DEVICE_NUM 0 #endif /* * NUMA node memory migration support */ #ifdef CONFIG_MIGRATION #define SWP_MIGRATION_NUM 2 #define SWP_MIGRATION_READ (MAX_SWAPFILES + SWP_HWPOISON_NUM) #define SWP_MIGRATION_WRITE (MAX_SWAPFILES + SWP_HWPOISON_NUM + 1) #else #define SWP_MIGRATION_NUM 0 #endif /* * Handling of hardware poisoned pages with memory corruption. */ #ifdef CONFIG_MEMORY_FAILURE #define SWP_HWPOISON_NUM 1 #define SWP_HWPOISON MAX_SWAPFILES #else #define SWP_HWPOISON_NUM 0 #endif #define MAX_SWAPFILES \ ((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \ SWP_MIGRATION_NUM - SWP_HWPOISON_NUM) /* * Magic header for a swap area. The first part of the union is * what the swap magic looks like for the old (limited to 128MB) * swap area format, the second part of the union adds - in the * old reserved area - some extra information. Note that the first * kilobyte is reserved for boot loader or disk label stuff... * * Having the magic at the end of the PAGE_SIZE makes detecting swap * areas somewhat tricky on machines that support multiple page sizes. * For 2.5 we'll probably want to move the magic to just beyond the * bootbits... */ union swap_header { struct { char reserved[PAGE_SIZE - 10]; char magic[10]; /* SWAP-SPACE or SWAPSPACE2 */ } magic; struct { char bootbits[1024]; /* Space for disklabel etc. */ __u32 version; __u32 last_page; __u32 nr_badpages; unsigned char sws_uuid[16]; unsigned char sws_volume[16]; __u32 padding[117]; __u32 badpages[1]; } info; }; /* * current->reclaim_state points to one of these when a task is running * memory reclaim */ struct reclaim_state { unsigned long reclaimed_slab; }; #ifdef __KERNEL__ struct address_space; struct sysinfo; struct writeback_control; struct zone; /* * A swap extent maps a range of a swapfile's PAGE_SIZE pages onto a range of * disk blocks. A list of swap extents maps the entire swapfile. (Where the * term `swapfile' refers to either a blockdevice or an IS_REG file. Apart * from setup, they're handled identically. * * We always assume that blocks are of size PAGE_SIZE. */ struct swap_extent { struct rb_node rb_node; pgoff_t start_page; pgoff_t nr_pages; sector_t start_block; }; /* * Max bad pages in the new format.. */ #define MAX_SWAP_BADPAGES \ ((offsetof(union swap_header, magic.magic) - \ offsetof(union swap_header, info.badpages)) / sizeof(int)) enum { SWP_USED = (1 << 0), /* is slot in swap_info[] used? */ SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */ SWP_DISCARDABLE = (1 << 2), /* blkdev support discard */ SWP_DISCARDING = (1 << 3), /* now discarding a free cluster */ SWP_SOLIDSTATE = (1 << 4), /* blkdev seeks are cheap */ SWP_CONTINUED = (1 << 5), /* swap_map has count continuation */ SWP_BLKDEV = (1 << 6), /* its a block device */ SWP_ACTIVATED = (1 << 7), /* set after swap_activate success */ SWP_FS_OPS = (1 << 8), /* swapfile operations go through fs */ SWP_AREA_DISCARD = (1 << 9), /* single-time swap area discards */ SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */ SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ SWP_VALID = (1 << 13), /* swap is valid to be operated on? */ /* add others here before... */ SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ }; #define SWAP_CLUSTER_MAX 32UL #define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX /* Bit flag in swap_map */ #define SWAP_HAS_CACHE 0x40 /* Flag page is cached, in first swap_map */ #define COUNT_CONTINUED 0x80 /* Flag swap_map continuation for full count */ /* Special value in first swap_map */ #define SWAP_MAP_MAX 0x3e /* Max count */ #define SWAP_MAP_BAD 0x3f /* Note page is bad */ #define SWAP_MAP_SHMEM 0xbf /* Owned by shmem/tmpfs */ /* Special value in each swap_map continuation */ #define SWAP_CONT_MAX 0x7f /* Max count */ /* * We use this to track usage of a cluster. A cluster is a block of swap disk * space with SWAPFILE_CLUSTER pages long and naturally aligns in disk. All * free clusters are organized into a list. We fetch an entry from the list to * get a free cluster. * * The data field stores next cluster if the cluster is free or cluster usage * counter otherwise. The flags field determines if a cluster is free. This is * protected by swap_info_struct.lock. */ struct swap_cluster_info { spinlock_t lock; /* * Protect swap_cluster_info fields * and swap_info_struct->swap_map * elements correspond to the swap * cluster */ unsigned int data:24; unsigned int flags:8; }; #define CLUSTER_FLAG_FREE 1 /* This cluster is free */ #define CLUSTER_FLAG_NEXT_NULL 2 /* This cluster has no next cluster */ #define CLUSTER_FLAG_HUGE 4 /* This cluster is backing a transparent huge page */ /* * We assign a cluster to each CPU, so each CPU can allocate swap entry from * its own cluster and swapout sequentially. The purpose is to optimize swapout * throughput. */ struct percpu_cluster { struct swap_cluster_info index; /* Current cluster index */ unsigned int next; /* Likely next allocation offset */ }; struct swap_cluster_list { struct swap_cluster_info head; struct swap_cluster_info tail; }; /* * The in-memory structure used to track swap areas. */ struct swap_info_struct { unsigned long flags; /* SWP_USED etc: see above */ signed short prio; /* swap priority of this type */ struct plist_node list; /* entry in swap_active_head */ signed char type; /* strange name for an index */ unsigned int max; /* extent of the swap_map */ unsigned char *swap_map; /* vmalloc'ed array of usage counts */ struct swap_cluster_info *cluster_info; /* cluster info. Only for SSD */ struct swap_cluster_list free_clusters; /* free clusters list */ unsigned int lowest_bit; /* index of first free in swap_map */ unsigned int highest_bit; /* index of last free in swap_map */ unsigned int pages; /* total of usable pages of swap */ unsigned int inuse_pages; /* number of those currently in use */ unsigned int cluster_next; /* likely index for next allocation */ unsigned int cluster_nr; /* countdown to next cluster search */ unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */ struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */ struct rb_root swap_extent_root;/* root of the swap extent rbtree */ struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ unsigned int old_block_size; /* seldom referenced */ #ifdef CONFIG_FRONTSWAP unsigned long *frontswap_map; /* frontswap in-use, one bit per page */ atomic_t frontswap_pages; /* frontswap pages in-use counter */ #endif spinlock_t lock; /* * protect map scan related fields like * swap_map, lowest_bit, highest_bit, * inuse_pages, cluster_next, * cluster_nr, lowest_alloc, * highest_alloc, free/discard cluster * list. other fields are only changed * at swapon/swapoff, so are protected * by swap_lock. changing flags need * hold this lock and swap_lock. If * both locks need hold, hold swap_lock * first. */ spinlock_t cont_lock; /* * protect swap count continuation page * list. */ struct work_struct discard_work; /* discard worker */ struct swap_cluster_list discard_clusters; /* discard clusters list */ struct plist_node avail_lists[]; /* * entries in swap_avail_heads, one * entry per node. * Must be last as the number of the * array is nr_node_ids, which is not * a fixed value so have to allocate * dynamically. * And it has to be an array so that * plist_for_each_* can work. */ }; #ifdef CONFIG_64BIT #define SWAP_RA_ORDER_CEILING 5 #else /* Avoid stack overflow, because we need to save part of page table */ #define SWAP_RA_ORDER_CEILING 3 #define SWAP_RA_PTE_CACHE_SIZE (1 << SWAP_RA_ORDER_CEILING) #endif struct vma_swap_readahead { unsigned short win; unsigned short offset; unsigned short nr_pte; #ifdef CONFIG_64BIT pte_t *ptes; #else pte_t ptes[SWAP_RA_PTE_CACHE_SIZE]; #endif }; /* linux/mm/workingset.c */ void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages); void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg); void workingset_refault(struct page *page, void *shadow); void workingset_activation(struct page *page); /* Only track the nodes of mappings with shadow entries */ void workingset_update_node(struct xa_node *node); #define mapping_set_update(xas, mapping) do { \ if (!dax_mapping(mapping) && !shmem_mapping(mapping)) \ xas_set_update(xas, workingset_update_node); \ } while (0) /* linux/mm/page_alloc.c */ extern unsigned long totalreserve_pages; extern unsigned long nr_free_buffer_pages(void); /* Definition of global_zone_page_state not available yet */ #define nr_free_pages() global_zone_page_state(NR_FREE_PAGES) /* linux/mm/swap.c */ extern void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages); extern void lru_note_cost_page(struct page *); extern void lru_cache_add(struct page *); extern void lru_add_page_tail(struct page *page, struct page *page_tail, struct lruvec *lruvec, struct list_head *head); extern void mark_page_accessed(struct page *); extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_cpu_zone(struct zone *zone); extern void lru_add_drain_all(void); extern void rotate_reclaimable_page(struct page *page); extern void deactivate_file_page(struct page *page); extern void deactivate_page(struct page *page); extern void mark_page_lazyfree(struct page *page); extern void swap_setup(void); extern void lru_cache_add_inactive_or_unevictable(struct page *page, struct vm_area_struct *vma); /* linux/mm/vmscan.c */ extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); extern int __isolate_lru_page(struct page *page, isolate_mode_t mode); extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, bool may_swap); extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, gfp_t gfp_mask, bool noswap, pg_data_t *pgdat, unsigned long *nr_scanned); extern unsigned long shrink_all_memory(unsigned long nr_pages); extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; extern int sysctl_min_slab_ratio; #else #define node_reclaim_mode 0 #endif extern void check_move_unevictable_pages(struct pagevec *pvec); extern int kswapd_run(int nid); extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP #include <linux/blk_types.h> /* for bio_end_io_t */ /* linux/mm/page_io.c */ extern int swap_readpage(struct page *page, bool do_poll); extern int swap_writepage(struct page *page, struct writeback_control *wbc); extern void end_swap_bio_write(struct bio *bio); extern int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func); extern int swap_set_page_dirty(struct page *page); int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, sector_t *); /* linux/mm/swap_state.c */ /* One swap address space for each 64M swap space */ #define SWAP_ADDRESS_SPACE_SHIFT 14 #define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) extern struct address_space *swapper_spaces[]; #define swap_address_space(entry) \ (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ >> SWAP_ADDRESS_SPACE_SHIFT]) extern unsigned long total_swapcache_pages(void); extern void show_swap_cache_info(void); extern int add_to_swap(struct page *page); extern void *get_shadow_from_swap_cache(swp_entry_t entry); extern int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp, void **shadowp); extern void __delete_from_swap_cache(struct page *page, swp_entry_t entry, void *shadow); extern void delete_from_swap_cache(struct page *); extern void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end); extern void free_page_and_swap_cache(struct page *); extern void free_pages_and_swap_cache(struct page **, int); extern struct page *lookup_swap_cache(swp_entry_t entry, struct vm_area_struct *vma, unsigned long addr); struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); extern struct page *read_swap_cache_async(swp_entry_t, gfp_t, struct vm_area_struct *vma, unsigned long addr, bool do_poll); extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t, struct vm_area_struct *vma, unsigned long addr, bool *new_page_allocated); extern struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf); /* linux/mm/swapfile.c */ extern atomic_long_t nr_swap_pages; extern long total_swap_pages; extern atomic_t nr_rotate_swap; extern bool has_usable_swap(void); /* Swap 50% full? Release swapcache more aggressively.. */ static inline bool vm_swap_full(void) { return atomic_long_read(&nr_swap_pages) * 2 < total_swap_pages; } static inline long get_nr_swap_pages(void) { return atomic_long_read(&nr_swap_pages); } extern void si_swapinfo(struct sysinfo *); extern swp_entry_t get_swap_page(struct page *page); extern void put_swap_page(struct page *page, swp_entry_t entry); extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t); extern void swap_free(swp_entry_t); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); int swap_type_of(dev_t device, sector_t offset); int find_first_swap(dev_t *device); extern unsigned int count_swap_pages(int, int); extern sector_t map_swap_page(struct page *, struct block_device **); extern sector_t swapdev_block(int, pgoff_t); extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); extern int __swp_swapcount(swp_entry_t entry); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); extern bool reuse_swap_page(struct page *, int *); extern int try_to_free_swap(struct page *); struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); extern void exit_swap_address_space(unsigned int type); extern struct swap_info_struct *get_swap_device(swp_entry_t entry); sector_t swap_page_sector(struct page *page); static inline void put_swap_device(struct swap_info_struct *si) { rcu_read_unlock(); } #else /* CONFIG_SWAP */ static inline int swap_readpage(struct page *page, bool do_poll) { return 0; } static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) { return NULL; } #define swap_address_space(entry) (NULL) #define get_nr_swap_pages() 0L #define total_swap_pages 0L #define total_swapcache_pages() 0UL #define vm_swap_full() 0 #define si_swapinfo(val) \ do { (val)->freeswap = (val)->totalswap = 0; } while (0) /* only sparc can not include linux/pagemap.h in this file * so leave put_page and release_pages undeclared... */ #define free_page_and_swap_cache(page) \ put_page(page) #define free_pages_and_swap_cache(pages, nr) \ release_pages((pages), (nr)); static inline void show_swap_cache_info(void) { } #define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) #define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e));}) static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) { return 0; } static inline void swap_shmem_alloc(swp_entry_t swp) { } static inline int swap_duplicate(swp_entry_t swp) { return 0; } static inline void swap_free(swp_entry_t swp) { } static inline void put_swap_page(struct page *page, swp_entry_t swp) { } static inline struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask, struct vm_fault *vmf) { return NULL; } static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, struct vm_fault *vmf) { return NULL; } static inline int swap_writepage(struct page *p, struct writeback_control *wbc) { return 0; } static inline struct page *lookup_swap_cache(swp_entry_t swp, struct vm_area_struct *vma, unsigned long addr) { return NULL; } static inline struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) { return find_get_page(mapping, index); } static inline int add_to_swap(struct page *page) { return 0; } static inline void *get_shadow_from_swap_cache(swp_entry_t entry) { return NULL; } static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp_mask, void **shadowp) { return -1; } static inline void __delete_from_swap_cache(struct page *page, swp_entry_t entry, void *shadow) { } static inline void delete_from_swap_cache(struct page *page) { } static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end) { } static inline int page_swapcount(struct page *page) { return 0; } static inline int __swap_count(swp_entry_t entry) { return 0; } static inline int __swp_swapcount(swp_entry_t entry) { return 0; } static inline int swp_swapcount(swp_entry_t entry) { return 0; } #define reuse_swap_page(page, total_map_swapcount) \ (page_trans_huge_mapcount(page, total_map_swapcount) == 1) static inline int try_to_free_swap(struct page *page) { return 0; } static inline swp_entry_t get_swap_page(struct page *page) { swp_entry_t entry; entry.val = 0; return entry; } #endif /* CONFIG_SWAP */ #ifdef CONFIG_THP_SWAP extern int split_swap_cluster(swp_entry_t entry); #else static inline int split_swap_cluster(swp_entry_t entry) { return 0; } #endif #ifdef CONFIG_MEMCG static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg) { /* Cgroup2 doesn't have per-cgroup swappiness */ if (cgroup_subsys_on_dfl(memory_cgrp_subsys)) return vm_swappiness; /* root ? */ if (mem_cgroup_disabled() || mem_cgroup_is_root(memcg)) return vm_swappiness; return memcg->swappiness; } #else static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) { return vm_swappiness; } #endif #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) extern void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask); #else static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask) { } #endif #ifdef CONFIG_MEMCG_SWAP extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry); extern int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry); extern void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages); extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg); extern bool mem_cgroup_swap_full(struct page *page); #else static inline void mem_cgroup_swapout(struct page *page, swp_entry_t entry) { } static inline int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry) { return 0; } static inline void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages) { } static inline long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) { return get_nr_swap_pages(); } static inline bool mem_cgroup_swap_full(struct page *page) { return vm_swap_full(); } #endif #endif /* __KERNEL__*/ #endif /* _LINUX_SWAP_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef LINUX_CRASH_DUMP_H #define LINUX_CRASH_DUMP_H #include <linux/kexec.h> #include <linux/proc_fs.h> #include <linux/elf.h> #include <linux/pgtable.h> #include <uapi/linux/vmcore.h> #include <linux/pgtable.h> /* for pgprot_t */ #ifdef CONFIG_CRASH_DUMP #define ELFCORE_ADDR_MAX (-1ULL) #define ELFCORE_ADDR_ERR (-2ULL) extern unsigned long long elfcorehdr_addr; extern unsigned long long elfcorehdr_size; extern int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size); extern void elfcorehdr_free(unsigned long long addr); extern ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos); extern ssize_t elfcorehdr_read_notes(char *buf, size_t count, u64 *ppos); extern int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from, unsigned long pfn, unsigned long size, pgprot_t prot); extern ssize_t copy_oldmem_page(unsigned long, char *, size_t, unsigned long, int); extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, size_t csize, unsigned long offset, int userbuf); void vmcore_cleanup(void); /* Architecture code defines this if there are other possible ELF * machine types, e.g. on bi-arch capable hardware. */ #ifndef vmcore_elf_check_arch_cross #define vmcore_elf_check_arch_cross(x) 0 #endif /* * Architecture code can redefine this if there are any special checks * needed for 32-bit ELF or 64-bit ELF vmcores. In case of 32-bit * only architecture, vmcore_elf64_check_arch can be set to zero. */ #ifndef vmcore_elf32_check_arch #define vmcore_elf32_check_arch(x) elf_check_arch(x) #endif #ifndef vmcore_elf64_check_arch #define vmcore_elf64_check_arch(x) (elf_check_arch(x) || vmcore_elf_check_arch_cross(x)) #endif /* * is_kdump_kernel() checks whether this kernel is booting after a panic of * previous kernel or not. This is determined by checking if previous kernel * has passed the elf core header address on command line. * * This is not just a test if CONFIG_CRASH_DUMP is enabled or not. It will * return true if CONFIG_CRASH_DUMP=y and if kernel is booting after a panic * of previous kernel. */ static inline bool is_kdump_kernel(void) { return elfcorehdr_addr != ELFCORE_ADDR_MAX; } /* is_vmcore_usable() checks if the kernel is booting after a panic and * the vmcore region is usable. * * This makes use of the fact that due to alignment -2ULL is not * a valid pointer, much in the vain of IS_ERR(), except * dealing directly with an unsigned long long rather than a pointer. */ static inline int is_vmcore_usable(void) { return is_kdump_kernel() && elfcorehdr_addr != ELFCORE_ADDR_ERR ? 1 : 0; } /* vmcore_unusable() marks the vmcore as unusable, * without disturbing the logic of is_kdump_kernel() */ static inline void vmcore_unusable(void) { if (is_kdump_kernel()) elfcorehdr_addr = ELFCORE_ADDR_ERR; } #define HAVE_OLDMEM_PFN_IS_RAM 1 extern int register_oldmem_pfn_is_ram(int (*fn)(unsigned long pfn)); extern void unregister_oldmem_pfn_is_ram(void); #else /* !CONFIG_CRASH_DUMP */ static inline bool is_kdump_kernel(void) { return 0; } #endif /* CONFIG_CRASH_DUMP */ /* Device Dump information to be filled by drivers */ struct vmcoredd_data { char dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Unique name of the dump */ unsigned int size; /* Size of the dump */ /* Driver's registered callback to be invoked to collect dump */ int (*vmcoredd_callback)(struct vmcoredd_data *data, void *buf); }; #ifdef CONFIG_PROC_VMCORE_DEVICE_DUMP int vmcore_add_device_dump(struct vmcoredd_data *data); #else static inline int vmcore_add_device_dump(struct vmcoredd_data *data) { return -EOPNOTSUPP; } #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */ #ifdef CONFIG_PROC_VMCORE ssize_t read_from_oldmem(char *buf, size_t count, u64 *ppos, int userbuf, bool encrypted); #else static inline ssize_t read_from_oldmem(char *buf, size_t count, u64 *ppos, int userbuf, bool encrypted) { return -EOPNOTSUPP; } #endif /* CONFIG_PROC_VMCORE */ #endif /* LINUX_CRASHDUMP_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 /* * DRBG based on NIST SP800-90A * * Copyright Stephan Mueller <smueller@chronox.de>, 2014 * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, and the entire permission notice in its entirety, * including the disclaimer of warranties. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * ALTERNATIVELY, this product may be distributed under the terms of * the GNU General Public License, in which case the provisions of the GPL are * required INSTEAD OF the above restrictions. (This clause is * necessary due to a potential bad interaction between the GPL and * the restrictions contained in a BSD-style copyright.) * * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ALL OF * WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE * USE OF THIS SOFTWARE, EVEN IF NOT ADVISED OF THE POSSIBILITY OF SUCH * DAMAGE. */ #ifndef _DRBG_H #define _DRBG_H #include <linux/random.h> #include <linux/scatterlist.h> #include <crypto/hash.h> #include <crypto/skcipher.h> #include <linux/module.h> #include <linux/crypto.h> #include <linux/slab.h> #include <crypto/internal/rng.h> #include <crypto/rng.h> #include <linux/fips.h> #include <linux/mutex.h> #include <linux/list.h> #include <linux/workqueue.h> /* * Concatenation Helper and string operation helper * * SP800-90A requires the concatenation of different data. To avoid copying * buffers around or allocate additional memory, the following data structure * is used to point to the original memory with its size. In addition, it * is used to build a linked list. The linked list defines the concatenation * of individual buffers. The order of memory block referenced in that * linked list determines the order of concatenation. */ struct drbg_string { const unsigned char *buf; size_t len; struct list_head list; }; static inline void drbg_string_fill(struct drbg_string *string, const unsigned char *buf, size_t len) { string->buf = buf; string->len = len; INIT_LIST_HEAD(&string->list); } struct drbg_state; typedef uint32_t drbg_flag_t; struct drbg_core { drbg_flag_t flags; /* flags for the cipher */ __u8 statelen; /* maximum state length */ __u8 blocklen_bytes; /* block size of output in bytes */ char cra_name[CRYPTO_MAX_ALG_NAME]; /* mapping to kernel crypto API */ /* kernel crypto API backend cipher name */ char backend_cra_name[CRYPTO_MAX_ALG_NAME]; }; struct drbg_state_ops { int (*update)(struct drbg_state *drbg, struct list_head *seed, int reseed); int (*generate)(struct drbg_state *drbg, unsigned char *buf, unsigned int buflen, struct list_head *addtl); int (*crypto_init)(struct drbg_state *drbg); int (*crypto_fini)(struct drbg_state *drbg); }; struct drbg_test_data { struct drbg_string *testentropy; /* TEST PARAMETER: test entropy */ }; struct drbg_state { struct mutex drbg_mutex; /* lock around DRBG */ unsigned char *V; /* internal state 10.1.1.1 1a) */ unsigned char *Vbuf; /* hash: static value 10.1.1.1 1b) hmac / ctr: key */ unsigned char *C; unsigned char *Cbuf; /* Number of RNG requests since last reseed -- 10.1.1.1 1c) */ size_t reseed_ctr; size_t reseed_threshold; /* some memory the DRBG can use for its operation */ unsigned char *scratchpad; unsigned char *scratchpadbuf; void *priv_data; /* Cipher handle */ struct crypto_skcipher *ctr_handle; /* CTR mode cipher handle */ struct skcipher_request *ctr_req; /* CTR mode request handle */ __u8 *outscratchpadbuf; /* CTR mode output scratchpad */ __u8 *outscratchpad; /* CTR mode aligned outbuf */ struct crypto_wait ctr_wait; /* CTR mode async wait obj */ struct scatterlist sg_in, sg_out; /* CTR mode SGLs */ bool seeded; /* DRBG fully seeded? */ bool pr; /* Prediction resistance enabled? */ bool fips_primed; /* Continuous test primed? */ unsigned char *prev; /* FIPS 140-2 continuous test value */ struct work_struct seed_work; /* asynchronous seeding support */ struct crypto_rng *jent; const struct drbg_state_ops *d_ops; const struct drbg_core *core; struct drbg_string test_data; struct random_ready_callback random_ready; }; static inline __u8 drbg_statelen(struct drbg_state *drbg) { if (drbg && drbg->core) return drbg->core->statelen; return 0; } static inline __u8 drbg_blocklen(struct drbg_state *drbg) { if (drbg && drbg->core) return drbg->core->blocklen_bytes; return 0; } static inline __u8 drbg_keylen(struct drbg_state *drbg) { if (drbg && drbg->core) return (drbg->core->statelen - drbg->core->blocklen_bytes); return 0; } static inline size_t drbg_max_request_bytes(struct drbg_state *drbg) { /* SP800-90A requires the limit 2**19 bits, but we return bytes */ return (1 << 16); } static inline size_t drbg_max_addtl(struct drbg_state *drbg) { /* SP800-90A requires 2**35 bytes additional info str / pers str */ #if (__BITS_PER_LONG == 32) /* * SP800-90A allows smaller maximum numbers to be returned -- we * return SIZE_MAX - 1 to allow the verification of the enforcement * of this value in drbg_healthcheck_sanity. */ return (SIZE_MAX - 1); #else return (1UL<<35); #endif } static inline size_t drbg_max_requests(struct drbg_state *drbg) { /* SP800-90A requires 2**48 maximum requests before reseeding */ return (1<<20); } /* * This is a wrapper to the kernel crypto API function of * crypto_rng_generate() to allow the caller to provide additional data. * * @drng DRBG handle -- see crypto_rng_get_bytes * @outbuf output buffer -- see crypto_rng_get_bytes * @outlen length of output buffer -- see crypto_rng_get_bytes * @addtl_input additional information string input buffer * @addtllen length of additional information string buffer * * return * see crypto_rng_get_bytes */ static inline int crypto_drbg_get_bytes_addtl(struct crypto_rng *drng, unsigned char *outbuf, unsigned int outlen, struct drbg_string *addtl) { return crypto_rng_generate(drng, addtl->buf, addtl->len, outbuf, outlen); } /* * TEST code * * This is a wrapper to the kernel crypto API function of * crypto_rng_generate() to allow the caller to provide additional data and * allow furnishing of test_data * * @drng DRBG handle -- see crypto_rng_get_bytes * @outbuf output buffer -- see crypto_rng_get_bytes * @outlen length of output buffer -- see crypto_rng_get_bytes * @addtl_input additional information string input buffer * @addtllen length of additional information string buffer * @test_data filled test data * * return * see crypto_rng_get_bytes */ static inline int crypto_drbg_get_bytes_addtl_test(struct crypto_rng *drng, unsigned char *outbuf, unsigned int outlen, struct drbg_string *addtl, struct drbg_test_data *test_data) { crypto_rng_set_entropy(drng, test_data->testentropy->buf, test_data->testentropy->len); return crypto_rng_generate(drng, addtl->buf, addtl->len, outbuf, outlen); } /* * TEST code * * This is a wrapper to the kernel crypto API function of * crypto_rng_reset() to allow the caller to provide test_data * * @drng DRBG handle -- see crypto_rng_reset * @pers personalization string input buffer * @perslen length of additional information string buffer * @test_data filled test data * * return * see crypto_rng_reset */ static inline int crypto_drbg_reset_test(struct crypto_rng *drng, struct drbg_string *pers, struct drbg_test_data *test_data) { crypto_rng_set_entropy(drng, test_data->testentropy->buf, test_data->testentropy->len); return crypto_rng_reset(drng, pers->buf, pers->len); } /* DRBG type flags */ #define DRBG_CTR ((drbg_flag_t)1<<0) #define DRBG_HMAC ((drbg_flag_t)1<<1) #define DRBG_HASH ((drbg_flag_t)1<<2) #define DRBG_TYPE_MASK (DRBG_CTR | DRBG_HMAC | DRBG_HASH) /* DRBG strength flags */ #define DRBG_STRENGTH128 ((drbg_flag_t)1<<3) #define DRBG_STRENGTH192 ((drbg_flag_t)1<<4) #define DRBG_STRENGTH256 ((drbg_flag_t)1<<5) #define DRBG_STRENGTH_MASK (DRBG_STRENGTH128 | DRBG_STRENGTH192 | \ DRBG_STRENGTH256) enum drbg_prefixes { DRBG_PREFIX0 = 0x00, DRBG_PREFIX1, DRBG_PREFIX2, DRBG_PREFIX3 }; #endif /* _DRBG_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __CGROUP_INTERNAL_H #define __CGROUP_INTERNAL_H #include <linux/cgroup.h> #include <linux/kernfs.h> #include <linux/workqueue.h> #include <linux/list.h> #include <linux/refcount.h> #include <linux/fs_parser.h> #define TRACE_CGROUP_PATH_LEN 1024 extern spinlock_t trace_cgroup_path_lock; extern char trace_cgroup_path[TRACE_CGROUP_PATH_LEN]; extern bool cgroup_debug; extern void __init enable_debug_cgroup(void); /* * cgroup_path() takes a spin lock. It is good practice not to take * spin locks within trace point handlers, as they are mostly hidden * from normal view. As cgroup_path() can take the kernfs_rename_lock * spin lock, it is best to not call that function from the trace event * handler. * * Note: trace_cgroup_##type##_enabled() is a static branch that will only * be set when the trace event is enabled. */ #define TRACE_CGROUP_PATH(type, cgrp, ...) \ do { \ if (trace_cgroup_##type##_enabled()) { \ unsigned long flags; \ spin_lock_irqsave(&trace_cgroup_path_lock, \ flags); \ cgroup_path(cgrp, trace_cgroup_path, \ TRACE_CGROUP_PATH_LEN); \ trace_cgroup_##type(cgrp, trace_cgroup_path, \ ##__VA_ARGS__); \ spin_unlock_irqrestore(&trace_cgroup_path_lock, \ flags); \ } \ } while (0) /* * The cgroup filesystem superblock creation/mount context. */ struct cgroup_fs_context { struct kernfs_fs_context kfc; struct cgroup_root *root; struct cgroup_namespace *ns; unsigned int flags; /* CGRP_ROOT_* flags */ /* cgroup1 bits */ bool cpuset_clone_children; bool none; /* User explicitly requested empty subsystem */ bool all_ss; /* Seen 'all' option */ u16 subsys_mask; /* Selected subsystems */ char *name; /* Hierarchy name */ char *release_agent; /* Path for release notifications */ }; static inline struct cgroup_fs_context *cgroup_fc2context(struct fs_context *fc) { struct kernfs_fs_context *kfc = fc->fs_private; return container_of(kfc, struct cgroup_fs_context, kfc); } /* * A cgroup can be associated with multiple css_sets as different tasks may * belong to different cgroups on different hierarchies. In the other * direction, a css_set is naturally associated with multiple cgroups. * This M:N relationship is represented by the following link structure * which exists for each association and allows traversing the associations * from both sides. */ struct cgrp_cset_link { /* the cgroup and css_set this link associates */ struct cgroup *cgrp; struct css_set *cset; /* list of cgrp_cset_links anchored at cgrp->cset_links */ struct list_head cset_link; /* list of cgrp_cset_links anchored at css_set->cgrp_links */ struct list_head cgrp_link; }; /* used to track tasks and csets during migration */ struct cgroup_taskset { /* the src and dst cset list running through cset->mg_node */ struct list_head src_csets; struct list_head dst_csets; /* the number of tasks in the set */ int nr_tasks; /* the subsys currently being processed */ int ssid; /* * Fields for cgroup_taskset_*() iteration. * * Before migration is committed, the target migration tasks are on * ->mg_tasks of the csets on ->src_csets. After, on ->mg_tasks of * the csets on ->dst_csets. ->csets point to either ->src_csets * or ->dst_csets depending on whether migration is committed. * * ->cur_csets and ->cur_task point to the current task position * during iteration. */ struct list_head *csets; struct css_set *cur_cset; struct task_struct *cur_task; }; /* migration context also tracks preloading */ struct cgroup_mgctx { /* * Preloaded source and destination csets. Used to guarantee * atomic success or failure on actual migration. */ struct list_head preloaded_src_csets; struct list_head preloaded_dst_csets; /* tasks and csets to migrate */ struct cgroup_taskset tset; /* subsystems affected by migration */ u16 ss_mask; }; #define CGROUP_TASKSET_INIT(tset) \ { \ .src_csets = LIST_HEAD_INIT(tset.src_csets), \ .dst_csets = LIST_HEAD_INIT(tset.dst_csets), \ .csets = &tset.src_csets, \ } #define CGROUP_MGCTX_INIT(name) \ { \ LIST_HEAD_INIT(name.preloaded_src_csets), \ LIST_HEAD_INIT(name.preloaded_dst_csets), \ CGROUP_TASKSET_INIT(name.tset), \ } #define DEFINE_CGROUP_MGCTX(name) \ struct cgroup_mgctx name = CGROUP_MGCTX_INIT(name) extern struct mutex cgroup_mutex; extern spinlock_t css_set_lock; extern struct cgroup_subsys *cgroup_subsys[]; extern struct list_head cgroup_roots; extern struct file_system_type cgroup_fs_type; /* iterate across the hierarchies */ #define for_each_root(root) \ list_for_each_entry((root), &cgroup_roots, root_list) /** * for_each_subsys - iterate all enabled cgroup subsystems * @ss: the iteration cursor * @ssid: the index of @ss, CGROUP_SUBSYS_COUNT after reaching the end */ #define for_each_subsys(ss, ssid) \ for ((ssid) = 0; (ssid) < CGROUP_SUBSYS_COUNT && \ (((ss) = cgroup_subsys[ssid]) || true); (ssid)++) static inline bool cgroup_is_dead(const struct cgroup *cgrp) { return !(cgrp->self.flags & CSS_ONLINE); } static inline bool notify_on_release(const struct cgroup *cgrp) { return test_bit(CGRP_NOTIFY_ON_RELEASE, &cgrp->flags); } void put_css_set_locked(struct css_set *cset); static inline void put_css_set(struct css_set *cset) { unsigned long flags; /* * Ensure that the refcount doesn't hit zero while any readers * can see it. Similar to atomic_dec_and_lock(), but for an * rwlock */ if (refcount_dec_not_one(&cset->refcount)) return; spin_lock_irqsave(&css_set_lock, flags); put_css_set_locked(cset); spin_unlock_irqrestore(&css_set_lock, flags); } /* * refcounted get/put for css_set objects */ static inline void get_css_set(struct css_set *cset) { refcount_inc(&cset->refcount); } bool cgroup_ssid_enabled(int ssid); bool cgroup_on_dfl(const struct cgroup *cgrp); bool cgroup_is_thread_root(struct cgroup *cgrp); bool cgroup_is_threaded(struct cgroup *cgrp); struct cgroup_root *cgroup_root_from_kf(struct kernfs_root *kf_root); struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root); struct cgroup *cgroup_kn_lock_live(struct kernfs_node *kn, bool drain_offline); void cgroup_kn_unlock(struct kernfs_node *kn); int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen, struct cgroup_namespace *ns); void cgroup_free_root(struct cgroup_root *root); void init_cgroup_root(struct cgroup_fs_context *ctx); int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask); int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask); int cgroup_do_get_tree(struct fs_context *fc); int cgroup_migrate_vet_dst(struct cgroup *dst_cgrp); void cgroup_migrate_finish(struct cgroup_mgctx *mgctx); void cgroup_migrate_add_src(struct css_set *src_cset, struct cgroup *dst_cgrp, struct cgroup_mgctx *mgctx); int cgroup_migrate_prepare_dst(struct cgroup_mgctx *mgctx); int cgroup_migrate(struct task_struct *leader, bool threadgroup, struct cgroup_mgctx *mgctx); int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader, bool threadgroup); struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, bool *locked) __acquires(&cgroup_threadgroup_rwsem); void cgroup_procs_write_finish(struct task_struct *task, bool locked) __releases(&cgroup_threadgroup_rwsem); void cgroup_lock_and_drain_offline(struct cgroup *cgrp); int cgroup_mkdir(struct kernfs_node *parent_kn, const char *name, umode_t mode); int cgroup_rmdir(struct kernfs_node *kn); int cgroup_show_path(struct seq_file *sf, struct kernfs_node *kf_node, struct kernfs_root *kf_root); int __cgroup_task_count(const struct cgroup *cgrp); int cgroup_task_count(const struct cgroup *cgrp); /* * rstat.c */ int cgroup_rstat_init(struct cgroup *cgrp); void cgroup_rstat_exit(struct cgroup *cgrp); void cgroup_rstat_boot(void); void cgroup_base_stat_cputime_show(struct seq_file *seq); /* * namespace.c */ extern const struct proc_ns_operations cgroupns_operations; /* * cgroup-v1.c */ extern struct cftype cgroup1_base_files[]; extern struct kernfs_syscall_ops cgroup1_kf_syscall_ops; extern const struct fs_parameter_spec cgroup1_fs_parameters[]; int proc_cgroupstats_show(struct seq_file *m, void *v); bool cgroup1_ssid_disabled(int ssid); void cgroup1_pidlist_destroy_all(struct cgroup *cgrp); void cgroup1_release_agent(struct work_struct *work); void cgroup1_check_for_release(struct cgroup *cgrp); int cgroup1_parse_param(struct fs_context *fc, struct fs_parameter *param); int cgroup1_get_tree(struct fs_context *fc); int cgroup1_reconfigure(struct fs_context *ctx); #endif /* __CGROUP_INTERNAL_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 /* SPDX-License-Identifier: GPL-2.0-or-later */ /* * Queued spinlock * * (C) Copyright 2013-2015 Hewlett-Packard Development Company, L.P. * (C) Copyright 2015 Hewlett-Packard Enterprise Development LP * * Authors: Waiman Long <waiman.long@hpe.com> */ #ifndef __ASM_GENERIC_QSPINLOCK_H #define __ASM_GENERIC_QSPINLOCK_H #include <asm-generic/qspinlock_types.h> #include <linux/atomic.h> #ifndef queued_spin_is_locked /** * queued_spin_is_locked - is the spinlock locked? * @lock: Pointer to queued spinlock structure * Return: 1 if it is locked, 0 otherwise */ static __always_inline int queued_spin_is_locked(struct qspinlock *lock) { /* * Any !0 state indicates it is locked, even if _Q_LOCKED_VAL * isn't immediately observable. */ return atomic_read(&lock->val); } #endif /** * queued_spin_value_unlocked - is the spinlock structure unlocked? * @lock: queued spinlock structure * Return: 1 if it is unlocked, 0 otherwise * * N.B. Whenever there are tasks waiting for the lock, it is considered * locked wrt the lockref code to avoid lock stealing by the lockref * code and change things underneath the lock. This also allows some * optimizations to be applied without conflict with lockref. */ static __always_inline int queued_spin_value_unlocked(struct qspinlock lock) { return !atomic_read(&lock.val); } /** * queued_spin_is_contended - check if the lock is contended * @lock : Pointer to queued spinlock structure * Return: 1 if lock contended, 0 otherwise */ static __always_inline int queued_spin_is_contended(struct qspinlock *lock) { return atomic_read(&lock->val) & ~_Q_LOCKED_MASK; } /** * queued_spin_trylock - try to acquire the queued spinlock * @lock : Pointer to queued spinlock structure * Return: 1 if lock acquired, 0 if failed */ static __always_inline int queued_spin_trylock(struct qspinlock *lock) { u32 val = atomic_read(&lock->val); if (unlikely(val)) return 0; return likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL)); } extern void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val); #ifndef queued_spin_lock /** * queued_spin_lock - acquire a queued spinlock * @lock: Pointer to queued spinlock structure */ static __always_inline void queued_spin_lock(struct qspinlock *lock) { u32 val = 0; if (likely(atomic_try_cmpxchg_acquire(&lock->val, &val, _Q_LOCKED_VAL))) return; queued_spin_lock_slowpath(lock, val); } #endif #ifndef queued_spin_unlock /** * queued_spin_unlock - release a queued spinlock * @lock : Pointer to queued spinlock structure */ static __always_inline void queued_spin_unlock(struct qspinlock *lock) { /* * unlock() needs release semantics: */ smp_store_release(&lock->locked, 0); } #endif #ifndef virt_spin_lock static __always_inline bool virt_spin_lock(struct qspinlock *lock) { return false; } #endif /* * Remapping spinlock architecture specific functions to the corresponding * queued spinlock functions. */ #define arch_spin_is_locked(l) queued_spin_is_locked(l) #define arch_spin_is_contended(l) queued_spin_is_contended(l) #define arch_spin_value_unlocked(l) queued_spin_value_unlocked(l) #define arch_spin_lock(l) queued_spin_lock(l) #define arch_spin_trylock(l) queued_spin_trylock(l) #define arch_spin_unlock(l) queued_spin_unlock(l) #endif /* __ASM_GENERIC_QSPINLOCK_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_SWAPOPS_H #define _LINUX_SWAPOPS_H #include <linux/radix-tree.h> #include <linux/bug.h> #include <linux/mm_types.h> #ifdef CONFIG_MMU /* * swapcache pages are stored in the swapper_space radix tree. We want to * get good packing density in that tree, so the index should be dense in * the low-order bits. * * We arrange the `type' and `offset' fields so that `type' is at the seven * high-order bits of the swp_entry_t and `offset' is right-aligned in the * remaining bits. Although `type' itself needs only five bits, we allow for * shmem/tmpfs to shift it all up a further two bits: see swp_to_radix_entry(). * * swp_entry_t's are *never* stored anywhere in their arch-dependent format. */ #define SWP_TYPE_SHIFT (BITS_PER_XA_VALUE - MAX_SWAPFILES_SHIFT) #define SWP_OFFSET_MASK ((1UL << SWP_TYPE_SHIFT) - 1) /* Clear all flags but only keep swp_entry_t related information */ static inline pte_t pte_swp_clear_flags(pte_t pte) { if (pte_swp_soft_dirty(pte)) pte = pte_swp_clear_soft_dirty(pte); if (pte_swp_uffd_wp(pte)) pte = pte_swp_clear_uffd_wp(pte); return pte; } /* * Store a type+offset into a swp_entry_t in an arch-independent format */ static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset) { swp_entry_t ret; ret.val = (type << SWP_TYPE_SHIFT) | (offset & SWP_OFFSET_MASK); return ret; } /* * Extract the `type' field from a swp_entry_t. The swp_entry_t is in * arch-independent format */ static inline unsigned swp_type(swp_entry_t entry) { return (entry.val >> SWP_TYPE_SHIFT); } /* * Extract the `offset' field from a swp_entry_t. The swp_entry_t is in * arch-independent format */ static inline pgoff_t swp_offset(swp_entry_t entry) { return entry.val & SWP_OFFSET_MASK; } /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { return !pte_none(pte) && !pte_present(pte); } /* * Convert the arch-dependent pte representation of a swp_entry_t into an * arch-independent swp_entry_t. */ static inline swp_entry_t pte_to_swp_entry(pte_t pte) { swp_entry_t arch_entry; pte = pte_swp_clear_flags(pte); arch_entry = __pte_to_swp_entry(pte); return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); } /* * Convert the arch-independent representation of a swp_entry_t into the * arch-dependent pte representation. */ static inline pte_t swp_entry_to_pte(swp_entry_t entry) { swp_entry_t arch_entry; arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pte(arch_entry); } static inline swp_entry_t radix_to_swp_entry(void *arg) { swp_entry_t entry; entry.val = xa_to_value(arg); return entry; } static inline void *swp_to_radix_entry(swp_entry_t entry) { return xa_mk_value(entry.val); } #if IS_ENABLED(CONFIG_DEVICE_PRIVATE) static inline swp_entry_t make_device_private_entry(struct page *page, bool write) { return swp_entry(write ? SWP_DEVICE_WRITE : SWP_DEVICE_READ, page_to_pfn(page)); } static inline bool is_device_private_entry(swp_entry_t entry) { int type = swp_type(entry); return type == SWP_DEVICE_READ || type == SWP_DEVICE_WRITE; } static inline void make_device_private_entry_read(swp_entry_t *entry) { *entry = swp_entry(SWP_DEVICE_READ, swp_offset(*entry)); } static inline bool is_write_device_private_entry(swp_entry_t entry) { return unlikely(swp_type(entry) == SWP_DEVICE_WRITE); } static inline unsigned long device_private_entry_to_pfn(swp_entry_t entry) { return swp_offset(entry); } static inline struct page *device_private_entry_to_page(swp_entry_t entry) { return pfn_to_page(swp_offset(entry)); } #else /* CONFIG_DEVICE_PRIVATE */ static inline swp_entry_t make_device_private_entry(struct page *page, bool write) { return swp_entry(0, 0); } static inline void make_device_private_entry_read(swp_entry_t *entry) { } static inline bool is_device_private_entry(swp_entry_t entry) { return false; } static inline bool is_write_device_private_entry(swp_entry_t entry) { return false; } static inline unsigned long device_private_entry_to_pfn(swp_entry_t entry) { return 0; } static inline struct page *device_private_entry_to_page(swp_entry_t entry) { return NULL; } #endif /* CONFIG_DEVICE_PRIVATE */ #ifdef CONFIG_MIGRATION static inline swp_entry_t make_migration_entry(struct page *page, int write) { BUG_ON(!PageLocked(compound_head(page))); return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ, page_to_pfn(page)); } static inline int is_migration_entry(swp_entry_t entry) { return unlikely(swp_type(entry) == SWP_MIGRATION_READ || swp_type(entry) == SWP_MIGRATION_WRITE); } static inline int is_write_migration_entry(swp_entry_t entry) { return unlikely(swp_type(entry) == SWP_MIGRATION_WRITE); } static inline unsigned long migration_entry_to_pfn(swp_entry_t entry) { return swp_offset(entry); } static inline struct page *migration_entry_to_page(swp_entry_t entry) { struct page *p = pfn_to_page(swp_offset(entry)); /* * Any use of migration entries may only occur while the * corresponding page is locked */ BUG_ON(!PageLocked(compound_head(p))); return p; } static inline void make_migration_entry_read(swp_entry_t *entry) { *entry = swp_entry(SWP_MIGRATION_READ, swp_offset(*entry)); } extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); extern void migration_entry_wait_huge(struct vm_area_struct *vma, struct mm_struct *mm, pte_t *pte); #else #define make_migration_entry(page, write) swp_entry(0, 0) static inline int is_migration_entry(swp_entry_t swp) { return 0; } static inline unsigned long migration_entry_to_pfn(swp_entry_t entry) { return 0; } static inline struct page *migration_entry_to_page(swp_entry_t entry) { return NULL; } static inline void make_migration_entry_read(swp_entry_t *entryp) { } static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, spinlock_t *ptl) { } static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } static inline void migration_entry_wait_huge(struct vm_area_struct *vma, struct mm_struct *mm, pte_t *pte) { } static inline int is_write_migration_entry(swp_entry_t entry) { return 0; } #endif struct page_vma_mapped_walk; #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, struct page *page); extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new); extern void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd); static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) { swp_entry_t arch_entry; if (pmd_swp_soft_dirty(pmd)) pmd = pmd_swp_clear_soft_dirty(pmd); if (pmd_swp_uffd_wp(pmd)) pmd = pmd_swp_clear_uffd_wp(pmd); arch_entry = __pmd_to_swp_entry(pmd); return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry)); } static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) { swp_entry_t arch_entry; arch_entry = __swp_entry(swp_type(entry), swp_offset(entry)); return __swp_entry_to_pmd(arch_entry); } static inline int is_pmd_migration_entry(pmd_t pmd) { return !pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd)); } #else static inline void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, struct page *page) { BUILD_BUG(); } static inline void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) { BUILD_BUG(); } static inline void pmd_migration_entry_wait(struct mm_struct *m, pmd_t *p) { } static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd) { return swp_entry(0, 0); } static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) { return __pmd(0); } static inline int is_pmd_migration_entry(pmd_t pmd) { return 0; } #endif #ifdef CONFIG_MEMORY_FAILURE extern atomic_long_t num_poisoned_pages __read_mostly; /* * Support for hardware poisoned pages */ static inline swp_entry_t make_hwpoison_entry(struct page *page) { BUG_ON(!PageLocked(page)); return swp_entry(SWP_HWPOISON, page_to_pfn(page)); } static inline int is_hwpoison_entry(swp_entry_t entry) { return swp_type(entry) == SWP_HWPOISON; } static inline void num_poisoned_pages_inc(void) { atomic_long_inc(&num_poisoned_pages); } static inline void num_poisoned_pages_dec(void) { atomic_long_dec(&num_poisoned_pages); } #else static inline swp_entry_t make_hwpoison_entry(struct page *page) { return swp_entry(0, 0); } static inline int is_hwpoison_entry(swp_entry_t swp) { return 0; } static inline void num_poisoned_pages_inc(void) { } #endif #if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION) || \ defined(CONFIG_DEVICE_PRIVATE) static inline int non_swap_entry(swp_entry_t entry) { return swp_type(entry) >= MAX_SWAPFILES; } #else static inline int non_swap_entry(swp_entry_t entry) { return 0; } #endif #endif /* CONFIG_MMU */ #endif /* _LINUX_SWAPOPS_H */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 // SPDX-License-Identifier: GPL-2.0 /* * kobject.h - generic kernel object infrastructure. * * Copyright (c) 2002-2003 Patrick Mochel * Copyright (c) 2002-2003 Open Source Development Labs * Copyright (c) 2006-2008 Greg Kroah-Hartman <greg@kroah.com> * Copyright (c) 2006-2008 Novell Inc. * * Please read Documentation/core-api/kobject.rst before using the kobject * interface, ESPECIALLY the parts about reference counts and object * destructors. */ #ifndef _KOBJECT_H_ #define _KOBJECT_H_ #include <linux/types.h> #include <linux/list.h> #include <linux/sysfs.h> #include <linux/compiler.h> #include <linux/spinlock.h> #include <linux/kref.h> #include <linux/kobject_ns.h> #include <linux/kernel.h> #include <linux/wait.h> #include <linux/atomic.h> #include <linux/workqueue.h> #include <linux/uidgid.h> #define UEVENT_HELPER_PATH_LEN 256 #define UEVENT_NUM_ENVP 64 /* number of env pointers */ #define UEVENT_BUFFER_SIZE 2048 /* buffer for the variables */ #ifdef CONFIG_UEVENT_HELPER /* path to the userspace helper executed on an event */ extern char uevent_helper[]; #endif /* counter to tag the uevent, read only except for the kobject core */ extern u64 uevent_seqnum; /* * The actions here must match the index to the string array * in lib/kobject_uevent.c * * Do not add new actions here without checking with the driver-core * maintainers. Action strings are not meant to express subsystem * or device specific properties. In most cases you want to send a * kobject_uevent_env(kobj, KOBJ_CHANGE, env) with additional event * specific variables added to the event environment. */ enum kobject_action { KOBJ_ADD, KOBJ_REMOVE, KOBJ_CHANGE, KOBJ_MOVE, KOBJ_ONLINE, KOBJ_OFFLINE, KOBJ_BIND, KOBJ_UNBIND, }; struct kobject { const char *name; struct list_head entry; struct kobject *parent; struct kset *kset; struct kobj_type *ktype; struct kernfs_node *sd; /* sysfs directory entry */ struct kref kref; #ifdef CONFIG_DEBUG_KOBJECT_RELEASE struct delayed_work release; #endif unsigned int state_initialized:1; unsigned int state_in_sysfs:1; unsigned int state_add_uevent_sent:1; unsigned int state_remove_uevent_sent:1; unsigned int uevent_suppress:1; }; extern __printf(2, 3) int kobject_set_name(struct kobject *kobj, const char *name, ...); extern __printf(2, 0) int kobject_set_name_vargs(struct kobject *kobj, const char *fmt, va_list vargs); static inline const char *kobject_name(const struct kobject *kobj) { return kobj->name; } extern void kobject_init(struct kobject *kobj, struct kobj_type *ktype); extern __printf(3, 4) __must_check int kobject_add(struct kobject *kobj, struct kobject *parent, const char *fmt, ...); extern __printf(4, 5) __must_check int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype, struct kobject *parent, const char *fmt, ...); extern void kobject_del(struct kobject *kobj); extern struct kobject * __must_check kobject_create(void); extern struct kobject * __must_check kobject_create_and_add(const char *name, struct kobject *parent); extern int __must_check kobject_rename(struct kobject *, const char *new_name); extern int __must_check kobject_move(struct kobject *, struct kobject *); extern struct kobject *kobject_get(struct kobject *kobj); extern struct kobject * __must_check kobject_get_unless_zero( struct kobject *kobj); extern void kobject_put(struct kobject *kobj); extern const void *kobject_namespace(struct kobject *kobj); extern void kobject_get_ownership(struct kobject *kobj, kuid_t *uid, kgid_t *gid); extern char *kobject_get_path(struct kobject *kobj, gfp_t flag); /** * kobject_has_children - Returns whether a kobject has children. * @kobj: the object to test * * This will return whether a kobject has other kobjects as children. * * It does NOT account for the presence of attribute files, only sub * directories. It also assumes there is no concurrent addition or * removal of such children, and thus relies on external locking. */ static inline bool kobject_has_children(struct kobject *kobj) { WARN_ON_ONCE(kref_read(&kobj->kref) == 0); return kobj->sd && kobj->sd->dir.subdirs; } struct kobj_type { void (*release)(struct kobject *kobj); const struct sysfs_ops *sysfs_ops; struct attribute **default_attrs; /* use default_groups instead */ const struct attribute_group **default_groups; const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj); const void *(*namespace)(struct kobject *kobj); void (*get_ownership)(struct kobject *kobj, kuid_t *uid, kgid_t *gid); }; struct kobj_uevent_env { char *argv[3]; char *envp[UEVENT_NUM_ENVP]; int envp_idx; char buf[UEVENT_BUFFER_SIZE]; int buflen; }; struct kset_uevent_ops { int (* const filter)(struct kset *kset, struct kobject *kobj); const char *(* const name)(struct kset *kset, struct kobject *kobj); int (* const uevent)(struct kset *kset, struct kobject *kobj, struct kobj_uevent_env *env); }; struct kobj_attribute { struct attribute attr; ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr, char *buf); ssize_t (*store)(struct kobject *kobj, struct kobj_attribute *attr, const char *buf, size_t count); }; extern const struct sysfs_ops kobj_sysfs_ops; struct sock; /** * struct kset - a set of kobjects of a specific type, belonging to a specific subsystem. * * A kset defines a group of kobjects. They can be individually * different "types" but overall these kobjects all want to be grouped * together and operated on in the same manner. ksets are used to * define the attribute callbacks and other common events that happen to * a kobject. * * @list: the list of all kobjects for this kset * @list_lock: a lock for iterating over the kobjects * @kobj: the embedded kobject for this kset (recursion, isn't it fun...) * @uevent_ops: the set of uevent operations for this kset. These are * called whenever a kobject has something happen to it so that the kset * can add new environment variables, or filter out the uevents if so * desired. */ struct kset { struct list_head list; spinlock_t list_lock; struct kobject kobj; const struct kset_uevent_ops *uevent_ops; } __randomize_layout; extern void kset_init(struct kset *kset); extern int __must_check kset_register(struct kset *kset); extern void kset_unregister(struct kset *kset); extern struct kset * __must_check kset_create_and_add(const char *name, const struct kset_uevent_ops *u, struct kobject *parent_kobj); static inline struct kset *to_kset(struct kobject *kobj) { return kobj ? container_of(kobj, struct kset, kobj) : NULL; } static inline struct kset *kset_get(struct kset *k) { return k ? to_kset(kobject_get(&k->kobj)) : NULL; } static inline void kset_put(struct kset *k) { kobject_put(&k->kobj); } static inline struct kobj_type *get_ktype(struct kobject *kobj) { return kobj->ktype; } extern struct kobject *kset_find_obj(struct kset *, const char *); /* The global /sys/kernel/ kobject for people to chain off of */ extern struct kobject *kernel_kobj; /* The global /sys/kernel/mm/ kobject for people to chain off of */ extern struct kobject *mm_kobj; /* The global /sys/hypervisor/ kobject for people to chain off of */ extern struct kobject *hypervisor_kobj; /* The global /sys/power/ kobject for people to chain off of */ extern struct kobject *power_kobj; /* The global /sys/firmware/ kobject for people to chain off of */ extern struct kobject *firmware_kobj; int kobject_uevent(struct kobject *kobj, enum kobject_action action); int kobject_uevent_env(struct kobject *kobj, enum kobject_action action, char *envp[]); int kobject_synth_uevent(struct kobject *kobj, const char *buf, size_t count); __printf(2, 3) int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...); #endif /* _KOBJECT_H_ */
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_VIRTIO_NET_H #define _LINUX_VIRTIO_NET_H #include <linux/if_vlan.h> #include <uapi/linux/tcp.h> #include <uapi/linux/udp.h> #include <uapi/linux/virtio_net.h> static inline bool virtio_net_hdr_match_proto(__be16 protocol, __u8 gso_type) { switch (gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { case VIRTIO_NET_HDR_GSO_TCPV4: return protocol == cpu_to_be16(ETH_P_IP); case VIRTIO_NET_HDR_GSO_TCPV6: return protocol == cpu_to_be16(ETH_P_IPV6); case VIRTIO_NET_HDR_GSO_UDP: return protocol == cpu_to_be16(ETH_P_IP) ||