1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 3515 3516 3517 3518 3519 3520 3521 3522 3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 3748 3749 3750 3751 3752 3753 3754 3755 3756 3757 3758 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 3824 3825 3826 3827 3828 3829 3830 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 3846 3847 3848 3849 3850 3851 3852 3853 3854 3855 3856 3857 3858 3859 3860 3861 3862 3863 3864 3865 3866 3867 3868 3869 3870 3871 3872 3873 3874 3875 3876 3877 3878 3879 3880 3881 3882 3883 3884 3885 3886 3887 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 4016 4017 4018 4019 4020 4021 4022 4023 4024 4025 4026 4027 4028 4029 4030 4031 4032 4033 4034 4035 4036 4037 4038 4039 4040 4041 4042 4043 4044 4045 4046 4047 4048 4049 4050 4051 4052 4053 4054 4055 4056 4057 4058 4059 4060 4061 4062 4063 4064 4065 4066 4067 4068 4069 4070 4071 4072 4073 4074 4075 4076 4077 4078 4079 4080 4081 4082 4083 4084 4085 4086 4087 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 4103 4104 4105 4106 4107 4108 4109 4110 4111 4112 4113 4114 4115 4116 4117 4118 4119 4120 4121 4122 4123 4124 4125 4126 4127 4128 4129 4130 4131 4132 4133 4134 4135 4136 4137 4138 4139 4140 4141 4142 4143 4144 4145 4146 4147 4148 4149 4150 4151 4152 4153 4154 4155 4156 4157 4158 4159 4160 4161 4162 4163 4164 4165 4166 4167 4168 4169 4170 4171 4172 4173 4174 4175 4176 4177 4178 4179 4180 4181 4182 4183 4184 4185 4186 4187 4188 4189 4190 4191 4192 4193 4194 4195 4196 4197 4198 4199 4200 4201 4202 4203 4204 4205 4206 4207 4208 4209 4210 4211 4212 4213 4214 4215 4216 4217 4218 4219 4220 4221 4222 4223 4224 4225 4226 4227 4228 4229 4230 4231 4232 4233 4234 4235 4236 4237 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 4252 4253 4254 4255 4256 4257 4258 4259 4260 4261 4262 4263 4264 4265 4266 4267 4268 4269 4270 4271 4272 4273 4274 4275 4276 4277 4278 4279 4280 4281 4282 4283 4284 4285 4286 4287 4288 4289 4290 4291 4292 4293 4294 4295 4296 4297 4298 4299 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 4353 4354 4355 4356 4357 4358 4359 4360 4361 4362 4363 4364 4365 4366 4367 4368 4369 4370 4371 4372 4373 4374 4375 4376 4377 4378 4379 4380 4381 4382 4383 4384 4385 4386 4387 4388 4389 4390 4391 4392 4393 4394 4395 4396 4397 4398 4399 4400 4401 4402 4403 4404 4405 4406 4407 4408 4409 4410 4411 4412 4413 4414 4415 4416 4417 4418 4419 4420 4421 4422 4423 4424 4425 4426 4427 4428 4429 4430 4431 4432 4433 4434 4435 4436 4437 4438 4439 4440 4441 4442 4443 4444 4445 4446 4447 4448 4449 4450 4451 4452 4453 4454 4455 4456 4457 4458 4459 4460 4461 4462 4463 4464 4465 4466 4467 4468 4469 4470 4471 4472 4473 4474 4475 4476 4477 4478 4479 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 // SPDX-License-Identifier: GPL-2.0-or-later /* * libata-scsi.c - helper library for ATA * * Copyright 2003-2004 Red Hat, Inc. All rights reserved. * Copyright 2003-2004 Jeff Garzik * * libata documentation is available via 'make {ps|pdf}docs', * as Documentation/driver-api/libata.rst * * Hardware documentation available from * - http://www.t10.org/ * - http://www.t13.org/ */ #include <linux/compat.h> #include <linux/slab.h> #include <linux/kernel.h> #include <linux/blkdev.h> #include <linux/spinlock.h> #include <linux/export.h> #include <scsi/scsi.h> #include <scsi/scsi_host.h> #include <scsi/scsi_cmnd.h> #include <scsi/scsi_eh.h> #include <scsi/scsi_device.h> #include <scsi/scsi_tcq.h> #include <scsi/scsi_transport.h> #include <linux/libata.h> #include <linux/hdreg.h> #include <linux/uaccess.h> #include <linux/suspend.h> #include <asm/unaligned.h> #include <linux/ioprio.h> #include <linux/of.h> #include "libata.h" #include "libata-transport.h" #define ATA_SCSI_RBUF_SIZE 576 static DEFINE_SPINLOCK(ata_scsi_rbuf_lock); static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE]; typedef unsigned int (*ata_xlat_func_t)(struct ata_queued_cmd *qc); static struct ata_device *__ata_scsi_find_dev(struct ata_port *ap, const struct scsi_device *scsidev); #define RW_RECOVERY_MPAGE 0x1 #define RW_RECOVERY_MPAGE_LEN 12 #define CACHE_MPAGE 0x8 #define CACHE_MPAGE_LEN 20 #define CONTROL_MPAGE 0xa #define CONTROL_MPAGE_LEN 12 #define ALL_MPAGES 0x3f #define ALL_SUB_MPAGES 0xff static const u8 def_rw_recovery_mpage[RW_RECOVERY_MPAGE_LEN] = { RW_RECOVERY_MPAGE, RW_RECOVERY_MPAGE_LEN - 2, (1 << 7), /* AWRE */ 0, /* read retry count */ 0, 0, 0, 0, 0, /* write retry count */ 0, 0, 0 }; static const u8 def_cache_mpage[CACHE_MPAGE_LEN] = { CACHE_MPAGE, CACHE_MPAGE_LEN - 2, 0, /* contains WCE, needs to be 0 for logic */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* contains DRA, needs to be 0 for logic */ 0, 0, 0, 0, 0, 0, 0 }; static const u8 def_control_mpage[CONTROL_MPAGE_LEN] = { CONTROL_MPAGE, CONTROL_MPAGE_LEN - 2, 2, /* DSENSE=0, GLTSD=1 */ 0, /* [QAM+QERR may be 1, see 05-359r1] */ 0, 0, 0, 0, 0xff, 0xff, 0, 30 /* extended self test time, see 05-359r1 */ }; static ssize_t ata_scsi_park_show(struct device *device, struct device_attribute *attr, char *buf) { struct scsi_device *sdev = to_scsi_device(device); struct ata_port *ap; struct ata_link *link; struct ata_device *dev; unsigned long now; unsigned int msecs; int rc = 0; ap = ata_shost_to_port(sdev->host); spin_lock_irq(ap->lock); dev = ata_scsi_find_dev(ap, sdev); if (!dev) { rc = -ENODEV; goto unlock; } if (dev->flags & ATA_DFLAG_NO_UNLOAD) { rc = -EOPNOTSUPP; goto unlock; } link = dev->link; now = jiffies; if (ap->pflags & ATA_PFLAG_EH_IN_PROGRESS && link->eh_context.unloaded_mask & (1 << dev->devno) && time_after(dev->unpark_deadline, now)) msecs = jiffies_to_msecs(dev->unpark_deadline - now); else msecs = 0; unlock: spin_unlock_irq(ap->lock); return rc ? rc : snprintf(buf, 20, "%u\n", msecs); } static ssize_t ata_scsi_park_store(struct device *device, struct device_attribute *attr, const char *buf, size_t len) { struct scsi_device *sdev = to_scsi_device(device); struct ata_port *ap; struct ata_device *dev; long int input; unsigned long flags; int rc; rc = kstrtol(buf, 10, &input); if (rc) return rc; if (input < -2) return -EINVAL; if (input > ATA_TMOUT_MAX_PARK) { rc = -EOVERFLOW; input = ATA_TMOUT_MAX_PARK; } ap = ata_shost_to_port(sdev->host); spin_lock_irqsave(ap->lock, flags); dev = ata_scsi_find_dev(ap, sdev); if (unlikely(!dev)) { rc = -ENODEV; goto unlock; } if (dev->class != ATA_DEV_ATA && dev->class != ATA_DEV_ZAC) { rc = -EOPNOTSUPP; goto unlock; } if (input >= 0) { if (dev->flags & ATA_DFLAG_NO_UNLOAD) { rc = -EOPNOTSUPP; goto unlock; } dev->unpark_deadline = ata_deadline(jiffies, input); dev->link->eh_info.dev_action[dev->devno] |= ATA_EH_PARK; ata_port_schedule_eh(ap); complete(&ap->park_req_pending); } else { switch (input) { case -1: dev->flags &= ~ATA_DFLAG_NO_UNLOAD; break; case -2: dev->flags |= ATA_DFLAG_NO_UNLOAD; break; } } unlock: spin_unlock_irqrestore(ap->lock, flags); return rc ? rc : len; } DEVICE_ATTR(unload_heads, S_IRUGO | S_IWUSR, ata_scsi_park_show, ata_scsi_park_store); EXPORT_SYMBOL_GPL(dev_attr_unload_heads); void ata_scsi_set_sense(struct ata_device *dev, struct scsi_cmnd *cmd, u8 sk, u8 asc, u8 ascq) { bool d_sense = (dev->flags & ATA_DFLAG_D_SENSE); if (!cmd) return; cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION; scsi_build_sense_buffer(d_sense, cmd->sense_buffer, sk, asc, ascq); } void ata_scsi_set_sense_information(struct ata_device *dev, struct scsi_cmnd *cmd, const struct ata_taskfile *tf) { u64 information; if (!cmd) return; information = ata_tf_read_block(tf, dev); if (information == U64_MAX) return; scsi_set_sense_information(cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE, information); } static void ata_scsi_set_invalid_field(struct ata_device *dev, struct scsi_cmnd *cmd, u16 field, u8 bit) { ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x24, 0x0); /* "Invalid field in CDB" */ scsi_set_sense_field_pointer(cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE, field, bit, 1); } static void ata_scsi_set_invalid_parameter(struct ata_device *dev, struct scsi_cmnd *cmd, u16 field) { /* "Invalid field in parameter list" */ ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x26, 0x0); scsi_set_sense_field_pointer(cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE, field, 0xff, 0); } struct device_attribute *ata_common_sdev_attrs[] = { &dev_attr_unload_heads, NULL }; EXPORT_SYMBOL_GPL(ata_common_sdev_attrs); /** * ata_std_bios_param - generic bios head/sector/cylinder calculator used by sd. * @sdev: SCSI device for which BIOS geometry is to be determined * @bdev: block device associated with @sdev * @capacity: capacity of SCSI device * @geom: location to which geometry will be output * * Generic bios head/sector/cylinder calculator * used by sd. Most BIOSes nowadays expect a XXX/255/16 (CHS) * mapping. Some situations may arise where the disk is not * bootable if this is not used. * * LOCKING: * Defined by the SCSI layer. We don't really care. * * RETURNS: * Zero. */ int ata_std_bios_param(struct scsi_device *sdev, struct block_device *bdev, sector_t capacity, int geom[]) { geom[0] = 255; geom[1] = 63; sector_div(capacity, 255*63); geom[2] = capacity; return 0; } EXPORT_SYMBOL_GPL(ata_std_bios_param); /** * ata_scsi_unlock_native_capacity - unlock native capacity * @sdev: SCSI device to adjust device capacity for * * This function is called if a partition on @sdev extends beyond * the end of the device. It requests EH to unlock HPA. * * LOCKING: * Defined by the SCSI layer. Might sleep. */ void ata_scsi_unlock_native_capacity(struct scsi_device *sdev) { struct ata_port *ap = ata_shost_to_port(sdev->host); struct ata_device *dev; unsigned long flags; spin_lock_irqsave(ap->lock, flags); dev = ata_scsi_find_dev(ap, sdev); if (dev && dev->n_sectors < dev->n_native_sectors) { dev->flags |= ATA_DFLAG_UNLOCK_HPA; dev->link->eh_info.action |= ATA_EH_RESET; ata_port_schedule_eh(ap); } spin_unlock_irqrestore(ap->lock, flags); ata_port_wait_eh(ap); } EXPORT_SYMBOL_GPL(ata_scsi_unlock_native_capacity); /** * ata_get_identity - Handler for HDIO_GET_IDENTITY ioctl * @ap: target port * @sdev: SCSI device to get identify data for * @arg: User buffer area for identify data * * LOCKING: * Defined by the SCSI layer. We don't really care. * * RETURNS: * Zero on success, negative errno on error. */ static int ata_get_identity(struct ata_port *ap, struct scsi_device *sdev, void __user *arg) { struct ata_device *dev = ata_scsi_find_dev(ap, sdev); u16 __user *dst = arg; char buf[40]; if (!dev) return -ENOMSG; if (copy_to_user(dst, dev->id, ATA_ID_WORDS * sizeof(u16))) return -EFAULT; ata_id_string(dev->id, buf, ATA_ID_PROD, ATA_ID_PROD_LEN); if (copy_to_user(dst + ATA_ID_PROD, buf, ATA_ID_PROD_LEN)) return -EFAULT; ata_id_string(dev->id, buf, ATA_ID_FW_REV, ATA_ID_FW_REV_LEN); if (copy_to_user(dst + ATA_ID_FW_REV, buf, ATA_ID_FW_REV_LEN)) return -EFAULT; ata_id_string(dev->id, buf, ATA_ID_SERNO, ATA_ID_SERNO_LEN); if (copy_to_user(dst + ATA_ID_SERNO, buf, ATA_ID_SERNO_LEN)) return -EFAULT; return 0; } /** * ata_cmd_ioctl - Handler for HDIO_DRIVE_CMD ioctl * @scsidev: Device to which we are issuing command * @arg: User provided data for issuing command * * LOCKING: * Defined by the SCSI layer. We don't really care. * * RETURNS: * Zero on success, negative errno on error. */ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg) { int rc = 0; u8 sensebuf[SCSI_SENSE_BUFFERSIZE]; u8 scsi_cmd[MAX_COMMAND_SIZE]; u8 args[4], *argbuf = NULL; int argsize = 0; enum dma_data_direction data_dir; struct scsi_sense_hdr sshdr; int cmd_result; if (arg == NULL) return -EINVAL; if (copy_from_user(args, arg, sizeof(args))) return -EFAULT; memset(sensebuf, 0, sizeof(sensebuf)); memset(scsi_cmd, 0, sizeof(scsi_cmd)); if (args[3]) { argsize = ATA_SECT_SIZE * args[3]; argbuf = kmalloc(argsize, GFP_KERNEL); if (argbuf == NULL) { rc = -ENOMEM; goto error; } scsi_cmd[1] = (4 << 1); /* PIO Data-in */ scsi_cmd[2] = 0x0e; /* no off.line or cc, read from dev, block count in sector count field */ data_dir = DMA_FROM_DEVICE; } else { scsi_cmd[1] = (3 << 1); /* Non-data */ scsi_cmd[2] = 0x20; /* cc but no off.line or data xfer */ data_dir = DMA_NONE; } scsi_cmd[0] = ATA_16; scsi_cmd[4] = args[2]; if (args[0] == ATA_CMD_SMART) { /* hack -- ide driver does this too */ scsi_cmd[6] = args[3]; scsi_cmd[8] = args[1]; scsi_cmd[10] = ATA_SMART_LBAM_PASS; scsi_cmd[12] = ATA_SMART_LBAH_PASS; } else { scsi_cmd[6] = args[1]; } scsi_cmd[14] = args[0]; /* Good values for timeout and retries? Values below from scsi_ioctl_send_command() for default case... */ cmd_result = scsi_execute(scsidev, scsi_cmd, data_dir, argbuf, argsize, sensebuf, &sshdr, (10*HZ), 5, 0, 0, NULL); if (driver_byte(cmd_result) == DRIVER_SENSE) {/* sense data available */ u8 *desc = sensebuf + 8; cmd_result &= ~(0xFF<<24); /* DRIVER_SENSE is not an error */ /* If we set cc then ATA pass-through will cause a * check condition even if no error. Filter that. */ if (cmd_result & SAM_STAT_CHECK_CONDITION) { if (sshdr.sense_key == RECOVERED_ERROR && sshdr.asc == 0 && sshdr.ascq == 0x1d) cmd_result &= ~SAM_STAT_CHECK_CONDITION; } /* Send userspace a few ATA registers (same as drivers/ide) */ if (sensebuf[0] == 0x72 && /* format is "descriptor" */ desc[0] == 0x09) { /* code is "ATA Descriptor" */ args[0] = desc[13]; /* status */ args[1] = desc[3]; /* error */ args[2] = desc[5]; /* sector count (0:7) */ if (copy_to_user(arg, args, sizeof(args))) rc = -EFAULT; } } if (cmd_result) { rc = -EIO; goto error; } if ((argbuf) && copy_to_user(arg + sizeof(args), argbuf, argsize)) rc = -EFAULT; error: kfree(argbuf); return rc; } /** * ata_task_ioctl - Handler for HDIO_DRIVE_TASK ioctl * @scsidev: Device to which we are issuing command * @arg: User provided data for issuing command * * LOCKING: * Defined by the SCSI layer. We don't really care. * * RETURNS: * Zero on success, negative errno on error. */ int ata_task_ioctl(struct scsi_device *scsidev, void __user *arg) { int rc = 0; u8 sensebuf[SCSI_SENSE_BUFFERSIZE]; u8 scsi_cmd[MAX_COMMAND_SIZE]; u8 args[7]; struct scsi_sense_hdr sshdr; int cmd_result; if (arg == NULL) return -EINVAL; if (copy_from_user(args, arg, sizeof(args))) return -EFAULT; memset(sensebuf, 0, sizeof(sensebuf)); memset(scsi_cmd, 0, sizeof(scsi_cmd)); scsi_cmd[0] = ATA_16; scsi_cmd[1] = (3 << 1); /* Non-data */ scsi_cmd[2] = 0x20; /* cc but no off.line or data xfer */ scsi_cmd[4] = args[1]; scsi_cmd[6] = args[2]; scsi_cmd[8] = args[3]; scsi_cmd[10] = args[4]; scsi_cmd[12] = args[5]; scsi_cmd[13] = args[6] & 0x4f; scsi_cmd[14] = args[0]; /* Good values for timeout and retries? Values below from scsi_ioctl_send_command() for default case... */ cmd_result = scsi_execute(scsidev, scsi_cmd, DMA_NONE, NULL, 0, sensebuf, &sshdr, (10*HZ), 5, 0, 0, NULL); if (driver_byte(cmd_result) == DRIVER_SENSE) {/* sense data available */ u8 *desc = sensebuf + 8; cmd_result &= ~(0xFF<<24); /* DRIVER_SENSE is not an error */ /* If we set cc then ATA pass-through will cause a * check condition even if no error. Filter that. */ if (cmd_result & SAM_STAT_CHECK_CONDITION) { if (sshdr.sense_key == RECOVERED_ERROR && sshdr.asc == 0 && sshdr.ascq == 0x1d) cmd_result &= ~SAM_STAT_CHECK_CONDITION; } /* Send userspace ATA registers */ if (sensebuf[0] == 0x72 && /* format is "descriptor" */ desc[0] == 0x09) {/* code is "ATA Descriptor" */ args[0] = desc[13]; /* status */ args[1] = desc[3]; /* error */ args[2] = desc[5]; /* sector count (0:7) */ args[3] = desc[7]; /* lbal */ args[4] = desc[9]; /* lbam */ args[5] = desc[11]; /* lbah */ args[6] = desc[12]; /* select */ if (copy_to_user(arg, args, sizeof(args))) rc = -EFAULT; } } if (cmd_result) { rc = -EIO; goto error; } error: return rc; } static int ata_ioc32(struct ata_port *ap) { if (ap->flags & ATA_FLAG_PIO_DMA) return 1; if (ap->pflags & ATA_PFLAG_PIO32) return 1; return 0; } /* * This handles both native and compat commands, so anything added * here must have a compatible argument, or check in_compat_syscall() */ int ata_sas_scsi_ioctl(struct ata_port *ap, struct scsi_device *scsidev, unsigned int cmd, void __user *arg) { unsigned long val; int rc = -EINVAL; unsigned long flags; switch (cmd) { case HDIO_GET_32BIT: spin_lock_irqsave(ap->lock, flags); val = ata_ioc32(ap); spin_unlock_irqrestore(ap->lock, flags); #ifdef CONFIG_COMPAT if (in_compat_syscall()) return put_user(val, (compat_ulong_t __user *)arg); #endif return put_user(val, (unsigned long __user *)arg); case HDIO_SET_32BIT: val = (unsigned long) arg; rc = 0; spin_lock_irqsave(ap->lock, flags); if (ap->pflags & ATA_PFLAG_PIO32CHANGE) { if (val) ap->pflags |= ATA_PFLAG_PIO32; else ap->pflags &= ~ATA_PFLAG_PIO32; } else { if (val != ata_ioc32(ap)) rc = -EINVAL; } spin_unlock_irqrestore(ap->lock, flags); return rc; case HDIO_GET_IDENTITY: return ata_get_identity(ap, scsidev, arg); case HDIO_DRIVE_CMD: if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)) return -EACCES; return ata_cmd_ioctl(scsidev, arg); case HDIO_DRIVE_TASK: if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)) return -EACCES; return ata_task_ioctl(scsidev, arg); default: rc = -ENOTTY; break; } return rc; } EXPORT_SYMBOL_GPL(ata_sas_scsi_ioctl); int ata_scsi_ioctl(struct scsi_device *scsidev, unsigned int cmd, void __user *arg) { return ata_sas_scsi_ioctl(ata_shost_to_port(scsidev->host), scsidev, cmd, arg); } EXPORT_SYMBOL_GPL(ata_scsi_ioctl); /** * ata_scsi_qc_new - acquire new ata_queued_cmd reference * @dev: ATA device to which the new command is attached * @cmd: SCSI command that originated this ATA command * * Obtain a reference to an unused ata_queued_cmd structure, * which is the basic libata structure representing a single * ATA command sent to the hardware. * * If a command was available, fill in the SCSI-specific * portions of the structure with information on the * current command. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Command allocated, or %NULL if none available. */ static struct ata_queued_cmd *ata_scsi_qc_new(struct ata_device *dev, struct scsi_cmnd *cmd) { struct ata_queued_cmd *qc; qc = ata_qc_new_init(dev, cmd->request->tag); if (qc) { qc->scsicmd = cmd; qc->scsidone = cmd->scsi_done; qc->sg = scsi_sglist(cmd); qc->n_elem = scsi_sg_count(cmd); if (cmd->request->rq_flags & RQF_QUIET) qc->flags |= ATA_QCFLAG_QUIET; } else { cmd->result = (DID_OK << 16) | (QUEUE_FULL << 1); cmd->scsi_done(cmd); } return qc; } static void ata_qc_set_pc_nbytes(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; qc->extrabytes = scmd->extra_len; qc->nbytes = scsi_bufflen(scmd) + qc->extrabytes; } /** * ata_dump_status - user friendly display of error info * @id: id of the port in question * @tf: ptr to filled out taskfile * * Decode and dump the ATA error/status registers for the user so * that they have some idea what really happened at the non * make-believe layer. * * LOCKING: * inherited from caller */ static void ata_dump_status(unsigned id, struct ata_taskfile *tf) { u8 stat = tf->command, err = tf->feature; pr_warn("ata%u: status=0x%02x { ", id, stat); if (stat & ATA_BUSY) { pr_cont("Busy }\n"); /* Data is not valid in this case */ } else { if (stat & ATA_DRDY) pr_cont("DriveReady "); if (stat & ATA_DF) pr_cont("DeviceFault "); if (stat & ATA_DSC) pr_cont("SeekComplete "); if (stat & ATA_DRQ) pr_cont("DataRequest "); if (stat & ATA_CORR) pr_cont("CorrectedError "); if (stat & ATA_SENSE) pr_cont("Sense "); if (stat & ATA_ERR) pr_cont("Error "); pr_cont("}\n"); if (err) { pr_warn("ata%u: error=0x%02x { ", id, err); if (err & ATA_ABORTED) pr_cont("DriveStatusError "); if (err & ATA_ICRC) { if (err & ATA_ABORTED) pr_cont("BadCRC "); else pr_cont("Sector "); } if (err & ATA_UNC) pr_cont("UncorrectableError "); if (err & ATA_IDNF) pr_cont("SectorIdNotFound "); if (err & ATA_TRK0NF) pr_cont("TrackZeroNotFound "); if (err & ATA_AMNF) pr_cont("AddrMarkNotFound "); pr_cont("}\n"); } } } /** * ata_to_sense_error - convert ATA error to SCSI error * @id: ATA device number * @drv_stat: value contained in ATA status register * @drv_err: value contained in ATA error register * @sk: the sense key we'll fill out * @asc: the additional sense code we'll fill out * @ascq: the additional sense code qualifier we'll fill out * @verbose: be verbose * * Converts an ATA error into a SCSI error. Fill out pointers to * SK, ASC, and ASCQ bytes for later use in fixed or descriptor * format sense blocks. * * LOCKING: * spin_lock_irqsave(host lock) */ static void ata_to_sense_error(unsigned id, u8 drv_stat, u8 drv_err, u8 *sk, u8 *asc, u8 *ascq, int verbose) { int i; /* Based on the 3ware driver translation table */ static const unsigned char sense_table[][4] = { /* BBD|ECC|ID|MAR */ {0xd1, ABORTED_COMMAND, 0x00, 0x00}, // Device busy Aborted command /* BBD|ECC|ID */ {0xd0, ABORTED_COMMAND, 0x00, 0x00}, // Device busy Aborted command /* ECC|MC|MARK */ {0x61, HARDWARE_ERROR, 0x00, 0x00}, // Device fault Hardware error /* ICRC|ABRT */ /* NB: ICRC & !ABRT is BBD */ {0x84, ABORTED_COMMAND, 0x47, 0x00}, // Data CRC error SCSI parity error /* MC|ID|ABRT|TRK0|MARK */ {0x37, NOT_READY, 0x04, 0x00}, // Unit offline Not ready /* MCR|MARK */ {0x09, NOT_READY, 0x04, 0x00}, // Unrecovered disk error Not ready /* Bad address mark */ {0x01, MEDIUM_ERROR, 0x13, 0x00}, // Address mark not found for data field /* TRK0 - Track 0 not found */ {0x02, HARDWARE_ERROR, 0x00, 0x00}, // Hardware error /* Abort: 0x04 is not translated here, see below */ /* Media change request */ {0x08, NOT_READY, 0x04, 0x00}, // FIXME: faking offline /* SRV/IDNF - ID not found */ {0x10, ILLEGAL_REQUEST, 0x21, 0x00}, // Logical address out of range /* MC - Media Changed */ {0x20, UNIT_ATTENTION, 0x28, 0x00}, // Not ready to ready change, medium may have changed /* ECC - Uncorrectable ECC error */ {0x40, MEDIUM_ERROR, 0x11, 0x04}, // Unrecovered read error /* BBD - block marked bad */ {0x80, MEDIUM_ERROR, 0x11, 0x04}, // Block marked bad Medium error, unrecovered read error {0xFF, 0xFF, 0xFF, 0xFF}, // END mark }; static const unsigned char stat_table[][4] = { /* Must be first because BUSY means no other bits valid */ {0x80, ABORTED_COMMAND, 0x47, 0x00}, // Busy, fake parity for now {0x40, ILLEGAL_REQUEST, 0x21, 0x04}, // Device ready, unaligned write command {0x20, HARDWARE_ERROR, 0x44, 0x00}, // Device fault, internal target failure {0x08, ABORTED_COMMAND, 0x47, 0x00}, // Timed out in xfer, fake parity for now {0x04, RECOVERED_ERROR, 0x11, 0x00}, // Recovered ECC error Medium error, recovered {0xFF, 0xFF, 0xFF, 0xFF}, // END mark }; /* * Is this an error we can process/parse */ if (drv_stat & ATA_BUSY) { drv_err = 0; /* Ignore the err bits, they're invalid */ } if (drv_err) { /* Look for drv_err */ for (i = 0; sense_table[i][0] != 0xFF; i++) { /* Look for best matches first */ if ((sense_table[i][0] & drv_err) == sense_table[i][0]) { *sk = sense_table[i][1]; *asc = sense_table[i][2]; *ascq = sense_table[i][3]; goto translate_done; } } } /* * Fall back to interpreting status bits. Note that if the drv_err * has only the ABRT bit set, we decode drv_stat. ABRT by itself * is not descriptive enough. */ for (i = 0; stat_table[i][0] != 0xFF; i++) { if (stat_table[i][0] & drv_stat) { *sk = stat_table[i][1]; *asc = stat_table[i][2]; *ascq = stat_table[i][3]; goto translate_done; } } /* * We need a sensible error return here, which is tricky, and one * that won't cause people to do things like return a disk wrongly. */ *sk = ABORTED_COMMAND; *asc = 0x00; *ascq = 0x00; translate_done: if (verbose) pr_err("ata%u: translated ATA stat/err 0x%02x/%02x to SCSI SK/ASC/ASCQ 0x%x/%02x/%02x\n", id, drv_stat, drv_err, *sk, *asc, *ascq); return; } /* * ata_gen_passthru_sense - Generate check condition sense block. * @qc: Command that completed. * * This function is specific to the ATA descriptor format sense * block specified for the ATA pass through commands. Regardless * of whether the command errored or not, return a sense * block. Copy all controller registers into the sense * block. If there was no error, we get the request from an ATA * passthrough command, so we use the following sense data: * sk = RECOVERED ERROR * asc,ascq = ATA PASS-THROUGH INFORMATION AVAILABLE * * * LOCKING: * None. */ static void ata_gen_passthru_sense(struct ata_queued_cmd *qc) { struct scsi_cmnd *cmd = qc->scsicmd; struct ata_taskfile *tf = &qc->result_tf; unsigned char *sb = cmd->sense_buffer; unsigned char *desc = sb + 8; int verbose = qc->ap->ops->error_handler == NULL; u8 sense_key, asc, ascq; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION; /* * Use ata_to_sense_error() to map status register bits * onto sense key, asc & ascq. */ if (qc->err_mask || tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->print_id, tf->command, tf->feature, &sense_key, &asc, &ascq, verbose); ata_scsi_set_sense(qc->dev, cmd, sense_key, asc, ascq); } else { /* * ATA PASS-THROUGH INFORMATION AVAILABLE * Always in descriptor format sense. */ scsi_build_sense_buffer(1, cmd->sense_buffer, RECOVERED_ERROR, 0, 0x1D); } if ((cmd->sense_buffer[0] & 0x7f) >= 0x72) { u8 len; /* descriptor format */ len = sb[7]; desc = (char *)scsi_sense_desc_find(sb, len + 8, 9); if (!desc) { if (SCSI_SENSE_BUFFERSIZE < len + 14) return; sb[7] = len + 14; desc = sb + 8 + len; } desc[0] = 9; desc[1] = 12; /* * Copy registers into sense buffer. */ desc[2] = 0x00; desc[3] = tf->feature; /* == error reg */ desc[5] = tf->nsect; desc[7] = tf->lbal; desc[9] = tf->lbam; desc[11] = tf->lbah; desc[12] = tf->device; desc[13] = tf->command; /* == status reg */ /* * Fill in Extend bit, and the high order bytes * if applicable. */ if (tf->flags & ATA_TFLAG_LBA48) { desc[2] |= 0x01; desc[4] = tf->hob_nsect; desc[6] = tf->hob_lbal; desc[8] = tf->hob_lbam; desc[10] = tf->hob_lbah; } } else { /* Fixed sense format */ desc[0] = tf->feature; desc[1] = tf->command; /* status */ desc[2] = tf->device; desc[3] = tf->nsect; desc[7] = 0; if (tf->flags & ATA_TFLAG_LBA48) { desc[8] |= 0x80; if (tf->hob_nsect) desc[8] |= 0x40; if (tf->hob_lbal || tf->hob_lbam || tf->hob_lbah) desc[8] |= 0x20; } desc[9] = tf->lbal; desc[10] = tf->lbam; desc[11] = tf->lbah; } } /** * ata_gen_ata_sense - generate a SCSI fixed sense block * @qc: Command that we are erroring out * * Generate sense block for a failed ATA command @qc. Descriptor * format is used to accommodate LBA48 block address. * * LOCKING: * None. */ static void ata_gen_ata_sense(struct ata_queued_cmd *qc) { struct ata_device *dev = qc->dev; struct scsi_cmnd *cmd = qc->scsicmd; struct ata_taskfile *tf = &qc->result_tf; unsigned char *sb = cmd->sense_buffer; int verbose = qc->ap->ops->error_handler == NULL; u64 block; u8 sense_key, asc, ascq; memset(sb, 0, SCSI_SENSE_BUFFERSIZE); cmd->result = (DRIVER_SENSE << 24) | SAM_STAT_CHECK_CONDITION; if (ata_dev_disabled(dev)) { /* Device disabled after error recovery */ /* LOGICAL UNIT NOT READY, HARD RESET REQUIRED */ ata_scsi_set_sense(dev, cmd, NOT_READY, 0x04, 0x21); return; } /* Use ata_to_sense_error() to map status register bits * onto sense key, asc & ascq. */ if (qc->err_mask || tf->command & (ATA_BUSY | ATA_DF | ATA_ERR | ATA_DRQ)) { ata_to_sense_error(qc->ap->print_id, tf->command, tf->feature, &sense_key, &asc, &ascq, verbose); ata_scsi_set_sense(dev, cmd, sense_key, asc, ascq); } else { /* Could not decode error */ ata_dev_warn(dev, "could not decode error status 0x%x err_mask 0x%x\n", tf->command, qc->err_mask); ata_scsi_set_sense(dev, cmd, ABORTED_COMMAND, 0, 0); return; } block = ata_tf_read_block(&qc->result_tf, dev); if (block == U64_MAX) return; scsi_set_sense_information(sb, SCSI_SENSE_BUFFERSIZE, block); } void ata_scsi_sdev_config(struct scsi_device *sdev) { sdev->use_10_for_rw = 1; sdev->use_10_for_ms = 1; sdev->no_write_same = 1; /* Schedule policy is determined by ->qc_defer() callback and * it needs to see every deferred qc. Set dev_blocked to 1 to * prevent SCSI midlayer from automatically deferring * requests. */ sdev->max_device_blocked = 1; } /** * ata_scsi_dma_need_drain - Check whether data transfer may overflow * @rq: request to be checked * * ATAPI commands which transfer variable length data to host * might overflow due to application error or hardware bug. This * function checks whether overflow should be drained and ignored * for @request. * * LOCKING: * None. * * RETURNS: * 1 if ; otherwise, 0. */ bool ata_scsi_dma_need_drain(struct request *rq) { return atapi_cmd_type(scsi_req(rq)->cmd[0]) == ATAPI_MISC; } EXPORT_SYMBOL_GPL(ata_scsi_dma_need_drain); int ata_scsi_dev_config(struct scsi_device *sdev, struct ata_device *dev) { struct request_queue *q = sdev->request_queue; if (!ata_id_has_unload(dev->id)) dev->flags |= ATA_DFLAG_NO_UNLOAD; /* configure max sectors */ blk_queue_max_hw_sectors(q, dev->max_sectors); if (dev->class == ATA_DEV_ATAPI) { sdev->sector_size = ATA_SECT_SIZE; /* set DMA padding */ blk_queue_update_dma_pad(q, ATA_DMA_PAD_SZ - 1); /* make room for appending the drain */ blk_queue_max_segments(q, queue_max_segments(q) - 1); sdev->dma_drain_len = ATAPI_MAX_DRAIN; sdev->dma_drain_buf = kmalloc(sdev->dma_drain_len, q->bounce_gfp | GFP_KERNEL); if (!sdev->dma_drain_buf) { ata_dev_err(dev, "drain buffer allocation failed\n"); return -ENOMEM; } } else { sdev->sector_size = ata_id_logical_sector_size(dev->id); sdev->manage_start_stop = 1; } /* * ata_pio_sectors() expects buffer for each sector to not cross * page boundary. Enforce it by requiring buffers to be sector * aligned, which works iff sector_size is not larger than * PAGE_SIZE. ATAPI devices also need the alignment as * IDENTIFY_PACKET is executed as ATA_PROT_PIO. */ if (sdev->sector_size > PAGE_SIZE) ata_dev_warn(dev, "sector_size=%u > PAGE_SIZE, PIO may malfunction\n", sdev->sector_size); blk_queue_update_dma_alignment(q, sdev->sector_size - 1); if (dev->flags & ATA_DFLAG_AN) set_bit(SDEV_EVT_MEDIA_CHANGE, sdev->supported_events); if (dev->flags & ATA_DFLAG_NCQ) { int depth; depth = min(sdev->host->can_queue, ata_id_queue_depth(dev->id)); depth = min(ATA_MAX_QUEUE, depth); scsi_change_queue_depth(sdev, depth); } if (dev->flags & ATA_DFLAG_TRUSTED) sdev->security_supported = 1; dev->sdev = sdev; return 0; } /** * ata_scsi_slave_config - Set SCSI device attributes * @sdev: SCSI device to examine * * This is called before we actually start reading * and writing to the device, to configure certain * SCSI mid-layer behaviors. * * LOCKING: * Defined by SCSI layer. We don't really care. */ int ata_scsi_slave_config(struct scsi_device *sdev) { struct ata_port *ap = ata_shost_to_port(sdev->host); struct ata_device *dev = __ata_scsi_find_dev(ap, sdev); int rc = 0; ata_scsi_sdev_config(sdev); if (dev) rc = ata_scsi_dev_config(sdev, dev); return rc; } EXPORT_SYMBOL_GPL(ata_scsi_slave_config); /** * ata_scsi_slave_destroy - SCSI device is about to be destroyed * @sdev: SCSI device to be destroyed * * @sdev is about to be destroyed for hot/warm unplugging. If * this unplugging was initiated by libata as indicated by NULL * dev->sdev, this function doesn't have to do anything. * Otherwise, SCSI layer initiated warm-unplug is in progress. * Clear dev->sdev, schedule the device for ATA detach and invoke * EH. * * LOCKING: * Defined by SCSI layer. We don't really care. */ void ata_scsi_slave_destroy(struct scsi_device *sdev) { struct ata_port *ap = ata_shost_to_port(sdev->host); unsigned long flags; struct ata_device *dev; if (!ap->ops->error_handler) return; spin_lock_irqsave(ap->lock, flags); dev = __ata_scsi_find_dev(ap, sdev); if (dev && dev->sdev) { /* SCSI device already in CANCEL state, no need to offline it */ dev->sdev = NULL; dev->flags |= ATA_DFLAG_DETACH; ata_port_schedule_eh(ap); } spin_unlock_irqrestore(ap->lock, flags); kfree(sdev->dma_drain_buf); } EXPORT_SYMBOL_GPL(ata_scsi_slave_destroy); /** * ata_scsi_start_stop_xlat - Translate SCSI START STOP UNIT command * @qc: Storage for translated ATA taskfile * * Sets up an ATA taskfile to issue STANDBY (to stop) or READ VERIFY * (to start). Perhaps these commands should be preceded by * CHECK POWER MODE to see what power mode the device is already in. * [See SAT revision 5 at www.t10.org] * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on error. */ static unsigned int ata_scsi_start_stop_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; struct ata_taskfile *tf = &qc->tf; const u8 *cdb = scmd->cmnd; u16 fp; u8 bp = 0xff; if (scmd->cmd_len < 5) { fp = 4; goto invalid_fld; } tf->flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR; tf->protocol = ATA_PROT_NODATA; if (cdb[1] & 0x1) { ; /* ignore IMMED bit, violates sat-r05 */ } if (cdb[4] & 0x2) { fp = 4; bp = 1; goto invalid_fld; /* LOEJ bit set not supported */ } if (((cdb[4] >> 4) & 0xf) != 0) { fp = 4; bp = 3; goto invalid_fld; /* power conditions not supported */ } if (cdb[4] & 0x1) { tf->nsect = 1; /* 1 sector, lba=0 */ if (qc->dev->flags & ATA_DFLAG_LBA) { tf->flags |= ATA_TFLAG_LBA; tf->lbah = 0x0; tf->lbam = 0x0; tf->lbal = 0x0; tf->device |= ATA_LBA; } else { /* CHS */ tf->lbal = 0x1; /* sect */ tf->lbam = 0x0; /* cyl low */ tf->lbah = 0x0; /* cyl high */ } tf->command = ATA_CMD_VERIFY; /* READ VERIFY */ } else { /* Some odd clown BIOSen issue spindown on power off (ACPI S4 * or S5) causing some drives to spin up and down again. */ if ((qc->ap->flags & ATA_FLAG_NO_POWEROFF_SPINDOWN) && system_state == SYSTEM_POWER_OFF) goto skip; if ((qc->ap->flags & ATA_FLAG_NO_HIBERNATE_SPINDOWN) && system_entering_hibernation()) goto skip; /* Issue ATA STANDBY IMMEDIATE command */ tf->command = ATA_CMD_STANDBYNOW1; } /* * Standby and Idle condition timers could be implemented but that * would require libata to implement the Power condition mode page * and allow the user to change it. Changing mode pages requires * MODE SELECT to be implemented. */ return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, bp); return 1; skip: scmd->result = SAM_STAT_GOOD; return 1; } /** * ata_scsi_flush_xlat - Translate SCSI SYNCHRONIZE CACHE command * @qc: Storage for translated ATA taskfile * * Sets up an ATA taskfile to issue FLUSH CACHE or * FLUSH CACHE EXT. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on error. */ static unsigned int ata_scsi_flush_xlat(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &qc->tf; tf->flags |= ATA_TFLAG_DEVICE; tf->protocol = ATA_PROT_NODATA; if (qc->dev->flags & ATA_DFLAG_FLUSH_EXT) tf->command = ATA_CMD_FLUSH_EXT; else tf->command = ATA_CMD_FLUSH; /* flush is critical for IO integrity, consider it an IO command */ qc->flags |= ATA_QCFLAG_IO; return 0; } /** * scsi_6_lba_len - Get LBA and transfer length * @cdb: SCSI command to translate * * Calculate LBA and transfer length for 6-byte commands. * * RETURNS: * @plba: the LBA * @plen: the transfer length */ static void scsi_6_lba_len(const u8 *cdb, u64 *plba, u32 *plen) { u64 lba = 0; u32 len; VPRINTK("six-byte command\n"); lba |= ((u64)(cdb[1] & 0x1f)) << 16; lba |= ((u64)cdb[2]) << 8; lba |= ((u64)cdb[3]); len = cdb[4]; *plba = lba; *plen = len; } /** * scsi_10_lba_len - Get LBA and transfer length * @cdb: SCSI command to translate * * Calculate LBA and transfer length for 10-byte commands. * * RETURNS: * @plba: the LBA * @plen: the transfer length */ static void scsi_10_lba_len(const u8 *cdb, u64 *plba, u32 *plen) { u64 lba = 0; u32 len = 0; VPRINTK("ten-byte command\n"); lba |= ((u64)cdb[2]) << 24; lba |= ((u64)cdb[3]) << 16; lba |= ((u64)cdb[4]) << 8; lba |= ((u64)cdb[5]); len |= ((u32)cdb[7]) << 8; len |= ((u32)cdb[8]); *plba = lba; *plen = len; } /** * scsi_16_lba_len - Get LBA and transfer length * @cdb: SCSI command to translate * * Calculate LBA and transfer length for 16-byte commands. * * RETURNS: * @plba: the LBA * @plen: the transfer length */ static void scsi_16_lba_len(const u8 *cdb, u64 *plba, u32 *plen) { u64 lba = 0; u32 len = 0; VPRINTK("sixteen-byte command\n"); lba |= ((u64)cdb[2]) << 56; lba |= ((u64)cdb[3]) << 48; lba |= ((u64)cdb[4]) << 40; lba |= ((u64)cdb[5]) << 32; lba |= ((u64)cdb[6]) << 24; lba |= ((u64)cdb[7]) << 16; lba |= ((u64)cdb[8]) << 8; lba |= ((u64)cdb[9]); len |= ((u32)cdb[10]) << 24; len |= ((u32)cdb[11]) << 16; len |= ((u32)cdb[12]) << 8; len |= ((u32)cdb[13]); *plba = lba; *plen = len; } /** * ata_scsi_verify_xlat - Translate SCSI VERIFY command into an ATA one * @qc: Storage for translated ATA taskfile * * Converts SCSI VERIFY command to an ATA READ VERIFY command. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on error. */ static unsigned int ata_scsi_verify_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; struct ata_taskfile *tf = &qc->tf; struct ata_device *dev = qc->dev; u64 dev_sectors = qc->dev->n_sectors; const u8 *cdb = scmd->cmnd; u64 block; u32 n_block; u16 fp; tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; tf->protocol = ATA_PROT_NODATA; if (cdb[0] == VERIFY) { if (scmd->cmd_len < 10) { fp = 9; goto invalid_fld; } scsi_10_lba_len(cdb, &block, &n_block); } else if (cdb[0] == VERIFY_16) { if (scmd->cmd_len < 16) { fp = 15; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); } else { fp = 0; goto invalid_fld; } if (!n_block) goto nothing_to_do; if (block >= dev_sectors) goto out_of_range; if ((block + n_block) > dev_sectors) goto out_of_range; if (dev->flags & ATA_DFLAG_LBA) { tf->flags |= ATA_TFLAG_LBA; if (lba_28_ok(block, n_block)) { /* use LBA28 */ tf->command = ATA_CMD_VERIFY; tf->device |= (block >> 24) & 0xf; } else if (lba_48_ok(block, n_block)) { if (!(dev->flags & ATA_DFLAG_LBA48)) goto out_of_range; /* use LBA48 */ tf->flags |= ATA_TFLAG_LBA48; tf->command = ATA_CMD_VERIFY_EXT; tf->hob_nsect = (n_block >> 8) & 0xff; tf->hob_lbah = (block >> 40) & 0xff; tf->hob_lbam = (block >> 32) & 0xff; tf->hob_lbal = (block >> 24) & 0xff; } else /* request too large even for LBA48 */ goto out_of_range; tf->nsect = n_block & 0xff; tf->lbah = (block >> 16) & 0xff; tf->lbam = (block >> 8) & 0xff; tf->lbal = block & 0xff; tf->device |= ATA_LBA; } else { /* CHS */ u32 sect, head, cyl, track; if (!lba_28_ok(block, n_block)) goto out_of_range; /* Convert LBA to CHS */ track = (u32)block / dev->sectors; cyl = track / dev->heads; head = track % dev->heads; sect = (u32)block % dev->sectors + 1; DPRINTK("block %u track %u cyl %u head %u sect %u\n", (u32)block, track, cyl, head, sect); /* Check whether the converted CHS can fit. Cylinder: 0-65535 Head: 0-15 Sector: 1-255*/ if ((cyl >> 16) || (head >> 4) || (sect >> 8) || (!sect)) goto out_of_range; tf->command = ATA_CMD_VERIFY; tf->nsect = n_block & 0xff; /* Sector count 0 means 256 sectors */ tf->lbal = sect; tf->lbam = cyl; tf->lbah = cyl >> 8; tf->device |= head; } return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, 0xff); return 1; out_of_range: ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x21, 0x0); /* "Logical Block Address out of range" */ return 1; nothing_to_do: scmd->result = SAM_STAT_GOOD; return 1; } static bool ata_check_nblocks(struct scsi_cmnd *scmd, u32 n_blocks) { struct request *rq = scmd->request; u32 req_blocks; if (!blk_rq_is_passthrough(rq)) return true; req_blocks = blk_rq_bytes(rq) / scmd->device->sector_size; if (n_blocks > req_blocks) return false; return true; } /** * ata_scsi_rw_xlat - Translate SCSI r/w command into an ATA one * @qc: Storage for translated ATA taskfile * * Converts any of six SCSI read/write commands into the * ATA counterpart, including starting sector (LBA), * sector count, and taking into account the device's LBA48 * support. * * Commands %READ_6, %READ_10, %READ_16, %WRITE_6, %WRITE_10, and * %WRITE_16 are currently supported. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on error. */ static unsigned int ata_scsi_rw_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; struct request *rq = scmd->request; int class = IOPRIO_PRIO_CLASS(req_get_ioprio(rq)); unsigned int tf_flags = 0; u64 block; u32 n_block; int rc; u16 fp = 0; if (cdb[0] == WRITE_10 || cdb[0] == WRITE_6 || cdb[0] == WRITE_16) tf_flags |= ATA_TFLAG_WRITE; /* Calculate the SCSI LBA, transfer length and FUA. */ switch (cdb[0]) { case READ_10: case WRITE_10: if (unlikely(scmd->cmd_len < 10)) { fp = 9; goto invalid_fld; } scsi_10_lba_len(cdb, &block, &n_block); if (cdb[1] & (1 << 3)) tf_flags |= ATA_TFLAG_FUA; if (!ata_check_nblocks(scmd, n_block)) goto invalid_fld; break; case READ_6: case WRITE_6: if (unlikely(scmd->cmd_len < 6)) { fp = 5; goto invalid_fld; } scsi_6_lba_len(cdb, &block, &n_block); /* for 6-byte r/w commands, transfer length 0 * means 256 blocks of data, not 0 block. */ if (!n_block) n_block = 256; if (!ata_check_nblocks(scmd, n_block)) goto invalid_fld; break; case READ_16: case WRITE_16: if (unlikely(scmd->cmd_len < 16)) { fp = 15; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); if (cdb[1] & (1 << 3)) tf_flags |= ATA_TFLAG_FUA; if (!ata_check_nblocks(scmd, n_block)) goto invalid_fld; break; default: DPRINTK("no-byte command\n"); fp = 0; goto invalid_fld; } /* Check and compose ATA command */ if (!n_block) /* For 10-byte and 16-byte SCSI R/W commands, transfer * length 0 means transfer 0 block of data. * However, for ATA R/W commands, sector count 0 means * 256 or 65536 sectors, not 0 sectors as in SCSI. * * WARNING: one or two older ATA drives treat 0 as 0... */ goto nothing_to_do; qc->flags |= ATA_QCFLAG_IO; qc->nbytes = n_block * scmd->device->sector_size; rc = ata_build_rw_tf(&qc->tf, qc->dev, block, n_block, tf_flags, qc->hw_tag, class); if (likely(rc == 0)) return 0; if (rc == -ERANGE) goto out_of_range; /* treat all other errors as -EINVAL, fall through */ invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, 0xff); return 1; out_of_range: ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x21, 0x0); /* "Logical Block Address out of range" */ return 1; nothing_to_do: scmd->result = SAM_STAT_GOOD; return 1; } static void ata_qc_done(struct ata_queued_cmd *qc) { struct scsi_cmnd *cmd = qc->scsicmd; void (*done)(struct scsi_cmnd *) = qc->scsidone; ata_qc_free(qc); done(cmd); } static void ata_scsi_qc_complete(struct ata_queued_cmd *qc) { struct ata_port *ap = qc->ap; struct scsi_cmnd *cmd = qc->scsicmd; u8 *cdb = cmd->cmnd; int need_sense = (qc->err_mask != 0); /* For ATA pass thru (SAT) commands, generate a sense block if * user mandated it or if there's an error. Note that if we * generate because the user forced us to [CK_COND =1], a check * condition is generated and the ATA register values are returned * whether the command completed successfully or not. If there * was no error, we use the following sense data: * sk = RECOVERED ERROR * asc,ascq = ATA PASS-THROUGH INFORMATION AVAILABLE */ if (((cdb[0] == ATA_16) || (cdb[0] == ATA_12)) && ((cdb[2] & 0x20) || need_sense)) ata_gen_passthru_sense(qc); else if (qc->flags & ATA_QCFLAG_SENSE_VALID) cmd->result = SAM_STAT_CHECK_CONDITION; else if (need_sense) ata_gen_ata_sense(qc); else cmd->result = SAM_STAT_GOOD; if (need_sense && !ap->ops->error_handler) ata_dump_status(ap->print_id, &qc->result_tf); ata_qc_done(qc); } /** * ata_scsi_translate - Translate then issue SCSI command to ATA device * @dev: ATA device to which the command is addressed * @cmd: SCSI command to execute * @xlat_func: Actor which translates @cmd to an ATA taskfile * * Our ->queuecommand() function has decided that the SCSI * command issued can be directly translated into an ATA * command, rather than handled internally. * * This function sets up an ata_queued_cmd structure for the * SCSI command, and sends that ata_queued_cmd to the hardware. * * The xlat_func argument (actor) returns 0 if ready to execute * ATA command, else 1 to finish translation. If 1 is returned * then cmd->result (and possibly cmd->sense_buffer) are assumed * to be set reflecting an error condition or clean (early) * termination. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * 0 on success, SCSI_ML_QUEUE_DEVICE_BUSY if the command * needs to be deferred. */ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd, ata_xlat_func_t xlat_func) { struct ata_port *ap = dev->link->ap; struct ata_queued_cmd *qc; int rc; VPRINTK("ENTER\n"); qc = ata_scsi_qc_new(dev, cmd); if (!qc) goto err_mem; /* data is present; dma-map it */ if (cmd->sc_data_direction == DMA_FROM_DEVICE || cmd->sc_data_direction == DMA_TO_DEVICE) { if (unlikely(scsi_bufflen(cmd) < 1)) { ata_dev_warn(dev, "WARNING: zero len r/w req\n"); goto err_did; } ata_sg_init(qc, scsi_sglist(cmd), scsi_sg_count(cmd)); qc->dma_dir = cmd->sc_data_direction; } qc->complete_fn = ata_scsi_qc_complete; if (xlat_func(qc)) goto early_finish; if (ap->ops->qc_defer) { if ((rc = ap->ops->qc_defer(qc))) goto defer; } /* select device, send command to hardware */ ata_qc_issue(qc); VPRINTK("EXIT\n"); return 0; early_finish: ata_qc_free(qc); cmd->scsi_done(cmd); DPRINTK("EXIT - early finish (good or error)\n"); return 0; err_did: ata_qc_free(qc); cmd->result = (DID_ERROR << 16); cmd->scsi_done(cmd); err_mem: DPRINTK("EXIT - internal\n"); return 0; defer: ata_qc_free(qc); DPRINTK("EXIT - defer\n"); if (rc == ATA_DEFER_LINK) return SCSI_MLQUEUE_DEVICE_BUSY; else return SCSI_MLQUEUE_HOST_BUSY; } struct ata_scsi_args { struct ata_device *dev; u16 *id; struct scsi_cmnd *cmd; }; /** * ata_scsi_rbuf_get - Map response buffer. * @cmd: SCSI command containing buffer to be mapped. * @flags: unsigned long variable to store irq enable status * @copy_in: copy in from user buffer * * Prepare buffer for simulated SCSI commands. * * LOCKING: * spin_lock_irqsave(ata_scsi_rbuf_lock) on success * * RETURNS: * Pointer to response buffer. */ static void *ata_scsi_rbuf_get(struct scsi_cmnd *cmd, bool copy_in, unsigned long *flags) { spin_lock_irqsave(&ata_scsi_rbuf_lock, *flags); memset(ata_scsi_rbuf, 0, ATA_SCSI_RBUF_SIZE); if (copy_in) sg_copy_to_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), ata_scsi_rbuf, ATA_SCSI_RBUF_SIZE); return ata_scsi_rbuf; } /** * ata_scsi_rbuf_put - Unmap response buffer. * @cmd: SCSI command containing buffer to be unmapped. * @copy_out: copy out result * @flags: @flags passed to ata_scsi_rbuf_get() * * Returns rbuf buffer. The result is copied to @cmd's buffer if * @copy_back is true. * * LOCKING: * Unlocks ata_scsi_rbuf_lock. */ static inline void ata_scsi_rbuf_put(struct scsi_cmnd *cmd, bool copy_out, unsigned long *flags) { if (copy_out) sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), ata_scsi_rbuf, ATA_SCSI_RBUF_SIZE); spin_unlock_irqrestore(&ata_scsi_rbuf_lock, *flags); } /** * ata_scsi_rbuf_fill - wrapper for SCSI command simulators * @args: device IDENTIFY data / SCSI command of interest. * @actor: Callback hook for desired SCSI command simulator * * Takes care of the hard work of simulating a SCSI command... * Mapping the response buffer, calling the command's handler, * and handling the handler's return value. This return value * indicates whether the handler wishes the SCSI command to be * completed successfully (0), or not (in which case cmd->result * and sense buffer are assumed to be set). * * LOCKING: * spin_lock_irqsave(host lock) */ static void ata_scsi_rbuf_fill(struct ata_scsi_args *args, unsigned int (*actor)(struct ata_scsi_args *args, u8 *rbuf)) { u8 *rbuf; unsigned int rc; struct scsi_cmnd *cmd = args->cmd; unsigned long flags; rbuf = ata_scsi_rbuf_get(cmd, false, &flags); rc = actor(args, rbuf); ata_scsi_rbuf_put(cmd, rc == 0, &flags); if (rc == 0) cmd->result = SAM_STAT_GOOD; } /** * ata_scsiop_inq_std - Simulate INQUIRY command * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Returns standard device identification data associated * with non-VPD INQUIRY command output. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_std(struct ata_scsi_args *args, u8 *rbuf) { static const u8 versions[] = { 0x00, 0x60, /* SAM-3 (no version claimed) */ 0x03, 0x20, /* SBC-2 (no version claimed) */ 0x03, 0x00 /* SPC-3 (no version claimed) */ }; static const u8 versions_zbc[] = { 0x00, 0xA0, /* SAM-5 (no version claimed) */ 0x06, 0x00, /* SBC-4 (no version claimed) */ 0x05, 0xC0, /* SPC-5 (no version claimed) */ 0x60, 0x24, /* ZBC r05 */ }; u8 hdr[] = { TYPE_DISK, 0, 0x5, /* claim SPC-3 version compatibility */ 2, 95 - 4, 0, 0, 2 }; VPRINTK("ENTER\n"); /* set scsi removable (RMB) bit per ata bit, or if the * AHCI port says it's external (Hotplug-capable, eSATA). */ if (ata_id_removable(args->id) || (args->dev->link->ap->pflags & ATA_PFLAG_EXTERNAL)) hdr[1] |= (1 << 7); if (args->dev->class == ATA_DEV_ZAC) { hdr[0] = TYPE_ZBC; hdr[2] = 0x7; /* claim SPC-5 version compatibility */ } memcpy(rbuf, hdr, sizeof(hdr)); memcpy(&rbuf[8], "ATA ", 8); ata_id_string(args->id, &rbuf[16], ATA_ID_PROD, 16); /* From SAT, use last 2 words from fw rev unless they are spaces */ ata_id_string(args->id, &rbuf[32], ATA_ID_FW_REV + 2, 4); if (strncmp(&rbuf[32], " ", 4) == 0) ata_id_string(args->id, &rbuf[32], ATA_ID_FW_REV, 4); if (rbuf[32] == 0 || rbuf[32] == ' ') memcpy(&rbuf[32], "n/a ", 4); if (ata_id_zoned_cap(args->id) || args->dev->class == ATA_DEV_ZAC) memcpy(rbuf + 58, versions_zbc, sizeof(versions_zbc)); else memcpy(rbuf + 58, versions, sizeof(versions)); return 0; } /** * ata_scsiop_inq_00 - Simulate INQUIRY VPD page 0, list of pages * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Returns list of inquiry VPD pages available. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf) { int num_pages; static const u8 pages[] = { 0x00, /* page 0x00, this page */ 0x80, /* page 0x80, unit serial no page */ 0x83, /* page 0x83, device ident page */ 0x89, /* page 0x89, ata info page */ 0xb0, /* page 0xb0, block limits page */ 0xb1, /* page 0xb1, block device characteristics page */ 0xb2, /* page 0xb2, thin provisioning page */ 0xb6, /* page 0xb6, zoned block device characteristics */ }; num_pages = sizeof(pages); if (!(args->dev->flags & ATA_DFLAG_ZAC)) num_pages--; rbuf[3] = num_pages; /* number of supported VPD pages */ memcpy(rbuf + 4, pages, num_pages); return 0; } /** * ata_scsiop_inq_80 - Simulate INQUIRY VPD page 80, device serial number * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Returns ATA device serial number. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_80(struct ata_scsi_args *args, u8 *rbuf) { static const u8 hdr[] = { 0, 0x80, /* this page code */ 0, ATA_ID_SERNO_LEN, /* page len */ }; memcpy(rbuf, hdr, sizeof(hdr)); ata_id_string(args->id, (unsigned char *) &rbuf[4], ATA_ID_SERNO, ATA_ID_SERNO_LEN); return 0; } /** * ata_scsiop_inq_83 - Simulate INQUIRY VPD page 83, device identity * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Yields two logical unit device identification designators: * - vendor specific ASCII containing the ATA serial number * - SAT defined "t10 vendor id based" containing ASCII vendor * name ("ATA "), model and serial numbers. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_83(struct ata_scsi_args *args, u8 *rbuf) { const int sat_model_serial_desc_len = 68; int num; rbuf[1] = 0x83; /* this page code */ num = 4; /* piv=0, assoc=lu, code_set=ACSII, designator=vendor */ rbuf[num + 0] = 2; rbuf[num + 3] = ATA_ID_SERNO_LEN; num += 4; ata_id_string(args->id, (unsigned char *) rbuf + num, ATA_ID_SERNO, ATA_ID_SERNO_LEN); num += ATA_ID_SERNO_LEN; /* SAT defined lu model and serial numbers descriptor */ /* piv=0, assoc=lu, code_set=ACSII, designator=t10 vendor id */ rbuf[num + 0] = 2; rbuf[num + 1] = 1; rbuf[num + 3] = sat_model_serial_desc_len; num += 4; memcpy(rbuf + num, "ATA ", 8); num += 8; ata_id_string(args->id, (unsigned char *) rbuf + num, ATA_ID_PROD, ATA_ID_PROD_LEN); num += ATA_ID_PROD_LEN; ata_id_string(args->id, (unsigned char *) rbuf + num, ATA_ID_SERNO, ATA_ID_SERNO_LEN); num += ATA_ID_SERNO_LEN; if (ata_id_has_wwn(args->id)) { /* SAT defined lu world wide name */ /* piv=0, assoc=lu, code_set=binary, designator=NAA */ rbuf[num + 0] = 1; rbuf[num + 1] = 3; rbuf[num + 3] = ATA_ID_WWN_LEN; num += 4; ata_id_string(args->id, (unsigned char *) rbuf + num, ATA_ID_WWN, ATA_ID_WWN_LEN); num += ATA_ID_WWN_LEN; } rbuf[3] = num - 4; /* page len (assume less than 256 bytes) */ return 0; } /** * ata_scsiop_inq_89 - Simulate INQUIRY VPD page 89, ATA info * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Yields SAT-specified ATA VPD page. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_inq_89(struct ata_scsi_args *args, u8 *rbuf) { rbuf[1] = 0x89; /* our page code */ rbuf[2] = (0x238 >> 8); /* page size fixed at 238h */ rbuf[3] = (0x238 & 0xff); memcpy(&rbuf[8], "linux ", 8); memcpy(&rbuf[16], "libata ", 16); memcpy(&rbuf[32], DRV_VERSION, 4); rbuf[36] = 0x34; /* force D2H Reg FIS (34h) */ rbuf[37] = (1 << 7); /* bit 7 indicates Command FIS */ /* TODO: PMP? */ /* we don't store the ATA device signature, so we fake it */ rbuf[38] = ATA_DRDY; /* really, this is Status reg */ rbuf[40] = 0x1; rbuf[48] = 0x1; rbuf[56] = ATA_CMD_ID_ATA; memcpy(&rbuf[60], &args->id[0], 512); return 0; } static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf) { struct ata_device *dev = args->dev; u16 min_io_sectors; rbuf[1] = 0xb0; rbuf[3] = 0x3c; /* required VPD size with unmap support */ /* * Optimal transfer length granularity. * * This is always one physical block, but for disks with a smaller * logical than physical sector size we need to figure out what the * latter is. */ min_io_sectors = 1 << ata_id_log2_per_physical_sector(args->id); put_unaligned_be16(min_io_sectors, &rbuf[6]); /* * Optimal unmap granularity. * * The ATA spec doesn't even know about a granularity or alignment * for the TRIM command. We can leave away most of the unmap related * VPD page entries, but we have specifify a granularity to signal * that we support some form of unmap - in thise case via WRITE SAME * with the unmap bit set. */ if (ata_id_has_trim(args->id)) { u64 max_blocks = 65535 * ATA_MAX_TRIM_RNUM; if (dev->horkage & ATA_HORKAGE_MAX_TRIM_128M) max_blocks = 128 << (20 - SECTOR_SHIFT); put_unaligned_be64(max_blocks, &rbuf[36]); put_unaligned_be32(1, &rbuf[28]); } return 0; } static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf) { int form_factor = ata_id_form_factor(args->id); int media_rotation_rate = ata_id_rotation_rate(args->id); u8 zoned = ata_id_zoned_cap(args->id); rbuf[1] = 0xb1; rbuf[3] = 0x3c; rbuf[4] = media_rotation_rate >> 8; rbuf[5] = media_rotation_rate; rbuf[7] = form_factor; if (zoned) rbuf[8] = (zoned << 4); return 0; } static unsigned int ata_scsiop_inq_b2(struct ata_scsi_args *args, u8 *rbuf) { /* SCSI Thin Provisioning VPD page: SBC-3 rev 22 or later */ rbuf[1] = 0xb2; rbuf[3] = 0x4; rbuf[5] = 1 << 6; /* TPWS */ return 0; } static unsigned int ata_scsiop_inq_b6(struct ata_scsi_args *args, u8 *rbuf) { /* * zbc-r05 SCSI Zoned Block device characteristics VPD page */ rbuf[1] = 0xb6; rbuf[3] = 0x3C; /* * URSWRZ bit is only meaningful for host-managed ZAC drives */ if (args->dev->zac_zoned_cap & 1) rbuf[4] |= 1; put_unaligned_be32(args->dev->zac_zones_optimal_open, &rbuf[8]); put_unaligned_be32(args->dev->zac_zones_optimal_nonseq, &rbuf[12]); put_unaligned_be32(args->dev->zac_zones_max_open, &rbuf[16]); return 0; } /** * modecpy - Prepare response for MODE SENSE * @dest: output buffer * @src: data being copied * @n: length of mode page * @changeable: whether changeable parameters are requested * * Generate a generic MODE SENSE page for either current or changeable * parameters. * * LOCKING: * None. */ static void modecpy(u8 *dest, const u8 *src, int n, bool changeable) { if (changeable) { memcpy(dest, src, 2); memset(dest + 2, 0, n - 2); } else { memcpy(dest, src, n); } } /** * ata_msense_caching - Simulate MODE SENSE caching info page * @id: device IDENTIFY data * @buf: output buffer * @changeable: whether changeable parameters are requested * * Generate a caching info page, which conditionally indicates * write caching to the SCSI layer, depending on device * capabilities. * * LOCKING: * None. */ static unsigned int ata_msense_caching(u16 *id, u8 *buf, bool changeable) { modecpy(buf, def_cache_mpage, sizeof(def_cache_mpage), changeable); if (changeable) { buf[2] |= (1 << 2); /* ata_mselect_caching() */ } else { buf[2] |= (ata_id_wcache_enabled(id) << 2); /* write cache enable */ buf[12] |= (!ata_id_rahead_enabled(id) << 5); /* disable read ahead */ } return sizeof(def_cache_mpage); } /** * ata_msense_control - Simulate MODE SENSE control mode page * @dev: ATA device of interest * @buf: output buffer * @changeable: whether changeable parameters are requested * * Generate a generic MODE SENSE control mode page. * * LOCKING: * None. */ static unsigned int ata_msense_control(struct ata_device *dev, u8 *buf, bool changeable) { modecpy(buf, def_control_mpage, sizeof(def_control_mpage), changeable); if (changeable) { buf[2] |= (1 << 2); /* ata_mselect_control() */ } else { bool d_sense = (dev->flags & ATA_DFLAG_D_SENSE); buf[2] |= (d_sense << 2); /* descriptor format sense data */ } return sizeof(def_control_mpage); } /** * ata_msense_rw_recovery - Simulate MODE SENSE r/w error recovery page * @buf: output buffer * @changeable: whether changeable parameters are requested * * Generate a generic MODE SENSE r/w error recovery page. * * LOCKING: * None. */ static unsigned int ata_msense_rw_recovery(u8 *buf, bool changeable) { modecpy(buf, def_rw_recovery_mpage, sizeof(def_rw_recovery_mpage), changeable); return sizeof(def_rw_recovery_mpage); } /* * We can turn this into a real blacklist if it's needed, for now just * blacklist any Maxtor BANC1G10 revision firmware */ static int ata_dev_supports_fua(u16 *id) { unsigned char model[ATA_ID_PROD_LEN + 1], fw[ATA_ID_FW_REV_LEN + 1]; if (!libata_fua) return 0; if (!ata_id_has_fua(id)) return 0; ata_id_c_string(id, model, ATA_ID_PROD, sizeof(model)); ata_id_c_string(id, fw, ATA_ID_FW_REV, sizeof(fw)); if (strcmp(model, "Maxtor")) return 1; if (strcmp(fw, "BANC1G10")) return 1; return 0; /* blacklisted */ } /** * ata_scsiop_mode_sense - Simulate MODE SENSE 6, 10 commands * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Simulate MODE SENSE commands. Assume this is invoked for direct * access devices (e.g. disks) only. There should be no block * descriptor for other device types. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_mode_sense(struct ata_scsi_args *args, u8 *rbuf) { struct ata_device *dev = args->dev; u8 *scsicmd = args->cmd->cmnd, *p = rbuf; static const u8 sat_blk_desc[] = { 0, 0, 0, 0, /* number of blocks: sat unspecified */ 0, 0, 0x2, 0x0 /* block length: 512 bytes */ }; u8 pg, spg; unsigned int ebd, page_control, six_byte; u8 dpofua, bp = 0xff; u16 fp; VPRINTK("ENTER\n"); six_byte = (scsicmd[0] == MODE_SENSE); ebd = !(scsicmd[1] & 0x8); /* dbd bit inverted == edb */ /* * LLBA bit in msense(10) ignored (compliant) */ page_control = scsicmd[2] >> 6; switch (page_control) { case 0: /* current */ case 1: /* changeable */ case 2: /* defaults */ break; /* supported */ case 3: /* saved */ goto saving_not_supp; default: fp = 2; bp = 6; goto invalid_fld; } if (six_byte) p += 4 + (ebd ? 8 : 0); else p += 8 + (ebd ? 8 : 0); pg = scsicmd[2] & 0x3f; spg = scsicmd[3]; /* * No mode subpages supported (yet) but asking for _all_ * subpages may be valid */ if (spg && (spg != ALL_SUB_MPAGES)) { fp = 3; goto invalid_fld; } switch(pg) { case RW_RECOVERY_MPAGE: p += ata_msense_rw_recovery(p, page_control == 1); break; case CACHE_MPAGE: p += ata_msense_caching(args->id, p, page_control == 1); break; case CONTROL_MPAGE: p += ata_msense_control(args->dev, p, page_control == 1); break; case ALL_MPAGES: p += ata_msense_rw_recovery(p, page_control == 1); p += ata_msense_caching(args->id, p, page_control == 1); p += ata_msense_control(args->dev, p, page_control == 1); break; default: /* invalid page code */ fp = 2; goto invalid_fld; } dpofua = 0; if (ata_dev_supports_fua(args->id) && (dev->flags & ATA_DFLAG_LBA48) && (!(dev->flags & ATA_DFLAG_PIO) || dev->multi_count)) dpofua = 1 << 4; if (six_byte) { rbuf[0] = p - rbuf - 1; rbuf[2] |= dpofua; if (ebd) { rbuf[3] = sizeof(sat_blk_desc); memcpy(rbuf + 4, sat_blk_desc, sizeof(sat_blk_desc)); } } else { unsigned int output_len = p - rbuf - 2; rbuf[0] = output_len >> 8; rbuf[1] = output_len; rbuf[3] |= dpofua; if (ebd) { rbuf[7] = sizeof(sat_blk_desc); memcpy(rbuf + 8, sat_blk_desc, sizeof(sat_blk_desc)); } } return 0; invalid_fld: ata_scsi_set_invalid_field(dev, args->cmd, fp, bp); return 1; saving_not_supp: ata_scsi_set_sense(dev, args->cmd, ILLEGAL_REQUEST, 0x39, 0x0); /* "Saving parameters not supported" */ return 1; } /** * ata_scsiop_read_cap - Simulate READ CAPACITY[ 16] commands * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Simulate READ CAPACITY commands. * * LOCKING: * None. */ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf) { struct ata_device *dev = args->dev; u64 last_lba = dev->n_sectors - 1; /* LBA of the last block */ u32 sector_size; /* physical sector size in bytes */ u8 log2_per_phys; u16 lowest_aligned; sector_size = ata_id_logical_sector_size(dev->id); log2_per_phys = ata_id_log2_per_physical_sector(dev->id); lowest_aligned = ata_id_logical_sector_offset(dev->id, log2_per_phys); VPRINTK("ENTER\n"); if (args->cmd->cmnd[0] == READ_CAPACITY) { if (last_lba >= 0xffffffffULL) last_lba = 0xffffffff; /* sector count, 32-bit */ rbuf[0] = last_lba >> (8 * 3); rbuf[1] = last_lba >> (8 * 2); rbuf[2] = last_lba >> (8 * 1); rbuf[3] = last_lba; /* sector size */ rbuf[4] = sector_size >> (8 * 3); rbuf[5] = sector_size >> (8 * 2); rbuf[6] = sector_size >> (8 * 1); rbuf[7] = sector_size; } else { /* sector count, 64-bit */ rbuf[0] = last_lba >> (8 * 7); rbuf[1] = last_lba >> (8 * 6); rbuf[2] = last_lba >> (8 * 5); rbuf[3] = last_lba >> (8 * 4); rbuf[4] = last_lba >> (8 * 3); rbuf[5] = last_lba >> (8 * 2); rbuf[6] = last_lba >> (8 * 1); rbuf[7] = last_lba; /* sector size */ rbuf[ 8] = sector_size >> (8 * 3); rbuf[ 9] = sector_size >> (8 * 2); rbuf[10] = sector_size >> (8 * 1); rbuf[11] = sector_size; rbuf[12] = 0; rbuf[13] = log2_per_phys; rbuf[14] = (lowest_aligned >> 8) & 0x3f; rbuf[15] = lowest_aligned; if (ata_id_has_trim(args->id) && !(dev->horkage & ATA_HORKAGE_NOTRIM)) { rbuf[14] |= 0x80; /* LBPME */ if (ata_id_has_zero_after_trim(args->id) && dev->horkage & ATA_HORKAGE_ZERO_AFTER_TRIM) { ata_dev_info(dev, "Enabling discard_zeroes_data\n"); rbuf[14] |= 0x40; /* LBPRZ */ } } if (ata_id_zoned_cap(args->id) || args->dev->class == ATA_DEV_ZAC) rbuf[12] = (1 << 4); /* RC_BASIS */ } return 0; } /** * ata_scsiop_report_luns - Simulate REPORT LUNS command * @args: device IDENTIFY data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Simulate REPORT LUNS command. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_report_luns(struct ata_scsi_args *args, u8 *rbuf) { VPRINTK("ENTER\n"); rbuf[3] = 8; /* just one lun, LUN 0, size 8 bytes */ return 0; } static void atapi_sense_complete(struct ata_queued_cmd *qc) { if (qc->err_mask && ((qc->err_mask & AC_ERR_DEV) == 0)) { /* FIXME: not quite right; we don't want the * translation of taskfile registers into * a sense descriptors, since that's only * correct for ATA, not ATAPI */ ata_gen_passthru_sense(qc); } ata_qc_done(qc); } /* is it pointless to prefer PIO for "safety reasons"? */ static inline int ata_pio_use_silly(struct ata_port *ap) { return (ap->flags & ATA_FLAG_PIO_DMA); } static void atapi_request_sense(struct ata_queued_cmd *qc) { struct ata_port *ap = qc->ap; struct scsi_cmnd *cmd = qc->scsicmd; DPRINTK("ATAPI request sense\n"); memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); #ifdef CONFIG_ATA_SFF if (ap->ops->sff_tf_read) ap->ops->sff_tf_read(ap, &qc->tf); #endif /* fill these in, for the case where they are -not- overwritten */ cmd->sense_buffer[0] = 0x70; cmd->sense_buffer[2] = qc->tf.feature >> 4; ata_qc_reinit(qc); /* setup sg table and init transfer direction */ sg_init_one(&qc->sgent, cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE); ata_sg_init(qc, &qc->sgent, 1); qc->dma_dir = DMA_FROM_DEVICE; memset(&qc->cdb, 0, qc->dev->cdb_len); qc->cdb[0] = REQUEST_SENSE; qc->cdb[4] = SCSI_SENSE_BUFFERSIZE; qc->tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; qc->tf.command = ATA_CMD_PACKET; if (ata_pio_use_silly(ap)) { qc->tf.protocol = ATAPI_PROT_DMA; qc->tf.feature |= ATAPI_PKT_DMA; } else { qc->tf.protocol = ATAPI_PROT_PIO; qc->tf.lbam = SCSI_SENSE_BUFFERSIZE; qc->tf.lbah = 0; } qc->nbytes = SCSI_SENSE_BUFFERSIZE; qc->complete_fn = atapi_sense_complete; ata_qc_issue(qc); DPRINTK("EXIT\n"); } /* * ATAPI devices typically report zero for their SCSI version, and sometimes * deviate from the spec WRT response data format. If SCSI version is * reported as zero like normal, then we make the following fixups: * 1) Fake MMC-5 version, to indicate to the Linux scsi midlayer this is a * modern device. * 2) Ensure response data format / ATAPI information are always correct. */ static void atapi_fixup_inquiry(struct scsi_cmnd *cmd) { u8 buf[4]; sg_copy_to_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, 4); if (buf[2] == 0) { buf[2] = 0x5; buf[3] = 0x32; } sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, 4); } static void atapi_qc_complete(struct ata_queued_cmd *qc) { struct scsi_cmnd *cmd = qc->scsicmd; unsigned int err_mask = qc->err_mask; VPRINTK("ENTER, err_mask 0x%X\n", err_mask); /* handle completion from new EH */ if (unlikely(qc->ap->ops->error_handler && (err_mask || qc->flags & ATA_QCFLAG_SENSE_VALID))) { if (!(qc->flags & ATA_QCFLAG_SENSE_VALID)) { /* FIXME: not quite right; we don't want the * translation of taskfile registers into a * sense descriptors, since that's only * correct for ATA, not ATAPI */ ata_gen_passthru_sense(qc); } /* SCSI EH automatically locks door if sdev->locked is * set. Sometimes door lock request continues to * fail, for example, when no media is present. This * creates a loop - SCSI EH issues door lock which * fails and gets invoked again to acquire sense data * for the failed command. * * If door lock fails, always clear sdev->locked to * avoid this infinite loop. * * This may happen before SCSI scan is complete. Make * sure qc->dev->sdev isn't NULL before dereferencing. */ if (qc->cdb[0] == ALLOW_MEDIUM_REMOVAL && qc->dev->sdev) qc->dev->sdev->locked = 0; qc->scsicmd->result = SAM_STAT_CHECK_CONDITION; ata_qc_done(qc); return; } /* successful completion or old EH failure path */ if (unlikely(err_mask & AC_ERR_DEV)) { cmd->result = SAM_STAT_CHECK_CONDITION; atapi_request_sense(qc); return; } else if (unlikely(err_mask)) { /* FIXME: not quite right; we don't want the * translation of taskfile registers into * a sense descriptors, since that's only * correct for ATA, not ATAPI */ ata_gen_passthru_sense(qc); } else { if (cmd->cmnd[0] == INQUIRY && (cmd->cmnd[1] & 0x03) == 0) atapi_fixup_inquiry(cmd); cmd->result = SAM_STAT_GOOD; } ata_qc_done(qc); } /** * atapi_xlat - Initialize PACKET taskfile * @qc: command structure to be initialized * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Zero on success, non-zero on failure. */ static unsigned int atapi_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; struct ata_device *dev = qc->dev; int nodata = (scmd->sc_data_direction == DMA_NONE); int using_pio = !nodata && (dev->flags & ATA_DFLAG_PIO); unsigned int nbytes; memset(qc->cdb, 0, dev->cdb_len); memcpy(qc->cdb, scmd->cmnd, scmd->cmd_len); qc->complete_fn = atapi_qc_complete; qc->tf.flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; if (scmd->sc_data_direction == DMA_TO_DEVICE) { qc->tf.flags |= ATA_TFLAG_WRITE; DPRINTK("direction: write\n"); } qc->tf.command = ATA_CMD_PACKET; ata_qc_set_pc_nbytes(qc); /* check whether ATAPI DMA is safe */ if (!nodata && !using_pio && atapi_check_dma(qc)) using_pio = 1; /* Some controller variants snoop this value for Packet * transfers to do state machine and FIFO management. Thus we * want to set it properly, and for DMA where it is * effectively meaningless. */ nbytes = min(ata_qc_raw_nbytes(qc), (unsigned int)63 * 1024); /* Most ATAPI devices which honor transfer chunk size don't * behave according to the spec when odd chunk size which * matches the transfer length is specified. If the number of * bytes to transfer is 2n+1. According to the spec, what * should happen is to indicate that 2n+1 is going to be * transferred and transfer 2n+2 bytes where the last byte is * padding. * * In practice, this doesn't happen. ATAPI devices first * indicate and transfer 2n bytes and then indicate and * transfer 2 bytes where the last byte is padding. * * This inconsistency confuses several controllers which * perform PIO using DMA such as Intel AHCIs and sil3124/32. * These controllers use actual number of transferred bytes to * update DMA pointer and transfer of 4n+2 bytes make those * controller push DMA pointer by 4n+4 bytes because SATA data * FISes are aligned to 4 bytes. This causes data corruption * and buffer overrun. * * Always setting nbytes to even number solves this problem * because then ATAPI devices don't have to split data at 2n * boundaries. */ if (nbytes & 0x1) nbytes++; qc->tf.lbam = (nbytes & 0xFF); qc->tf.lbah = (nbytes >> 8); if (nodata) qc->tf.protocol = ATAPI_PROT_NODATA; else if (using_pio) qc->tf.protocol = ATAPI_PROT_PIO; else { /* DMA data xfer */ qc->tf.protocol = ATAPI_PROT_DMA; qc->tf.feature |= ATAPI_PKT_DMA; if ((dev->flags & ATA_DFLAG_DMADIR) && (scmd->sc_data_direction != DMA_TO_DEVICE)) /* some SATA bridges need us to indicate data xfer direction */ qc->tf.feature |= ATAPI_DMADIR; } /* FIXME: We need to translate 0x05 READ_BLOCK_LIMITS to a MODE_SENSE as ATAPI tape drives don't get this right otherwise */ return 0; } static struct ata_device *ata_find_dev(struct ata_port *ap, int devno) { if (!sata_pmp_attached(ap)) { if (likely(devno >= 0 && devno < ata_link_max_devices(&ap->link))) return &ap->link.device[devno]; } else { if (likely(devno >= 0 && devno < ap->nr_pmp_links)) return &ap->pmp_link[devno].device[0]; } return NULL; } static struct ata_device *__ata_scsi_find_dev(struct ata_port *ap, const struct scsi_device *scsidev) { int devno; /* skip commands not addressed to targets we simulate */ if (!sata_pmp_attached(ap)) { if (unlikely(scsidev->channel || scsidev->lun)) return NULL; devno = scsidev->id; } else { if (unlikely(scsidev->id || scsidev->lun)) return NULL; devno = scsidev->channel; } return ata_find_dev(ap, devno); } /** * ata_scsi_find_dev - lookup ata_device from scsi_cmnd * @ap: ATA port to which the device is attached * @scsidev: SCSI device from which we derive the ATA device * * Given various information provided in struct scsi_cmnd, * map that onto an ATA bus, and using that mapping * determine which ata_device is associated with the * SCSI command to be sent. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * Associated ATA device, or %NULL if not found. */ struct ata_device * ata_scsi_find_dev(struct ata_port *ap, const struct scsi_device *scsidev) { struct ata_device *dev = __ata_scsi_find_dev(ap, scsidev); if (unlikely(!dev || !ata_dev_enabled(dev))) return NULL; return dev; } /* * ata_scsi_map_proto - Map pass-thru protocol value to taskfile value. * @byte1: Byte 1 from pass-thru CDB. * * RETURNS: * ATA_PROT_UNKNOWN if mapping failed/unimplemented, protocol otherwise. */ static u8 ata_scsi_map_proto(u8 byte1) { switch((byte1 & 0x1e) >> 1) { case 3: /* Non-data */ return ATA_PROT_NODATA; case 6: /* DMA */ case 10: /* UDMA Data-in */ case 11: /* UDMA Data-Out */ return ATA_PROT_DMA; case 4: /* PIO Data-in */ case 5: /* PIO Data-out */ return ATA_PROT_PIO; case 12: /* FPDMA */ return ATA_PROT_NCQ; case 0: /* Hard Reset */ case 1: /* SRST */ case 8: /* Device Diagnostic */ case 9: /* Device Reset */ case 7: /* DMA Queued */ case 15: /* Return Response Info */ default: /* Reserved */ break; } return ATA_PROT_UNKNOWN; } /** * ata_scsi_pass_thru - convert ATA pass-thru CDB to taskfile * @qc: command structure to be initialized * * Handles either 12, 16, or 32-byte versions of the CDB. * * RETURNS: * Zero on success, non-zero on failure. */ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &(qc->tf); struct scsi_cmnd *scmd = qc->scsicmd; struct ata_device *dev = qc->dev; const u8 *cdb = scmd->cmnd; u16 fp; u16 cdb_offset = 0; /* 7Fh variable length cmd means a ata pass-thru(32) */ if (cdb[0] == VARIABLE_LENGTH_CMD) cdb_offset = 9; tf->protocol = ata_scsi_map_proto(cdb[1 + cdb_offset]); if (tf->protocol == ATA_PROT_UNKNOWN) { fp = 1; goto invalid_fld; } if (ata_is_ncq(tf->protocol) && (cdb[2 + cdb_offset] & 0x3) == 0) tf->protocol = ATA_PROT_NCQ_NODATA; /* enable LBA */ tf->flags |= ATA_TFLAG_LBA; /* * 12 and 16 byte CDBs use different offsets to * provide the various register values. */ if (cdb[0] == ATA_16) { /* * 16-byte CDB - may contain extended commands. * * If that is the case, copy the upper byte register values. */ if (cdb[1] & 0x01) { tf->hob_feature = cdb[3]; tf->hob_nsect = cdb[5]; tf->hob_lbal = cdb[7]; tf->hob_lbam = cdb[9]; tf->hob_lbah = cdb[11]; tf->flags |= ATA_TFLAG_LBA48; } else tf->flags &= ~ATA_TFLAG_LBA48; /* * Always copy low byte, device and command registers. */ tf->feature = cdb[4]; tf->nsect = cdb[6]; tf->lbal = cdb[8]; tf->lbam = cdb[10]; tf->lbah = cdb[12]; tf->device = cdb[13]; tf->command = cdb[14]; } else if (cdb[0] == ATA_12) { /* * 12-byte CDB - incapable of extended commands. */ tf->flags &= ~ATA_TFLAG_LBA48; tf->feature = cdb[3]; tf->nsect = cdb[4]; tf->lbal = cdb[5]; tf->lbam = cdb[6]; tf->lbah = cdb[7]; tf->device = cdb[8]; tf->command = cdb[9]; } else { /* * 32-byte CDB - may contain extended command fields. * * If that is the case, copy the upper byte register values. */ if (cdb[10] & 0x01) { tf->hob_feature = cdb[20]; tf->hob_nsect = cdb[22]; tf->hob_lbal = cdb[16]; tf->hob_lbam = cdb[15]; tf->hob_lbah = cdb[14]; tf->flags |= ATA_TFLAG_LBA48; } else tf->flags &= ~ATA_TFLAG_LBA48; tf->feature = cdb[21]; tf->nsect = cdb[23]; tf->lbal = cdb[19]; tf->lbam = cdb[18]; tf->lbah = cdb[17]; tf->device = cdb[24]; tf->command = cdb[25]; tf->auxiliary = get_unaligned_be32(&cdb[28]); } /* For NCQ commands copy the tag value */ if (ata_is_ncq(tf->protocol)) tf->nsect = qc->hw_tag << 3; /* enforce correct master/slave bit */ tf->device = dev->devno ? tf->device | ATA_DEV1 : tf->device & ~ATA_DEV1; switch (tf->command) { /* READ/WRITE LONG use a non-standard sect_size */ case ATA_CMD_READ_LONG: case ATA_CMD_READ_LONG_ONCE: case ATA_CMD_WRITE_LONG: case ATA_CMD_WRITE_LONG_ONCE: if (tf->protocol != ATA_PROT_PIO || tf->nsect != 1) { fp = 1; goto invalid_fld; } qc->sect_size = scsi_bufflen(scmd); break; /* commands using reported Logical Block size (e.g. 512 or 4K) */ case ATA_CMD_CFA_WRITE_NE: case ATA_CMD_CFA_TRANS_SECT: case ATA_CMD_CFA_WRITE_MULT_NE: /* XXX: case ATA_CMD_CFA_WRITE_SECTORS_WITHOUT_ERASE: */ case ATA_CMD_READ: case ATA_CMD_READ_EXT: case ATA_CMD_READ_QUEUED: /* XXX: case ATA_CMD_READ_QUEUED_EXT: */ case ATA_CMD_FPDMA_READ: case ATA_CMD_READ_MULTI: case ATA_CMD_READ_MULTI_EXT: case ATA_CMD_PIO_READ: case ATA_CMD_PIO_READ_EXT: case ATA_CMD_READ_STREAM_DMA_EXT: case ATA_CMD_READ_STREAM_EXT: case ATA_CMD_VERIFY: case ATA_CMD_VERIFY_EXT: case ATA_CMD_WRITE: case ATA_CMD_WRITE_EXT: case ATA_CMD_WRITE_FUA_EXT: case ATA_CMD_WRITE_QUEUED: case ATA_CMD_WRITE_QUEUED_FUA_EXT: case ATA_CMD_FPDMA_WRITE: case ATA_CMD_WRITE_MULTI: case ATA_CMD_WRITE_MULTI_EXT: case ATA_CMD_WRITE_MULTI_FUA_EXT: case ATA_CMD_PIO_WRITE: case ATA_CMD_PIO_WRITE_EXT: case ATA_CMD_WRITE_STREAM_DMA_EXT: case ATA_CMD_WRITE_STREAM_EXT: qc->sect_size = scmd->device->sector_size; break; /* Everything else uses 512 byte "sectors" */ default: qc->sect_size = ATA_SECT_SIZE; } /* * Set flags so that all registers will be written, pass on * write indication (used for PIO/DMA setup), result TF is * copied back and we don't whine too much about its failure. */ tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE; if (scmd->sc_data_direction == DMA_TO_DEVICE) tf->flags |= ATA_TFLAG_WRITE; qc->flags |= ATA_QCFLAG_RESULT_TF | ATA_QCFLAG_QUIET; /* * Set transfer length. * * TODO: find out if we need to do more here to * cover scatter/gather case. */ ata_qc_set_pc_nbytes(qc); /* We may not issue DMA commands if no DMA mode is set */ if (tf->protocol == ATA_PROT_DMA && dev->dma_mode == 0) { fp = 1; goto invalid_fld; } /* We may not issue NCQ commands to devices not supporting NCQ */ if (ata_is_ncq(tf->protocol) && !ata_ncq_enabled(dev)) { fp = 1; goto invalid_fld; } /* sanity check for pio multi commands */ if ((cdb[1] & 0xe0) && !is_multi_taskfile(tf)) { fp = 1; goto invalid_fld; } if (is_multi_taskfile(tf)) { unsigned int multi_count = 1 << (cdb[1] >> 5); /* compare the passed through multi_count * with the cached multi_count of libata */ if (multi_count != dev->multi_count) ata_dev_warn(dev, "invalid multi_count %u ignored\n", multi_count); } /* * Filter SET_FEATURES - XFER MODE command -- otherwise, * SET_FEATURES - XFER MODE must be preceded/succeeded * by an update to hardware-specific registers for each * controller (i.e. the reason for ->set_piomode(), * ->set_dmamode(), and ->post_set_mode() hooks). */ if (tf->command == ATA_CMD_SET_FEATURES && tf->feature == SETFEATURES_XFER) { fp = (cdb[0] == ATA_16) ? 4 : 3; goto invalid_fld; } /* * Filter TPM commands by default. These provide an * essentially uncontrolled encrypted "back door" between * applications and the disk. Set libata.allow_tpm=1 if you * have a real reason for wanting to use them. This ensures * that installed software cannot easily mess stuff up without * user intent. DVR type users will probably ship with this enabled * for movie content management. * * Note that for ATA8 we can issue a DCS change and DCS freeze lock * for this and should do in future but that it is not sufficient as * DCS is an optional feature set. Thus we also do the software filter * so that we comply with the TC consortium stated goal that the user * can turn off TC features of their system. */ if (tf->command >= 0x5C && tf->command <= 0x5F && !libata_allow_tpm) { fp = (cdb[0] == ATA_16) ? 14 : 9; goto invalid_fld; } return 0; invalid_fld: ata_scsi_set_invalid_field(dev, scmd, fp, 0xff); return 1; } /** * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim * @cmd: SCSI command being translated * @trmax: Maximum number of entries that will fit in sector_size bytes. * @sector: Starting sector * @count: Total Range of request in logical sectors * * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted * descriptor. * * Upto 64 entries of the format: * 63:48 Range Length * 47:0 LBA * * Range Length of 0 is ignored. * LBA's should be sorted order and not overlap. * * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET * * Return: Number of bytes copied into sglist. */ static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax, u64 sector, u32 count) { struct scsi_device *sdp = cmd->device; size_t len = sdp->sector_size; size_t r; __le64 *buf; u32 i = 0; unsigned long flags; WARN_ON(len > ATA_SCSI_RBUF_SIZE); if (len > ATA_SCSI_RBUF_SIZE) len = ATA_SCSI_RBUF_SIZE; spin_lock_irqsave(&ata_scsi_rbuf_lock, flags); buf = ((void *)ata_scsi_rbuf); memset(buf, 0, len); while (i < trmax) { u64 entry = sector | ((u64)(count > 0xffff ? 0xffff : count) << 48); buf[i++] = __cpu_to_le64(entry); if (count <= 0xffff) break; count -= 0xffff; sector += 0xffff; } r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len); spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags); return r; } /** * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same * @qc: Command to be translated * * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or * an SCT Write Same command. * Based on WRITE SAME has the UNMAP flag: * * - When set translate to DSM TRIM * - When clear translate to SCT Write Same */ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &qc->tf; struct scsi_cmnd *scmd = qc->scsicmd; struct scsi_device *sdp = scmd->device; size_t len = sdp->sector_size; struct ata_device *dev = qc->dev; const u8 *cdb = scmd->cmnd; u64 block; u32 n_block; const u32 trmax = len >> 3; u32 size; u16 fp; u8 bp = 0xff; u8 unmap = cdb[1] & 0x8; /* we may not issue DMA commands if no DMA mode is set */ if (unlikely(!dev->dma_mode)) goto invalid_opcode; /* * We only allow sending this command through the block layer, * as it modifies the DATA OUT buffer, which would corrupt user * memory for SG_IO commands. */ if (unlikely(blk_rq_is_passthrough(scmd->request))) goto invalid_opcode; if (unlikely(scmd->cmd_len < 16)) { fp = 15; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); if (!unmap || (dev->horkage & ATA_HORKAGE_NOTRIM) || !ata_id_has_trim(dev->id)) { fp = 1; bp = 3; goto invalid_fld; } /* If the request is too large the cmd is invalid */ if (n_block > 0xffff * trmax) { fp = 2; goto invalid_fld; } /* * WRITE SAME always has a sector sized buffer as payload, this * should never be a multiple entry S/G list. */ if (!scsi_sg_count(scmd)) goto invalid_param_len; /* * size must match sector size in bytes * For DATA SET MANAGEMENT TRIM in ACS-2 nsect (aka count) * is defined as number of 512 byte blocks to be transferred. */ size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block); if (size != len) goto invalid_param_len; if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) { /* Newer devices support queued TRIM commands */ tf->protocol = ATA_PROT_NCQ; tf->command = ATA_CMD_FPDMA_SEND; tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f; tf->nsect = qc->hw_tag << 3; tf->hob_feature = (size / 512) >> 8; tf->feature = size / 512; tf->auxiliary = 1; } else { tf->protocol = ATA_PROT_DMA; tf->hob_feature = 0; tf->feature = ATA_DSM_TRIM; tf->hob_nsect = (size / 512) >> 8; tf->nsect = size / 512; tf->command = ATA_CMD_DSM; } tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 | ATA_TFLAG_WRITE; ata_qc_set_pc_nbytes(qc); return 0; invalid_fld: ata_scsi_set_invalid_field(dev, scmd, fp, bp); return 1; invalid_param_len: /* "Parameter list length error" */ ata_scsi_set_sense(dev, scmd, ILLEGAL_REQUEST, 0x1a, 0x0); return 1; invalid_opcode: /* "Invalid command operation code" */ ata_scsi_set_sense(dev, scmd, ILLEGAL_REQUEST, 0x20, 0x0); return 1; } /** * ata_scsiop_maint_in - Simulate a subset of MAINTENANCE_IN * @args: device MAINTENANCE_IN data / SCSI command of interest. * @rbuf: Response buffer, to which simulated SCSI cmd output is sent. * * Yields a subset to satisfy scsi_report_opcode() * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsiop_maint_in(struct ata_scsi_args *args, u8 *rbuf) { struct ata_device *dev = args->dev; u8 *cdb = args->cmd->cmnd; u8 supported = 0; unsigned int err = 0; if (cdb[2] != 1) { ata_dev_warn(dev, "invalid command format %d\n", cdb[2]); err = 2; goto out; } switch (cdb[3]) { case INQUIRY: case MODE_SENSE: case MODE_SENSE_10: case READ_CAPACITY: case SERVICE_ACTION_IN_16: case REPORT_LUNS: case REQUEST_SENSE: case SYNCHRONIZE_CACHE: case REZERO_UNIT: case SEEK_6: case SEEK_10: case TEST_UNIT_READY: case SEND_DIAGNOSTIC: case MAINTENANCE_IN: case READ_6: case READ_10: case READ_16: case WRITE_6: case WRITE_10: case WRITE_16: case ATA_12: case ATA_16: case VERIFY: case VERIFY_16: case MODE_SELECT: case MODE_SELECT_10: case START_STOP: supported = 3; break; case ZBC_IN: case ZBC_OUT: if (ata_id_zoned_cap(dev->id) || dev->class == ATA_DEV_ZAC) supported = 3; break; case SECURITY_PROTOCOL_IN: case SECURITY_PROTOCOL_OUT: if (dev->flags & ATA_DFLAG_TRUSTED) supported = 3; break; default: break; } out: rbuf[1] = supported; /* supported */ return err; } /** * ata_scsi_report_zones_complete - convert ATA output * @qc: command structure returning the data * * Convert T-13 little-endian field representation into * T-10 big-endian field representation. * What a mess. */ static void ata_scsi_report_zones_complete(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; struct sg_mapping_iter miter; unsigned long flags; unsigned int bytes = 0; sg_miter_start(&miter, scsi_sglist(scmd), scsi_sg_count(scmd), SG_MITER_TO_SG | SG_MITER_ATOMIC); local_irq_save(flags); while (sg_miter_next(&miter)) { unsigned int offset = 0; if (bytes == 0) { char *hdr; u32 list_length; u64 max_lba, opt_lba; u16 same; /* Swizzle header */ hdr = miter.addr; list_length = get_unaligned_le32(&hdr[0]); same = get_unaligned_le16(&hdr[4]); max_lba = get_unaligned_le64(&hdr[8]); opt_lba = get_unaligned_le64(&hdr[16]); put_unaligned_be32(list_length, &hdr[0]); hdr[4] = same & 0xf; put_unaligned_be64(max_lba, &hdr[8]); put_unaligned_be64(opt_lba, &hdr[16]); offset += 64; bytes += 64; } while (offset < miter.length) { char *rec; u8 cond, type, non_seq, reset; u64 size, start, wp; /* Swizzle zone descriptor */ rec = miter.addr + offset; type = rec[0] & 0xf; cond = (rec[1] >> 4) & 0xf; non_seq = (rec[1] & 2); reset = (rec[1] & 1); size = get_unaligned_le64(&rec[8]); start = get_unaligned_le64(&rec[16]); wp = get_unaligned_le64(&rec[24]); rec[0] = type; rec[1] = (cond << 4) | non_seq | reset; put_unaligned_be64(size, &rec[8]); put_unaligned_be64(start, &rec[16]); put_unaligned_be64(wp, &rec[24]); WARN_ON(offset + 64 > miter.length); offset += 64; bytes += 64; } } sg_miter_stop(&miter); local_irq_restore(flags); ata_scsi_qc_complete(qc); } static unsigned int ata_scsi_zbc_in_xlat(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &qc->tf; struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; u16 sect, fp = (u16)-1; u8 sa, options, bp = 0xff; u64 block; u32 n_block; if (unlikely(scmd->cmd_len < 16)) { ata_dev_warn(qc->dev, "invalid cdb length %d\n", scmd->cmd_len); fp = 15; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); if (n_block != scsi_bufflen(scmd)) { ata_dev_warn(qc->dev, "non-matching transfer count (%d/%d)\n", n_block, scsi_bufflen(scmd)); goto invalid_param_len; } sa = cdb[1] & 0x1f; if (sa != ZI_REPORT_ZONES) { ata_dev_warn(qc->dev, "invalid service action %d\n", sa); fp = 1; goto invalid_fld; } /* * ZAC allows only for transfers in 512 byte blocks, * and uses a 16 bit value for the transfer count. */ if ((n_block / 512) > 0xffff || n_block < 512 || (n_block % 512)) { ata_dev_warn(qc->dev, "invalid transfer count %d\n", n_block); goto invalid_param_len; } sect = n_block / 512; options = cdb[14] & 0xbf; if (ata_ncq_enabled(qc->dev) && ata_fpdma_zac_mgmt_in_supported(qc->dev)) { tf->protocol = ATA_PROT_NCQ; tf->command = ATA_CMD_FPDMA_RECV; tf->hob_nsect = ATA_SUBCMD_FPDMA_RECV_ZAC_MGMT_IN & 0x1f; tf->nsect = qc->hw_tag << 3; tf->feature = sect & 0xff; tf->hob_feature = (sect >> 8) & 0xff; tf->auxiliary = ATA_SUBCMD_ZAC_MGMT_IN_REPORT_ZONES | (options << 8); } else { tf->command = ATA_CMD_ZAC_MGMT_IN; tf->feature = ATA_SUBCMD_ZAC_MGMT_IN_REPORT_ZONES; tf->protocol = ATA_PROT_DMA; tf->hob_feature = options; tf->hob_nsect = (sect >> 8) & 0xff; tf->nsect = sect & 0xff; } tf->device = ATA_LBA; tf->lbah = (block >> 16) & 0xff; tf->lbam = (block >> 8) & 0xff; tf->lbal = block & 0xff; tf->hob_lbah = (block >> 40) & 0xff; tf->hob_lbam = (block >> 32) & 0xff; tf->hob_lbal = (block >> 24) & 0xff; tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48; qc->flags |= ATA_QCFLAG_RESULT_TF; ata_qc_set_pc_nbytes(qc); qc->complete_fn = ata_scsi_report_zones_complete; return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, bp); return 1; invalid_param_len: /* "Parameter list length error" */ ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x1a, 0x0); return 1; } static unsigned int ata_scsi_zbc_out_xlat(struct ata_queued_cmd *qc) { struct ata_taskfile *tf = &qc->tf; struct scsi_cmnd *scmd = qc->scsicmd; struct ata_device *dev = qc->dev; const u8 *cdb = scmd->cmnd; u8 all, sa; u64 block; u32 n_block; u16 fp = (u16)-1; if (unlikely(scmd->cmd_len < 16)) { fp = 15; goto invalid_fld; } sa = cdb[1] & 0x1f; if ((sa != ZO_CLOSE_ZONE) && (sa != ZO_FINISH_ZONE) && (sa != ZO_OPEN_ZONE) && (sa != ZO_RESET_WRITE_POINTER)) { fp = 1; goto invalid_fld; } scsi_16_lba_len(cdb, &block, &n_block); if (n_block) { /* * ZAC MANAGEMENT OUT doesn't define any length */ goto invalid_param_len; } all = cdb[14] & 0x1; if (all) { /* * Ignore the block address (zone ID) as defined by ZBC. */ block = 0; } else if (block >= dev->n_sectors) { /* * Block must be a valid zone ID (a zone start LBA). */ fp = 2; goto invalid_fld; } if (ata_ncq_enabled(qc->dev) && ata_fpdma_zac_mgmt_out_supported(qc->dev)) { tf->protocol = ATA_PROT_NCQ_NODATA; tf->command = ATA_CMD_NCQ_NON_DATA; tf->feature = ATA_SUBCMD_NCQ_NON_DATA_ZAC_MGMT_OUT; tf->nsect = qc->hw_tag << 3; tf->auxiliary = sa | ((u16)all << 8); } else { tf->protocol = ATA_PROT_NODATA; tf->command = ATA_CMD_ZAC_MGMT_OUT; tf->feature = sa; tf->hob_feature = all; } tf->lbah = (block >> 16) & 0xff; tf->lbam = (block >> 8) & 0xff; tf->lbal = block & 0xff; tf->hob_lbah = (block >> 40) & 0xff; tf->hob_lbam = (block >> 32) & 0xff; tf->hob_lbal = (block >> 24) & 0xff; tf->device = ATA_LBA; tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48; return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, 0xff); return 1; invalid_param_len: /* "Parameter list length error" */ ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x1a, 0x0); return 1; } /** * ata_mselect_caching - Simulate MODE SELECT for caching info page * @qc: Storage for translated ATA taskfile * @buf: input buffer * @len: number of valid bytes in the input buffer * @fp: out parameter for the failed field on error * * Prepare a taskfile to modify caching information for the device. * * LOCKING: * None. */ static int ata_mselect_caching(struct ata_queued_cmd *qc, const u8 *buf, int len, u16 *fp) { struct ata_taskfile *tf = &qc->tf; struct ata_device *dev = qc->dev; u8 mpage[CACHE_MPAGE_LEN]; u8 wce; int i; /* * The first two bytes of def_cache_mpage are a header, so offsets * in mpage are off by 2 compared to buf. Same for len. */ if (len != CACHE_MPAGE_LEN - 2) { if (len < CACHE_MPAGE_LEN - 2) *fp = len; else *fp = CACHE_MPAGE_LEN - 2; return -EINVAL; } wce = buf[0] & (1 << 2); /* * Check that read-only bits are not modified. */ ata_msense_caching(dev->id, mpage, false); for (i = 0; i < CACHE_MPAGE_LEN - 2; i++) { if (i == 0) continue; if (mpage[i + 2] != buf[i]) { *fp = i; return -EINVAL; } } tf->flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR; tf->protocol = ATA_PROT_NODATA; tf->nsect = 0; tf->command = ATA_CMD_SET_FEATURES; tf->feature = wce ? SETFEATURES_WC_ON : SETFEATURES_WC_OFF; return 0; } /** * ata_mselect_control - Simulate MODE SELECT for control page * @qc: Storage for translated ATA taskfile * @buf: input buffer * @len: number of valid bytes in the input buffer * @fp: out parameter for the failed field on error * * Prepare a taskfile to modify caching information for the device. * * LOCKING: * None. */ static int ata_mselect_control(struct ata_queued_cmd *qc, const u8 *buf, int len, u16 *fp) { struct ata_device *dev = qc->dev; u8 mpage[CONTROL_MPAGE_LEN]; u8 d_sense; int i; /* * The first two bytes of def_control_mpage are a header, so offsets * in mpage are off by 2 compared to buf. Same for len. */ if (len != CONTROL_MPAGE_LEN - 2) { if (len < CONTROL_MPAGE_LEN - 2) *fp = len; else *fp = CONTROL_MPAGE_LEN - 2; return -EINVAL; } d_sense = buf[0] & (1 << 2); /* * Check that read-only bits are not modified. */ ata_msense_control(dev, mpage, false); for (i = 0; i < CONTROL_MPAGE_LEN - 2; i++) { if (i == 0) continue; if (mpage[2 + i] != buf[i]) { *fp = i; return -EINVAL; } } if (d_sense & (1 << 2)) dev->flags |= ATA_DFLAG_D_SENSE; else dev->flags &= ~ATA_DFLAG_D_SENSE; return 0; } /** * ata_scsi_mode_select_xlat - Simulate MODE SELECT 6, 10 commands * @qc: Storage for translated ATA taskfile * * Converts a MODE SELECT command to an ATA SET FEATURES taskfile. * Assume this is invoked for direct access devices (e.g. disks) only. * There should be no block descriptor for other device types. * * LOCKING: * spin_lock_irqsave(host lock) */ static unsigned int ata_scsi_mode_select_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; u8 pg, spg; unsigned six_byte, pg_len, hdr_len, bd_len; int len; u16 fp = (u16)-1; u8 bp = 0xff; u8 buffer[64]; const u8 *p = buffer; VPRINTK("ENTER\n"); six_byte = (cdb[0] == MODE_SELECT); if (six_byte) { if (scmd->cmd_len < 5) { fp = 4; goto invalid_fld; } len = cdb[4]; hdr_len = 4; } else { if (scmd->cmd_len < 9) { fp = 8; goto invalid_fld; } len = (cdb[7] << 8) + cdb[8]; hdr_len = 8; } /* We only support PF=1, SP=0. */ if ((cdb[1] & 0x11) != 0x10) { fp = 1; bp = (cdb[1] & 0x01) ? 1 : 5; goto invalid_fld; } /* Test early for possible overrun. */ if (!scsi_sg_count(scmd) || scsi_sglist(scmd)->length < len) goto invalid_param_len; /* Move past header and block descriptors. */ if (len < hdr_len) goto invalid_param_len; if (!sg_copy_to_buffer(scsi_sglist(scmd), scsi_sg_count(scmd), buffer, sizeof(buffer))) goto invalid_param_len; if (six_byte) bd_len = p[3]; else bd_len = (p[6] << 8) + p[7]; len -= hdr_len; p += hdr_len; if (len < bd_len) goto invalid_param_len; if (bd_len != 0 && bd_len != 8) { fp = (six_byte) ? 3 : 6; fp += bd_len + hdr_len; goto invalid_param; } len -= bd_len; p += bd_len; if (len == 0) goto skip; /* Parse both possible formats for the mode page headers. */ pg = p[0] & 0x3f; if (p[0] & 0x40) { if (len < 4) goto invalid_param_len; spg = p[1]; pg_len = (p[2] << 8) | p[3]; p += 4; len -= 4; } else { if (len < 2) goto invalid_param_len; spg = 0; pg_len = p[1]; p += 2; len -= 2; } /* * No mode subpages supported (yet) but asking for _all_ * subpages may be valid */ if (spg && (spg != ALL_SUB_MPAGES)) { fp = (p[0] & 0x40) ? 1 : 0; fp += hdr_len + bd_len; goto invalid_param; } if (pg_len > len) goto invalid_param_len; switch (pg) { case CACHE_MPAGE: if (ata_mselect_caching(qc, p, pg_len, &fp) < 0) { fp += hdr_len + bd_len; goto invalid_param; } break; case CONTROL_MPAGE: if (ata_mselect_control(qc, p, pg_len, &fp) < 0) { fp += hdr_len + bd_len; goto invalid_param; } else { goto skip; /* No ATA command to send */ } break; default: /* invalid page code */ fp = bd_len + hdr_len; goto invalid_param; } /* * Only one page has changeable data, so we only support setting one * page at a time. */ if (len > pg_len) goto invalid_param; return 0; invalid_fld: ata_scsi_set_invalid_field(qc->dev, scmd, fp, bp); return 1; invalid_param: ata_scsi_set_invalid_parameter(qc->dev, scmd, fp); return 1; invalid_param_len: /* "Parameter list length error" */ ata_scsi_set_sense(qc->dev, scmd, ILLEGAL_REQUEST, 0x1a, 0x0); return 1; skip: scmd->result = SAM_STAT_GOOD; return 1; } static u8 ata_scsi_trusted_op(u32 len, bool send, bool dma) { if (len == 0) return ATA_CMD_TRUSTED_NONDATA; else if (send) return dma ? ATA_CMD_TRUSTED_SND_DMA : ATA_CMD_TRUSTED_SND; else return dma ? ATA_CMD_TRUSTED_RCV_DMA : ATA_CMD_TRUSTED_RCV; } static unsigned int ata_scsi_security_inout_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; struct ata_taskfile *tf = &qc->tf; u8 secp = cdb[1]; bool send = (cdb[0] == SECURITY_PROTOCOL_OUT); u16 spsp = get_unaligned_be16(&cdb[2]); u32 len = get_unaligned_be32(&cdb[6]); bool dma = !(qc->dev->flags & ATA_DFLAG_PIO); /* * We don't support the ATA "security" protocol. */ if (secp == 0xef) { ata_scsi_set_invalid_field(qc->dev, scmd, 1, 0); return 1; } if (cdb[4] & 7) { /* INC_512 */ if (len > 0xffff) { ata_scsi_set_invalid_field(qc->dev, scmd, 6, 0); return 1; } } else { if (len > 0x01fffe00) { ata_scsi_set_invalid_field(qc->dev, scmd, 6, 0); return 1; } /* convert to the sector-based ATA addressing */ len = (len + 511) / 512; } tf->protocol = dma ? ATA_PROT_DMA : ATA_PROT_PIO; tf->flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR | ATA_TFLAG_LBA; if (send) tf->flags |= ATA_TFLAG_WRITE; tf->command = ata_scsi_trusted_op(len, send, dma); tf->feature = secp; tf->lbam = spsp & 0xff; tf->lbah = spsp >> 8; if (len) { tf->nsect = len & 0xff; tf->lbal = len >> 8; } else { if (!send) tf->lbah = (1 << 7); } ata_qc_set_pc_nbytes(qc); return 0; } /** * ata_scsi_var_len_cdb_xlat - SATL variable length CDB to Handler * @qc: Command to be translated * * Translate a SCSI variable length CDB to specified commands. * It checks a service action value in CDB to call corresponding handler. * * RETURNS: * Zero on success, non-zero on failure * */ static unsigned int ata_scsi_var_len_cdb_xlat(struct ata_queued_cmd *qc) { struct scsi_cmnd *scmd = qc->scsicmd; const u8 *cdb = scmd->cmnd; const u16 sa = get_unaligned_be16(&cdb[8]); /* * if service action represents a ata pass-thru(32) command, * then pass it to ata_scsi_pass_thru handler. */ if (sa == ATA_32) return ata_scsi_pass_thru(qc); /* unsupported service action */ return 1; } /** * ata_get_xlat_func - check if SCSI to ATA translation is possible * @dev: ATA device * @cmd: SCSI command opcode to consider * * Look up the SCSI command given, and determine whether the * SCSI command is to be translated or simulated. * * RETURNS: * Pointer to translation function if possible, %NULL if not. */ static inline ata_xlat_func_t ata_get_xlat_func(struct ata_device *dev, u8 cmd) { switch (cmd) { case READ_6: case READ_10: case READ_16: case WRITE_6: case WRITE_10: case WRITE_16: return ata_scsi_rw_xlat; case WRITE_SAME_16: return ata_scsi_write_same_xlat; case SYNCHRONIZE_CACHE: if (ata_try_flush_cache(dev)) return ata_scsi_flush_xlat; break; case VERIFY: case VERIFY_16: return ata_scsi_verify_xlat; case ATA_12: case ATA_16: return ata_scsi_pass_thru; case VARIABLE_LENGTH_CMD: return ata_scsi_var_len_cdb_xlat; case MODE_SELECT: case MODE_SELECT_10: return ata_scsi_mode_select_xlat; break; case ZBC_IN: return ata_scsi_zbc_in_xlat; case ZBC_OUT: return ata_scsi_zbc_out_xlat; case SECURITY_PROTOCOL_IN: case SECURITY_PROTOCOL_OUT: if (!(dev->flags & ATA_DFLAG_TRUSTED)) break; return ata_scsi_security_inout_xlat; case START_STOP: return ata_scsi_start_stop_xlat; } return NULL; } /** * ata_scsi_dump_cdb - dump SCSI command contents to dmesg * @ap: ATA port to which the command was being sent * @cmd: SCSI command to dump * * Prints the contents of a SCSI command via printk(). */ void ata_scsi_dump_cdb(struct ata_port *ap, struct scsi_cmnd *cmd) { #ifdef ATA_VERBOSE_DEBUG struct scsi_device *scsidev = cmd->device; VPRINTK("CDB (%u:%d,%d,%lld) %9ph\n", ap->print_id, scsidev->channel, scsidev->id, scsidev->lun, cmd->cmnd); #endif } int __ata_scsi_queuecmd(struct scsi_cmnd *scmd, struct ata_device *dev) { u8 scsi_op = scmd->cmnd[0]; ata_xlat_func_t xlat_func; int rc = 0; if (dev->class == ATA_DEV_ATA || dev->class == ATA_DEV_ZAC) { if (unlikely(!scmd->cmd_len || scmd->cmd_len > dev->cdb_len)) goto bad_cdb_len; xlat_func = ata_get_xlat_func(dev, scsi_op); } else { if (unlikely(!scmd->cmd_len)) goto bad_cdb_len; xlat_func = NULL; if (likely((scsi_op != ATA_16) || !atapi_passthru16)) { /* relay SCSI command to ATAPI device */ int len = COMMAND_SIZE(scsi_op); if (unlikely(len > scmd->cmd_len || len > dev->cdb_len || scmd->cmd_len > ATAPI_CDB_LEN)) goto bad_cdb_len; xlat_func = atapi_xlat; } else { /* ATA_16 passthru, treat as an ATA command */ if (unlikely(scmd->cmd_len > 16)) goto bad_cdb_len; xlat_func = ata_get_xlat_func(dev, scsi_op); } } if (xlat_func) rc = ata_scsi_translate(dev, scmd, xlat_func); else ata_scsi_simulate(dev, scmd); return rc; bad_cdb_len: DPRINTK("bad CDB len=%u, scsi_op=0x%02x, max=%u\n", scmd->cmd_len, scsi_op, dev->cdb_len); scmd->result = DID_ERROR << 16; scmd->scsi_done(scmd); return 0; } /** * ata_scsi_queuecmd - Issue SCSI cdb to libata-managed device * @shost: SCSI host of command to be sent * @cmd: SCSI command to be sent * * In some cases, this function translates SCSI commands into * ATA taskfiles, and queues the taskfiles to be sent to * hardware. In other cases, this function simulates a * SCSI device by evaluating and responding to certain * SCSI commands. This creates the overall effect of * ATA and ATAPI devices appearing as SCSI devices. * * LOCKING: * ATA host lock * * RETURNS: * Return value from __ata_scsi_queuecmd() if @cmd can be queued, * 0 otherwise. */ int ata_scsi_queuecmd(struct Scsi_Host *shost, struct scsi_cmnd *cmd) { struct ata_port *ap; struct ata_device *dev; struct scsi_device *scsidev = cmd->device; int rc = 0; unsigned long irq_flags; ap = ata_shost_to_port(shost); spin_lock_irqsave(ap->lock, irq_flags); ata_scsi_dump_cdb(ap, cmd); dev = ata_scsi_find_dev(ap, scsidev); if (likely(dev)) rc = __ata_scsi_queuecmd(cmd, dev); else { cmd->result = (DID_BAD_TARGET << 16); cmd->scsi_done(cmd); } spin_unlock_irqrestore(ap->lock, irq_flags); return rc; } EXPORT_SYMBOL_GPL(ata_scsi_queuecmd); /** * ata_scsi_simulate - simulate SCSI command on ATA device * @dev: the target device * @cmd: SCSI command being sent to device. * * Interprets and directly executes a select list of SCSI commands * that can be handled internally. * * LOCKING: * spin_lock_irqsave(host lock) */ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd) { struct ata_scsi_args args; const u8 *scsicmd = cmd->cmnd; u8 tmp8; args.dev = dev; args.id = dev->id; args.cmd = cmd; switch(scsicmd[0]) { case INQUIRY: if (scsicmd[1] & 2) /* is CmdDt set? */ ata_scsi_set_invalid_field(dev, cmd, 1, 0xff); else if ((scsicmd[1] & 1) == 0) /* is EVPD clear? */ ata_scsi_rbuf_fill(&args, ata_scsiop_inq_std); else switch (scsicmd[2]) { case 0x00: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_00); break; case 0x80: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_80); break; case 0x83: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_83); break; case 0x89: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_89); break; case 0xb0: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b0); break; case 0xb1: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b1); break; case 0xb2: ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b2); break; case 0xb6: if (dev->flags & ATA_DFLAG_ZAC) { ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b6); break; } fallthrough; default: ata_scsi_set_invalid_field(dev, cmd, 2, 0xff); break; } break; case MODE_SENSE: case MODE_SENSE_10: ata_scsi_rbuf_fill(&args, ata_scsiop_mode_sense); break; case READ_CAPACITY: ata_scsi_rbuf_fill(&args, ata_scsiop_read_cap); break; case SERVICE_ACTION_IN_16: if ((scsicmd[1] & 0x1f) == SAI_READ_CAPACITY_16) ata_scsi_rbuf_fill(&args, ata_scsiop_read_cap); else ata_scsi_set_invalid_field(dev, cmd, 1, 0xff); break; case REPORT_LUNS: ata_scsi_rbuf_fill(&args, ata_scsiop_report_luns); break; case REQUEST_SENSE: ata_scsi_set_sense(dev, cmd, 0, 0, 0); cmd->result = (DRIVER_SENSE << 24); break; /* if we reach this, then writeback caching is disabled, * turning this into a no-op. */ case SYNCHRONIZE_CACHE: fallthrough; /* no-op's, complete with success */ case REZERO_UNIT: case SEEK_6: case SEEK_10: case TEST_UNIT_READY: break; case SEND_DIAGNOSTIC: tmp8 = scsicmd[1] & ~(1 << 3); if (tmp8 != 0x4 || scsicmd[3] || scsicmd[4]) ata_scsi_set_invalid_field(dev, cmd, 1, 0xff); break; case MAINTENANCE_IN: if (scsicmd[1] == MI_REPORT_SUPPORTED_OPERATION_CODES) ata_scsi_rbuf_fill(&args, ata_scsiop_maint_in); else ata_scsi_set_invalid_field(dev, cmd, 1, 0xff); break; /* all other commands */ default: ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x20, 0x0); /* "Invalid command operation code" */ break; } cmd->scsi_done(cmd); } int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht) { int i, rc; for (i = 0; i < host->n_ports; i++) { struct ata_port *ap = host->ports[i]; struct Scsi_Host *shost; rc = -ENOMEM; shost = scsi_host_alloc(sht, sizeof(struct ata_port *)); if (!shost) goto err_alloc; shost->eh_noresume = 1; *(struct ata_port **)&shost->hostdata[0] = ap; ap->scsi_host = shost; shost->transportt = ata_scsi_transport_template; shost->unique_id = ap->print_id; shost->max_id = 16; shost->max_lun = 1; shost->max_channel = 1; shost->max_cmd_len = 32; /* Schedule policy is determined by ->qc_defer() * callback and it needs to see every deferred qc. * Set host_blocked to 1 to prevent SCSI midlayer from * automatically deferring requests. */ shost->max_host_blocked = 1; rc = scsi_add_host_with_dma(shost, &ap->tdev, ap->host->dev); if (rc) goto err_alloc; } return 0; err_alloc: while (--i >= 0) { struct Scsi_Host *shost = host->ports[i]->scsi_host; /* scsi_host_put() is in ata_devres_release() */ scsi_remove_host(shost); } return rc; } #ifdef CONFIG_OF static void ata_scsi_assign_ofnode(struct ata_device *dev, struct ata_port *ap) { struct scsi_device *sdev = dev->sdev; struct device *d = ap->host->dev; struct device_node *np = d->of_node; struct device_node *child; for_each_available_child_of_node(np, child) { int ret; u32 val; ret = of_property_read_u32(child, "reg", &val); if (ret) continue; if (val == dev->devno) { dev_dbg(d, "found matching device node\n"); sdev->sdev_gendev.of_node = child; return; } } } #else static void ata_scsi_assign_ofnode(struct ata_device *dev, struct ata_port *ap) { } #endif void ata_scsi_scan_host(struct ata_port *ap, int sync) { int tries = 5; struct ata_device *last_failed_dev = NULL; struct ata_link *link; struct ata_device *dev; repeat: ata_for_each_link(link, ap, EDGE) { ata_for_each_dev(dev, link, ENABLED) { struct scsi_device *sdev; int channel = 0, id = 0; if (dev->sdev) continue; if (ata_is_host_link(link)) id = dev->devno; else channel = link->pmp; sdev = __scsi_add_device(ap->scsi_host, channel, id, 0, NULL); if (!IS_ERR(sdev)) { dev->sdev = sdev; ata_scsi_assign_ofnode(dev, ap); scsi_device_put(sdev); } else { dev->sdev = NULL; } } } /* If we scanned while EH was in progress or allocation * failure occurred, scan would have failed silently. Check * whether all devices are attached. */ ata_for_each_link(link, ap, EDGE) { ata_for_each_dev(dev, link, ENABLED) { if (!dev->sdev) goto exit_loop; } } exit_loop: if (!link) return; /* we're missing some SCSI devices */ if (sync) { /* If caller requested synchrnous scan && we've made * any progress, sleep briefly and repeat. */ if (dev != last_failed_dev) { msleep(100); last_failed_dev = dev; goto repeat; } /* We might be failing to detect boot device, give it * a few more chances. */ if (--tries) { msleep(100); goto repeat; } ata_port_err(ap, "WARNING: synchronous SCSI scan failed without making any progress, switching to async\n"); } queue_delayed_work(system_long_wq, &ap->hotplug_task, round_jiffies_relative(HZ)); } /** * ata_scsi_offline_dev - offline attached SCSI device * @dev: ATA device to offline attached SCSI device for * * This function is called from ata_eh_hotplug() and responsible * for taking the SCSI device attached to @dev offline. This * function is called with host lock which protects dev->sdev * against clearing. * * LOCKING: * spin_lock_irqsave(host lock) * * RETURNS: * 1 if attached SCSI device exists, 0 otherwise. */ int ata_scsi_offline_dev(struct ata_device *dev) { if (dev->sdev) { scsi_device_set_state(dev->sdev, SDEV_OFFLINE); return 1; } return 0; } /** * ata_scsi_remove_dev - remove attached SCSI device * @dev: ATA device to remove attached SCSI device for * * This function is called from ata_eh_scsi_hotplug() and * responsible for removing the SCSI device attached to @dev. * * LOCKING: * Kernel thread context (may sleep). */ static void ata_scsi_remove_dev(struct ata_device *dev) { struct ata_port *ap = dev->link->ap; struct scsi_device *sdev; unsigned long flags; /* Alas, we need to grab scan_mutex to ensure SCSI device * state doesn't change underneath us and thus * scsi_device_get() always succeeds. The mutex locking can * be removed if there is __scsi_device_get() interface which * increments reference counts regardless of device state. */ mutex_lock(&ap->scsi_host->scan_mutex); spin_lock_irqsave(ap->lock, flags); /* clearing dev->sdev is protected by host lock */ sdev = dev->sdev; dev->sdev = NULL; if (sdev) { /* If user initiated unplug races with us, sdev can go * away underneath us after the host lock and * scan_mutex are released. Hold onto it. */ if (scsi_device_get(sdev) == 0) { /* The following ensures the attached sdev is * offline on return from ata_scsi_offline_dev() * regardless it wins or loses the race * against this function. */ scsi_device_set_state(sdev, SDEV_OFFLINE); } else { WARN_ON(1); sdev = NULL; } } spin_unlock_irqrestore(ap->lock, flags); mutex_unlock(&ap->scsi_host->scan_mutex); if (sdev) { ata_dev_info(dev, "detaching (SCSI %s)\n", dev_name(&sdev->sdev_gendev)); scsi_remove_device(sdev); scsi_device_put(sdev); } } static void ata_scsi_handle_link_detach(struct ata_link *link) { struct ata_port *ap = link->ap; struct ata_device *dev; ata_for_each_dev(dev, link, ALL) { unsigned long flags; if (!(dev->flags & ATA_DFLAG_DETACHED)) continue; spin_lock_irqsave(ap->lock, flags); dev->flags &= ~ATA_DFLAG_DETACHED; spin_unlock_irqrestore(ap->lock, flags); if (zpodd_dev_enabled(dev)) zpodd_exit(dev); ata_scsi_remove_dev(dev); } } /** * ata_scsi_media_change_notify - send media change event * @dev: Pointer to the disk device with media change event * * Tell the block layer to send a media change notification * event. * * LOCKING: * spin_lock_irqsave(host lock) */ void ata_scsi_media_change_notify(struct ata_device *dev) { if (dev->sdev) sdev_evt_send_simple(dev->sdev, SDEV_EVT_MEDIA_CHANGE, GFP_ATOMIC); } /** * ata_scsi_hotplug - SCSI part of hotplug * @work: Pointer to ATA port to perform SCSI hotplug on * * Perform SCSI part of hotplug. It's executed from a separate * workqueue after EH completes. This is necessary because SCSI * hot plugging requires working EH and hot unplugging is * synchronized with hot plugging with a mutex. * * LOCKING: * Kernel thread context (may sleep). */ void ata_scsi_hotplug(struct work_struct *work) { struct ata_port *ap = container_of(work, struct ata_port, hotplug_task.work); int i; if (ap->pflags & ATA_PFLAG_UNLOADING) { DPRINTK("ENTER/EXIT - unloading\n"); return; } DPRINTK("ENTER\n"); mutex_lock(&ap->scsi_scan_mutex); /* Unplug detached devices. We cannot use link iterator here * because PMP links have to be scanned even if PMP is * currently not attached. Iterate manually. */ ata_scsi_handle_link_detach(&ap->link); if (ap->pmp_link) for (i = 0; i < SATA_PMP_MAX_PORTS; i++) ata_scsi_handle_link_detach(&ap->pmp_link[i]); /* scan for new ones */ ata_scsi_scan_host(ap, 0); mutex_unlock(&ap->scsi_scan_mutex); DPRINTK("EXIT\n"); } /** * ata_scsi_user_scan - indication for user-initiated bus scan * @shost: SCSI host to scan * @channel: Channel to scan * @id: ID to scan * @lun: LUN to scan * * This function is called when user explicitly requests bus * scan. Set probe pending flag and invoke EH. * * LOCKING: * SCSI layer (we don't care) * * RETURNS: * Zero. */ int ata_scsi_user_scan(struct Scsi_Host *shost, unsigned int channel, unsigned int id, u64 lun) { struct ata_port *ap = ata_shost_to_port(shost); unsigned long flags; int devno, rc = 0; if (!ap->ops->error_handler) return -EOPNOTSUPP; if (lun != SCAN_WILD_CARD && lun) return -EINVAL; if (!sata_pmp_attached(ap)) { if (channel != SCAN_WILD_CARD && channel) return -EINVAL; devno = id; } else { if (id != SCAN_WILD_CARD && id) return -EINVAL; devno = channel; } spin_lock_irqsave(ap->lock, flags); if (devno == SCAN_WILD_CARD) { struct ata_link *link; ata_for_each_link(link, ap, EDGE) { struct ata_eh_info *ehi = &link->eh_info; ehi->probe_mask |= ATA_ALL_DEVICES; ehi->action |= ATA_EH_RESET; } } else { struct ata_device *dev = ata_find_dev(ap, devno); if (dev) { struct ata_eh_info *ehi = &dev->link->eh_info; ehi->probe_mask |= 1 << dev->devno; ehi->action |= ATA_EH_RESET; } else rc = -EINVAL; } if (rc == 0) { ata_port_schedule_eh(ap); spin_unlock_irqrestore(ap->lock, flags); ata_port_wait_eh(ap); } else spin_unlock_irqrestore(ap->lock, flags); return rc; } /** * ata_scsi_dev_rescan - initiate scsi_rescan_device() * @work: Pointer to ATA port to perform scsi_rescan_device() * * After ATA pass thru (SAT) commands are executed successfully, * libata need to propagate the changes to SCSI layer. * * LOCKING: * Kernel thread context (may sleep). */ void ata_scsi_dev_rescan(struct work_struct *work) { struct ata_port *ap = container_of(work, struct ata_port, scsi_rescan_task); struct ata_link *link; struct ata_device *dev; unsigned long flags; mutex_lock(&ap->scsi_scan_mutex); spin_lock_irqsave(ap->lock, flags); ata_for_each_link(link, ap, EDGE) { ata_for_each_dev(dev, link, ENABLED) { struct scsi_device *sdev = dev->sdev; if (!sdev) continue; if (scsi_device_get(sdev)) continue; spin_unlock_irqrestore(ap->lock, flags); scsi_rescan_device(&(sdev->sdev_gendev)); scsi_device_put(sdev); spin_lock_irqsave(ap->lock, flags); } } spin_unlock_irqrestore(ap->lock, flags); mutex_unlock(&ap->scsi_scan_mutex); }
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_BLKDEV_H #define _LINUX_BLKDEV_H #include <linux/sched.h> #include <linux/sched/clock.h> #include <linux/major.h> #include <linux/genhd.h> #include <linux/list.h> #include <linux/llist.h> #include <linux/minmax.h> #include <linux/timer.h> #include <linux/workqueue.h> #include <linux/pagemap.h> #include <linux/backing-dev-defs.h> #include <linux/wait.h> #include <linux/mempool.h> #include <linux/pfn.h> #include <linux/bio.h> #include <linux/stringify.h> #include <linux/gfp.h> #include <linux/bsg.h> #include <linux/smp.h> #include <linux/rcupdate.h> #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> #include <linux/pm.h> struct module; struct scsi_ioctl_command; struct request_queue; struct elevator_queue; struct blk_trace; struct request; struct sg_io_hdr; struct bsg_job; struct blkcg_gq; struct blk_flush_queue; struct pr_ops; struct rq_qos; struct blk_queue_stats; struct blk_stat_callback; struct blk_keyslot_manager; #define BLKDEV_MIN_RQ 4 #define BLKDEV_MAX_RQ 128 /* Default maximum */ /* Must be consistent with blk_mq_poll_stats_bkt() */ #define BLK_MQ_POLL_STATS_BKTS 16 /* Doing classic polling */ #define BLK_MQ_POLL_CLASSIC -1 /* * Maximum number of blkcg policies allowed to be registered concurrently. * Defined here to simplify include dependency. */ #define BLKCG_MAX_POLS 5 typedef void (rq_end_io_fn)(struct request *, blk_status_t); /* * request flags */ typedef __u32 __bitwise req_flags_t; /* elevator knows about this request */ #define RQF_SORTED ((__force req_flags_t)(1 << 0)) /* drive already may have started this one */ #define RQF_STARTED ((__force req_flags_t)(1 << 1)) /* may not be passed by ioscheduler */ #define RQF_SOFTBARRIER ((__force req_flags_t)(1 << 3)) /* request for flush sequence */ #define RQF_FLUSH_SEQ ((__force req_flags_t)(1 << 4)) /* merge of different types, fail separately */ #define RQF_MIXED_MERGE ((__force req_flags_t)(1 << 5)) /* track inflight for MQ */ #define RQF_MQ_INFLIGHT ((__force req_flags_t)(1 << 6)) /* don't call prep for this one */ #define RQF_DONTPREP ((__force req_flags_t)(1 << 7)) /* vaguely specified driver internal error. Ignored by the block layer */ #define RQF_FAILED ((__force req_flags_t)(1 << 10)) /* don't warn about errors */ #define RQF_QUIET ((__force req_flags_t)(1 << 11)) /* elevator private data attached */ #define RQF_ELVPRIV ((__force req_flags_t)(1 << 12)) /* account into disk and partition IO statistics */ #define RQF_IO_STAT ((__force req_flags_t)(1 << 13)) /* request came from our alloc pool */ #define RQF_ALLOCED ((__force req_flags_t)(1 << 14)) /* runtime pm request */ #define RQF_PM ((__force req_flags_t)(1 << 15)) /* on IO scheduler merge hash */ #define RQF_HASHED ((__force req_flags_t)(1 << 16)) /* track IO completion time */ #define RQF_STATS ((__force req_flags_t)(1 << 17)) /* Look at ->special_vec for the actual data payload instead of the bio chain. */ #define RQF_SPECIAL_PAYLOAD ((__force req_flags_t)(1 << 18)) /* The per-zone write lock is held for this request */ #define RQF_ZONE_WRITE_LOCKED ((__force req_flags_t)(1 << 19)) /* already slept for hybrid poll */ #define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 20)) /* ->timeout has been called, don't expire again */ #define RQF_TIMED_OUT ((__force req_flags_t)(1 << 21)) /* flags that prevent us from merging requests: */ #define RQF_NOMERGE_FLAGS \ (RQF_STARTED | RQF_SOFTBARRIER | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD) /* * Request state for blk-mq. */ enum mq_rq_state { MQ_RQ_IDLE = 0, MQ_RQ_IN_FLIGHT = 1, MQ_RQ_COMPLETE = 2, }; /* * Try to put the fields that are referenced together in the same cacheline. * * If you modify this structure, make sure to update blk_rq_init() and * especially blk_mq_rq_ctx_init() to take care of the added fields. */ struct request { struct request_queue *q; struct blk_mq_ctx *mq_ctx; struct blk_mq_hw_ctx *mq_hctx; unsigned int cmd_flags; /* op and common flags */ req_flags_t rq_flags; int tag; int internal_tag; /* the following two fields are internal, NEVER access directly */ unsigned int __data_len; /* total data len */ sector_t __sector; /* sector cursor */ struct bio *bio; struct bio *biotail; struct list_head queuelist; /* * The hash is used inside the scheduler, and killed once the * request reaches the dispatch list. The ipi_list is only used * to queue the request for softirq completion, which is long * after the request has been unhashed (and even removed from * the dispatch list). */ union { struct hlist_node hash; /* merge hash */ struct list_head ipi_list; }; /* * The rb_node is only used inside the io scheduler, requests * are pruned when moved to the dispatch queue. So let the * completion_data share space with the rb_node. */ union { struct rb_node rb_node; /* sort/lookup */ struct bio_vec special_vec; void *completion_data; int error_count; /* for legacy drivers, don't use */ }; /* * Three pointers are available for the IO schedulers, if they need * more they have to dynamically allocate it. Flush requests are * never put on the IO scheduler. So let the flush fields share * space with the elevator data. */ union { struct { struct io_cq *icq; void *priv[2]; } elv; struct { unsigned int seq; struct list_head list; rq_end_io_fn *saved_end_io; } flush; }; struct gendisk *rq_disk; struct hd_struct *part; #ifdef CONFIG_BLK_RQ_ALLOC_TIME /* Time that the first bio started allocating this request. */ u64 alloc_time_ns; #endif /* Time that this request was allocated for this IO. */ u64 start_time_ns; /* Time that I/O was submitted to the device. */ u64 io_start_time_ns; #ifdef CONFIG_BLK_WBT unsigned short wbt_flags; #endif /* * rq sectors used for blk stats. It has the same value * with blk_rq_sectors(rq), except that it never be zeroed * by completion. */ unsigned short stats_sectors; /* * Number of scatter-gather DMA addr+len pairs after * physical address coalescing is performed. */ unsigned short nr_phys_segments; #if defined(CONFIG_BLK_DEV_INTEGRITY) unsigned short nr_integrity_segments; #endif #ifdef CONFIG_BLK_INLINE_ENCRYPTION struct bio_crypt_ctx *crypt_ctx; struct blk_ksm_keyslot *crypt_keyslot; #endif unsigned short write_hint; unsigned short ioprio; enum mq_rq_state state; refcount_t ref; unsigned int timeout; unsigned long deadline; union { struct __call_single_data csd; u64 fifo_time; }; /* * completion callback. */ rq_end_io_fn *end_io; void *end_io_data; }; static inline bool blk_op_is_scsi(unsigned int op) { return op == REQ_OP_SCSI_IN || op == REQ_OP_SCSI_OUT; } static inline bool blk_op_is_private(unsigned int op) { return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT; } static inline bool blk_rq_is_scsi(struct request *rq) { return blk_op_is_scsi(req_op(rq)); } static inline bool blk_rq_is_private(struct request *rq) { return blk_op_is_private(req_op(rq)); } static inline bool blk_rq_is_passthrough(struct request *rq) { return blk_rq_is_scsi(rq) || blk_rq_is_private(rq); } static inline bool bio_is_passthrough(struct bio *bio) { unsigned op = bio_op(bio); return blk_op_is_scsi(op) || blk_op_is_private(op); } static inline unsigned short req_get_ioprio(struct request *req) { return req->ioprio; } #include <linux/elevator.h> struct blk_queue_ctx; struct bio_vec; enum blk_eh_timer_return { BLK_EH_DONE, /* drivers has completed the command */ BLK_EH_RESET_TIMER, /* reset timer and try again */ }; enum blk_queue_state { Queue_down, Queue_up, }; #define BLK_TAG_ALLOC_FIFO 0 /* allocate starting from 0 */ #define BLK_TAG_ALLOC_RR 1 /* allocate starting from last allocated tag */ #define BLK_SCSI_MAX_CMDS (256) #define BLK_SCSI_CMD_PER_LONG (BLK_SCSI_MAX_CMDS / (sizeof(long) * 8)) /* * Zoned block device models (zoned limit). * * Note: This needs to be ordered from the least to the most severe * restrictions for the inheritance in blk_stack_limits() to work. */ enum blk_zoned_model { BLK_ZONED_NONE = 0, /* Regular block device */ BLK_ZONED_HA, /* Host-aware zoned block device */ BLK_ZONED_HM, /* Host-managed zoned block device */ }; struct queue_limits { unsigned long bounce_pfn; unsigned long seg_boundary_mask; unsigned long virt_boundary_mask; unsigned int max_hw_sectors; unsigned int max_dev_sectors; unsigned int chunk_sectors; unsigned int max_sectors; unsigned int max_segment_size; unsigned int physical_block_size; unsigned int logical_block_size; unsigned int alignment_offset; unsigned int io_min; unsigned int io_opt; unsigned int max_discard_sectors; unsigned int max_hw_discard_sectors; unsigned int max_write_same_sectors; unsigned int max_write_zeroes_sectors; unsigned int max_zone_append_sectors; unsigned int discard_granularity; unsigned int discard_alignment; unsigned short max_segments; unsigned short max_integrity_segments; unsigned short max_discard_segments; unsigned char misaligned; unsigned char discard_misaligned; unsigned char raid_partial_stripes_expensive; enum blk_zoned_model zoned; }; typedef int (*report_zones_cb)(struct blk_zone *zone, unsigned int idx, void *data); void blk_queue_set_zoned(struct gendisk *disk, enum blk_zoned_model model); #ifdef CONFIG_BLK_DEV_ZONED #define BLK_ALL_ZONES ((unsigned int)-1) int blkdev_report_zones(struct block_device *bdev, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); unsigned int blkdev_nr_zones(struct gendisk *disk); extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op, sector_t sectors, sector_t nr_sectors, gfp_t gfp_mask); int blk_revalidate_disk_zones(struct gendisk *disk, void (*update_driver_data)(struct gendisk *disk)); extern int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg); extern int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg); #else /* CONFIG_BLK_DEV_ZONED */ static inline unsigned int blkdev_nr_zones(struct gendisk *disk) { return 0; } static inline int blkdev_report_zones_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg) { return -ENOTTY; } static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg) { return -ENOTTY; } #endif /* CONFIG_BLK_DEV_ZONED */ struct request_queue { struct request *last_merge; struct elevator_queue *elevator; struct percpu_ref q_usage_counter; struct blk_queue_stats *stats; struct rq_qos *rq_qos; const struct blk_mq_ops *mq_ops; /* sw queues */ struct blk_mq_ctx __percpu *queue_ctx; unsigned int queue_depth; /* hw dispatch queues */ struct blk_mq_hw_ctx **queue_hw_ctx; unsigned int nr_hw_queues; struct backing_dev_info *backing_dev_info; /* * The queue owner gets to use this for whatever they like. * ll_rw_blk doesn't touch it. */ void *queuedata; /* * various queue flags, see QUEUE_* below */ unsigned long queue_flags; /* * Number of contexts that have called blk_set_pm_only(). If this * counter is above zero then only RQF_PM requests are processed. */ atomic_t pm_only; /* * ida allocated id for this queue. Used to index queues from * ioctx. */ int id; /* * queue needs bounce pages for pages above this limit */ gfp_t bounce_gfp; spinlock_t queue_lock; /* * queue kobject */ struct kobject kobj; /* * mq queue kobject */ struct kobject *mq_kobj; #ifdef CONFIG_BLK_DEV_INTEGRITY struct blk_integrity integrity; #endif /* CONFIG_BLK_DEV_INTEGRITY */ #ifdef CONFIG_PM struct device *dev; enum rpm_status rpm_status; unsigned int nr_pending; #endif /* * queue settings */ unsigned long nr_requests; /* Max # of requests */ unsigned int dma_pad_mask; unsigned int dma_alignment; #ifdef CONFIG_BLK_INLINE_ENCRYPTION /* Inline crypto capabilities */ struct blk_keyslot_manager *ksm; #endif unsigned int rq_timeout; int poll_nsec; struct blk_stat_callback *poll_cb; struct blk_rq_stat poll_stat[BLK_MQ_POLL_STATS_BKTS]; struct timer_list timeout; struct work_struct timeout_work; atomic_t nr_active_requests_shared_sbitmap; struct list_head icq_list; #ifdef CONFIG_BLK_CGROUP DECLARE_BITMAP (blkcg_pols, BLKCG_MAX_POLS); struct blkcg_gq *root_blkg; struct list_head blkg_list; #endif struct queue_limits limits; unsigned int required_elevator_features; #ifdef CONFIG_BLK_DEV_ZONED /* * Zoned block device information for request dispatch control. * nr_zones is the total number of zones of the device. This is always * 0 for regular block devices. conv_zones_bitmap is a bitmap of nr_zones * bits which indicates if a zone is conventional (bit set) or * sequential (bit clear). seq_zones_wlock is a bitmap of nr_zones * bits which indicates if a zone is write locked, that is, if a write * request targeting the zone was dispatched. All three fields are * initialized by the low level device driver (e.g. scsi/sd.c). * Stacking drivers (device mappers) may or may not initialize * these fields. * * Reads of this information must be protected with blk_queue_enter() / * blk_queue_exit(). Modifying this information is only allowed while * no requests are being processed. See also blk_mq_freeze_queue() and * blk_mq_unfreeze_queue(). */ unsigned int nr_zones; unsigned long *conv_zones_bitmap; unsigned long *seq_zones_wlock; unsigned int max_open_zones; unsigned int max_active_zones; #endif /* CONFIG_BLK_DEV_ZONED */ /* * sg stuff */ unsigned int sg_timeout; unsigned int sg_reserved_size; int node; struct mutex debugfs_mutex; #ifdef CONFIG_BLK_DEV_IO_TRACE struct blk_trace __rcu *blk_trace; #endif /* * for flush operations */ struct blk_flush_queue *fq; struct list_head requeue_list; spinlock_t requeue_lock; struct delayed_work requeue_work; struct mutex sysfs_lock; struct mutex sysfs_dir_lock; /* * for reusing dead hctx instance in case of updating * nr_hw_queues */ struct list_head unused_hctx_list; spinlock_t unused_hctx_lock; int mq_freeze_depth; #if defined(CONFIG_BLK_DEV_BSG) struct bsg_class_device bsg_dev; #endif #ifdef CONFIG_BLK_DEV_THROTTLING /* Throttle data */ struct throtl_data *td; #endif struct rcu_head rcu_head; wait_queue_head_t mq_freeze_wq; /* * Protect concurrent access to q_usage_counter by * percpu_ref_kill() and percpu_ref_reinit(). */ struct mutex mq_freeze_lock; struct blk_mq_tag_set *tag_set; struct list_head tag_set_list; struct bio_set bio_split; struct dentry *debugfs_dir; #ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; struct dentry *rqos_debugfs_dir; #endif bool mq_sysfs_init_done; size_t cmd_size; #define BLK_MAX_WRITE_HINTS 5 u64 write_hints[BLK_MAX_WRITE_HINTS]; }; /* Keep blk_queue_flag_name[] in sync with the definitions below */ #define QUEUE_FLAG_STOPPED 0 /* queue is stopped */ #define QUEUE_FLAG_DYING 1 /* queue being torn down */ #define QUEUE_FLAG_NOMERGES 3 /* disable merge attempts */ #define QUEUE_FLAG_SAME_COMP 4 /* complete on same CPU-group */ #define QUEUE_FLAG_FAIL_IO 5 /* fake timeout */ #define QUEUE_FLAG_NONROT 6 /* non-rotational device (SSD) */ #define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */ #define QUEUE_FLAG_IO_STAT 7 /* do disk/partitions IO accounting */ #define QUEUE_FLAG_DISCARD 8 /* supports DISCARD */ #define QUEUE_FLAG_NOXMERGES 9 /* No extended merges */ #define QUEUE_FLAG_ADD_RANDOM 10 /* Contributes to random pool */ #define QUEUE_FLAG_SECERASE 11 /* supports secure erase */ #define QUEUE_FLAG_SAME_FORCE 12 /* force complete on same CPU */ #define QUEUE_FLAG_DEAD 13 /* queue tear-down finished */ #define QUEUE_FLAG_INIT_DONE 14 /* queue is initialized */ #define QUEUE_FLAG_STABLE_WRITES 15 /* don't modify blks until WB is done */ #define QUEUE_FLAG_POLL 16 /* IO polling enabled if set */ #define QUEUE_FLAG_WC 17 /* Write back caching */ #define QUEUE_FLAG_FUA 18 /* device supports FUA writes */ #define QUEUE_FLAG_DAX 19 /* device supports DAX */ #define QUEUE_FLAG_STATS 20 /* track IO start and completion times */ #define QUEUE_FLAG_POLL_STATS 21 /* collecting stats for hybrid polling */ #define QUEUE_FLAG_REGISTERED 22 /* queue has been registered to a disk */ #define QUEUE_FLAG_SCSI_PASSTHROUGH 23 /* queue supports SCSI commands */ #define QUEUE_FLAG_QUIESCED 24 /* queue has been quiesced */ #define QUEUE_FLAG_PCI_P2PDMA 25 /* device supports PCI p2p requests */ #define QUEUE_FLAG_ZONE_RESETALL 26 /* supports Zone Reset All */ #define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */ #define QUEUE_FLAG_HCTX_ACTIVE 28 /* at least one blk-mq hctx is active */ #define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */ #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP) | \ (1 << QUEUE_FLAG_NOWAIT)) void blk_queue_flag_set(unsigned int flag, struct request_queue *q); void blk_queue_flag_clear(unsigned int flag, struct request_queue *q); bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) #define blk_queue_dying(q) test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags) #define blk_queue_dead(q) test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags) #define blk_queue_init_done(q) test_bit(QUEUE_FLAG_INIT_DONE, &(q)->queue_flags) #define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags) #define blk_queue_noxmerges(q) \ test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags) #define blk_queue_nonrot(q) test_bit(QUEUE_FLAG_NONROT, &(q)->queue_flags) #define blk_queue_stable_writes(q) \ test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags) #define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags) #define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags) #define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags) #define blk_queue_zone_resetall(q) \ test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags) #define blk_queue_secure_erase(q) \ (test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags)) #define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags) #define blk_queue_scsi_passthrough(q) \ test_bit(QUEUE_FLAG_SCSI_PASSTHROUGH, &(q)->queue_flags) #define blk_queue_pci_p2pdma(q) \ test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags) #ifdef CONFIG_BLK_RQ_ALLOC_TIME #define blk_queue_rq_alloc_time(q) \ test_bit(QUEUE_FLAG_RQ_ALLOC_TIME, &(q)->queue_flags) #else #define blk_queue_rq_alloc_time(q) false #endif #define blk_noretry_request(rq) \ ((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \ REQ_FAILFAST_DRIVER)) #define blk_queue_quiesced(q) test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags) #define blk_queue_pm_only(q) atomic_read(&(q)->pm_only) #define blk_queue_fua(q) test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags) #define blk_queue_registered(q) test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags) #define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags) extern void blk_set_pm_only(struct request_queue *q); extern void blk_clear_pm_only(struct request_queue *q); static inline bool blk_account_rq(struct request *rq) { return (rq->rq_flags & RQF_STARTED) && !blk_rq_is_passthrough(rq); } #define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist) #define rq_data_dir(rq) (op_is_write(req_op(rq)) ? WRITE : READ) #define rq_dma_dir(rq) \ (op_is_write(req_op(rq)) ? DMA_TO_DEVICE : DMA_FROM_DEVICE) #define dma_map_bvec(dev, bv, dir, attrs) \ dma_map_page_attrs(dev, (bv)->bv_page, (bv)->bv_offset, (bv)->bv_len, \ (dir), (attrs)) static inline bool queue_is_mq(struct request_queue *q) { return q->mq_ops; } #ifdef CONFIG_PM static inline enum rpm_status queue_rpm_status(struct request_queue *q) { return q->rpm_status; } #else static inline enum rpm_status queue_rpm_status(struct request_queue *q) { return RPM_ACTIVE; } #endif static inline enum blk_zoned_model blk_queue_zoned_model(struct request_queue *q) { if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) return q->limits.zoned; return BLK_ZONED_NONE; } static inline bool blk_queue_is_zoned(struct request_queue *q) { switch (blk_queue_zoned_model(q)) { case BLK_ZONED_HA: case BLK_ZONED_HM: return true; default: return false; } } static inline sector_t blk_queue_zone_sectors(struct request_queue *q) { return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0; } #ifdef CONFIG_BLK_DEV_ZONED static inline unsigned int blk_queue_nr_zones(struct request_queue *q) { return blk_queue_is_zoned(q) ? q->nr_zones : 0; } static inline unsigned int blk_queue_zone_no(struct request_queue *q, sector_t sector) { if (!blk_queue_is_zoned(q)) return 0; return sector >> ilog2(q->limits.chunk_sectors); } static inline bool blk_queue_zone_is_seq(struct request_queue *q, sector_t sector) { if (!blk_queue_is_zoned(q)) return false; if (!q->conv_zones_bitmap) return true; return !test_bit(blk_queue_zone_no(q, sector), q->conv_zones_bitmap); } static inline void blk_queue_max_open_zones(struct request_queue *q, unsigned int max_open_zones) { q->max_open_zones = max_open_zones; } static inline unsigned int queue_max_open_zones(const struct request_queue *q) { return q->max_open_zones; } static inline void blk_queue_max_active_zones(struct request_queue *q, unsigned int max_active_zones) { q->max_active_zones = max_active_zones; } static inline unsigned int queue_max_active_zones(const struct request_queue *q) { return q->max_active_zones; } #else /* CONFIG_BLK_DEV_ZONED */ static inline unsigned int blk_queue_nr_zones(struct request_queue *q) { return 0; } static inline bool blk_queue_zone_is_seq(struct request_queue *q, sector_t sector) { return false; } static inline unsigned int blk_queue_zone_no(struct request_queue *q, sector_t sector) { return 0; } static inline unsigned int queue_max_open_zones(const struct request_queue *q) { return 0; } static inline unsigned int queue_max_active_zones(const struct request_queue *q) { return 0; } #endif /* CONFIG_BLK_DEV_ZONED */ static inline bool rq_is_sync(struct request *rq) { return op_is_sync(rq->cmd_flags); } static inline bool rq_mergeable(struct request *rq) { if (blk_rq_is_passthrough(rq)) return false; if (req_op(rq) == REQ_OP_FLUSH) return false; if (req_op(rq) == REQ_OP_WRITE_ZEROES) return false; if (req_op(rq) == REQ_OP_ZONE_APPEND) return false; if (rq->cmd_flags & REQ_NOMERGE_FLAGS) return false; if (rq->rq_flags & RQF_NOMERGE_FLAGS) return false; return true; } static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b) { if (bio_page(a) == bio_page(b) && bio_offset(a) == bio_offset(b)) return true; return false; } static inline unsigned int blk_queue_depth(struct request_queue *q) { if (q->queue_depth) return q->queue_depth; return q->nr_requests; } extern unsigned long blk_max_low_pfn, blk_max_pfn; /* * standard bounce addresses: * * BLK_BOUNCE_HIGH : bounce all highmem pages * BLK_BOUNCE_ANY : don't bounce anything * BLK_BOUNCE_ISA : bounce pages above ISA DMA boundary */ #if BITS_PER_LONG == 32 #define BLK_BOUNCE_HIGH ((u64)blk_max_low_pfn << PAGE_SHIFT) #else #define BLK_BOUNCE_HIGH -1ULL #endif #define BLK_BOUNCE_ANY (-1ULL) #define BLK_BOUNCE_ISA (DMA_BIT_MASK(24)) /* * default timeout for SG_IO if none specified */ #define BLK_DEFAULT_SG_TIMEOUT (60 * HZ) #define BLK_MIN_SG_TIMEOUT (7 * HZ) struct rq_map_data { struct page **pages; int page_order; int nr_entries; unsigned long offset; int null_mapped; int from_user; }; struct req_iterator { struct bvec_iter iter; struct bio *bio; }; /* This should not be used directly - use rq_for_each_segment */ #define for_each_bio(_bio) \ for (; _bio; _bio = _bio->bi_next) #define __rq_for_each_bio(_bio, rq) \ if ((rq->bio)) \ for (_bio = (rq)->bio; _bio; _bio = _bio->bi_next) #define rq_for_each_segment(bvl, _rq, _iter) \ __rq_for_each_bio(_iter.bio, _rq) \ bio_for_each_segment(bvl, _iter.bio, _iter.iter) #define rq_for_each_bvec(bvl, _rq, _iter) \ __rq_for_each_bio(_iter.bio, _rq) \ bio_for_each_bvec(bvl, _iter.bio, _iter.iter) #define rq_iter_last(bvec, _iter) \ (_iter.bio->bi_next == NULL && \ bio_iter_last(bvec, _iter.iter)) #ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE # error "You should define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE for your platform" #endif #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE extern void rq_flush_dcache_pages(struct request *rq); #else static inline void rq_flush_dcache_pages(struct request *rq) { } #endif extern int blk_register_queue(struct gendisk *disk); extern void blk_unregister_queue(struct gendisk *disk); blk_qc_t submit_bio_noacct(struct bio *bio); extern void blk_rq_init(struct request_queue *q, struct request *rq); extern void blk_put_request(struct request *); extern struct request *blk_get_request(struct request_queue *, unsigned int op, blk_mq_req_flags_t flags); extern int blk_lld_busy(struct request_queue *q); extern int blk_rq_prep_clone(struct request *rq, struct request *rq_src, struct bio_set *bs, gfp_t gfp_mask, int (*bio_ctr)(struct bio *, struct bio *, void *), void *data); extern void blk_rq_unprep_clone(struct request *rq); extern blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *rq); extern int blk_rq_append_bio(struct request *rq, struct bio **bio); extern void blk_queue_split(struct bio **); extern int scsi_verify_blk_ioctl(struct block_device *, unsigned int); extern int scsi_cmd_blk_ioctl(struct block_device *, fmode_t, unsigned int, void __user *); extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t, unsigned int, void __user *); extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t, struct scsi_ioctl_command __user *); extern int get_sg_io_hdr(struct sg_io_hdr *hdr, const void __user *argp); extern int put_sg_io_hdr(const struct sg_io_hdr *hdr, void __user *argp); extern int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags); extern void blk_queue_exit(struct request_queue *q); extern void blk_sync_queue(struct request_queue *q); extern int blk_rq_map_user(struct request_queue *, struct request *, struct rq_map_data *, void __user *, unsigned long, gfp_t); extern int blk_rq_unmap_user(struct bio *); extern int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t); extern int blk_rq_map_user_iov(struct request_queue *, struct request *, struct rq_map_data *, const struct iov_iter *, gfp_t); extern void blk_execute_rq(struct request_queue *, struct gendisk *, struct request *, int); extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *, struct request *, int, rq_end_io_fn *); /* Helper to convert REQ_OP_XXX to its string format XXX */ extern const char *blk_op_str(unsigned int op); int blk_status_to_errno(blk_status_t status); blk_status_t errno_to_blk_status(int errno); int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin); static inline struct request_queue *bdev_get_queue(struct block_device *bdev) { return bdev->bd_disk->queue; /* this is never NULL */ } /* * The basic unit of block I/O is a sector. It is used in a number of contexts * in Linux (blk, bio, genhd). The size of one sector is 512 = 2**9 * bytes. Variables of type sector_t represent an offset or size that is a * multiple of 512 bytes. Hence these two constants. */ #ifndef SECTOR_SHIFT #define SECTOR_SHIFT 9 #endif #ifndef SECTOR_SIZE #define SECTOR_SIZE (1 << SECTOR_SHIFT) #endif /* * blk_rq_pos() : the current sector * blk_rq_bytes() : bytes left in the entire request * blk_rq_cur_bytes() : bytes left in the current segment * blk_rq_err_bytes() : bytes left till the next error boundary * blk_rq_sectors() : sectors left in the entire request * blk_rq_cur_sectors() : sectors left in the current segment * blk_rq_stats_sectors() : sectors of the entire request used for stats */ static inline sector_t blk_rq_pos(const struct request *rq) { return rq->__sector; } static inline unsigned int blk_rq_bytes(const struct request *rq) { return rq->__data_len; } static inline int blk_rq_cur_bytes(const struct request *rq) { return rq->bio ? bio_cur_bytes(rq->bio) : 0; } extern unsigned int blk_rq_err_bytes(const struct request *rq); static inline unsigned int blk_rq_sectors(const struct request *rq) { return blk_rq_bytes(rq) >> SECTOR_SHIFT; } static inline unsigned int blk_rq_cur_sectors(const struct request *rq) { return blk_rq_cur_bytes(rq) >> SECTOR_SHIFT; } static inline unsigned int blk_rq_stats_sectors(const struct request *rq) { return rq->stats_sectors; } #ifdef CONFIG_BLK_DEV_ZONED /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */ const char *blk_zone_cond_str(enum blk_zone_cond zone_cond); static inline unsigned int blk_rq_zone_no(struct request *rq) { return blk_queue_zone_no(rq->q, blk_rq_pos(rq)); } static inline unsigned int blk_rq_zone_is_seq(struct request *rq) { return blk_queue_zone_is_seq(rq->q, blk_rq_pos(rq)); } #endif /* CONFIG_BLK_DEV_ZONED */ /* * Some commands like WRITE SAME have a payload or data transfer size which * is different from the size of the request. Any driver that supports such * commands using the RQF_SPECIAL_PAYLOAD flag needs to use this helper to * calculate the data transfer size. */ static inline unsigned int blk_rq_payload_bytes(struct request *rq) { if (rq->rq_flags & RQF_SPECIAL_PAYLOAD) return rq->special_vec.bv_len; return blk_rq_bytes(rq); } /* * Return the first full biovec in the request. The caller needs to check that * there are any bvecs before calling this helper. */ static inline struct bio_vec req_bvec(struct request *rq) { if (rq->rq_flags & RQF_SPECIAL_PAYLOAD) return rq->special_vec; return mp_bvec_iter_bvec(rq->bio->bi_io_vec, rq->bio->bi_iter); } static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, int op) { if (unlikely(op == REQ_OP_DISCARD || op == REQ_OP_SECURE_ERASE)) return min(q->limits.max_discard_sectors, UINT_MAX >> SECTOR_SHIFT); if (unlikely(op == REQ_OP_WRITE_SAME)) return q->limits.max_write_same_sectors; if (unlikely(op == REQ_OP_WRITE_ZEROES)) return q->limits.max_write_zeroes_sectors; return q->limits.max_sectors; } /* * Return maximum size of a request at given offset. Only valid for * file system requests. */ static inline unsigned int blk_max_size_offset(struct request_queue *q, sector_t offset, unsigned int chunk_sectors) { if (!chunk_sectors) { if (q->limits.chunk_sectors) chunk_sectors = q->limits.chunk_sectors; else return q->limits.max_sectors; } if (likely(is_power_of_2(chunk_sectors))) chunk_sectors -= offset & (chunk_sectors - 1); else chunk_sectors -= sector_div(offset, chunk_sectors); return min(q->limits.max_sectors, chunk_sectors); } static inline unsigned int blk_rq_get_max_sectors(struct request *rq, sector_t offset) { struct request_queue *q = rq->q; if (blk_rq_is_passthrough(rq)) return q->limits.max_hw_sectors; if (!q->limits.chunk_sectors || req_op(rq) == REQ_OP_DISCARD || req_op(rq) == REQ_OP_SECURE_ERASE) return blk_queue_get_max_sectors(q, req_op(rq)); return min(blk_max_size_offset(q, offset, 0), blk_queue_get_max_sectors(q, req_op(rq))); } static inline unsigned int blk_rq_count_bios(struct request *rq) { unsigned int nr_bios = 0; struct bio *bio; __rq_for_each_bio(bio, rq) nr_bios++; return nr_bios; } void blk_steal_bios(struct bio_list *list, struct request *rq); /* * Request completion related functions. * * blk_update_request() completes given number of bytes and updates * the request without completing it. */ extern bool blk_update_request(struct request *rq, blk_status_t error, unsigned int nr_bytes); extern void blk_abort_request(struct request *); /* * Access functions for manipulating queue properties */ extern void blk_cleanup_queue(struct request_queue *); extern void blk_queue_bounce_limit(struct request_queue *, u64); extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int); extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int); extern void blk_queue_max_segments(struct request_queue *, unsigned short); extern void blk_queue_max_discard_segments(struct request_queue *, unsigned short); extern void blk_queue_max_segment_size(struct request_queue *, unsigned int); extern void blk_queue_max_discard_sectors(struct request_queue *q, unsigned int max_discard_sectors); extern void blk_queue_max_write_same_sectors(struct request_queue *q, unsigned int max_write_same_sectors); extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, unsigned int max_write_same_sectors); extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); extern void blk_queue_max_zone_append_sectors(struct request_queue *q, unsigned int max_zone_append_sectors); extern void blk_queue_physical_block_size(struct request_queue *, unsigned int); extern void blk_queue_alignment_offset(struct request_queue *q, unsigned int alignment); void blk_queue_update_readahead(struct request_queue *q); extern void blk_limits_io_min(struct queue_limits *limits, unsigned int min); extern void blk_queue_io_min(struct request_queue *q, unsigned int min); extern void blk_limits_io_opt(struct queue_limits *limits, unsigned int opt); extern void blk_queue_io_opt(struct request_queue *q, unsigned int opt); extern void blk_set_queue_depth(struct request_queue *q, unsigned int depth); extern void blk_set_default_limits(struct queue_limits *lim); extern void blk_set_stacking_limits(struct queue_limits *lim); extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, sector_t offset); extern void disk_stack_limits(struct gendisk *disk, struct block_device *bdev, sector_t offset); extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int); extern void blk_queue_segment_boundary(struct request_queue *, unsigned long); extern void blk_queue_virt_boundary(struct request_queue *, unsigned long); extern void blk_queue_dma_alignment(struct request_queue *, int); extern void blk_queue_update_dma_alignment(struct request_queue *, int); extern void blk_queue_rq_timeout(struct request_queue *, unsigned int); extern void blk_queue_write_cache(struct request_queue *q, bool enabled, bool fua); extern void blk_queue_required_elevator_features(struct request_queue *q, unsigned int features); extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q, struct device *dev); /* * Number of physical segments as sent to the device. * * Normally this is the number of discontiguous data segments sent by the * submitter. But for data-less command like discard we might have no * actual data segments submitted, but the driver might have to add it's * own special payload. In that case we still return 1 here so that this * special payload will be mapped. */ static inline unsigned short blk_rq_nr_phys_segments(struct request *rq) { if (rq->rq_flags & RQF_SPECIAL_PAYLOAD) return 1; return rq->nr_phys_segments; } /* * Number of discard segments (or ranges) the driver needs to fill in. * Each discard bio merged into a request is counted as one segment. */ static inline unsigned short blk_rq_nr_discard_segments(struct request *rq) { return max_t(unsigned short, rq->nr_phys_segments, 1); } int __blk_rq_map_sg(struct request_queue *q, struct request *rq, struct scatterlist *sglist, struct scatterlist **last_sg); static inline int blk_rq_map_sg(struct request_queue *q, struct request *rq, struct scatterlist *sglist) { struct scatterlist *last_sg = NULL; return __blk_rq_map_sg(q, rq, sglist, &last_sg); } extern void blk_dump_rq_flags(struct request *, char *); bool __must_check blk_get_queue(struct request_queue *); struct request_queue *blk_alloc_queue(int node_id); extern void blk_put_queue(struct request_queue *); extern void blk_set_queue_dying(struct request_queue *); #ifdef CONFIG_BLOCK /* * blk_plug permits building a queue of related requests by holding the I/O * fragments for a short period. This allows merging of sequential requests * into single larger request. As the requests are moved from a per-task list to * the device's request_queue in a batch, this results in improved scalability * as the lock contention for request_queue lock is reduced. * * It is ok not to disable preemption when adding the request to the plug list * or when attempting a merge, because blk_schedule_flush_list() will only flush * the plug list when the task sleeps by itself. For details, please see * schedule() where blk_schedule_flush_plug() is called. */ struct blk_plug { struct list_head mq_list; /* blk-mq requests */ struct list_head cb_list; /* md requires an unplug callback */ unsigned short rq_count; bool multiple_queues; bool nowait; }; #define BLK_MAX_REQUEST_COUNT 16 #define BLK_PLUG_FLUSH_SIZE (128 * 1024) struct blk_plug_cb; typedef void (*blk_plug_cb_fn)(struct blk_plug_cb *, bool); struct blk_plug_cb { struct list_head list; blk_plug_cb_fn callback; void *data; }; extern struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn unplug, void *data, int size); extern void blk_start_plug(struct blk_plug *); extern void blk_finish_plug(struct blk_plug *); extern void blk_flush_plug_list(struct blk_plug *, bool); static inline void blk_flush_plug(struct task_struct *tsk) { struct blk_plug *plug = tsk->plug; if (plug) blk_flush_plug_list(plug, false); } static inline void blk_schedule_flush_plug(struct task_struct *tsk) { struct blk_plug *plug = tsk->plug; if (plug) blk_flush_plug_list(plug, true); } static inline bool blk_needs_flush_plug(struct task_struct *tsk) { struct blk_plug *plug = tsk->plug; return plug && (!list_empty(&plug->mq_list) || !list_empty(&plug->cb_list)); } int blkdev_issue_flush(struct block_device *, gfp_t); long nr_blockdev_pages(void); #else /* CONFIG_BLOCK */ struct blk_plug { }; static inline void blk_start_plug(struct blk_plug *plug) { } static inline void blk_finish_plug(struct blk_plug *plug) { } static inline void blk_flush_plug(struct task_struct *task) { } static inline void blk_schedule_flush_plug(struct task_struct *task) { } static inline bool blk_needs_flush_plug(struct task_struct *tsk) { return false; } static inline int blkdev_issue_flush(struct block_device *bdev, gfp_t gfp_mask) { return 0; } static inline long nr_blockdev_pages(void) { return 0; } #endif /* CONFIG_BLOCK */ extern void blk_io_schedule(void); extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct page *page); #define BLKDEV_DISCARD_SECURE (1 << 0) /* issue a secure erase */ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, unsigned long flags); extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, int flags, struct bio **biop); #define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */ #define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */ extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct bio **biop, unsigned flags); extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, unsigned flags); static inline int sb_issue_discard(struct super_block *sb, sector_t block, sector_t nr_blocks, gfp_t gfp_mask, unsigned long flags) { return blkdev_issue_discard(sb->s_bdev, block << (sb->s_blocksize_bits - SECTOR_SHIFT), nr_blocks << (sb->s_blocksize_bits - SECTOR_SHIFT), gfp_mask, flags); } static inline int sb_issue_zeroout(struct super_block *sb, sector_t block, sector_t nr_blocks, gfp_t gfp_mask) { return blkdev_issue_zeroout(sb->s_bdev, block << (sb->s_blocksize_bits - SECTOR_SHIFT), nr_blocks << (sb->s_blocksize_bits - SECTOR_SHIFT), gfp_mask, 0); } extern int blk_verify_command(unsigned char *cmd, fmode_t mode); static inline bool bdev_is_partition(struct block_device *bdev) { return bdev->bd_partno; } enum blk_default_limits { BLK_MAX_SEGMENTS = 128, BLK_SAFE_MAX_SECTORS = 255, BLK_DEF_MAX_SECTORS = 2560, BLK_MAX_SEGMENT_SIZE = 65536, BLK_SEG_BOUNDARY_MASK = 0xFFFFFFFFUL, }; static inline unsigned long queue_segment_boundary(const struct request_queue *q) { return q->limits.seg_boundary_mask; } static inline unsigned long queue_virt_boundary(const struct request_queue *q) { return q->limits.virt_boundary_mask; } static inline unsigned int queue_max_sectors(const struct request_queue *q) { return q->limits.max_sectors; } static inline unsigned int queue_max_hw_sectors(const struct request_queue *q) { return q->limits.max_hw_sectors; } static inline unsigned short queue_max_segments(const struct request_queue *q) { return q->limits.max_segments; } static inline unsigned short queue_max_discard_segments(const struct request_queue *q) { return q->limits.max_discard_segments; } static inline unsigned int queue_max_segment_size(const struct request_queue *q) { return q->limits.max_segment_size; } static inline unsigned int queue_max_zone_append_sectors(const struct request_queue *q) { const struct queue_limits *l = &q->limits; return min(l->max_zone_append_sectors, l->max_sectors); } static inline unsigned queue_logical_block_size(const struct request_queue *q) { int retval = 512; if (q && q->limits.logical_block_size) retval = q->limits.logical_block_size; return retval; } static inline unsigned int bdev_logical_block_size(struct block_device *bdev) { return queue_logical_block_size(bdev_get_queue(bdev)); } static inline unsigned int queue_physical_block_size(const struct request_queue *q) { return q->limits.physical_block_size; } static inline unsigned int bdev_physical_block_size(struct block_device *bdev) { return queue_physical_block_size(bdev_get_queue(bdev)); } static inline unsigned int queue_io_min(const struct request_queue *q) { return q->limits.io_min; } static inline int bdev_io_min(struct block_device *bdev) { return queue_io_min(bdev_get_queue(bdev)); } static inline unsigned int queue_io_opt(const struct request_queue *q) { return q->limits.io_opt; } static inline int bdev_io_opt(struct block_device *bdev) { return queue_io_opt(bdev_get_queue(bdev)); } static inline int queue_alignment_offset(const struct request_queue *q) { if (q->limits.misaligned) return -1; return q->limits.alignment_offset; } static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t sector) { unsigned int granularity = max(lim->physical_block_size, lim->io_min); unsigned int alignment = sector_div(sector, granularity >> SECTOR_SHIFT) << SECTOR_SHIFT; return (granularity + lim->alignment_offset - alignment) % granularity; } static inline int bdev_alignment_offset(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (q->limits.misaligned) return -1; if (bdev_is_partition(bdev)) return queue_limit_alignment_offset(&q->limits, bdev->bd_part->start_sect); return q->limits.alignment_offset; } static inline int queue_discard_alignment(const struct request_queue *q) { if (q->limits.discard_misaligned) return -1; return q->limits.discard_alignment; } static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector_t sector) { unsigned int alignment, granularity, offset; if (!lim->max_discard_sectors) return 0; /* Why are these in bytes, not sectors? */ alignment = lim->discard_alignment >> SECTOR_SHIFT; granularity = lim->discard_granularity >> SECTOR_SHIFT; if (!granularity) return 0; /* Offset of the partition start in 'granularity' sectors */ offset = sector_div(sector, granularity); /* And why do we do this modulus *again* in blkdev_issue_discard()? */ offset = (granularity + alignment - offset) % granularity; /* Turn it back into bytes, gaah */ return offset << SECTOR_SHIFT; } /* * Two cases of handling DISCARD merge: * If max_discard_segments > 1, the driver takes every bio * as a range and send them to controller together. The ranges * needn't to be contiguous. * Otherwise, the bios/requests will be handled as same as * others which should be contiguous. */ static inline bool blk_discard_mergable(struct request *req) { if (req_op(req) == REQ_OP_DISCARD && queue_max_discard_segments(req->q) > 1) return true; return false; } static inline int bdev_discard_alignment(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (bdev_is_partition(bdev)) return queue_limit_discard_alignment(&q->limits, bdev->bd_part->start_sect); return q->limits.discard_alignment; } static inline unsigned int bdev_write_same(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (q) return q->limits.max_write_same_sectors; return 0; } static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (q) return q->limits.max_write_zeroes_sectors; return 0; } static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (q) return blk_queue_zoned_model(q); return BLK_ZONED_NONE; } static inline bool bdev_is_zoned(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (q) return blk_queue_is_zoned(q); return false; } static inline sector_t bdev_zone_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (q) return blk_queue_zone_sectors(q); return 0; } static inline unsigned int bdev_max_open_zones(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (q) return queue_max_open_zones(q); return 0; } static inline unsigned int bdev_max_active_zones(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); if (q) return queue_max_active_zones(q); return 0; } static inline int queue_dma_alignment(const struct request_queue *q) { return q ? q->dma_alignment : 511; } static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr, unsigned int len) { unsigned int alignment = queue_dma_alignment(q) | q->dma_pad_mask; return !(addr & alignment) && !(len & alignment); } /* assumes size > 256 */ static inline unsigned int blksize_bits(unsigned int size) { unsigned int bits = 8; do { bits++; size >>= 1; } while (size > 256); return bits; } static inline unsigned int block_size(struct block_device *bdev) { return 1 << bdev->bd_inode->i_blkbits; } int kblockd_schedule_work(struct work_struct *work); int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay); #define MODULE_ALIAS_BLOCKDEV(major,minor) \ MODULE_ALIAS("block-major-" __stringify(major) "-" __stringify(minor)) #define MODULE_ALIAS_BLOCKDEV_MAJOR(major) \ MODULE_ALIAS("block-major-" __stringify(major) "-*") #if defined(CONFIG_BLK_DEV_INTEGRITY) enum blk_integrity_flags { BLK_INTEGRITY_VERIFY = 1 << 0, BLK_INTEGRITY_GENERATE = 1 << 1, BLK_INTEGRITY_DEVICE_CAPABLE = 1 << 2, BLK_INTEGRITY_IP_CHECKSUM = 1 << 3, }; struct blk_integrity_iter { void *prot_buf; void *data_buf; sector_t seed; unsigned int data_size; unsigned short interval; const char *disk_name; }; typedef blk_status_t (integrity_processing_fn) (struct blk_integrity_iter *); typedef void (integrity_prepare_fn) (struct request *); typedef void (integrity_complete_fn) (struct request *, unsigned int); struct blk_integrity_profile { integrity_processing_fn *generate_fn; integrity_processing_fn *verify_fn; integrity_prepare_fn *prepare_fn; integrity_complete_fn *complete_fn; const char *name; }; extern void blk_integrity_register(struct gendisk *, struct blk_integrity *); extern void blk_integrity_unregister(struct gendisk *); extern int blk_integrity_compare(struct gendisk *, struct gendisk *); extern int blk_rq_map_integrity_sg(struct request_queue *, struct bio *, struct scatterlist *); extern int blk_rq_count_integrity_sg(struct request_queue *, struct bio *); static inline struct blk_integrity *blk_get_integrity(struct gendisk *disk) { struct blk_integrity *bi = &disk->queue->integrity; if (!bi->profile) return NULL; return bi; } static inline struct blk_integrity *bdev_get_integrity(struct block_device *bdev) { return blk_get_integrity(bdev->bd_disk); } static inline bool blk_integrity_queue_supports_integrity(struct request_queue *q) { return q->integrity.profile; } static inline bool blk_integrity_rq(struct request *rq) { return rq->cmd_flags & REQ_INTEGRITY; } static inline void blk_queue_max_integrity_segments(struct request_queue *q, unsigned int segs) { q->limits.max_integrity_segments = segs; } static inline unsigned short queue_max_integrity_segments(const struct request_queue *q) { return q->limits.max_integrity_segments; } /** * bio_integrity_intervals - Return number of integrity intervals for a bio * @bi: blk_integrity profile for device * @sectors: Size of the bio in 512-byte sectors * * Description: The block layer calculates everything in 512 byte * sectors but integrity metadata is done in terms of the data integrity * interval size of the storage device. Convert the block layer sectors * to the appropriate number of integrity intervals. */ static inline unsigned int bio_integrity_intervals(struct blk_integrity *bi, unsigned int sectors) { return sectors >> (bi->interval_exp - 9); } static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi, unsigned int sectors) { return bio_integrity_intervals(bi, sectors) * bi->tuple_size; } /* * Return the first bvec that contains integrity data. Only drivers that are * limited to a single integrity segment should use this helper. */ static inline struct bio_vec *rq_integrity_vec(struct request *rq) { if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1)) return NULL; return rq->bio->bi_integrity->bip_vec; } #else /* CONFIG_BLK_DEV_INTEGRITY */ struct bio; struct block_device; struct gendisk; struct blk_integrity; static inline int blk_integrity_rq(struct request *rq) { return 0; } static inline int blk_rq_count_integrity_sg(struct request_queue *q, struct bio *b) { return 0; } static inline int blk_rq_map_integrity_sg(struct request_queue *q, struct bio *b, struct scatterlist *s) { return 0; } static inline struct blk_integrity *bdev_get_integrity(struct block_device *b) { return NULL; } static inline struct blk_integrity *blk_get_integrity(struct gendisk *disk) { return NULL; } static inline bool blk_integrity_queue_supports_integrity(struct request_queue *q) { return false; } static inline int blk_integrity_compare(struct gendisk *a, struct gendisk *b) { return 0; } static inline void blk_integrity_register(struct gendisk *d, struct blk_integrity *b) { } static inline void blk_integrity_unregister(struct gendisk *d) { } static inline void blk_queue_max_integrity_segments(struct request_queue *q, unsigned int segs) { } static inline unsigned short queue_max_integrity_segments(const struct request_queue *q) { return 0; } static inline unsigned int bio_integrity_intervals(struct blk_integrity *bi, unsigned int sectors) { return 0; } static inline unsigned int bio_integrity_bytes(struct blk_integrity *bi, unsigned int sectors) { return 0; } static inline struct bio_vec *rq_integrity_vec(struct request *rq) { return NULL; } #endif /* CONFIG_BLK_DEV_INTEGRITY */ #ifdef CONFIG_BLK_INLINE_ENCRYPTION bool blk_ksm_register(struct blk_keyslot_manager *ksm, struct request_queue *q); void blk_ksm_unregister(struct request_queue *q); #else /* CONFIG_BLK_INLINE_ENCRYPTION */ static inline bool blk_ksm_register(struct blk_keyslot_manager *ksm, struct request_queue *q) { return true; } static inline void blk_ksm_unregister(struct request_queue *q) { } #endif /* CONFIG_BLK_INLINE_ENCRYPTION */ struct block_device_operations { blk_qc_t (*submit_bio) (struct bio *bio); int (*open) (struct block_device *, fmode_t); void (*release) (struct gendisk *, fmode_t); int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int); int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long); int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long); unsigned int (*check_events) (struct gendisk *disk, unsigned int clearing); void (*unlock_native_capacity) (struct gendisk *); int (*revalidate_disk) (struct gendisk *); int (*getgeo)(struct block_device *, struct hd_geometry *); /* this callback is with swap_lock and sometimes page table lock held */ void (*swap_slot_free_notify) (struct block_device *, unsigned long); int (*report_zones)(struct gendisk *, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); char *(*devnode)(struct gendisk *disk, umode_t *mode); struct module *owner; const struct pr_ops *pr_ops; }; #ifdef CONFIG_COMPAT extern int blkdev_compat_ptr_ioctl(struct block_device *, fmode_t, unsigned int, unsigned long); #else #define blkdev_compat_ptr_ioctl NULL #endif extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int, unsigned long); extern int bdev_read_page(struct block_device *, sector_t, struct page *); extern int bdev_write_page(struct block_device *, sector_t, struct page *, struct writeback_control *); #ifdef CONFIG_BLK_DEV_ZONED bool blk_req_needs_zone_write_lock(struct request *rq); bool blk_req_zone_write_trylock(struct request *rq); void __blk_req_zone_write_lock(struct request *rq); void __blk_req_zone_write_unlock(struct request *rq); static inline void blk_req_zone_write_lock(struct request *rq) { if (blk_req_needs_zone_write_lock(rq)) __blk_req_zone_write_lock(rq); } static inline void blk_req_zone_write_unlock(struct request *rq) { if (rq->rq_flags & RQF_ZONE_WRITE_LOCKED) __blk_req_zone_write_unlock(rq); } static inline bool blk_req_zone_is_write_locked(struct request *rq) { return rq->q->seq_zones_wlock && test_bit(blk_rq_zone_no(rq), rq->q->seq_zones_wlock); } static inline bool blk_req_can_dispatch_to_zone(struct request *rq) { if (!blk_req_needs_zone_write_lock(rq)) return true; return !blk_req_zone_is_write_locked(rq); } #else static inline bool blk_req_needs_zone_write_lock(struct request *rq) { return false; } static inline void blk_req_zone_write_lock(struct request *rq) { } static inline void blk_req_zone_write_unlock(struct request *rq) { } static inline bool blk_req_zone_is_write_locked(struct request *rq) { return false; } static inline bool blk_req_can_dispatch_to_zone(struct request *rq) { return true; } #endif /* CONFIG_BLK_DEV_ZONED */ static inline void blk_wake_io_task(struct task_struct *waiter) { /* * If we're polling, the task itself is doing the completions. For * that case, we don't need to signal a wakeup, it's enough to just * mark us as RUNNING. */ if (waiter == current) __set_current_state(TASK_RUNNING); else wake_up_process(waiter); } unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors, unsigned int op); void disk_end_io_acct(struct gendisk *disk, unsigned int op, unsigned long start_time); unsigned long part_start_io_acct(struct gendisk *disk, struct hd_struct **part, struct bio *bio); void part_end_io_acct(struct hd_struct *part, struct bio *bio, unsigned long start_time); /** * bio_start_io_acct - start I/O accounting for bio based drivers * @bio: bio to start account for * * Returns the start time that should be passed back to bio_end_io_acct(). */ static inline unsigned long bio_start_io_acct(struct bio *bio) { return disk_start_io_acct(bio->bi_disk, bio_sectors(bio), bio_op(bio)); } /** * bio_end_io_acct - end I/O accounting for bio based drivers * @bio: bio to end account for * @start: start time returned by bio_start_io_acct() */ static inline void bio_end_io_acct(struct bio *bio, unsigned long start_time) { return disk_end_io_acct(bio->bi_disk, bio_op(bio), start_time); } int bdev_read_only(struct block_device *bdev); int set_blocksize(struct block_device *bdev, int size); const char *bdevname(struct block_device *bdev, char *buffer); struct block_device *lookup_bdev(const char *); void blkdev_show(struct seq_file *seqf, off_t offset); #define BDEVNAME_SIZE 32 /* Largest string for a blockdev identifier */ #define BDEVT_SIZE 10 /* Largest string for MAJ:MIN for blkdev */ #ifdef CONFIG_BLOCK #define BLKDEV_MAJOR_MAX 512 #else #define BLKDEV_MAJOR_MAX 0 #endif struct block_device *blkdev_get_by_path(const char *path, fmode_t mode, void *holder); struct block_device *blkdev_get_by_dev(dev_t dev, fmode_t mode, void *holder); int bd_prepare_to_claim(struct block_device *bdev, struct block_device *whole, void *holder); void bd_abort_claiming(struct block_device *bdev, struct block_device *whole, void *holder); void blkdev_put(struct block_device *bdev, fmode_t mode); struct block_device *I_BDEV(struct inode *inode); struct block_device *bdget_part(struct hd_struct *part); struct block_device *bdgrab(struct block_device *bdev); void bdput(struct block_device *); #ifdef CONFIG_BLOCK void invalidate_bdev(struct block_device *bdev); int truncate_bdev_range(struct block_device *bdev, fmode_t mode, loff_t lstart, loff_t lend); int sync_blockdev(struct block_device *bdev); #else static inline void invalidate_bdev(struct block_device *bdev) { } static inline int truncate_bdev_range(struct block_device *bdev, fmode_t mode, loff_t lstart, loff_t lend) { return 0; } static inline int sync_blockdev(struct block_device *bdev) { return 0; } #endif int fsync_bdev(struct block_device *bdev); struct super_block *freeze_bdev(struct block_device *bdev); int thaw_bdev(struct block_device *bdev, struct super_block *sb); #endif /* _LINUX_BLKDEV_H */
2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _KERNEL_EVENTS_INTERNAL_H #define _KERNEL_EVENTS_INTERNAL_H #include <linux/hardirq.h> #include <linux/uaccess.h> #include <linux/refcount.h> /* Buffer handling */ #define RING_BUFFER_WRITABLE 0x01 struct perf_buffer { refcount_t refcount; struct rcu_head rcu_head; #ifdef CONFIG_PERF_USE_VMALLOC struct work_struct work; int page_order; /* allocation order */ #endif int nr_pages; /* nr of data pages */ int overwrite; /* can overwrite itself */ int paused; /* can write into ring buffer */ atomic_t poll; /* POLL_ for wakeups */ local_t head; /* write position */ unsigned int nest; /* nested writers */ local_t events; /* event limit */ local_t wakeup; /* wakeup stamp */ local_t lost; /* nr records lost */ long watermark; /* wakeup watermark */ long aux_watermark; /* poll crap */ spinlock_t event_lock; struct list_head event_list; atomic_t mmap_count; unsigned long mmap_locked; struct user_struct *mmap_user; /* AUX area */ long aux_head; unsigned int aux_nest; long aux_wakeup; /* last aux_watermark boundary crossed by aux_head */ unsigned long aux_pgoff; int aux_nr_pages; int aux_overwrite; atomic_t aux_mmap_count; unsigned long aux_mmap_locked; void (*free_aux)(void *); refcount_t aux_refcount; int aux_in_sampling; void **aux_pages; void *aux_priv; struct perf_event_mmap_page *user_page; void *data_pages[]; }; extern void rb_free(struct perf_buffer *rb); static inline void rb_free_rcu(struct rcu_head *rcu_head) { struct perf_buffer *rb; rb = container_of(rcu_head, struct perf_buffer, rcu_head); rb_free(rb); } static inline void rb_toggle_paused(struct perf_buffer *rb, bool pause) { if (!pause && rb->nr_pages) rb->paused = 0; else rb->paused = 1; } extern struct perf_buffer * rb_alloc(int nr_pages, long watermark, int cpu, int flags); extern void perf_event_wakeup(struct perf_event *event); extern int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event, pgoff_t pgoff, int nr_pages, long watermark, int flags); extern void rb_free_aux(struct perf_buffer *rb); extern struct perf_buffer *ring_buffer_get(struct perf_event *event); extern void ring_buffer_put(struct perf_buffer *rb); static inline bool rb_has_aux(struct perf_buffer *rb) { return !!rb->aux_nr_pages; } void perf_event_aux_event(struct perf_event *event, unsigned long head, unsigned long size, u64 flags); extern struct page * perf_mmap_to_page(struct perf_buffer *rb, unsigned long pgoff); #ifdef CONFIG_PERF_USE_VMALLOC /* * Back perf_mmap() with vmalloc memory. * * Required for architectures that have d-cache aliasing issues. */ static inline int page_order(struct perf_buffer *rb) { return rb->page_order; } #else static inline int page_order(struct perf_buffer *rb) { return 0; } #endif static inline unsigned long perf_data_size(struct perf_buffer *rb) { return rb->nr_pages << (PAGE_SHIFT + page_order(rb)); } static inline unsigned long perf_aux_size(struct perf_buffer *rb) { return rb->aux_nr_pages << PAGE_SHIFT; } #define __DEFINE_OUTPUT_COPY_BODY(advance_buf, memcpy_func, ...) \ { \ unsigned long size, written; \ \ do { \ size = min(handle->size, len); \ written = memcpy_func(__VA_ARGS__); \ written = size - written; \ \ len -= written; \ handle->addr += written; \ if (advance_buf) \ buf += written; \ handle->size -= written; \ if (!handle->size) { \ struct perf_buffer *rb = handle->rb; \ \ handle->page++; \ handle->page &= rb->nr_pages - 1; \ handle->addr = rb->data_pages[handle->page]; \ handle->size = PAGE_SIZE << page_order(rb); \ } \ } while (len && written == size); \ \ return len; \ } #define DEFINE_OUTPUT_COPY(func_name, memcpy_func) \ static inline unsigned long \ func_name(struct perf_output_handle *handle, \ const void *buf, unsigned long len) \ __DEFINE_OUTPUT_COPY_BODY(true, memcpy_func, handle->addr, buf, size) static inline unsigned long __output_custom(struct perf_output_handle *handle, perf_copy_f copy_func, const void *buf, unsigned long len) { unsigned long orig_len = len; __DEFINE_OUTPUT_COPY_BODY(false, copy_func, handle->addr, buf, orig_len - len, size) } static inline unsigned long memcpy_common(void *dst, const void *src, unsigned long n) { memcpy(dst, src, n); return 0; } DEFINE_OUTPUT_COPY(__output_copy, memcpy_common) static inline unsigned long memcpy_skip(void *dst, const void *src, unsigned long n) { return 0; } DEFINE_OUTPUT_COPY(__output_skip, memcpy_skip) #ifndef arch_perf_out_copy_user #define arch_perf_out_copy_user arch_perf_out_copy_user static inline unsigned long arch_perf_out_copy_user(void *dst, const void *src, unsigned long n) { unsigned long ret; pagefault_disable(); ret = __copy_from_user_inatomic(dst, src, n); pagefault_enable(); return ret; } #endif DEFINE_OUTPUT_COPY(__output_copy_user, arch_perf_out_copy_user) static inline int get_recursion_context(int *recursion) { unsigned int pc = preempt_count(); unsigned char rctx = 0; rctx += !!(pc & (NMI_MASK)); rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK)); rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)); if (recursion[rctx]) return -1; recursion[rctx]++; barrier(); return rctx; } static inline void put_recursion_context(int *recursion, int rctx) { barrier(); recursion[rctx]--; } #ifdef CONFIG_HAVE_PERF_USER_STACK_DUMP static inline bool arch_perf_have_user_stack_dump(void) { return true; } #define perf_user_stack_pointer(regs) user_stack_pointer(regs) #else static inline bool arch_perf_have_user_stack_dump(void) { return false; } #define perf_user_stack_pointer(regs) 0 #endif /* CONFIG_HAVE_PERF_USER_STACK_DUMP */ #endif /* _KERNEL_EVENTS_INTERNAL_H */
1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_BH_H #define _LINUX_BH_H #include <linux/preempt.h> #ifdef CONFIG_TRACE_IRQFLAGS extern void __local_bh_disable_ip(unsigned long ip, unsigned int cnt); #else static __always_inline void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) { preempt_count_add(cnt); barrier(); } #endif static inline void local_bh_disable(void) { __local_bh_disable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET); } extern void _local_bh_enable(void); extern void __local_bh_enable_ip(unsigned long ip, unsigned int cnt); static inline void local_bh_enable_ip(unsigned long ip) { __local_bh_enable_ip(ip, SOFTIRQ_DISABLE_OFFSET); } static inline void local_bh_enable(void) { __local_bh_enable_ip(_THIS_IP_, SOFTIRQ_DISABLE_OFFSET); } #endif /* _LINUX_BH_H */
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 2788 2789 2790 2791 2792 2793 2794 2795 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 2879 2880 2881 2882 2883 2884 2885 2886 2887 2888 2889 2890 2891 2892 2893 2894 2895 2896 2897 2898 2899 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 2915 2916 2917 2918 2919 2920 2921 2922 2923 2924 2925 2926 2927 2928 2929 2930 2931 2932 2933 2934 2935 2936 2937 2938 2939 2940 2941 2942 2943 2944 2945 2946 2947 2948 2949 2950 2951 2952 2953 2954 2955 2956 2957 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 2968 2969 2970 2971 2972 2973 2974 2975 2976 2977 2978 2979 2980 2981 2982 2983 2984 2985 2986 2987 2988 2989 2990 2991 2992 2993 2994 2995 2996 2997 2998 2999 3000 3001 3002 3003 3004 3005 3006 3007 3008 3009 3010 3011 3012 3013 3014 3015 3016 3017 3018 3019 3020 3021 3022 3023 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 3055 3056 3057 3058 3059 3060 3061 3062 3063 3064 3065 3066 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 // SPDX-License-Identifier: GPL-2.0 /* * linux/fs/ext4/xattr.c * * Copyright (C) 2001-2003 Andreas Gruenbacher, <agruen@suse.de> * * Fix by Harrison Xing <harrison@mountainviewdata.com>. * Ext4 code with a lot of help from Eric Jarman <ejarman@acm.org>. * Extended attributes for symlinks and special files added per * suggestion of Luka Renko <luka.renko@hermes.si>. * xattr consolidation Copyright (c) 2004 James Morris <jmorris@redhat.com>, * Red Hat Inc. * ea-in-inode support by Alex Tomas <alex@clusterfs.com> aka bzzz * and Andreas Gruenbacher <agruen@suse.de>. */ /* * Extended attributes are stored directly in inodes (on file systems with * inodes bigger than 128 bytes) and on additional disk blocks. The i_file_acl * field contains the block number if an inode uses an additional block. All * attributes must fit in the inode and one additional block. Blocks that * contain the identical set of attributes may be shared among several inodes. * Identical blocks are detected by keeping a cache of blocks that have * recently been accessed. * * The attributes in inodes and on blocks have a different header; the entries * are stored in the same format: * * +------------------+ * | header | * | entry 1 | | * | entry 2 | | growing downwards * | entry 3 | v * | four null bytes | * | . . . | * | value 1 | ^ * | value 3 | | growing upwards * | value 2 | | * +------------------+ * * The header is followed by multiple entry descriptors. In disk blocks, the * entry descriptors are kept sorted. In inodes, they are unsorted. The * attribute values are aligned to the end of the block in no specific order. * * Locking strategy * ---------------- * EXT4_I(inode)->i_file_acl is protected by EXT4_I(inode)->xattr_sem. * EA blocks are only changed if they are exclusive to an inode, so * holding xattr_sem also means that nothing but the EA block's reference * count can change. Multiple writers to the same block are synchronized * by the buffer lock. */ #include <linux/init.h> #include <linux/fs.h> #include <linux/slab.h> #include <linux/mbcache.h> #include <linux/quotaops.h> #include <linux/iversion.h> #include "ext4_jbd2.h" #include "ext4.h" #include "xattr.h" #include "acl.h" #ifdef EXT4_XATTR_DEBUG # define ea_idebug(inode, fmt, ...) \ printk(KERN_DEBUG "inode %s:%lu: " fmt "\n", \ inode->i_sb->s_id, inode->i_ino, ##__VA_ARGS__) # define ea_bdebug(bh, fmt, ...) \ printk(KERN_DEBUG "block %pg:%lu: " fmt "\n", \ bh->b_bdev, (unsigned long)bh->b_blocknr, ##__VA_ARGS__) #else # define ea_idebug(inode, fmt, ...) no_printk(fmt, ##__VA_ARGS__) # define ea_bdebug(bh, fmt, ...) no_printk(fmt, ##__VA_ARGS__) #endif static void ext4_xattr_block_cache_insert(struct mb_cache *, struct buffer_head *); static struct buffer_head * ext4_xattr_block_cache_find(struct inode *, struct ext4_xattr_header *, struct mb_cache_entry **); static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value, size_t value_count); static void ext4_xattr_rehash(struct ext4_xattr_header *); static const struct xattr_handler * const ext4_xattr_handler_map[] = { [EXT4_XATTR_INDEX_USER] = &ext4_xattr_user_handler, #ifdef CONFIG_EXT4_FS_POSIX_ACL [EXT4_XATTR_INDEX_POSIX_ACL_ACCESS] = &posix_acl_access_xattr_handler, [EXT4_XATTR_INDEX_POSIX_ACL_DEFAULT] = &posix_acl_default_xattr_handler, #endif [EXT4_XATTR_INDEX_TRUSTED] = &ext4_xattr_trusted_handler, #ifdef CONFIG_EXT4_FS_SECURITY [EXT4_XATTR_INDEX_SECURITY] = &ext4_xattr_security_handler, #endif [EXT4_XATTR_INDEX_HURD] = &ext4_xattr_hurd_handler, }; const struct xattr_handler *ext4_xattr_handlers[] = { &ext4_xattr_user_handler, &ext4_xattr_trusted_handler, #ifdef CONFIG_EXT4_FS_POSIX_ACL &posix_acl_access_xattr_handler, &posix_acl_default_xattr_handler, #endif #ifdef CONFIG_EXT4_FS_SECURITY &ext4_xattr_security_handler, #endif &ext4_xattr_hurd_handler, NULL }; #define EA_BLOCK_CACHE(inode) (((struct ext4_sb_info *) \ inode->i_sb->s_fs_info)->s_ea_block_cache) #define EA_INODE_CACHE(inode) (((struct ext4_sb_info *) \ inode->i_sb->s_fs_info)->s_ea_inode_cache) static int ext4_expand_inode_array(struct ext4_xattr_inode_array **ea_inode_array, struct inode *inode); #ifdef CONFIG_LOCKDEP void ext4_xattr_inode_set_class(struct inode *ea_inode) { lockdep_set_subclass(&ea_inode->i_rwsem, 1); } #endif static __le32 ext4_xattr_block_csum(struct inode *inode, sector_t block_nr, struct ext4_xattr_header *hdr) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); __u32 csum; __le64 dsk_block_nr = cpu_to_le64(block_nr); __u32 dummy_csum = 0; int offset = offsetof(struct ext4_xattr_header, h_checksum); csum = ext4_chksum(sbi, sbi->s_csum_seed, (__u8 *)&dsk_block_nr, sizeof(dsk_block_nr)); csum = ext4_chksum(sbi, csum, (__u8 *)hdr, offset); csum = ext4_chksum(sbi, csum, (__u8 *)&dummy_csum, sizeof(dummy_csum)); offset += sizeof(dummy_csum); csum = ext4_chksum(sbi, csum, (__u8 *)hdr + offset, EXT4_BLOCK_SIZE(inode->i_sb) - offset); return cpu_to_le32(csum); } static int ext4_xattr_block_csum_verify(struct inode *inode, struct buffer_head *bh) { struct ext4_xattr_header *hdr = BHDR(bh); int ret = 1; if (ext4_has_metadata_csum(inode->i_sb)) { lock_buffer(bh); ret = (hdr->h_checksum == ext4_xattr_block_csum(inode, bh->b_blocknr, hdr)); unlock_buffer(bh); } return ret; } static void ext4_xattr_block_csum_set(struct inode *inode, struct buffer_head *bh) { if (ext4_has_metadata_csum(inode->i_sb)) BHDR(bh)->h_checksum = ext4_xattr_block_csum(inode, bh->b_blocknr, BHDR(bh)); } static inline const struct xattr_handler * ext4_xattr_handler(int name_index) { const struct xattr_handler *handler = NULL; if (name_index > 0 && name_index < ARRAY_SIZE(ext4_xattr_handler_map)) handler = ext4_xattr_handler_map[name_index]; return handler; } static int ext4_xattr_check_entries(struct ext4_xattr_entry *entry, void *end, void *value_start) { struct ext4_xattr_entry *e = entry; /* Find the end of the names list */ while (!IS_LAST_ENTRY(e)) { struct ext4_xattr_entry *next = EXT4_XATTR_NEXT(e); if ((void *)next >= end) return -EFSCORRUPTED; if (strnlen(e->e_name, e->e_name_len) != e->e_name_len) return -EFSCORRUPTED; e = next; } /* Check the values */ while (!IS_LAST_ENTRY(entry)) { u32 size = le32_to_cpu(entry->e_value_size); if (size > EXT4_XATTR_SIZE_MAX) return -EFSCORRUPTED; if (size != 0 && entry->e_value_inum == 0) { u16 offs = le16_to_cpu(entry->e_value_offs); void *value; /* * The value cannot overlap the names, and the value * with padding cannot extend beyond 'end'. Check both * the padded and unpadded sizes, since the size may * overflow to 0 when adding padding. */ if (offs > end - value_start) return -EFSCORRUPTED; value = value_start + offs; if (value < (void *)e + sizeof(u32) || size > end - value || EXT4_XATTR_SIZE(size) > end - value) return -EFSCORRUPTED; } entry = EXT4_XATTR_NEXT(entry); } return 0; } static inline int __ext4_xattr_check_block(struct inode *inode, struct buffer_head *bh, const char *function, unsigned int line) { int error = -EFSCORRUPTED; if (BHDR(bh)->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) || BHDR(bh)->h_blocks != cpu_to_le32(1)) goto errout; if (buffer_verified(bh)) return 0; error = -EFSBADCRC; if (!ext4_xattr_block_csum_verify(inode, bh)) goto errout; error = ext4_xattr_check_entries(BFIRST(bh), bh->b_data + bh->b_size, bh->b_data); errout: if (error) __ext4_error_inode(inode, function, line, 0, -error, "corrupted xattr block %llu", (unsigned long long) bh->b_blocknr); else set_buffer_verified(bh); return error; } #define ext4_xattr_check_block(inode, bh) \ __ext4_xattr_check_block((inode), (bh), __func__, __LINE__) static int __xattr_check_inode(struct inode *inode, struct ext4_xattr_ibody_header *header, void *end, const char *function, unsigned int line) { int error = -EFSCORRUPTED; if (end - (void *)header < sizeof(*header) + sizeof(u32) || (header->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC))) goto errout; error = ext4_xattr_check_entries(IFIRST(header), end, IFIRST(header)); errout: if (error) __ext4_error_inode(inode, function, line, 0, -error, "corrupted in-inode xattr"); return error; } #define xattr_check_inode(inode, header, end) \ __xattr_check_inode((inode), (header), (end), __func__, __LINE__) static int xattr_find_entry(struct inode *inode, struct ext4_xattr_entry **pentry, void *end, int name_index, const char *name, int sorted) { struct ext4_xattr_entry *entry, *next; size_t name_len; int cmp = 1; if (name == NULL) return -EINVAL; name_len = strlen(name); for (entry = *pentry; !IS_LAST_ENTRY(entry); entry = next) { next = EXT4_XATTR_NEXT(entry); if ((void *) next >= end) { EXT4_ERROR_INODE(inode, "corrupted xattr entries"); return -EFSCORRUPTED; } cmp = name_index - entry->e_name_index; if (!cmp) cmp = name_len - entry->e_name_len; if (!cmp) cmp = memcmp(name, entry->e_name, name_len); if (cmp <= 0 && (sorted || cmp == 0)) break; } *pentry = entry; return cmp ? -ENODATA : 0; } static u32 ext4_xattr_inode_hash(struct ext4_sb_info *sbi, const void *buffer, size_t size) { return ext4_chksum(sbi, sbi->s_csum_seed, buffer, size); } static u64 ext4_xattr_inode_get_ref(struct inode *ea_inode) { return ((u64)ea_inode->i_ctime.tv_sec << 32) | (u32) inode_peek_iversion_raw(ea_inode); } static void ext4_xattr_inode_set_ref(struct inode *ea_inode, u64 ref_count) { ea_inode->i_ctime.tv_sec = (u32)(ref_count >> 32); inode_set_iversion_raw(ea_inode, ref_count & 0xffffffff); } static u32 ext4_xattr_inode_get_hash(struct inode *ea_inode) { return (u32)ea_inode->i_atime.tv_sec; } static void ext4_xattr_inode_set_hash(struct inode *ea_inode, u32 hash) { ea_inode->i_atime.tv_sec = hash; } /* * Read the EA value from an inode. */ static int ext4_xattr_inode_read(struct inode *ea_inode, void *buf, size_t size) { int blocksize = 1 << ea_inode->i_blkbits; int bh_count = (size + blocksize - 1) >> ea_inode->i_blkbits; int tail_size = (size % blocksize) ?: blocksize; struct buffer_head *bhs_inline[8]; struct buffer_head **bhs = bhs_inline; int i, ret; if (bh_count > ARRAY_SIZE(bhs_inline)) { bhs = kmalloc_array(bh_count, sizeof(*bhs), GFP_NOFS); if (!bhs) return -ENOMEM; } ret = ext4_bread_batch(ea_inode, 0 /* block */, bh_count, true /* wait */, bhs); if (ret) goto free_bhs; for (i = 0; i < bh_count; i++) { /* There shouldn't be any holes in ea_inode. */ if (!bhs[i]) { ret = -EFSCORRUPTED; goto put_bhs; } memcpy((char *)buf + blocksize * i, bhs[i]->b_data, i < bh_count - 1 ? blocksize : tail_size); } ret = 0; put_bhs: for (i = 0; i < bh_count; i++) brelse(bhs[i]); free_bhs: if (bhs != bhs_inline) kfree(bhs); return ret; } #define EXT4_XATTR_INODE_GET_PARENT(inode) ((__u32)(inode)->i_mtime.tv_sec) static int ext4_xattr_inode_iget(struct inode *parent, unsigned long ea_ino, u32 ea_inode_hash, struct inode **ea_inode) { struct inode *inode; int err; inode = ext4_iget(parent->i_sb, ea_ino, EXT4_IGET_NORMAL); if (IS_ERR(inode)) { err = PTR_ERR(inode); ext4_error(parent->i_sb, "error while reading EA inode %lu err=%d", ea_ino, err); return err; } if (is_bad_inode(inode)) { ext4_error(parent->i_sb, "error while reading EA inode %lu is_bad_inode", ea_ino); err = -EIO; goto error; } if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) { ext4_error(parent->i_sb, "EA inode %lu does not have EXT4_EA_INODE_FL flag", ea_ino); err = -EINVAL; goto error; } ext4_xattr_inode_set_class(inode); /* * Check whether this is an old Lustre-style xattr inode. Lustre * implementation does not have hash validation, rather it has a * backpointer from ea_inode to the parent inode. */ if (ea_inode_hash != ext4_xattr_inode_get_hash(inode) && EXT4_XATTR_INODE_GET_PARENT(inode) == parent->i_ino && inode->i_generation == parent->i_generation) { ext4_set_inode_state(inode, EXT4_STATE_LUSTRE_EA_INODE); ext4_xattr_inode_set_ref(inode, 1); } else { inode_lock(inode); inode->i_flags |= S_NOQUOTA; inode_unlock(inode); } *ea_inode = inode; return 0; error: iput(inode); return err; } static int ext4_xattr_inode_verify_hashes(struct inode *ea_inode, struct ext4_xattr_entry *entry, void *buffer, size_t size) { u32 hash; /* Verify stored hash matches calculated hash. */ hash = ext4_xattr_inode_hash(EXT4_SB(ea_inode->i_sb), buffer, size); if (hash != ext4_xattr_inode_get_hash(ea_inode)) return -EFSCORRUPTED; if (entry) { __le32 e_hash, tmp_data; /* Verify entry hash. */ tmp_data = cpu_to_le32(hash); e_hash = ext4_xattr_hash_entry(entry->e_name, entry->e_name_len, &tmp_data, 1); if (e_hash != entry->e_hash) return -EFSCORRUPTED; } return 0; } /* * Read xattr value from the EA inode. */ static int ext4_xattr_inode_get(struct inode *inode, struct ext4_xattr_entry *entry, void *buffer, size_t size) { struct mb_cache *ea_inode_cache = EA_INODE_CACHE(inode); struct inode *ea_inode; int err; err = ext4_xattr_inode_iget(inode, le32_to_cpu(entry->e_value_inum), le32_to_cpu(entry->e_hash), &ea_inode); if (err) { ea_inode = NULL; goto out; } if (i_size_read(ea_inode) != size) { ext4_warning_inode(ea_inode, "ea_inode file size=%llu entry size=%zu", i_size_read(ea_inode), size); err = -EFSCORRUPTED; goto out; } err = ext4_xattr_inode_read(ea_inode, buffer, size); if (err) goto out; if (!ext4_test_inode_state(ea_inode, EXT4_STATE_LUSTRE_EA_INODE)) { err = ext4_xattr_inode_verify_hashes(ea_inode, entry, buffer, size); if (err) { ext4_warning_inode(ea_inode, "EA inode hash validation failed"); goto out; } if (ea_inode_cache) mb_cache_entry_create(ea_inode_cache, GFP_NOFS, ext4_xattr_inode_get_hash(ea_inode), ea_inode->i_ino, true /* reusable */); } out: iput(ea_inode); return err; } static int ext4_xattr_block_get(struct inode *inode, int name_index, const char *name, void *buffer, size_t buffer_size) { struct buffer_head *bh = NULL; struct ext4_xattr_entry *entry; size_t size; void *end; int error; struct mb_cache *ea_block_cache = EA_BLOCK_CACHE(inode); ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); if (!EXT4_I(inode)->i_file_acl) return -ENODATA; ea_idebug(inode, "reading block %llu", (unsigned long long)EXT4_I(inode)->i_file_acl); bh = ext4_sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl, REQ_PRIO); if (IS_ERR(bh)) return PTR_ERR(bh); ea_bdebug(bh, "b_count=%d, refcount=%d", atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount)); error = ext4_xattr_check_block(inode, bh); if (error) goto cleanup; ext4_xattr_block_cache_insert(ea_block_cache, bh); entry = BFIRST(bh); end = bh->b_data + bh->b_size; error = xattr_find_entry(inode, &entry, end, name_index, name, 1); if (error) goto cleanup; size = le32_to_cpu(entry->e_value_size); error = -ERANGE; if (unlikely(size > EXT4_XATTR_SIZE_MAX)) goto cleanup; if (buffer) { if (size > buffer_size) goto cleanup; if (entry->e_value_inum) { error = ext4_xattr_inode_get(inode, entry, buffer, size); if (error) goto cleanup; } else { u16 offset = le16_to_cpu(entry->e_value_offs); void *p = bh->b_data + offset; if (unlikely(p + size > end)) goto cleanup; memcpy(buffer, p, size); } } error = size; cleanup: brelse(bh); return error; } int ext4_xattr_ibody_get(struct inode *inode, int name_index, const char *name, void *buffer, size_t buffer_size) { struct ext4_xattr_ibody_header *header; struct ext4_xattr_entry *entry; struct ext4_inode *raw_inode; struct ext4_iloc iloc; size_t size; void *end; int error; if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR)) return -ENODATA; error = ext4_get_inode_loc(inode, &iloc); if (error) return error; raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; error = xattr_check_inode(inode, header, end); if (error) goto cleanup; entry = IFIRST(header); error = xattr_find_entry(inode, &entry, end, name_index, name, 0); if (error) goto cleanup; size = le32_to_cpu(entry->e_value_size); error = -ERANGE; if (unlikely(size > EXT4_XATTR_SIZE_MAX)) goto cleanup; if (buffer) { if (size > buffer_size) goto cleanup; if (entry->e_value_inum) { error = ext4_xattr_inode_get(inode, entry, buffer, size); if (error) goto cleanup; } else { u16 offset = le16_to_cpu(entry->e_value_offs); void *p = (void *)IFIRST(header) + offset; if (unlikely(p + size > end)) goto cleanup; memcpy(buffer, p, size); } } error = size; cleanup: brelse(iloc.bh); return error; } /* * ext4_xattr_get() * * Copy an extended attribute into the buffer * provided, or compute the buffer size required. * Buffer is NULL to compute the size of the buffer required. * * Returns a negative error number on failure, or the number of bytes * used / required on success. */ int ext4_xattr_get(struct inode *inode, int name_index, const char *name, void *buffer, size_t buffer_size) { int error; if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) return -EIO; if (strlen(name) > 255) return -ERANGE; down_read(&EXT4_I(inode)->xattr_sem); error = ext4_xattr_ibody_get(inode, name_index, name, buffer, buffer_size); if (error == -ENODATA) error = ext4_xattr_block_get(inode, name_index, name, buffer, buffer_size); up_read(&EXT4_I(inode)->xattr_sem); return error; } static int ext4_xattr_list_entries(struct dentry *dentry, struct ext4_xattr_entry *entry, char *buffer, size_t buffer_size) { size_t rest = buffer_size; for (; !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) { const struct xattr_handler *handler = ext4_xattr_handler(entry->e_name_index); if (handler && (!handler->list || handler->list(dentry))) { const char *prefix = handler->prefix ?: handler->name; size_t prefix_len = strlen(prefix); size_t size = prefix_len + entry->e_name_len + 1; if (buffer) { if (size > rest) return -ERANGE; memcpy(buffer, prefix, prefix_len); buffer += prefix_len; memcpy(buffer, entry->e_name, entry->e_name_len); buffer += entry->e_name_len; *buffer++ = 0; } rest -= size; } } return buffer_size - rest; /* total size */ } static int ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size) { struct inode *inode = d_inode(dentry); struct buffer_head *bh = NULL; int error; ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); if (!EXT4_I(inode)->i_file_acl) return 0; ea_idebug(inode, "reading block %llu", (unsigned long long)EXT4_I(inode)->i_file_acl); bh = ext4_sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl, REQ_PRIO); if (IS_ERR(bh)) return PTR_ERR(bh); ea_bdebug(bh, "b_count=%d, refcount=%d", atomic_read(&(bh->b_count)), le32_to_cpu(BHDR(bh)->h_refcount)); error = ext4_xattr_check_block(inode, bh); if (error) goto cleanup; ext4_xattr_block_cache_insert(EA_BLOCK_CACHE(inode), bh); error = ext4_xattr_list_entries(dentry, BFIRST(bh), buffer, buffer_size); cleanup: brelse(bh); return error; } static int ext4_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size) { struct inode *inode = d_inode(dentry); struct ext4_xattr_ibody_header *header; struct ext4_inode *raw_inode; struct ext4_iloc iloc; void *end; int error; if (!ext4_test_inode_state(inode, EXT4_STATE_XATTR)) return 0; error = ext4_get_inode_loc(inode, &iloc); if (error) return error; raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; error = xattr_check_inode(inode, header, end); if (error) goto cleanup; error = ext4_xattr_list_entries(dentry, IFIRST(header), buffer, buffer_size); cleanup: brelse(iloc.bh); return error; } /* * Inode operation listxattr() * * d_inode(dentry)->i_rwsem: don't care * * Copy a list of attribute names into the buffer * provided, or compute the buffer size required. * Buffer is NULL to compute the size of the buffer required. * * Returns a negative error number on failure, or the number of bytes * used / required on success. */ ssize_t ext4_listxattr(struct dentry *dentry, char *buffer, size_t buffer_size) { int ret, ret2; down_read(&EXT4_I(d_inode(dentry))->xattr_sem); ret = ret2 = ext4_xattr_ibody_list(dentry, buffer, buffer_size); if (ret < 0) goto errout; if (buffer) { buffer += ret; buffer_size -= ret; } ret = ext4_xattr_block_list(dentry, buffer, buffer_size); if (ret < 0) goto errout; ret += ret2; errout: up_read(&EXT4_I(d_inode(dentry))->xattr_sem); return ret; } /* * If the EXT4_FEATURE_COMPAT_EXT_ATTR feature of this file system is * not set, set it. */ static void ext4_xattr_update_super_block(handle_t *handle, struct super_block *sb) { if (ext4_has_feature_xattr(sb)) return; BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get_write_access"); if (ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh) == 0) { ext4_set_feature_xattr(sb); ext4_handle_dirty_super(handle, sb); } } int ext4_get_inode_usage(struct inode *inode, qsize_t *usage) { struct ext4_iloc iloc = { .bh = NULL }; struct buffer_head *bh = NULL; struct ext4_inode *raw_inode; struct ext4_xattr_ibody_header *header; struct ext4_xattr_entry *entry; qsize_t ea_inode_refs = 0; void *end; int ret; lockdep_assert_held_read(&EXT4_I(inode)->xattr_sem); if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) { ret = ext4_get_inode_loc(inode, &iloc); if (ret) goto out; raw_inode = ext4_raw_inode(&iloc); header = IHDR(inode, raw_inode); end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; ret = xattr_check_inode(inode, header, end); if (ret) goto out; for (entry = IFIRST(header); !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) if (entry->e_value_inum) ea_inode_refs++; } if (EXT4_I(inode)->i_file_acl) { bh = ext4_sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl, REQ_PRIO); if (IS_ERR(bh)) { ret = PTR_ERR(bh); bh = NULL; goto out; } ret = ext4_xattr_check_block(inode, bh); if (ret) goto out; for (entry = BFIRST(bh); !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) if (entry->e_value_inum) ea_inode_refs++; } *usage = ea_inode_refs + 1; ret = 0; out: brelse(iloc.bh); brelse(bh); return ret; } static inline size_t round_up_cluster(struct inode *inode, size_t length) { struct super_block *sb = inode->i_sb; size_t cluster_size = 1 << (EXT4_SB(sb)->s_cluster_bits + inode->i_blkbits); size_t mask = ~(cluster_size - 1); return (length + cluster_size - 1) & mask; } static int ext4_xattr_inode_alloc_quota(struct inode *inode, size_t len) { int err; err = dquot_alloc_inode(inode); if (err) return err; err = dquot_alloc_space_nodirty(inode, round_up_cluster(inode, len)); if (err) dquot_free_inode(inode); return err; } static void ext4_xattr_inode_free_quota(struct inode *parent, struct inode *ea_inode, size_t len) { if (ea_inode && ext4_test_inode_state(ea_inode, EXT4_STATE_LUSTRE_EA_INODE)) return; dquot_free_space_nodirty(parent, round_up_cluster(parent, len)); dquot_free_inode(parent); } int __ext4_xattr_set_credits(struct super_block *sb, struct inode *inode, struct buffer_head *block_bh, size_t value_len, bool is_create) { int credits; int blocks; /* * 1) Owner inode update * 2) Ref count update on old xattr block * 3) new xattr block * 4) block bitmap update for new xattr block * 5) group descriptor for new xattr block * 6) block bitmap update for old xattr block * 7) group descriptor for old block * * 6 & 7 can happen if we have two racing threads T_a and T_b * which are each trying to set an xattr on inodes I_a and I_b * which were both initially sharing an xattr block. */ credits = 7; /* Quota updates. */ credits += EXT4_MAXQUOTAS_TRANS_BLOCKS(sb); /* * In case of inline data, we may push out the data to a block, * so we need to reserve credits for this eventuality */ if (inode && ext4_has_inline_data(inode)) credits += ext4_writepage_trans_blocks(inode) + 1; /* We are done if ea_inode feature is not enabled. */ if (!ext4_has_feature_ea_inode(sb)) return credits; /* New ea_inode, inode map, block bitmap, group descriptor. */ credits += 4; /* Data blocks. */ blocks = (value_len + sb->s_blocksize - 1) >> sb->s_blocksize_bits; /* Indirection block or one level of extent tree. */ blocks += 1; /* Block bitmap and group descriptor updates for each block. */ credits += blocks * 2; /* Blocks themselves. */ credits += blocks; if (!is_create) { /* Dereference ea_inode holding old xattr value. * Old ea_inode, inode map, block bitmap, group descriptor. */ credits += 4; /* Data blocks for old ea_inode. */ blocks = XATTR_SIZE_MAX >> sb->s_blocksize_bits; /* Indirection block or one level of extent tree for old * ea_inode. */ blocks += 1; /* Block bitmap and group descriptor updates for each block. */ credits += blocks * 2; } /* We may need to clone the existing xattr block in which case we need * to increment ref counts for existing ea_inodes referenced by it. */ if (block_bh) { struct ext4_xattr_entry *entry = BFIRST(block_bh); for (; !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) if (entry->e_value_inum) /* Ref count update on ea_inode. */ credits += 1; } return credits; } static int ext4_xattr_inode_update_ref(handle_t *handle, struct inode *ea_inode, int ref_change) { struct mb_cache *ea_inode_cache = EA_INODE_CACHE(ea_inode); struct ext4_iloc iloc; s64 ref_count; u32 hash; int ret; inode_lock(ea_inode); ret = ext4_reserve_inode_write(handle, ea_inode, &iloc); if (ret) goto out; ref_count = ext4_xattr_inode_get_ref(ea_inode); ref_count += ref_change; ext4_xattr_inode_set_ref(ea_inode, ref_count); if (ref_change > 0) { WARN_ONCE(ref_count <= 0, "EA inode %lu ref_count=%lld", ea_inode->i_ino, ref_count); if (ref_count == 1) { WARN_ONCE(ea_inode->i_nlink, "EA inode %lu i_nlink=%u", ea_inode->i_ino, ea_inode->i_nlink); set_nlink(ea_inode, 1); ext4_orphan_del(handle, ea_inode); if (ea_inode_cache) { hash = ext4_xattr_inode_get_hash(ea_inode); mb_cache_entry_create(ea_inode_cache, GFP_NOFS, hash, ea_inode->i_ino, true /* reusable */); } } } else { WARN_ONCE(ref_count < 0, "EA inode %lu ref_count=%lld", ea_inode->i_ino, ref_count); if (ref_count == 0) { WARN_ONCE(ea_inode->i_nlink != 1, "EA inode %lu i_nlink=%u", ea_inode->i_ino, ea_inode->i_nlink); clear_nlink(ea_inode); ext4_orphan_add(handle, ea_inode); if (ea_inode_cache) { hash = ext4_xattr_inode_get_hash(ea_inode); mb_cache_entry_delete(ea_inode_cache, hash, ea_inode->i_ino); } } } ret = ext4_mark_iloc_dirty(handle, ea_inode, &iloc); if (ret) ext4_warning_inode(ea_inode, "ext4_mark_iloc_dirty() failed ret=%d", ret); out: inode_unlock(ea_inode); return ret; } static int ext4_xattr_inode_inc_ref(handle_t *handle, struct inode *ea_inode) { return ext4_xattr_inode_update_ref(handle, ea_inode, 1); } static int ext4_xattr_inode_dec_ref(handle_t *handle, struct inode *ea_inode) { return ext4_xattr_inode_update_ref(handle, ea_inode, -1); } static int ext4_xattr_inode_inc_ref_all(handle_t *handle, struct inode *parent, struct ext4_xattr_entry *first) { struct inode *ea_inode; struct ext4_xattr_entry *entry; struct ext4_xattr_entry *failed_entry; unsigned int ea_ino; int err, saved_err; for (entry = first; !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) { if (!entry->e_value_inum) continue; ea_ino = le32_to_cpu(entry->e_value_inum); err = ext4_xattr_inode_iget(parent, ea_ino, le32_to_cpu(entry->e_hash), &ea_inode); if (err) goto cleanup; err = ext4_xattr_inode_inc_ref(handle, ea_inode); if (err) { ext4_warning_inode(ea_inode, "inc ref error %d", err); iput(ea_inode); goto cleanup; } iput(ea_inode); } return 0; cleanup: saved_err = err; failed_entry = entry; for (entry = first; entry != failed_entry; entry = EXT4_XATTR_NEXT(entry)) { if (!entry->e_value_inum) continue; ea_ino = le32_to_cpu(entry->e_value_inum); err = ext4_xattr_inode_iget(parent, ea_ino, le32_to_cpu(entry->e_hash), &ea_inode); if (err) { ext4_warning(parent->i_sb, "cleanup ea_ino %u iget error %d", ea_ino, err); continue; } err = ext4_xattr_inode_dec_ref(handle, ea_inode); if (err) ext4_warning_inode(ea_inode, "cleanup dec ref error %d", err); iput(ea_inode); } return saved_err; } static int ext4_xattr_restart_fn(handle_t *handle, struct inode *inode, struct buffer_head *bh, bool block_csum, bool dirty) { int error; if (bh && dirty) { if (block_csum) ext4_xattr_block_csum_set(inode, bh); error = ext4_handle_dirty_metadata(handle, NULL, bh); if (error) { ext4_warning(inode->i_sb, "Handle metadata (error %d)", error); return error; } } return 0; } static void ext4_xattr_inode_dec_ref_all(handle_t *handle, struct inode *parent, struct buffer_head *bh, struct ext4_xattr_entry *first, bool block_csum, struct ext4_xattr_inode_array **ea_inode_array, int extra_credits, bool skip_quota) { struct inode *ea_inode; struct ext4_xattr_entry *entry; bool dirty = false; unsigned int ea_ino; int err; int credits; /* One credit for dec ref on ea_inode, one for orphan list addition, */ credits = 2 + extra_credits; for (entry = first; !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) { if (!entry->e_value_inum) continue; ea_ino = le32_to_cpu(entry->e_value_inum); err = ext4_xattr_inode_iget(parent, ea_ino, le32_to_cpu(entry->e_hash), &ea_inode); if (err) continue; err = ext4_expand_inode_array(ea_inode_array, ea_inode); if (err) { ext4_warning_inode(ea_inode, "Expand inode array err=%d", err); iput(ea_inode); continue; } err = ext4_journal_ensure_credits_fn(handle, credits, credits, ext4_free_metadata_revoke_credits(parent->i_sb, 1), ext4_xattr_restart_fn(handle, parent, bh, block_csum, dirty)); if (err < 0) { ext4_warning_inode(ea_inode, "Ensure credits err=%d", err); continue; } if (err > 0) { err = ext4_journal_get_write_access(handle, bh); if (err) { ext4_warning_inode(ea_inode, "Re-get write access err=%d", err); continue; } } err = ext4_xattr_inode_dec_ref(handle, ea_inode); if (err) { ext4_warning_inode(ea_inode, "ea_inode dec ref err=%d", err); continue; } if (!skip_quota) ext4_xattr_inode_free_quota(parent, ea_inode, le32_to_cpu(entry->e_value_size)); /* * Forget about ea_inode within the same transaction that * decrements the ref count. This avoids duplicate decrements in * case the rest of the work spills over to subsequent * transactions. */ entry->e_value_inum = 0; entry->e_value_size = 0; dirty = true; } if (dirty) { /* * Note that we are deliberately skipping csum calculation for * the final update because we do not expect any journal * restarts until xattr block is freed. */ err = ext4_handle_dirty_metadata(handle, NULL, bh); if (err) ext4_warning_inode(parent, "handle dirty metadata err=%d", err); } } /* * Release the xattr block BH: If the reference count is > 1, decrement it; * otherwise free the block. */ static void ext4_xattr_release_block(handle_t *handle, struct inode *inode, struct buffer_head *bh, struct ext4_xattr_inode_array **ea_inode_array, int extra_credits) { struct mb_cache *ea_block_cache = EA_BLOCK_CACHE(inode); u32 hash, ref; int error = 0; BUFFER_TRACE(bh, "get_write_access"); error = ext4_journal_get_write_access(handle, bh); if (error) goto out; lock_buffer(bh); hash = le32_to_cpu(BHDR(bh)->h_hash); ref = le32_to_cpu(BHDR(bh)->h_refcount); if (ref == 1) { ea_bdebug(bh, "refcount now=0; freeing"); /* * This must happen under buffer lock for * ext4_xattr_block_set() to reliably detect freed block */ if (ea_block_cache) mb_cache_entry_delete(ea_block_cache, hash, bh->b_blocknr); get_bh(bh); unlock_buffer(bh); if (ext4_has_feature_ea_inode(inode->i_sb)) ext4_xattr_inode_dec_ref_all(handle, inode, bh, BFIRST(bh), true /* block_csum */, ea_inode_array, extra_credits, true /* skip_quota */); ext4_free_blocks(handle, inode, bh, 0, 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); } else { ref--; BHDR(bh)->h_refcount = cpu_to_le32(ref); if (ref == EXT4_XATTR_REFCOUNT_MAX - 1) { struct mb_cache_entry *ce; if (ea_block_cache) { ce = mb_cache_entry_get(ea_block_cache, hash, bh->b_blocknr); if (ce) { ce->e_reusable = 1; mb_cache_entry_put(ea_block_cache, ce); } } } ext4_xattr_block_csum_set(inode, bh); /* * Beware of this ugliness: Releasing of xattr block references * from different inodes can race and so we have to protect * from a race where someone else frees the block (and releases * its journal_head) before we are done dirtying the buffer. In * nojournal mode this race is harmless and we actually cannot * call ext4_handle_dirty_metadata() with locked buffer as * that function can call sync_dirty_buffer() so for that case * we handle the dirtying after unlocking the buffer. */ if (ext4_handle_valid(handle)) error = ext4_handle_dirty_metadata(handle, inode, bh); unlock_buffer(bh); if (!ext4_handle_valid(handle)) error = ext4_handle_dirty_metadata(handle, inode, bh); if (IS_SYNC(inode)) ext4_handle_sync(handle); dquot_free_block(inode, EXT4_C2B(EXT4_SB(inode->i_sb), 1)); ea_bdebug(bh, "refcount now=%d; releasing", le32_to_cpu(BHDR(bh)->h_refcount)); } out: ext4_std_error(inode->i_sb, error); return; } /* * Find the available free space for EAs. This also returns the total number of * bytes used by EA entries. */ static size_t ext4_xattr_free_space(struct ext4_xattr_entry *last, size_t *min_offs, void *base, int *total) { for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) { if (!last->e_value_inum && last->e_value_size) { size_t offs = le16_to_cpu(last->e_value_offs); if (offs < *min_offs) *min_offs = offs; } if (total) *total += EXT4_XATTR_LEN(last->e_name_len); } return (*min_offs - ((void *)last - base) - sizeof(__u32)); } /* * Write the value of the EA in an inode. */ static int ext4_xattr_inode_write(handle_t *handle, struct inode *ea_inode, const void *buf, int bufsize) { struct buffer_head *bh = NULL; unsigned long block = 0; int blocksize = ea_inode->i_sb->s_blocksize; int max_blocks = (bufsize + blocksize - 1) >> ea_inode->i_blkbits; int csize, wsize = 0; int ret = 0, ret2 = 0; int retries = 0; retry: while (ret >= 0 && ret < max_blocks) { struct ext4_map_blocks map; map.m_lblk = block += ret; map.m_len = max_blocks -= ret; ret = ext4_map_blocks(handle, ea_inode, &map, EXT4_GET_BLOCKS_CREATE); if (ret <= 0) { ext4_mark_inode_dirty(handle, ea_inode); if (ret == -ENOSPC && ext4_should_retry_alloc(ea_inode->i_sb, &retries)) { ret = 0; goto retry; } break; } } if (ret < 0) return ret; block = 0; while (wsize < bufsize) { brelse(bh); csize = (bufsize - wsize) > blocksize ? blocksize : bufsize - wsize; bh = ext4_getblk(handle, ea_inode, block, 0); if (IS_ERR(bh)) return PTR_ERR(bh); if (!bh) { WARN_ON_ONCE(1); EXT4_ERROR_INODE(ea_inode, "ext4_getblk() return bh = NULL"); return -EFSCORRUPTED; } ret = ext4_journal_get_write_access(handle, bh); if (ret) goto out; memcpy(bh->b_data, buf, csize); set_buffer_uptodate(bh); ext4_handle_dirty_metadata(handle, ea_inode, bh); buf += csize; wsize += csize; block += 1; } inode_lock(ea_inode); i_size_write(ea_inode, wsize); ext4_update_i_disksize(ea_inode, wsize); inode_unlock(ea_inode); ret2 = ext4_mark_inode_dirty(handle, ea_inode); if (unlikely(ret2 && !ret)) ret = ret2; out: brelse(bh); return ret; } /* * Create an inode to store the value of a large EA. */ static struct inode *ext4_xattr_inode_create(handle_t *handle, struct inode *inode, u32 hash) { struct inode *ea_inode = NULL; uid_t owner[2] = { i_uid_read(inode), i_gid_read(inode) }; int err; /* * Let the next inode be the goal, so we try and allocate the EA inode * in the same group, or nearby one. */ ea_inode = ext4_new_inode(handle, inode->i_sb->s_root->d_inode, S_IFREG | 0600, NULL, inode->i_ino + 1, owner, EXT4_EA_INODE_FL); if (!IS_ERR(ea_inode)) { ea_inode->i_op = &ext4_file_inode_operations; ea_inode->i_fop = &ext4_file_operations; ext4_set_aops(ea_inode); ext4_xattr_inode_set_class(ea_inode); unlock_new_inode(ea_inode); ext4_xattr_inode_set_ref(ea_inode, 1); ext4_xattr_inode_set_hash(ea_inode, hash); err = ext4_mark_inode_dirty(handle, ea_inode); if (!err) err = ext4_inode_attach_jinode(ea_inode); if (err) { iput(ea_inode); return ERR_PTR(err); } /* * Xattr inodes are shared therefore quota charging is performed * at a higher level. */ dquot_free_inode(ea_inode); dquot_drop(ea_inode); inode_lock(ea_inode); ea_inode->i_flags |= S_NOQUOTA; inode_unlock(ea_inode); } return ea_inode; } static struct inode * ext4_xattr_inode_cache_find(struct inode *inode, const void *value, size_t value_len, u32 hash) { struct inode *ea_inode; struct mb_cache_entry *ce; struct mb_cache *ea_inode_cache = EA_INODE_CACHE(inode); void *ea_data; if (!ea_inode_cache) return NULL; ce = mb_cache_entry_find_first(ea_inode_cache, hash); if (!ce) return NULL; WARN_ON_ONCE(ext4_handle_valid(journal_current_handle()) && !(current->flags & PF_MEMALLOC_NOFS)); ea_data = kvmalloc(value_len, GFP_KERNEL); if (!ea_data) { mb_cache_entry_put(ea_inode_cache, ce); return NULL; } while (ce) { ea_inode = ext4_iget(inode->i_sb, ce->e_value, EXT4_IGET_NORMAL); if (!IS_ERR(ea_inode) && !is_bad_inode(ea_inode) && (EXT4_I(ea_inode)->i_flags & EXT4_EA_INODE_FL) && i_size_read(ea_inode) == value_len && !ext4_xattr_inode_read(ea_inode, ea_data, value_len) && !ext4_xattr_inode_verify_hashes(ea_inode, NULL, ea_data, value_len) && !memcmp(value, ea_data, value_len)) { mb_cache_entry_touch(ea_inode_cache, ce); mb_cache_entry_put(ea_inode_cache, ce); kvfree(ea_data); return ea_inode; } if (!IS_ERR(ea_inode)) iput(ea_inode); ce = mb_cache_entry_find_next(ea_inode_cache, ce); } kvfree(ea_data); return NULL; } /* * Add value of the EA in an inode. */ static int ext4_xattr_inode_lookup_create(handle_t *handle, struct inode *inode, const void *value, size_t value_len, struct inode **ret_inode) { struct inode *ea_inode; u32 hash; int err; hash = ext4_xattr_inode_hash(EXT4_SB(inode->i_sb), value, value_len); ea_inode = ext4_xattr_inode_cache_find(inode, value, value_len, hash); if (ea_inode) { err = ext4_xattr_inode_inc_ref(handle, ea_inode); if (err) { iput(ea_inode); return err; } *ret_inode = ea_inode; return 0; } /* Create an inode for the EA value */ ea_inode = ext4_xattr_inode_create(handle, inode, hash); if (IS_ERR(ea_inode)) return PTR_ERR(ea_inode); err = ext4_xattr_inode_write(handle, ea_inode, value, value_len); if (err) { ext4_xattr_inode_dec_ref(handle, ea_inode); iput(ea_inode); return err; } if (EA_INODE_CACHE(inode)) mb_cache_entry_create(EA_INODE_CACHE(inode), GFP_NOFS, hash, ea_inode->i_ino, true /* reusable */); *ret_inode = ea_inode; return 0; } /* * Reserve min(block_size/8, 1024) bytes for xattr entries/names if ea_inode * feature is enabled. */ #define EXT4_XATTR_BLOCK_RESERVE(inode) min(i_blocksize(inode)/8, 1024U) static int ext4_xattr_set_entry(struct ext4_xattr_info *i, struct ext4_xattr_search *s, handle_t *handle, struct inode *inode, bool is_block) { struct ext4_xattr_entry *last, *next; struct ext4_xattr_entry *here = s->here; size_t min_offs = s->end - s->base, name_len = strlen(i->name); int in_inode = i->in_inode; struct inode *old_ea_inode = NULL; struct inode *new_ea_inode = NULL; size_t old_size, new_size; int ret; /* Space used by old and new values. */ old_size = (!s->not_found && !here->e_value_inum) ? EXT4_XATTR_SIZE(le32_to_cpu(here->e_value_size)) : 0; new_size = (i->value && !in_inode) ? EXT4_XATTR_SIZE(i->value_len) : 0; /* * Optimization for the simple case when old and new values have the * same padded sizes. Not applicable if external inodes are involved. */ if (new_size && new_size == old_size) { size_t offs = le16_to_cpu(here->e_value_offs); void *val = s->base + offs; here->e_value_size = cpu_to_le32(i->value_len); if (i->value == EXT4_ZERO_XATTR_VALUE) { memset(val, 0, new_size); } else { memcpy(val, i->value, i->value_len); /* Clear padding bytes. */ memset(val + i->value_len, 0, new_size - i->value_len); } goto update_hash; } /* Compute min_offs and last. */ last = s->first; for (; !IS_LAST_ENTRY(last); last = next) { next = EXT4_XATTR_NEXT(last); if ((void *)next >= s->end) { EXT4_ERROR_INODE(inode, "corrupted xattr entries"); ret = -EFSCORRUPTED; goto out; } if (!last->e_value_inum && last->e_value_size) { size_t offs = le16_to_cpu(last->e_value_offs); if (offs < min_offs) min_offs = offs; } } /* Check whether we have enough space. */ if (i->value) { size_t free; free = min_offs - ((void *)last - s->base) - sizeof(__u32); if (!s->not_found) free += EXT4_XATTR_LEN(name_len) + old_size; if (free < EXT4_XATTR_LEN(name_len) + new_size) { ret = -ENOSPC; goto out; } /* * If storing the value in an external inode is an option, * reserve space for xattr entries/names in the external * attribute block so that a long value does not occupy the * whole space and prevent futher entries being added. */ if (ext4_has_feature_ea_inode(inode->i_sb) && new_size && is_block && (min_offs + old_size - new_size) < EXT4_XATTR_BLOCK_RESERVE(inode)) { ret = -ENOSPC; goto out; } } /* * Getting access to old and new ea inodes is subject to failures. * Finish that work before doing any modifications to the xattr data. */ if (!s->not_found && here->e_value_inum) { ret = ext4_xattr_inode_iget(inode, le32_to_cpu(here->e_value_inum), le32_to_cpu(here->e_hash), &old_ea_inode); if (ret) { old_ea_inode = NULL; goto out; } } if (i->value && in_inode) { WARN_ON_ONCE(!i->value_len); ret = ext4_xattr_inode_alloc_quota(inode, i->value_len); if (ret) goto out; ret = ext4_xattr_inode_lookup_create(handle, inode, i->value, i->value_len, &new_ea_inode); if (ret) { new_ea_inode = NULL; ext4_xattr_inode_free_quota(inode, NULL, i->value_len); goto out; } } if (old_ea_inode) { /* We are ready to release ref count on the old_ea_inode. */ ret = ext4_xattr_inode_dec_ref(handle, old_ea_inode); if (ret) { /* Release newly required ref count on new_ea_inode. */ if (new_ea_inode) { int err; err = ext4_xattr_inode_dec_ref(handle, new_ea_inode); if (err) ext4_warning_inode(new_ea_inode, "dec ref new_ea_inode err=%d", err); ext4_xattr_inode_free_quota(inode, new_ea_inode, i->value_len); } goto out; } ext4_xattr_inode_free_quota(inode, old_ea_inode, le32_to_cpu(here->e_value_size)); } /* No failures allowed past this point. */ if (!s->not_found && here->e_value_size && !here->e_value_inum) { /* Remove the old value. */ void *first_val = s->base + min_offs; size_t offs = le16_to_cpu(here->e_value_offs); void *val = s->base + offs; memmove(first_val + old_size, first_val, val - first_val); memset(first_val, 0, old_size); min_offs += old_size; /* Adjust all value offsets. */ last = s->first; while (!IS_LAST_ENTRY(last)) { size_t o = le16_to_cpu(last->e_value_offs); if (!last->e_value_inum && last->e_value_size && o < offs) last->e_value_offs = cpu_to_le16(o + old_size); last = EXT4_XATTR_NEXT(last); } } if (!i->value) { /* Remove old name. */ size_t size = EXT4_XATTR_LEN(name_len); last = ENTRY((void *)last - size); memmove(here, (void *)here + size, (void *)last - (void *)here + sizeof(__u32)); memset(last, 0, size); } else if (s->not_found) { /* Insert new name. */ size_t size = EXT4_XATTR_LEN(name_len); size_t rest = (void *)last - (void *)here + sizeof(__u32); memmove((void *)here + size, here, rest); memset(here, 0, size); here->e_name_index = i->name_index; here->e_name_len = name_len; memcpy(here->e_name, i->name, name_len); } else { /* This is an update, reset value info. */ here->e_value_inum = 0; here->e_value_offs = 0; here->e_value_size = 0; } if (i->value) { /* Insert new value. */ if (in_inode) { here->e_value_inum = cpu_to_le32(new_ea_inode->i_ino); } else if (i->value_len) { void *val = s->base + min_offs - new_size; here->e_value_offs = cpu_to_le16(min_offs - new_size); if (i->value == EXT4_ZERO_XATTR_VALUE) { memset(val, 0, new_size); } else { memcpy(val, i->value, i->value_len); /* Clear padding bytes. */ memset(val + i->value_len, 0, new_size - i->value_len); } } here->e_value_size = cpu_to_le32(i->value_len); } update_hash: if (i->value) { __le32 hash = 0; /* Entry hash calculation. */ if (in_inode) { __le32 crc32c_hash; /* * Feed crc32c hash instead of the raw value for entry * hash calculation. This is to avoid walking * potentially long value buffer again. */ crc32c_hash = cpu_to_le32( ext4_xattr_inode_get_hash(new_ea_inode)); hash = ext4_xattr_hash_entry(here->e_name, here->e_name_len, &crc32c_hash, 1); } else if (is_block) { __le32 *value = s->base + le16_to_cpu( here->e_value_offs); hash = ext4_xattr_hash_entry(here->e_name, here->e_name_len, value, new_size >> 2); } here->e_hash = hash; } if (is_block) ext4_xattr_rehash((struct ext4_xattr_header *)s->base); ret = 0; out: iput(old_ea_inode); iput(new_ea_inode); return ret; } struct ext4_xattr_block_find { struct ext4_xattr_search s; struct buffer_head *bh; }; static int ext4_xattr_block_find(struct inode *inode, struct ext4_xattr_info *i, struct ext4_xattr_block_find *bs) { struct super_block *sb = inode->i_sb; int error; ea_idebug(inode, "name=%d.%s, value=%p, value_len=%ld", i->name_index, i->name, i->value, (long)i->value_len); if (EXT4_I(inode)->i_file_acl) { /* The inode already has an extended attribute block. */ bs->bh = ext4_sb_bread(sb, EXT4_I(inode)->i_file_acl, REQ_PRIO); if (IS_ERR(bs->bh)) { error = PTR_ERR(bs->bh); bs->bh = NULL; return error; } ea_bdebug(bs->bh, "b_count=%d, refcount=%d", atomic_read(&(bs->bh->b_count)), le32_to_cpu(BHDR(bs->bh)->h_refcount)); error = ext4_xattr_check_block(inode, bs->bh); if (error) return error; /* Find the named attribute. */ bs->s.base = BHDR(bs->bh); bs->s.first = BFIRST(bs->bh); bs->s.end = bs->bh->b_data + bs->bh->b_size; bs->s.here = bs->s.first; error = xattr_find_entry(inode, &bs->s.here, bs->s.end, i->name_index, i->name, 1); if (error && error != -ENODATA) return error; bs->s.not_found = error; } return 0; } static int ext4_xattr_block_set(handle_t *handle, struct inode *inode, struct ext4_xattr_info *i, struct ext4_xattr_block_find *bs) { struct super_block *sb = inode->i_sb; struct buffer_head *new_bh = NULL; struct ext4_xattr_search s_copy = bs->s; struct ext4_xattr_search *s = &s_copy; struct mb_cache_entry *ce = NULL; int error = 0; struct mb_cache *ea_block_cache = EA_BLOCK_CACHE(inode); struct inode *ea_inode = NULL, *tmp_inode; size_t old_ea_inode_quota = 0; unsigned int ea_ino; #define header(x) ((struct ext4_xattr_header *)(x)) if (s->base) { BUFFER_TRACE(bs->bh, "get_write_access"); error = ext4_journal_get_write_access(handle, bs->bh); if (error) goto cleanup; lock_buffer(bs->bh); if (header(s->base)->h_refcount == cpu_to_le32(1)) { __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash); /* * This must happen under buffer lock for * ext4_xattr_block_set() to reliably detect modified * block */ if (ea_block_cache) mb_cache_entry_delete(ea_block_cache, hash, bs->bh->b_blocknr); ea_bdebug(bs->bh, "modifying in-place"); error = ext4_xattr_set_entry(i, s, handle, inode, true /* is_block */); ext4_xattr_block_csum_set(inode, bs->bh); unlock_buffer(bs->bh); if (error == -EFSCORRUPTED) goto bad_block; if (!error) error = ext4_handle_dirty_metadata(handle, inode, bs->bh); if (error) goto cleanup; goto inserted; } else { int offset = (char *)s->here - bs->bh->b_data; unlock_buffer(bs->bh); ea_bdebug(bs->bh, "cloning"); s->base = kmalloc(bs->bh->b_size, GFP_NOFS); error = -ENOMEM; if (s->base == NULL) goto cleanup; memcpy(s->base, BHDR(bs->bh), bs->bh->b_size); s->first = ENTRY(header(s->base)+1); header(s->base)->h_refcount = cpu_to_le32(1); s->here = ENTRY(s->base + offset); s->end = s->base + bs->bh->b_size; /* * If existing entry points to an xattr inode, we need * to prevent ext4_xattr_set_entry() from decrementing * ref count on it because the reference belongs to the * original block. In this case, make the entry look * like it has an empty value. */ if (!s->not_found && s->here->e_value_inum) { ea_ino = le32_to_cpu(s->here->e_value_inum); error = ext4_xattr_inode_iget(inode, ea_ino, le32_to_cpu(s->here->e_hash), &tmp_inode); if (error) goto cleanup; if (!ext4_test_inode_state(tmp_inode, EXT4_STATE_LUSTRE_EA_INODE)) { /* * Defer quota free call for previous * inode until success is guaranteed. */ old_ea_inode_quota = le32_to_cpu( s->here->e_value_size); } iput(tmp_inode); s->here->e_value_inum = 0; s->here->e_value_size = 0; } } } else { /* Allocate a buffer where we construct the new block. */ s->base = kzalloc(sb->s_blocksize, GFP_NOFS); /* assert(header == s->base) */ error = -ENOMEM; if (s->base == NULL) goto cleanup; header(s->base)->h_magic = cpu_to_le32(EXT4_XATTR_MAGIC); header(s->base)->h_blocks = cpu_to_le32(1); header(s->base)->h_refcount = cpu_to_le32(1); s->first = ENTRY(header(s->base)+1); s->here = ENTRY(header(s->base)+1); s->end = s->base + sb->s_blocksize; } error = ext4_xattr_set_entry(i, s, handle, inode, true /* is_block */); if (error == -EFSCORRUPTED) goto bad_block; if (error) goto cleanup; if (i->value && s->here->e_value_inum) { /* * A ref count on ea_inode has been taken as part of the call to * ext4_xattr_set_entry() above. We would like to drop this * extra ref but we have to wait until the xattr block is * initialized and has its own ref count on the ea_inode. */ ea_ino = le32_to_cpu(s->here->e_value_inum); error = ext4_xattr_inode_iget(inode, ea_ino, le32_to_cpu(s->here->e_hash), &ea_inode); if (error) { ea_inode = NULL; goto cleanup; } } inserted: if (!IS_LAST_ENTRY(s->first)) { new_bh = ext4_xattr_block_cache_find(inode, header(s->base), &ce); if (new_bh) { /* We found an identical block in the cache. */ if (new_bh == bs->bh) ea_bdebug(new_bh, "keeping"); else { u32 ref; WARN_ON_ONCE(dquot_initialize_needed(inode)); /* The old block is released after updating the inode. */ error = dquot_alloc_block(inode, EXT4_C2B(EXT4_SB(sb), 1)); if (error) goto cleanup; BUFFER_TRACE(new_bh, "get_write_access"); error = ext4_journal_get_write_access(handle, new_bh); if (error) goto cleanup_dquot; lock_buffer(new_bh); /* * We have to be careful about races with * freeing, rehashing or adding references to * xattr block. Once we hold buffer lock xattr * block's state is stable so we can check * whether the block got freed / rehashed or * not. Since we unhash mbcache entry under * buffer lock when freeing / rehashing xattr * block, checking whether entry is still * hashed is reliable. Same rules hold for * e_reusable handling. */ if (hlist_bl_unhashed(&ce->e_hash_list) || !ce->e_reusable) { /* * Undo everything and check mbcache * again. */ unlock_buffer(new_bh); dquot_free_block(inode, EXT4_C2B(EXT4_SB(sb), 1)); brelse(new_bh); mb_cache_entry_put(ea_block_cache, ce); ce = NULL; new_bh = NULL; goto inserted; } ref = le32_to_cpu(BHDR(new_bh)->h_refcount) + 1; BHDR(new_bh)->h_refcount = cpu_to_le32(ref); if (ref >= EXT4_XATTR_REFCOUNT_MAX) ce->e_reusable = 0; ea_bdebug(new_bh, "reusing; refcount now=%d", ref); ext4_xattr_block_csum_set(inode, new_bh); unlock_buffer(new_bh); error = ext4_handle_dirty_metadata(handle, inode, new_bh); if (error) goto cleanup_dquot; } mb_cache_entry_touch(ea_block_cache, ce); mb_cache_entry_put(ea_block_cache, ce); ce = NULL; } else if (bs->bh && s->base == bs->bh->b_data) { /* We were modifying this block in-place. */ ea_bdebug(bs->bh, "keeping this block"); ext4_xattr_block_cache_insert(ea_block_cache, bs->bh); new_bh = bs->bh; get_bh(new_bh); } else { /* We need to allocate a new block */ ext4_fsblk_t goal, block; WARN_ON_ONCE(dquot_initialize_needed(inode)); goal = ext4_group_first_block_no(sb, EXT4_I(inode)->i_block_group); /* non-extent files can't have physical blocks past 2^32 */ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) goal = goal & EXT4_MAX_BLOCK_FILE_PHYS; block = ext4_new_meta_blocks(handle, inode, goal, 0, NULL, &error); if (error) goto cleanup; if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) BUG_ON(block > EXT4_MAX_BLOCK_FILE_PHYS); ea_idebug(inode, "creating block %llu", (unsigned long long)block); new_bh = sb_getblk(sb, block); if (unlikely(!new_bh)) { error = -ENOMEM; getblk_failed: ext4_free_blocks(handle, inode, NULL, block, 1, EXT4_FREE_BLOCKS_METADATA); goto cleanup; } error = ext4_xattr_inode_inc_ref_all(handle, inode, ENTRY(header(s->base)+1)); if (error) goto getblk_failed; if (ea_inode) { /* Drop the extra ref on ea_inode. */ error = ext4_xattr_inode_dec_ref(handle, ea_inode); if (error) ext4_warning_inode(ea_inode, "dec ref error=%d", error); iput(ea_inode); ea_inode = NULL; } lock_buffer(new_bh); error = ext4_journal_get_create_access(handle, new_bh); if (error) { unlock_buffer(new_bh); error = -EIO; goto getblk_failed; } memcpy(new_bh->b_data, s->base, new_bh->b_size); ext4_xattr_block_csum_set(inode, new_bh); set_buffer_uptodate(new_bh); unlock_buffer(new_bh); ext4_xattr_block_cache_insert(ea_block_cache, new_bh); error = ext4_handle_dirty_metadata(handle, inode, new_bh); if (error) goto cleanup; } } if (old_ea_inode_quota) ext4_xattr_inode_free_quota(inode, NULL, old_ea_inode_quota); /* Update the inode. */ EXT4_I(inode)->i_file_acl = new_bh ? new_bh->b_blocknr : 0; /* Drop the previous xattr block. */ if (bs->bh && bs->bh != new_bh) { struct ext4_xattr_inode_array *ea_inode_array = NULL; ext4_xattr_release_block(handle, inode, bs->bh, &ea_inode_array, 0 /* extra_credits */); ext4_xattr_inode_array_free(ea_inode_array); } error = 0; cleanup: if (ea_inode) { int error2; error2 = ext4_xattr_inode_dec_ref(handle, ea_inode); if (error2) ext4_warning_inode(ea_inode, "dec ref error=%d", error2); /* If there was an error, revert the quota charge. */ if (error) ext4_xattr_inode_free_quota(inode, ea_inode, i_size_read(ea_inode)); iput(ea_inode); } if (ce) mb_cache_entry_put(ea_block_cache, ce); brelse(new_bh); if (!(bs->bh && s->base == bs->bh->b_data)) kfree(s->base); return error; cleanup_dquot: dquot_free_block(inode, EXT4_C2B(EXT4_SB(sb), 1)); goto cleanup; bad_block: EXT4_ERROR_INODE(inode, "bad block %llu", EXT4_I(inode)->i_file_acl); goto cleanup; #undef header } int ext4_xattr_ibody_find(struct inode *inode, struct ext4_xattr_info *i, struct ext4_xattr_ibody_find *is) { struct ext4_xattr_ibody_header *header; struct ext4_inode *raw_inode; int error; if (EXT4_I(inode)->i_extra_isize == 0) return 0; raw_inode = ext4_raw_inode(&is->iloc); header = IHDR(inode, raw_inode); is->s.base = is->s.first = IFIRST(header); is->s.here = is->s.first; is->s.end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) { error = xattr_check_inode(inode, header, is->s.end); if (error) return error; /* Find the named attribute. */ error = xattr_find_entry(inode, &is->s.here, is->s.end, i->name_index, i->name, 0); if (error && error != -ENODATA) return error; is->s.not_found = error; } return 0; } int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode, struct ext4_xattr_info *i, struct ext4_xattr_ibody_find *is) { struct ext4_xattr_ibody_header *header; struct ext4_xattr_search *s = &is->s; int error; if (EXT4_I(inode)->i_extra_isize == 0) return -ENOSPC; error = ext4_xattr_set_entry(i, s, handle, inode, false /* is_block */); if (error) return error; header = IHDR(inode, ext4_raw_inode(&is->iloc)); if (!IS_LAST_ENTRY(s->first)) { header->h_magic = cpu_to_le32(EXT4_XATTR_MAGIC); ext4_set_inode_state(inode, EXT4_STATE_XATTR); } else { header->h_magic = cpu_to_le32(0); ext4_clear_inode_state(inode, EXT4_STATE_XATTR); } return 0; } static int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode, struct ext4_xattr_info *i, struct ext4_xattr_ibody_find *is) { struct ext4_xattr_ibody_header *header; struct ext4_xattr_search *s = &is->s; int error; if (EXT4_I(inode)->i_extra_isize == 0) return -ENOSPC; error = ext4_xattr_set_entry(i, s, handle, inode, false /* is_block */); if (error) return error; header = IHDR(inode, ext4_raw_inode(&is->iloc)); if (!IS_LAST_ENTRY(s->first)) { header->h_magic = cpu_to_le32(EXT4_XATTR_MAGIC); ext4_set_inode_state(inode, EXT4_STATE_XATTR); } else { header->h_magic = cpu_to_le32(0); ext4_clear_inode_state(inode, EXT4_STATE_XATTR); } return 0; } static int ext4_xattr_value_same(struct ext4_xattr_search *s, struct ext4_xattr_info *i) { void *value; /* When e_value_inum is set the value is stored externally. */ if (s->here->e_value_inum) return 0; if (le32_to_cpu(s->here->e_value_size) != i->value_len) return 0; value = ((void *)s->base) + le16_to_cpu(s->here->e_value_offs); return !memcmp(value, i->value, i->value_len); } static struct buffer_head *ext4_xattr_get_block(struct inode *inode) { struct buffer_head *bh; int error; if (!EXT4_I(inode)->i_file_acl) return NULL; bh = ext4_sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl, REQ_PRIO); if (IS_ERR(bh)) return bh; error = ext4_xattr_check_block(inode, bh); if (error) { brelse(bh); return ERR_PTR(error); } return bh; } /* * ext4_xattr_set_handle() * * Create, replace or remove an extended attribute for this inode. Value * is NULL to remove an existing extended attribute, and non-NULL to * either replace an existing extended attribute, or create a new extended * attribute. The flags XATTR_REPLACE and XATTR_CREATE * specify that an extended attribute must exist and must not exist * previous to the call, respectively. * * Returns 0, or a negative error number on failure. */ int ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index, const char *name, const void *value, size_t value_len, int flags) { struct ext4_xattr_info i = { .name_index = name_index, .name = name, .value = value, .value_len = value_len, .in_inode = 0, }; struct ext4_xattr_ibody_find is = { .s = { .not_found = -ENODATA, }, }; struct ext4_xattr_block_find bs = { .s = { .not_found = -ENODATA, }, }; int no_expand; int error; if (!name) return -EINVAL; if (strlen(name) > 255) return -ERANGE; ext4_write_lock_xattr(inode, &no_expand); /* Check journal credits under write lock. */ if (ext4_handle_valid(handle)) { struct buffer_head *bh; int credits; bh = ext4_xattr_get_block(inode); if (IS_ERR(bh)) { error = PTR_ERR(bh); goto cleanup; } credits = __ext4_xattr_set_credits(inode->i_sb, inode, bh, value_len, flags & XATTR_CREATE); brelse(bh); if (jbd2_handle_buffer_credits(handle) < credits) { error = -ENOSPC; goto cleanup; } WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS)); } error = ext4_reserve_inode_write(handle, inode, &is.iloc); if (error) goto cleanup; if (ext4_test_inode_state(inode, EXT4_STATE_NEW)) { struct ext4_inode *raw_inode = ext4_raw_inode(&is.iloc); memset(raw_inode, 0, EXT4_SB(inode->i_sb)->s_inode_size); ext4_clear_inode_state(inode, EXT4_STATE_NEW); } error = ext4_xattr_ibody_find(inode, &i, &is); if (error) goto cleanup; if (is.s.not_found) error = ext4_xattr_block_find(inode, &i, &bs); if (error) goto cleanup; if (is.s.not_found && bs.s.not_found) { error = -ENODATA; if (flags & XATTR_REPLACE) goto cleanup; error = 0; if (!value) goto cleanup; } else { error = -EEXIST; if (flags & XATTR_CREATE) goto cleanup; } if (!value) { if (!is.s.not_found) error = ext4_xattr_ibody_set(handle, inode, &i, &is); else if (!bs.s.not_found) error = ext4_xattr_block_set(handle, inode, &i, &bs); } else { error = 0; /* Xattr value did not change? Save us some work and bail out */ if (!is.s.not_found && ext4_xattr_value_same(&is.s, &i)) goto cleanup; if (!bs.s.not_found && ext4_xattr_value_same(&bs.s, &i)) goto cleanup; if (ext4_has_feature_ea_inode(inode->i_sb) && (EXT4_XATTR_SIZE(i.value_len) > EXT4_XATTR_MIN_LARGE_EA_SIZE(inode->i_sb->s_blocksize))) i.in_inode = 1; retry_inode: error = ext4_xattr_ibody_set(handle, inode, &i, &is); if (!error && !bs.s.not_found) { i.value = NULL; error = ext4_xattr_block_set(handle, inode, &i, &bs); } else if (error == -ENOSPC) { if (EXT4_I(inode)->i_file_acl && !bs.s.base) { brelse(bs.bh); bs.bh = NULL; error = ext4_xattr_block_find(inode, &i, &bs); if (error) goto cleanup; } error = ext4_xattr_block_set(handle, inode, &i, &bs); if (!error && !is.s.not_found) { i.value = NULL; error = ext4_xattr_ibody_set(handle, inode, &i, &is); } else if (error == -ENOSPC) { /* * Xattr does not fit in the block, store at * external inode if possible. */ if (ext4_has_feature_ea_inode(inode->i_sb) && i.value_len && !i.in_inode) { i.in_inode = 1; goto retry_inode; } } } } if (!error) { ext4_xattr_update_super_block(handle, inode->i_sb); inode->i_ctime = current_time(inode); if (!value) no_expand = 0; error = ext4_mark_iloc_dirty(handle, inode, &is.iloc); /* * The bh is consumed by ext4_mark_iloc_dirty, even with * error != 0. */ is.iloc.bh = NULL; if (IS_SYNC(inode)) ext4_handle_sync(handle); } ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR); cleanup: brelse(is.iloc.bh); brelse(bs.bh); ext4_write_unlock_xattr(inode, &no_expand); return error; } int ext4_xattr_set_credits(struct inode *inode, size_t value_len, bool is_create, int *credits) { struct buffer_head *bh; int err; *credits = 0; if (!EXT4_SB(inode->i_sb)->s_journal) return 0; down_read(&EXT4_I(inode)->xattr_sem); bh = ext4_xattr_get_block(inode); if (IS_ERR(bh)) { err = PTR_ERR(bh); } else { *credits = __ext4_xattr_set_credits(inode->i_sb, inode, bh, value_len, is_create); brelse(bh); err = 0; } up_read(&EXT4_I(inode)->xattr_sem); return err; } /* * ext4_xattr_set() * * Like ext4_xattr_set_handle, but start from an inode. This extended * attribute modification is a filesystem transaction by itself. * * Returns 0, or a negative error number on failure. */ int ext4_xattr_set(struct inode *inode, int name_index, const char *name, const void *value, size_t value_len, int flags) { handle_t *handle; struct super_block *sb = inode->i_sb; int error, retries = 0; int credits; error = dquot_initialize(inode); if (error) return error; retry: error = ext4_xattr_set_credits(inode, value_len, flags & XATTR_CREATE, &credits); if (error) return error; handle = ext4_journal_start(inode, EXT4_HT_XATTR, credits); if (IS_ERR(handle)) { error = PTR_ERR(handle); } else { int error2; error = ext4_xattr_set_handle(handle, inode, name_index, name, value, value_len, flags); error2 = ext4_journal_stop(handle); if (error == -ENOSPC && ext4_should_retry_alloc(sb, &retries)) goto retry; if (error == 0) error = error2; } ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR); return error; } /* * Shift the EA entries in the inode to create space for the increased * i_extra_isize. */ static void ext4_xattr_shift_entries(struct ext4_xattr_entry *entry, int value_offs_shift, void *to, void *from, size_t n) { struct ext4_xattr_entry *last = entry; int new_offs; /* We always shift xattr headers further thus offsets get lower */ BUG_ON(value_offs_shift > 0); /* Adjust the value offsets of the entries */ for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) { if (!last->e_value_inum && last->e_value_size) { new_offs = le16_to_cpu(last->e_value_offs) + value_offs_shift; last->e_value_offs = cpu_to_le16(new_offs); } } /* Shift the entries by n bytes */ memmove(to, from, n); } /* * Move xattr pointed to by 'entry' from inode into external xattr block */ static int ext4_xattr_move_to_block(handle_t *handle, struct inode *inode, struct ext4_inode *raw_inode, struct ext4_xattr_entry *entry) { struct ext4_xattr_ibody_find *is = NULL; struct ext4_xattr_block_find *bs = NULL; char *buffer = NULL, *b_entry_name = NULL; size_t value_size = le32_to_cpu(entry->e_value_size); struct ext4_xattr_info i = { .value = NULL, .value_len = 0, .name_index = entry->e_name_index, .in_inode = !!entry->e_value_inum, }; struct ext4_xattr_ibody_header *header = IHDR(inode, raw_inode); int error; is = kzalloc(sizeof(struct ext4_xattr_ibody_find), GFP_NOFS); bs = kzalloc(sizeof(struct ext4_xattr_block_find), GFP_NOFS); buffer = kmalloc(value_size, GFP_NOFS); b_entry_name = kmalloc(entry->e_name_len + 1, GFP_NOFS); if (!is || !bs || !buffer || !b_entry_name) { error = -ENOMEM; goto out; } is->s.not_found = -ENODATA; bs->s.not_found = -ENODATA; is->iloc.bh = NULL; bs->bh = NULL; /* Save the entry name and the entry value */ if (entry->e_value_inum) { error = ext4_xattr_inode_get(inode, entry, buffer, value_size); if (error) goto out; } else { size_t value_offs = le16_to_cpu(entry->e_value_offs); memcpy(buffer, (void *)IFIRST(header) + value_offs, value_size); } memcpy(b_entry_name, entry->e_name, entry->e_name_len); b_entry_name[entry->e_name_len] = '\0'; i.name = b_entry_name; error = ext4_get_inode_loc(inode, &is->iloc); if (error) goto out; error = ext4_xattr_ibody_find(inode, &i, is); if (error) goto out; /* Remove the chosen entry from the inode */ error = ext4_xattr_ibody_set(handle, inode, &i, is); if (error) goto out; i.value = buffer; i.value_len = value_size; error = ext4_xattr_block_find(inode, &i, bs); if (error) goto out; /* Add entry which was removed from the inode into the block */ error = ext4_xattr_block_set(handle, inode, &i, bs); if (error) goto out; error = 0; out: kfree(b_entry_name); kfree(buffer); if (is) brelse(is->iloc.bh); if (bs) brelse(bs->bh); kfree(is); kfree(bs); return error; } static int ext4_xattr_make_inode_space(handle_t *handle, struct inode *inode, struct ext4_inode *raw_inode, int isize_diff, size_t ifree, size_t bfree, int *total_ino) { struct ext4_xattr_ibody_header *header = IHDR(inode, raw_inode); struct ext4_xattr_entry *small_entry; struct ext4_xattr_entry *entry; struct ext4_xattr_entry *last; unsigned int entry_size; /* EA entry size */ unsigned int total_size; /* EA entry size + value size */ unsigned int min_total_size; int error; while (isize_diff > ifree) { entry = NULL; small_entry = NULL; min_total_size = ~0U; last = IFIRST(header); /* Find the entry best suited to be pushed into EA block */ for (; !IS_LAST_ENTRY(last); last = EXT4_XATTR_NEXT(last)) { /* never move system.data out of the inode */ if ((last->e_name_len == 4) && (last->e_name_index == EXT4_XATTR_INDEX_SYSTEM) && !memcmp(last->e_name, "data", 4)) continue; total_size = EXT4_XATTR_LEN(last->e_name_len); if (!last->e_value_inum) total_size += EXT4_XATTR_SIZE( le32_to_cpu(last->e_value_size)); if (total_size <= bfree && total_size < min_total_size) { if (total_size + ifree < isize_diff) { small_entry = last; } else { entry = last; min_total_size = total_size; } } } if (entry == NULL) { if (small_entry == NULL) return -ENOSPC; entry = small_entry; } entry_size = EXT4_XATTR_LEN(entry->e_name_len); total_size = entry_size; if (!entry->e_value_inum) total_size += EXT4_XATTR_SIZE( le32_to_cpu(entry->e_value_size)); error = ext4_xattr_move_to_block(handle, inode, raw_inode, entry); if (error) return error; *total_ino -= entry_size; ifree += total_size; bfree -= total_size; } return 0; } /* * Expand an inode by new_extra_isize bytes when EAs are present. * Returns 0 on success or negative error number on failure. */ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize, struct ext4_inode *raw_inode, handle_t *handle) { struct ext4_xattr_ibody_header *header; struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); static unsigned int mnt_count; size_t min_offs; size_t ifree, bfree; int total_ino; void *base, *end; int error = 0, tried_min_extra_isize = 0; int s_min_extra_isize = le16_to_cpu(sbi->s_es->s_min_extra_isize); int isize_diff; /* How much do we need to grow i_extra_isize */ retry: isize_diff = new_extra_isize - EXT4_I(inode)->i_extra_isize; if (EXT4_I(inode)->i_extra_isize >= new_extra_isize) return 0; header = IHDR(inode, raw_inode); /* * Check if enough free space is available in the inode to shift the * entries ahead by new_extra_isize. */ base = IFIRST(header); end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size; min_offs = end - base; total_ino = sizeof(struct ext4_xattr_ibody_header) + sizeof(u32); error = xattr_check_inode(inode, header, end); if (error) goto cleanup; ifree = ext4_xattr_free_space(base, &min_offs, base, &total_ino); if (ifree >= isize_diff) goto shift; /* * Enough free space isn't available in the inode, check if * EA block can hold new_extra_isize bytes. */ if (EXT4_I(inode)->i_file_acl) { struct buffer_head *bh; bh = ext4_sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl, REQ_PRIO); if (IS_ERR(bh)) { error = PTR_ERR(bh); goto cleanup; } error = ext4_xattr_check_block(inode, bh); if (error) { brelse(bh); goto cleanup; } base = BHDR(bh); end = bh->b_data + bh->b_size; min_offs = end - base; bfree = ext4_xattr_free_space(BFIRST(bh), &min_offs, base, NULL); brelse(bh); if (bfree + ifree < isize_diff) { if (!tried_min_extra_isize && s_min_extra_isize) { tried_min_extra_isize++; new_extra_isize = s_min_extra_isize; goto retry; } error = -ENOSPC; goto cleanup; } } else { bfree = inode->i_sb->s_blocksize; } error = ext4_xattr_make_inode_space(handle, inode, raw_inode, isize_diff, ifree, bfree, &total_ino); if (error) { if (error == -ENOSPC && !tried_min_extra_isize && s_min_extra_isize) { tried_min_extra_isize++; new_extra_isize = s_min_extra_isize; goto retry; } goto cleanup; } shift: /* Adjust the offsets and shift the remaining entries ahead */ ext4_xattr_shift_entries(IFIRST(header), EXT4_I(inode)->i_extra_isize - new_extra_isize, (void *)raw_inode + EXT4_GOOD_OLD_INODE_SIZE + new_extra_isize, (void *)header, total_ino); EXT4_I(inode)->i_extra_isize = new_extra_isize; cleanup: if (error && (mnt_count != le16_to_cpu(sbi->s_es->s_mnt_count))) { ext4_warning(inode->i_sb, "Unable to expand inode %lu. Delete some EAs or run e2fsck.", inode->i_ino); mnt_count = le16_to_cpu(sbi->s_es->s_mnt_count); } return error; } #define EIA_INCR 16 /* must be 2^n */ #define EIA_MASK (EIA_INCR - 1) /* Add the large xattr @inode into @ea_inode_array for deferred iput(). * If @ea_inode_array is new or full it will be grown and the old * contents copied over. */ static int ext4_expand_inode_array(struct ext4_xattr_inode_array **ea_inode_array, struct inode *inode) { if (*ea_inode_array == NULL) { /* * Start with 15 inodes, so it fits into a power-of-two size. * If *ea_inode_array is NULL, this is essentially offsetof() */ (*ea_inode_array) = kmalloc(offsetof(struct ext4_xattr_inode_array, inodes[EIA_MASK]), GFP_NOFS); if (*ea_inode_array == NULL) return -ENOMEM; (*ea_inode_array)->count = 0; } else if (((*ea_inode_array)->count & EIA_MASK) == EIA_MASK) { /* expand the array once all 15 + n * 16 slots are full */ struct ext4_xattr_inode_array *new_array = NULL; int count = (*ea_inode_array)->count; /* if new_array is NULL, this is essentially offsetof() */ new_array = kmalloc( offsetof(struct ext4_xattr_inode_array, inodes[count + EIA_INCR]), GFP_NOFS); if (new_array == NULL) return -ENOMEM; memcpy(new_array, *ea_inode_array, offsetof(struct ext4_xattr_inode_array, inodes[count])); kfree(*ea_inode_array); *ea_inode_array = new_array; } (*ea_inode_array)->inodes[(*ea_inode_array)->count++] = inode; return 0; } /* * ext4_xattr_delete_inode() * * Free extended attribute resources associated with this inode. Traverse * all entries and decrement reference on any xattr inodes associated with this * inode. This is called immediately before an inode is freed. We have exclusive * access to the inode. If an orphan inode is deleted it will also release its * references on xattr block and xattr inodes. */ int ext4_xattr_delete_inode(handle_t *handle, struct inode *inode, struct ext4_xattr_inode_array **ea_inode_array, int extra_credits) { struct buffer_head *bh = NULL; struct ext4_xattr_ibody_header *header; struct ext4_iloc iloc = { .bh = NULL }; struct ext4_xattr_entry *entry; struct inode *ea_inode; int error; error = ext4_journal_ensure_credits(handle, extra_credits, ext4_free_metadata_revoke_credits(inode->i_sb, 1)); if (error < 0) { EXT4_ERROR_INODE(inode, "ensure credits (error %d)", error); goto cleanup; } if (ext4_has_feature_ea_inode(inode->i_sb) && ext4_test_inode_state(inode, EXT4_STATE_XATTR)) { error = ext4_get_inode_loc(inode, &iloc); if (error) { EXT4_ERROR_INODE(inode, "inode loc (error %d)", error); goto cleanup; } error = ext4_journal_get_write_access(handle, iloc.bh); if (error) { EXT4_ERROR_INODE(inode, "write access (error %d)", error); goto cleanup; } header = IHDR(inode, ext4_raw_inode(&iloc)); if (header->h_magic == cpu_to_le32(EXT4_XATTR_MAGIC)) ext4_xattr_inode_dec_ref_all(handle, inode, iloc.bh, IFIRST(header), false /* block_csum */, ea_inode_array, extra_credits, false /* skip_quota */); } if (EXT4_I(inode)->i_file_acl) { bh = ext4_sb_bread(inode->i_sb, EXT4_I(inode)->i_file_acl, REQ_PRIO); if (IS_ERR(bh)) { error = PTR_ERR(bh); if (error == -EIO) { EXT4_ERROR_INODE_ERR(inode, EIO, "block %llu read error", EXT4_I(inode)->i_file_acl); } bh = NULL; goto cleanup; } error = ext4_xattr_check_block(inode, bh); if (error) goto cleanup; if (ext4_has_feature_ea_inode(inode->i_sb)) { for (entry = BFIRST(bh); !IS_LAST_ENTRY(entry); entry = EXT4_XATTR_NEXT(entry)) { if (!entry->e_value_inum) continue; error = ext4_xattr_inode_iget(inode, le32_to_cpu(entry->e_value_inum), le32_to_cpu(entry->e_hash), &ea_inode); if (error) continue; ext4_xattr_inode_free_quota(inode, ea_inode, le32_to_cpu(entry->e_value_size)); iput(ea_inode); } } ext4_xattr_release_block(handle, inode, bh, ea_inode_array, extra_credits); /* * Update i_file_acl value in the same transaction that releases * block. */ EXT4_I(inode)->i_file_acl = 0; error = ext4_mark_inode_dirty(handle, inode); if (error) { EXT4_ERROR_INODE(inode, "mark inode dirty (error %d)", error); goto cleanup; } ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_XATTR); } error = 0; cleanup: brelse(iloc.bh); brelse(bh); return error; } void ext4_xattr_inode_array_free(struct ext4_xattr_inode_array *ea_inode_array) { int idx; if (ea_inode_array == NULL) return; for (idx = 0; idx < ea_inode_array->count; ++idx) iput(ea_inode_array->inodes[idx]); kfree(ea_inode_array); } /* * ext4_xattr_block_cache_insert() * * Create a new entry in the extended attribute block cache, and insert * it unless such an entry is already in the cache. * * Returns 0, or a negative error number on failure. */ static void ext4_xattr_block_cache_insert(struct mb_cache *ea_block_cache, struct buffer_head *bh) { struct ext4_xattr_header *header = BHDR(bh); __u32 hash = le32_to_cpu(header->h_hash); int reusable = le32_to_cpu(header->h_refcount) < EXT4_XATTR_REFCOUNT_MAX; int error; if (!ea_block_cache) return; error = mb_cache_entry_create(ea_block_cache, GFP_NOFS, hash, bh->b_blocknr, reusable); if (error) { if (error == -EBUSY) ea_bdebug(bh, "already in cache"); } else ea_bdebug(bh, "inserting [%x]", (int)hash); } /* * ext4_xattr_cmp() * * Compare two extended attribute blocks for equality. * * Returns 0 if the blocks are equal, 1 if they differ, and * a negative error number on errors. */ static int ext4_xattr_cmp(struct ext4_xattr_header *header1, struct ext4_xattr_header *header2) { struct ext4_xattr_entry *entry1, *entry2; entry1 = ENTRY(header1+1); entry2 = ENTRY(header2+1); while (!IS_LAST_ENTRY(entry1)) { if (IS_LAST_ENTRY(entry2)) return 1; if (entry1->e_hash != entry2->e_hash || entry1->e_name_index != entry2->e_name_index || entry1->e_name_len != entry2->e_name_len || entry1->e_value_size != entry2->e_value_size || entry1->e_value_inum != entry2->e_value_inum || memcmp(entry1->e_name, entry2->e_name, entry1->e_name_len)) return 1; if (!entry1->e_value_inum && memcmp((char *)header1 + le16_to_cpu(entry1->e_value_offs), (char *)header2 + le16_to_cpu(entry2->e_value_offs), le32_to_cpu(entry1->e_value_size))) return 1; entry1 = EXT4_XATTR_NEXT(entry1); entry2 = EXT4_XATTR_NEXT(entry2); } if (!IS_LAST_ENTRY(entry2)) return 1; return 0; } /* * ext4_xattr_block_cache_find() * * Find an identical extended attribute block. * * Returns a pointer to the block found, or NULL if such a block was * not found or an error occurred. */ static struct buffer_head * ext4_xattr_block_cache_find(struct inode *inode, struct ext4_xattr_header *header, struct mb_cache_entry **pce) { __u32 hash = le32_to_cpu(header->h_hash); struct mb_cache_entry *ce; struct mb_cache *ea_block_cache = EA_BLOCK_CACHE(inode); if (!ea_block_cache) return NULL; if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); ce = mb_cache_entry_find_first(ea_block_cache, hash); while (ce) { struct buffer_head *bh; bh = ext4_sb_bread(inode->i_sb, ce->e_value, REQ_PRIO); if (IS_ERR(bh)) { if (PTR_ERR(bh) == -ENOMEM) return NULL; bh = NULL; EXT4_ERROR_INODE(inode, "block %lu read error", (unsigned long)ce->e_value); } else if (ext4_xattr_cmp(header, BHDR(bh)) == 0) { *pce = ce; return bh; } brelse(bh); ce = mb_cache_entry_find_next(ea_block_cache, ce); } return NULL; } #define NAME_HASH_SHIFT 5 #define VALUE_HASH_SHIFT 16 /* * ext4_xattr_hash_entry() * * Compute the hash of an extended attribute. */ static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value, size_t value_count) { __u32 hash = 0; while (name_len--) { hash = (hash << NAME_HASH_SHIFT) ^ (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^ *name++; } while (value_count--) { hash = (hash << VALUE_HASH_SHIFT) ^ (hash >> (8*sizeof(hash) - VALUE_HASH_SHIFT)) ^ le32_to_cpu(*value++); } return cpu_to_le32(hash); } #undef NAME_HASH_SHIFT #undef VALUE_HASH_SHIFT #define BLOCK_HASH_SHIFT 16 /* * ext4_xattr_rehash() * * Re-compute the extended attribute hash value after an entry has changed. */ static void ext4_xattr_rehash(struct ext4_xattr_header *header) { struct ext4_xattr_entry *here; __u32 hash = 0; here = ENTRY(header+1); while (!IS_LAST_ENTRY(here)) { if (!here->e_hash) { /* Block is not shared if an entry's hash value == 0 */ hash = 0; break; } hash = (hash << BLOCK_HASH_SHIFT) ^ (hash >> (8*sizeof(hash) - BLOCK_HASH_SHIFT)) ^ le32_to_cpu(here->e_hash); here = EXT4_XATTR_NEXT(here); } header->h_hash = cpu_to_le32(hash); } #undef BLOCK_HASH_SHIFT #define HASH_BUCKET_BITS 10 struct mb_cache * ext4_xattr_create_cache(void) { return mb_cache_create(HASH_BUCKET_BITS); } void ext4_xattr_destroy_cache(struct mb_cache *cache) { if (cache) mb_cache_destroy(cache); }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 /* SPDX-License-Identifier: GPL-2.0 */ /* * Released under the GPLv2 only. */ #include <linux/pm.h> #include <linux/acpi.h> struct usb_hub_descriptor; struct usb_dev_state; /* Functions local to drivers/usb/core/ */ extern int usb_create_sysfs_dev_files(struct usb_device *dev); extern void usb_remove_sysfs_dev_files(struct usb_device *dev); extern void usb_create_sysfs_intf_files(struct usb_interface *intf); extern void usb_remove_sysfs_intf_files(struct usb_interface *intf); extern int usb_create_ep_devs(struct device *parent, struct usb_host_endpoint *endpoint, struct usb_device *udev); extern void usb_remove_ep_devs(struct usb_host_endpoint *endpoint); extern void usb_enable_endpoint(struct usb_device *dev, struct usb_host_endpoint *ep, bool reset_toggle); extern void usb_enable_interface(struct usb_device *dev, struct usb_interface *intf, bool reset_toggles); extern void usb_disable_endpoint(struct usb_device *dev, unsigned int epaddr, bool reset_hardware); extern void usb_disable_interface(struct usb_device *dev, struct usb_interface *intf, bool reset_hardware); extern void usb_release_interface_cache(struct kref *ref); extern void usb_disable_device(struct usb_device *dev, int skip_ep0); extern int usb_deauthorize_device(struct usb_device *); extern int usb_authorize_device(struct usb_device *); extern void usb_deauthorize_interface(struct usb_interface *); extern void usb_authorize_interface(struct usb_interface *); extern void usb_detect_quirks(struct usb_device *udev); extern void usb_detect_interface_quirks(struct usb_device *udev); extern void usb_release_quirk_list(void); extern bool usb_endpoint_is_ignored(struct usb_device *udev, struct usb_host_interface *intf, struct usb_endpoint_descriptor *epd); extern int usb_remove_device(struct usb_device *udev); extern int usb_get_device_descriptor(struct usb_device *dev, unsigned int size); extern int usb_set_isoch_delay(struct usb_device *dev); extern int usb_get_bos_descriptor(struct usb_device *dev); extern void usb_release_bos_descriptor(struct usb_device *dev); extern char *usb_cache_string(struct usb_device *udev, int index); extern int usb_set_configuration(struct usb_device *dev, int configuration); extern int usb_choose_configuration(struct usb_device *udev); extern int usb_generic_driver_probe(struct usb_device *udev); extern void usb_generic_driver_disconnect(struct usb_device *udev); extern int usb_generic_driver_suspend(struct usb_device *udev, pm_message_t msg); extern int usb_generic_driver_resume(struct usb_device *udev, pm_message_t msg); static inline unsigned usb_get_max_power(struct usb_device *udev, struct usb_host_config *c) { /* SuperSpeed power is in 8 mA units; others are in 2 mA units */ unsigned mul = (udev->speed >= USB_SPEED_SUPER ? 8 : 2); return c->desc.bMaxPower * mul; } extern void usb_kick_hub_wq(struct usb_device *dev); extern int usb_match_one_id_intf(struct usb_device *dev, struct usb_host_interface *intf, const struct usb_device_id *id); extern int usb_match_device(struct usb_device *dev, const struct usb_device_id *id); extern const struct usb_device_id *usb_device_match_id(struct usb_device *udev, const struct usb_device_id *id); extern bool usb_driver_applicable(struct usb_device *udev, struct usb_device_driver *udrv); extern void usb_forced_unbind_intf(struct usb_interface *intf); extern void usb_unbind_and_rebind_marked_interfaces(struct usb_device *udev); extern void usb_hub_release_all_ports(struct usb_device *hdev, struct usb_dev_state *owner); extern bool usb_device_is_owned(struct usb_device *udev); extern int usb_hub_init(void); extern void usb_hub_cleanup(void); extern int usb_major_init(void); extern void usb_major_cleanup(void); extern int usb_device_supports_lpm(struct usb_device *udev); extern int usb_port_disable(struct usb_device *udev); #ifdef CONFIG_PM extern int usb_suspend(struct device *dev, pm_message_t msg); extern int usb_resume(struct device *dev, pm_message_t msg); extern int usb_resume_complete(struct device *dev); extern int usb_port_suspend(struct usb_device *dev, pm_message_t msg); extern int usb_port_resume(struct usb_device *dev, pm_message_t msg); extern void usb_autosuspend_device(struct usb_device *udev); extern int usb_autoresume_device(struct usb_device *udev); extern int usb_remote_wakeup(struct usb_device *dev); extern int usb_runtime_suspend(struct device *dev); extern int usb_runtime_resume(struct device *dev); extern int usb_runtime_idle(struct device *dev); extern int usb_enable_usb2_hardware_lpm(struct usb_device *udev); extern int usb_disable_usb2_hardware_lpm(struct usb_device *udev); extern void usbfs_notify_suspend(struct usb_device *udev); extern void usbfs_notify_resume(struct usb_device *udev); #else static inline int usb_port_suspend(struct usb_device *udev, pm_message_t msg) { return 0; } static inline int usb_port_resume(struct usb_device *udev, pm_message_t msg) { return 0; } #define usb_autosuspend_device(udev) do {} while (0) static inline int usb_autoresume_device(struct usb_device *udev) { return 0; } static inline int usb_enable_usb2_hardware_lpm(struct usb_device *udev) { return 0; } static inline int usb_disable_usb2_hardware_lpm(struct usb_device *udev) { return 0; } #endif extern struct bus_type usb_bus_type; extern struct mutex usb_port_peer_mutex; extern struct device_type usb_device_type; extern struct device_type usb_if_device_type; extern struct device_type usb_ep_device_type; extern struct device_type usb_port_device_type; extern struct usb_device_driver usb_generic_driver; static inline int is_usb_device(const struct device *dev) { return dev->type == &usb_device_type; } static inline int is_usb_interface(const struct device *dev) { return dev->type == &usb_if_device_type; } static inline int is_usb_endpoint(const struct device *dev) { return dev->type == &usb_ep_device_type; } static inline int is_usb_port(const struct device *dev) { return dev->type == &usb_port_device_type; } static inline int is_root_hub(struct usb_device *udev) { return (udev->parent == NULL); } /* Do the same for device drivers and interface drivers. */ static inline int is_usb_device_driver(struct device_driver *drv) { return container_of(drv, struct usbdrv_wrap, driver)-> for_devices; } /* for labeling diagnostics */ extern const char *usbcore_name; /* sysfs stuff */ extern const struct attribute_group *usb_device_groups[]; extern const struct attribute_group *usb_interface_groups[]; /* usbfs stuff */ extern struct usb_driver usbfs_driver; extern const struct file_operations usbfs_devices_fops; extern const struct file_operations usbdev_file_operations; extern int usb_devio_init(void); extern void usb_devio_cleanup(void); /* * Firmware specific cookie identifying a port's location. '0' == no location * data available */ typedef u32 usb_port_location_t; /* internal notify stuff */ extern void usb_notify_add_device(struct usb_device *udev); extern void usb_notify_remove_device(struct usb_device *udev); extern void usb_notify_add_bus(struct usb_bus *ubus); extern void usb_notify_remove_bus(struct usb_bus *ubus); extern void usb_hub_adjust_deviceremovable(struct usb_device *hdev, struct usb_hub_descriptor *desc); #ifdef CONFIG_ACPI extern int usb_acpi_register(void); extern void usb_acpi_unregister(void); extern acpi_handle usb_get_hub_port_acpi_handle(struct usb_device *hdev, int port1); #else static inline int usb_acpi_register(void) { return 0; }; static inline void usb_acpi_unregister(void) { }; #endif
1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 // SPDX-License-Identifier: GPL-2.0 /* * Tag allocation using scalable bitmaps. Uses active queue tracking to support * fairer distribution of tags between multiple submitters when a shared tag map * is used. * * Copyright (C) 2013-2014 Jens Axboe */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/blk-mq.h> #include <linux/delay.h> #include "blk.h" #include "blk-mq.h" #include "blk-mq-tag.h" /* * If a previously inactive queue goes active, bump the active user count. * We need to do this before try to allocate driver tag, then even if fail * to get tag when first time, the other shared-tag users could reserve * budget for it. */ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) { if (blk_mq_is_sbitmap_shared(hctx->flags)) { struct request_queue *q = hctx->queue; struct blk_mq_tag_set *set = q->tag_set; if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) && !test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) atomic_inc(&set->active_queues_shared_sbitmap); } else { if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) && !test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) atomic_inc(&hctx->tags->active_queues); } return true; } /* * Wakeup all potentially sleeping on tags */ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve) { sbitmap_queue_wake_all(tags->bitmap_tags); if (include_reserve) sbitmap_queue_wake_all(tags->breserved_tags); } /* * If a previously busy queue goes inactive, potential waiters could now * be allowed to queue. Wake them up and check. */ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx) { struct blk_mq_tags *tags = hctx->tags; struct request_queue *q = hctx->queue; struct blk_mq_tag_set *set = q->tag_set; if (blk_mq_is_sbitmap_shared(hctx->flags)) { if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags)) return; atomic_dec(&set->active_queues_shared_sbitmap); } else { if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state)) return; atomic_dec(&tags->active_queues); } blk_mq_tag_wakeup_all(tags, false); } static int __blk_mq_get_tag(struct blk_mq_alloc_data *data, struct sbitmap_queue *bt) { if (!data->q->elevator && !(data->flags & BLK_MQ_REQ_RESERVED) && !hctx_may_queue(data->hctx, bt)) return BLK_MQ_NO_TAG; if (data->shallow_depth) return __sbitmap_queue_get_shallow(bt, data->shallow_depth); else return __sbitmap_queue_get(bt); } unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) { struct blk_mq_tags *tags = blk_mq_tags_from_data(data); struct sbitmap_queue *bt; struct sbq_wait_state *ws; DEFINE_SBQ_WAIT(wait); unsigned int tag_offset; int tag; if (data->flags & BLK_MQ_REQ_RESERVED) { if (unlikely(!tags->nr_reserved_tags)) { WARN_ON_ONCE(1); return BLK_MQ_NO_TAG; } bt = tags->breserved_tags; tag_offset = 0; } else { bt = tags->bitmap_tags; tag_offset = tags->nr_reserved_tags; } tag = __blk_mq_get_tag(data, bt); if (tag != BLK_MQ_NO_TAG) goto found_tag; if (data->flags & BLK_MQ_REQ_NOWAIT) return BLK_MQ_NO_TAG; ws = bt_wait_ptr(bt, data->hctx); do { struct sbitmap_queue *bt_prev; /* * We're out of tags on this hardware queue, kick any * pending IO submits before going to sleep waiting for * some to complete. */ blk_mq_run_hw_queue(data->hctx, false); /* * Retry tag allocation after running the hardware queue, * as running the queue may also have found completions. */ tag = __blk_mq_get_tag(data, bt); if (tag != BLK_MQ_NO_TAG) break; sbitmap_prepare_to_wait(bt, ws, &wait, TASK_UNINTERRUPTIBLE); tag = __blk_mq_get_tag(data, bt); if (tag != BLK_MQ_NO_TAG) break; bt_prev = bt; io_schedule(); sbitmap_finish_wait(bt, ws, &wait); data->ctx = blk_mq_get_ctx(data->q); data->hctx = blk_mq_map_queue(data->q, data->cmd_flags, data->ctx); tags = blk_mq_tags_from_data(data); if (data->flags & BLK_MQ_REQ_RESERVED) bt = tags->breserved_tags; else bt = tags->bitmap_tags; /* * If destination hw queue is changed, fake wake up on * previous queue for compensating the wake up miss, so * other allocations on previous queue won't be starved. */ if (bt != bt_prev) sbitmap_queue_wake_up(bt_prev); ws = bt_wait_ptr(bt, data->hctx); } while (1); sbitmap_finish_wait(bt, ws, &wait); found_tag: /* * Give up this allocation if the hctx is inactive. The caller will * retry on an active hctx. */ if (unlikely(test_bit(BLK_MQ_S_INACTIVE, &data->hctx->state))) { blk_mq_put_tag(tags, data->ctx, tag + tag_offset); return BLK_MQ_NO_TAG; } return tag + tag_offset; } void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx, unsigned int tag) { if (!blk_mq_tag_is_reserved(tags, tag)) { const int real_tag = tag - tags->nr_reserved_tags; BUG_ON(real_tag >= tags->nr_tags); sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu); } else { BUG_ON(tag >= tags->nr_reserved_tags); sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu); } } struct bt_iter_data { struct blk_mq_hw_ctx *hctx; busy_iter_fn *fn; void *data; bool reserved; }; static struct request *blk_mq_find_and_get_req(struct blk_mq_tags *tags, unsigned int bitnr) { struct request *rq; unsigned long flags; spin_lock_irqsave(&tags->lock, flags); rq = tags->rqs[bitnr]; if (!rq || rq->tag != bitnr || !refcount_inc_not_zero(&rq->ref)) rq = NULL; spin_unlock_irqrestore(&tags->lock, flags); return rq; } static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) { struct bt_iter_data *iter_data = data; struct blk_mq_hw_ctx *hctx = iter_data->hctx; struct blk_mq_tags *tags = hctx->tags; bool reserved = iter_data->reserved; struct request *rq; bool ret = true; if (!reserved) bitnr += tags->nr_reserved_tags; /* * We can hit rq == NULL here, because the tagging functions * test and set the bit before assigning ->rqs[]. */ rq = blk_mq_find_and_get_req(tags, bitnr); if (!rq) return true; if (rq->q == hctx->queue && rq->mq_hctx == hctx) ret = iter_data->fn(hctx, rq, iter_data->data, reserved); blk_mq_put_rq_ref(rq); return ret; } /** * bt_for_each - iterate over the requests associated with a hardware queue * @hctx: Hardware queue to examine. * @bt: sbitmap to examine. This is either the breserved_tags member * or the bitmap_tags member of struct blk_mq_tags. * @fn: Pointer to the function that will be called for each request * associated with @hctx that has been assigned a driver tag. * @fn will be called as follows: @fn(@hctx, rq, @data, @reserved) * where rq is a pointer to a request. Return true to continue * iterating tags, false to stop. * @data: Will be passed as third argument to @fn. * @reserved: Indicates whether @bt is the breserved_tags member or the * bitmap_tags member of struct blk_mq_tags. */ static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt, busy_iter_fn *fn, void *data, bool reserved) { struct bt_iter_data iter_data = { .hctx = hctx, .fn = fn, .data = data, .reserved = reserved, }; sbitmap_for_each_set(&bt->sb, bt_iter, &iter_data); } struct bt_tags_iter_data { struct blk_mq_tags *tags; busy_tag_iter_fn *fn; void *data; unsigned int flags; }; #define BT_TAG_ITER_RESERVED (1 << 0) #define BT_TAG_ITER_STARTED (1 << 1) #define BT_TAG_ITER_STATIC_RQS (1 << 2) static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) { struct bt_tags_iter_data *iter_data = data; struct blk_mq_tags *tags = iter_data->tags; bool reserved = iter_data->flags & BT_TAG_ITER_RESERVED; struct request *rq; bool ret = true; bool iter_static_rqs = !!(iter_data->flags & BT_TAG_ITER_STATIC_RQS); if (!reserved) bitnr += tags->nr_reserved_tags; /* * We can hit rq == NULL here, because the tagging functions * test and set the bit before assigning ->rqs[]. */ if (iter_static_rqs) rq = tags->static_rqs[bitnr]; else rq = blk_mq_find_and_get_req(tags, bitnr); if (!rq) return true; if (!(iter_data->flags & BT_TAG_ITER_STARTED) || blk_mq_request_started(rq)) ret = iter_data->fn(rq, iter_data->data, reserved); if (!iter_static_rqs) blk_mq_put_rq_ref(rq); return ret; } /** * bt_tags_for_each - iterate over the requests in a tag map * @tags: Tag map to iterate over. * @bt: sbitmap to examine. This is either the breserved_tags member * or the bitmap_tags member of struct blk_mq_tags. * @fn: Pointer to the function that will be called for each started * request. @fn will be called as follows: @fn(rq, @data, * @reserved) where rq is a pointer to a request. Return true * to continue iterating tags, false to stop. * @data: Will be passed as second argument to @fn. * @flags: BT_TAG_ITER_* */ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt, busy_tag_iter_fn *fn, void *data, unsigned int flags) { struct bt_tags_iter_data iter_data = { .tags = tags, .fn = fn, .data = data, .flags = flags, }; if (tags->rqs) sbitmap_for_each_set(&bt->sb, bt_tags_iter, &iter_data); } static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, void *priv, unsigned int flags) { WARN_ON_ONCE(flags & BT_TAG_ITER_RESERVED); if (tags->nr_reserved_tags) bt_tags_for_each(tags, tags->breserved_tags, fn, priv, flags | BT_TAG_ITER_RESERVED); bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, flags); } /** * blk_mq_all_tag_iter - iterate over all requests in a tag map * @tags: Tag map to iterate over. * @fn: Pointer to the function that will be called for each * request. @fn will be called as follows: @fn(rq, @priv, * reserved) where rq is a pointer to a request. 'reserved' * indicates whether or not @rq is a reserved request. Return * true to continue iterating tags, false to stop. * @priv: Will be passed as second argument to @fn. * * Caller has to pass the tag map from which requests are allocated. */ void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn, void *priv) { __blk_mq_all_tag_iter(tags, fn, priv, BT_TAG_ITER_STATIC_RQS); } /** * blk_mq_tagset_busy_iter - iterate over all started requests in a tag set * @tagset: Tag set to iterate over. * @fn: Pointer to the function that will be called for each started * request. @fn will be called as follows: @fn(rq, @priv, * reserved) where rq is a pointer to a request. 'reserved' * indicates whether or not @rq is a reserved request. Return * true to continue iterating tags, false to stop. * @priv: Will be passed as second argument to @fn. * * We grab one request reference before calling @fn and release it after * @fn returns. */ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset, busy_tag_iter_fn *fn, void *priv) { int i; for (i = 0; i < tagset->nr_hw_queues; i++) { if (tagset->tags && tagset->tags[i]) __blk_mq_all_tag_iter(tagset->tags[i], fn, priv, BT_TAG_ITER_STARTED); } } EXPORT_SYMBOL(blk_mq_tagset_busy_iter); static bool blk_mq_tagset_count_completed_rqs(struct request *rq, void *data, bool reserved) { unsigned *count = data; if (blk_mq_request_completed(rq)) (*count)++; return true; } /** * blk_mq_tagset_wait_completed_request - wait until all completed req's * complete funtion is run * @tagset: Tag set to drain completed request * * Note: This function has to be run after all IO queues are shutdown */ void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset) { while (true) { unsigned count = 0; blk_mq_tagset_busy_iter(tagset, blk_mq_tagset_count_completed_rqs, &count); if (!count) break; msleep(5); } } EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request); /** * blk_mq_queue_tag_busy_iter - iterate over all requests with a driver tag * @q: Request queue to examine. * @fn: Pointer to the function that will be called for each request * on @q. @fn will be called as follows: @fn(hctx, rq, @priv, * reserved) where rq is a pointer to a request and hctx points * to the hardware queue associated with the request. 'reserved' * indicates whether or not @rq is a reserved request. * @priv: Will be passed as third argument to @fn. * * Note: if @q->tag_set is shared with other request queues then @fn will be * called for all requests on all queues that share that tag set and not only * for requests associated with @q. */ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, void *priv) { struct blk_mq_hw_ctx *hctx; int i; /* * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx * while the queue is frozen. So we can use q_usage_counter to avoid * racing with it. */ if (!percpu_ref_tryget(&q->q_usage_counter)) return; queue_for_each_hw_ctx(q, hctx, i) { struct blk_mq_tags *tags = hctx->tags; /* * If no software queues are currently mapped to this * hardware queue, there's nothing to check */ if (!blk_mq_hw_queue_mapped(hctx)) continue; if (tags->nr_reserved_tags) bt_for_each(hctx, tags->breserved_tags, fn, priv, true); bt_for_each(hctx, tags->bitmap_tags, fn, priv, false); } blk_queue_exit(q); } static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth, bool round_robin, int node) { return sbitmap_queue_init_node(bt, depth, -1, round_robin, GFP_KERNEL, node); } static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags, int node, int alloc_policy) { unsigned int depth = tags->nr_tags - tags->nr_reserved_tags; bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR; if (bt_alloc(&tags->__bitmap_tags, depth, round_robin, node)) return -ENOMEM; if (bt_alloc(&tags->__breserved_tags, tags->nr_reserved_tags, round_robin, node)) goto free_bitmap_tags; tags->bitmap_tags = &tags->__bitmap_tags; tags->breserved_tags = &tags->__breserved_tags; return 0; free_bitmap_tags: sbitmap_queue_free(&tags->__bitmap_tags); return -ENOMEM; } int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int flags) { unsigned int depth = set->queue_depth - set->reserved_tags; int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags); bool round_robin = alloc_policy == BLK_TAG_ALLOC_RR; int i, node = set->numa_node; if (bt_alloc(&set->__bitmap_tags, depth, round_robin, node)) return -ENOMEM; if (bt_alloc(&set->__breserved_tags, set->reserved_tags, round_robin, node)) goto free_bitmap_tags; for (i = 0; i < set->nr_hw_queues; i++) { struct blk_mq_tags *tags = set->tags[i]; tags->bitmap_tags = &set->__bitmap_tags; tags->breserved_tags = &set->__breserved_tags; } return 0; free_bitmap_tags: sbitmap_queue_free(&set->__bitmap_tags); return -ENOMEM; } void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set) { sbitmap_queue_free(&set->__bitmap_tags); sbitmap_queue_free(&set->__breserved_tags); } struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags, unsigned int reserved_tags, int node, unsigned int flags) { int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(flags); struct blk_mq_tags *tags; if (total_tags > BLK_MQ_TAG_MAX) { pr_err("blk-mq: tag depth too large\n"); return NULL; } tags = kzalloc_node(sizeof(*tags), GFP_KERNEL, node); if (!tags) return NULL; tags->nr_tags = total_tags; tags->nr_reserved_tags = reserved_tags; spin_lock_init(&tags->lock); if (flags & BLK_MQ_F_TAG_HCTX_SHARED) return tags; if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) { kfree(tags); return NULL; } return tags; } void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags) { if (!(flags & BLK_MQ_F_TAG_HCTX_SHARED)) { sbitmap_queue_free(tags->bitmap_tags); sbitmap_queue_free(tags->breserved_tags); } kfree(tags); } int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags **tagsptr, unsigned int tdepth, bool can_grow) { struct blk_mq_tags *tags = *tagsptr; if (tdepth <= tags->nr_reserved_tags) return -EINVAL; /* * If we are allowed to grow beyond the original size, allocate * a new set of tags before freeing the old one. */ if (tdepth > tags->nr_tags) { struct blk_mq_tag_set *set = hctx->queue->tag_set; /* Only sched tags can grow, so clear HCTX_SHARED flag */ unsigned int flags = set->flags & ~BLK_MQ_F_TAG_HCTX_SHARED; struct blk_mq_tags *new; bool ret; if (!can_grow) return -EINVAL; /* * We need some sort of upper limit, set it high enough that * no valid use cases should require more. */ if (tdepth > 16 * BLKDEV_MAX_RQ) return -EINVAL; new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth, tags->nr_reserved_tags, flags); if (!new) return -ENOMEM; ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth); if (ret) { blk_mq_free_rq_map(new, flags); return -ENOMEM; } blk_mq_free_rqs(set, *tagsptr, hctx->queue_num); blk_mq_free_rq_map(*tagsptr, flags); *tagsptr = new; } else { /* * Don't need (or can't) update reserved tags here, they * remain static and should never need resizing. */ sbitmap_queue_resize(tags->bitmap_tags, tdepth - tags->nr_reserved_tags); } return 0; } void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size) { sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags); } /** * blk_mq_unique_tag() - return a tag that is unique queue-wide * @rq: request for which to compute a unique tag * * The tag field in struct request is unique per hardware queue but not over * all hardware queues. Hence this function that returns a tag with the * hardware context index in the upper bits and the per hardware queue tag in * the lower bits. * * Note: When called for a request that is queued on a non-multiqueue request * queue, the hardware context index is set to zero. */ u32 blk_mq_unique_tag(struct request *rq) { return (rq->mq_hctx->queue_num << BLK_MQ_UNIQUE_TAG_BITS) | (rq->tag & BLK_MQ_UNIQUE_TAG_MASK); } EXPORT_SYMBOL(blk_mq_unique_tag);
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef LINUX_MLD_H #define LINUX_MLD_H #include <linux/in6.h> #include <linux/icmpv6.h> /* MLDv1 Query/Report/Done */ struct mld_msg { struct icmp6hdr mld_hdr; struct in6_addr mld_mca; }; #define mld_type mld_hdr.icmp6_type #define mld_code mld_hdr.icmp6_code #define mld_cksum mld_hdr.icmp6_cksum #define mld_maxdelay mld_hdr.icmp6_maxdelay #define mld_reserved mld_hdr.icmp6_dataun.un_data16[1] /* Multicast Listener Discovery version 2 headers */ /* MLDv2 Report */ struct mld2_grec { __u8 grec_type; __u8 grec_auxwords; __be16 grec_nsrcs; struct in6_addr grec_mca; struct in6_addr grec_src[]; }; struct mld2_report { struct icmp6hdr mld2r_hdr; struct mld2_grec mld2r_grec[]; }; #define mld2r_type mld2r_hdr.icmp6_type #define mld2r_resv1 mld2r_hdr.icmp6_code #define mld2r_cksum mld2r_hdr.icmp6_cksum #define mld2r_resv2 mld2r_hdr.icmp6_dataun.un_data16[0] #define mld2r_ngrec mld2r_hdr.icmp6_dataun.un_data16[1] /* MLDv2 Query */ struct mld2_query { struct icmp6hdr mld2q_hdr; struct in6_addr mld2q_mca; #if defined(__LITTLE_ENDIAN_BITFIELD) __u8 mld2q_qrv:3, mld2q_suppress:1, mld2q_resv2:4; #elif defined(__BIG_ENDIAN_BITFIELD) __u8 mld2q_resv2:4, mld2q_suppress:1, mld2q_qrv:3; #else #error "Please fix <asm/byteorder.h>" #endif __u8 mld2q_qqic; __be16 mld2q_nsrcs; struct in6_addr mld2q_srcs[]; }; #define mld2q_type mld2q_hdr.icmp6_type #define mld2q_code mld2q_hdr.icmp6_code #define mld2q_cksum mld2q_hdr.icmp6_cksum #define mld2q_mrc mld2q_hdr.icmp6_maxdelay #define mld2q_resv1 mld2q_hdr.icmp6_dataun.un_data16[1] /* RFC3810, 5.1.3. Maximum Response Code: * * If Maximum Response Code >= 32768, Maximum Response Code represents a * floating-point value as follows: * * 0 1 2 3 4 5 6 7 8 9 A B C D E F * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * |1| exp | mant | * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ #define MLDV2_MRC_EXP(value) (((value) >> 12) & 0x0007) #define MLDV2_MRC_MAN(value) ((value) & 0x0fff) /* RFC3810, 5.1.9. QQIC (Querier's Query Interval Code): * * If QQIC >= 128, QQIC represents a floating-point value as follows: * * 0 1 2 3 4 5 6 7 * +-+-+-+-+-+-+-+-+ * |1| exp | mant | * +-+-+-+-+-+-+-+-+ */ #define MLDV2_QQIC_EXP(value) (((value) >> 4) & 0x07) #define MLDV2_QQIC_MAN(value) ((value) & 0x0f) #define MLD_EXP_MIN_LIMIT 32768UL #define MLDV1_MRD_MAX_COMPAT (MLD_EXP_MIN_LIMIT - 1) static inline unsigned long mldv2_mrc(const struct mld2_query *mlh2) { /* RFC3810, 5.1.3. Maximum Response Code */ unsigned long ret, mc_mrc = ntohs(mlh2->mld2q_mrc); if (mc_mrc < MLD_EXP_MIN_LIMIT) { ret = mc_mrc; } else { unsigned long mc_man, mc_exp; mc_exp = MLDV2_MRC_EXP(mc_mrc); mc_man = MLDV2_MRC_MAN(mc_mrc); ret = (mc_man | 0x1000) << (mc_exp + 3); } return ret; } #endif
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _ASM_X86_COMPAT_H #define _ASM_X86_COMPAT_H /* * Architecture specific compatibility types */ #include <linux/types.h> #include <linux/sched.h> #include <linux/sched/task_stack.h> #include <asm/processor.h> #include <asm/user32.h> #include <asm/unistd.h> #include <asm-generic/compat.h> #define COMPAT_USER_HZ 100 #define COMPAT_UTS_MACHINE "i686\0\0" typedef u16 __compat_uid_t; typedef u16 __compat_gid_t; typedef u32 __compat_uid32_t; typedef u32 __compat_gid32_t; typedef u16 compat_mode_t; typedef u16 compat_dev_t; typedef u16 compat_nlink_t; typedef u16 compat_ipc_pid_t; typedef u32 compat_caddr_t; typedef __kernel_fsid_t compat_fsid_t; struct compat_stat { compat_dev_t st_dev; u16 __pad1; compat_ino_t st_ino; compat_mode_t st_mode; compat_nlink_t st_nlink; __compat_uid_t st_uid; __compat_gid_t st_gid; compat_dev_t st_rdev; u16 __pad2; u32 st_size; u32 st_blksize; u32 st_blocks; u32 st_atime; u32 st_atime_nsec; u32 st_mtime; u32 st_mtime_nsec; u32 st_ctime; u32 st_ctime_nsec; u32 __unused4; u32 __unused5; }; struct compat_flock { short l_type; short l_whence; compat_off_t l_start; compat_off_t l_len; compat_pid_t l_pid; }; #define F_GETLK64 12 /* using 'struct flock64' */ #define F_SETLK64 13 #define F_SETLKW64 14 /* * IA32 uses 4 byte alignment for 64 bit quantities, * so we need to pack this structure. */ struct compat_flock64 { short l_type; short l_whence; compat_loff_t l_start; compat_loff_t l_len; compat_pid_t l_pid; } __attribute__((packed)); struct compat_statfs { int f_type; int f_bsize; int f_blocks; int f_bfree; int f_bavail; int f_files; int f_ffree; compat_fsid_t f_fsid; int f_namelen; /* SunOS ignores this field. */ int f_frsize; int f_flags; int f_spare[4]; }; #define COMPAT_RLIM_INFINITY 0xffffffff typedef u32 compat_old_sigset_t; /* at least 32 bits */ #define _COMPAT_NSIG 64 #define _COMPAT_NSIG_BPW 32 typedef u32 compat_sigset_word; #define COMPAT_OFF_T_MAX 0x7fffffff struct compat_ipc64_perm { compat_key_t key; __compat_uid32_t uid; __compat_gid32_t gid; __compat_uid32_t cuid; __compat_gid32_t cgid; unsigned short mode; unsigned short __pad1; unsigned short seq; unsigned short __pad2; compat_ulong_t unused1; compat_ulong_t unused2; }; struct compat_semid64_ds { struct compat_ipc64_perm sem_perm; compat_ulong_t sem_otime; compat_ulong_t sem_otime_high; compat_ulong_t sem_ctime; compat_ulong_t sem_ctime_high; compat_ulong_t sem_nsems; compat_ulong_t __unused3; compat_ulong_t __unused4; }; struct compat_msqid64_ds { struct compat_ipc64_perm msg_perm; compat_ulong_t msg_stime; compat_ulong_t msg_stime_high; compat_ulong_t msg_rtime; compat_ulong_t msg_rtime_high; compat_ulong_t msg_ctime; compat_ulong_t msg_ctime_high; compat_ulong_t msg_cbytes; compat_ulong_t msg_qnum; compat_ulong_t msg_qbytes; compat_pid_t msg_lspid; compat_pid_t msg_lrpid; compat_ulong_t __unused4; compat_ulong_t __unused5; }; struct compat_shmid64_ds { struct compat_ipc64_perm shm_perm; compat_size_t shm_segsz; compat_ulong_t shm_atime; compat_ulong_t shm_atime_high; compat_ulong_t shm_dtime; compat_ulong_t shm_dtime_high; compat_ulong_t shm_ctime; compat_ulong_t shm_ctime_high; compat_pid_t shm_cpid; compat_pid_t shm_lpid; compat_ulong_t shm_nattch; compat_ulong_t __unused4; compat_ulong_t __unused5; }; /* * The type of struct elf_prstatus.pr_reg in compatible core dumps. */ typedef struct user_regs_struct compat_elf_gregset_t; /* Full regset -- prstatus on x32, otherwise on ia32 */ #define PRSTATUS_SIZE(S, R) (R != sizeof(S.pr_reg) ? 144 : 296) #define SET_PR_FPVALID(S, V, R) \ do { *(int *) (((void *) &((S)->pr_reg)) + R) = (V); } \ while (0) #ifdef CONFIG_X86_X32_ABI #define COMPAT_USE_64BIT_TIME \ (!!(task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT)) #endif static inline void __user *arch_compat_alloc_user_space(long len) { compat_uptr_t sp; if (test_thread_flag(TIF_IA32)) { sp = task_pt_regs(current)->sp; } else { /* -128 for the x32 ABI redzone */ sp = task_pt_regs(current)->sp - 128; } return (void __user *)round_down(sp - len, 16); } static inline bool in_x32_syscall(void) { #ifdef CONFIG_X86_X32_ABI if (task_pt_regs(current)->orig_ax & __X32_SYSCALL_BIT) return true; #endif return false; } static inline bool in_32bit_syscall(void) { return in_ia32_syscall() || in_x32_syscall(); } #ifdef CONFIG_COMPAT static inline bool in_compat_syscall(void) { return in_32bit_syscall(); } #define in_compat_syscall in_compat_syscall /* override the generic impl */ #define compat_need_64bit_alignment_fixup in_ia32_syscall #endif struct compat_siginfo; #ifdef CONFIG_X86_X32_ABI int copy_siginfo_to_user32(struct compat_siginfo __user *to, const kernel_siginfo_t *from); #define copy_siginfo_to_user32 copy_siginfo_to_user32 #endif /* CONFIG_X86_X32_ABI */ #endif /* _ASM_X86_COMPAT_H */
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 67