FreeBSDでInfiniBand
iscsiでiSERが使用可能になったという話を見かけたので、
やる気をだしてFreeBSDのInfiniBand環境を作ってみた。
ここら辺を参考にして、環境構築。
FreeBSD 10.0-BETA1にてMHZH29-XTCが動作するのを確認。
KVM上のSR-IOVなConnectX-3は認識失敗。
10系は、Linuxの3.7相当のドライバに上がってるそうなので、ConnectX-3まで動作するのを確認済み。
FreeBSDで、InfiniBandを動かすのに必要な作業は、ユーザランド再構築+カーネル再構築。
10.1で、ドライバアップデートされている。
パフォーマンスの向上とか。
SR-IOVしているのKVMゲストでの動作はいまだにせず。
operability confirmed card - 動作確認カード (Inifiniband + 10GbE)
Card | OS Version | etc |
---|---|---|
InfiniHost III MHEA28-1TC | FreeBSD 10.0 BATA3 | port1 2 IB working |
InfiniHost III MHEA28-XTC | FreeBSD 10.0 BATA3 | port1 2 IB working |
InfiniHost LX MHES18-XTC | FreeBSD 10.0 BATA3 | port1 IB working |
InfiniHost LX MHES18-XT | FreeBSD 10.0 BATA3 | port1 IB working |
ConnectX MHRH19-XTC | FreeBSD 10.0 BATA3 | port1 IB working |
ConnectX MHQH29-XTC | FreeBSD 10.0 BATA3 | port1 2 IB working |
ConnectX-2 MHQH19B-XTR | FreeBSD 10.0 BATA3 | port1 IB working |
ConnectX-2 MHQH29B-XTR | FreeBSD 10.0 BATA3 | port1 IB port2 10GbE working |
ConnectX-2 MHZH29-XTC | FreeBSD 10.0 BATA3 | port1 IB port2 10GbE working |
ConnectX-3 MCX353A-FCBT | FreeBSD 10.0 BATA3 | port1 IB working |
ConnectX-3 MCX354A-FCBT | FreeBSD 10.0 BATA3 | KVM SR-IOV system not working.port1&2 IB working. Ether isn't test. |
InfiniBand動作用の環境構築の仕方
標準のGENRICカーネルでは、InfiniBandのドライバは未導入。
ユーザランドの方も、InfiniBandのコマンドは未導入。
その為、InfiniBandが使えるようにユーザランドとカーネルの再構築を最初に行う必要がある。
ソースの取得とかは、ここでは省略。
カーネル再構築
システム全体のソースを/usr/srcへ展開後、
cp sys/amd64/conf/GENERIC sys/amd64/conf/IB
で、InfiniBand用の「IB」カーネルコンフィグファイルを作成。
InfiniBandを使用する為「IB」ファイルに以下に変更・追加
ident IB-TEST # GENERICからわかりやすい名前へ options OFED # Infiniband protocol stack and support options SDP # Sockets Direct Protocol for infiniband options IPOIB_CM # Use connect mode ipoib device ipoib # IP over IB devices device mthca # InfiniHostIII device mlx4ib # ConnectX Infiniband support device mlxen # ConnectX Ethernet support
上記の設定で、
- InfiniHostドライバ
- ConnectX InfiniBandドライバ
- ConnectX Ethernetドライバ
- ipoibのドライバ
- ipoib CM使用可能
- SDP使用可能
となる。
ユーザランド再構築
ユーザランド側の設定は、/etc/src.confを作成し
WITH_OFED="YES"
を追加。
あとは、/usr/srcで
make buildworld installworld kernel KERNCONF=IB
実行してカーネルとユーザランドをインストール。
インストール完了後、再起動したら、IBカードが認識される。
- mhtcaの場合
ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xe8000000-0xe87fffff,0xe0000000-0xe7ffffff irq 16 at device 0.0 on pci1 ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010) ib_mthca: Initializing ib_mthca
- mlx4en/mlx4ibの場合
mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing mlx4_core mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010) mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
Tips
node_desc 設定
sysctl sys.class.infiniband.mlx4_0.node_desc=freebsd-ib
で設定可能。
port type変更
ib->ethへとかする場合。要ConnectX-2以降
手持ちのカードのkernel log + iperfの簡易ベンチの結果
iperfのベンチは、Networkチューニング前の値なので、参考程度に。
MHEA28-1TC FreeBSD 10.0 mthca + ipoib_cm driver
ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xe8000000-0xe87fffff,0xe0000000-0xe7ffffff irq 16 at device 0.0 on pci1 ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010) ib_mthca: Initializing ib_mthca ib0: Attached to mthca0 port 1 ib1: Attached to mthca0 port 2
# iperf -c 192.168.17.12 -i 1 -w 128K ------------------------------------------------------------ Client connecting to 192.168.17.12, TCP port 5001 TCP window size: 256 KByte (WARNING: requested 128 KByte) ------------------------------------------------------------ [ 3] local 192.168.17.21 port 60800 connected with 192.168.17.12 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 222 MBytes 1.86 Gbits/sec [ 3] 1.0- 2.0 sec 278 MBytes 2.33 Gbits/sec [ 3] 2.0- 3.0 sec 280 MBytes 2.35 Gbits/sec [ 3] 3.0- 4.0 sec 279 MBytes 2.34 Gbits/sec [ 3] 4.0- 5.0 sec 276 MBytes 2.31 Gbits/sec [ 3] 5.0- 6.0 sec 271 MBytes 2.28 Gbits/sec [ 3] 6.0- 7.0 sec 277 MBytes 2.32 Gbits/sec [ 3] 7.0- 8.0 sec 275 MBytes 2.31 Gbits/sec [ 3] 8.0- 9.0 sec 280 MBytes 2.35 Gbits/sec [ 3] 9.0-10.0 sec 279 MBytes 2.34 Gbits/sec [ 3] 0.0-10.0 sec 2.65 GBytes 2.28 Gbits/sec
# iperf -c 192.168.17.12 -i 1 ------------------------------------------------------------ Client connecting to 192.168.17.12, TCP port 5001 TCP window size: 648 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.17.21 port 60804 connected with 192.168.17.12 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 71.4 MBytes 599 Mbits/sec [ 3] 1.0- 2.0 sec 182 MBytes 1.53 Gbits/sec [ 3] 2.0- 3.0 sec 268 MBytes 2.24 Gbits/sec [ 3] 3.0- 4.0 sec 283 MBytes 2.38 Gbits/sec [ 3] 4.0- 5.0 sec 265 MBytes 2.23 Gbits/sec [ 3] 5.0- 6.0 sec 267 MBytes 2.24 Gbits/sec [ 3] 6.0- 7.0 sec 289 MBytes 2.42 Gbits/sec [ 3] 7.0- 8.0 sec 293 MBytes 2.46 Gbits/sec [ 3] 8.0- 9.0 sec 277 MBytes 2.32 Gbits/sec [ 3] 9.0-10.0 sec 291 MBytes 2.44 Gbits/sec [ 3] 0.0-10.0 sec 2.43 GBytes 2.08 Gbits/sec
MHEA28-XTC FreeBSD 10.0 mthca + ipoib_cm driver
ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010) ib_mthca: Initializing ib_mthca ib0: Attached to mthca0 port 1 ib1: Attached to mthca0 port 2
01:00.0 InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0)
速いホスト
# iperf -c 192.168.17.12 -i 1 -w 256K ------------------------------------------------------------ Client connecting to 192.168.17.12, TCP port 5001 TCP window size: 512 KByte (WARNING: requested 256 KByte) ------------------------------------------------------------ [ 3] local 192.168.17.55 port 52472 connected with 192.168.17.12 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 286 MBytes 2.40 Gbits/sec [ 3] 1.0- 2.0 sec 368 MBytes 3.08 Gbits/sec [ 3] 2.0- 3.0 sec 367 MBytes 3.08 Gbits/sec [ 3] 3.0- 4.0 sec 367 MBytes 3.08 Gbits/sec [ 3] 4.0- 5.0 sec 368 MBytes 3.08 Gbits/sec [ 3] 5.0- 6.0 sec 366 MBytes 3.07 Gbits/sec [ 3] 6.0- 7.0 sec 367 MBytes 3.08 Gbits/sec [ 3] 7.0- 8.0 sec 367 MBytes 3.08 Gbits/sec [ 3] 8.0- 9.0 sec 367 MBytes 3.08 Gbits/sec [ 3] 9.0-10.0 sec 366 MBytes 3.07 Gbits/sec [ 3] 0.0-10.0 sec 3.50 GBytes 3.01 Gbits/sec
遅いホスト
# iperf -c 192.168.17.12 -i 1 ------------------------------------------------------------ Client connecting to 192.168.17.12, TCP port 5001 TCP window size: 648 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.17.61 port 46974 connected with 192.168.17.12 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 77.8 MBytes 652 Mbits/sec [ 3] 1.0- 2.0 sec 298 MBytes 2.50 Gbits/sec [ 3] 2.0- 3.0 sec 299 MBytes 2.51 Gbits/sec [ 3] 3.0- 4.0 sec 299 MBytes 2.51 Gbits/sec [ 3] 4.0- 5.0 sec 298 MBytes 2.50 Gbits/sec [ 3] 5.0- 6.0 sec 207 MBytes 1.74 Gbits/sec [ 3] 6.0- 7.0 sec 297 MBytes 2.49 Gbits/sec [ 3] 7.0- 8.0 sec 297 MBytes 2.49 Gbits/sec [ 3] 8.0- 9.0 sec 298 MBytes 2.50 Gbits/sec [ 3] 9.0-10.0 sec 297 MBytes 2.49 Gbits/sec [ 3] 0.0-10.0 sec 2.60 GBytes 2.24 Gbits/sec
MHES18-XTC FreeBSD 10.0 mthca + ipoib_cm driver
ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010) ib_mthca: Initializing ib_mthca ib0: Attached to mthca0 port 1
01:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev a0)
速いホスト (ConnectX-2)
# iperf -c 192.168.17.12 -i 1 -w 256K ------------------------------------------------------------ Client connecting to 192.168.17.12, TCP port 5001 TCP window size: 512 KByte (WARNING: requested 256 KByte) ------------------------------------------------------------ [ 3] local 192.168.17.55 port 52474 connected with 192.168.17.12 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 276 MBytes 2.31 Gbits/sec [ 3] 1.0- 2.0 sec 354 MBytes 2.97 Gbits/sec [ 3] 2.0- 3.0 sec 353 MBytes 2.96 Gbits/sec [ 3] 3.0- 4.0 sec 344 MBytes 2.88 Gbits/sec [ 3] 4.0- 5.0 sec 351 MBytes 2.94 Gbits/sec [ 3] 5.0- 6.0 sec 350 MBytes 2.93 Gbits/sec [ 3] 6.0- 7.0 sec 344 MBytes 2.88 Gbits/sec [ 3] 7.0- 8.0 sec 345 MBytes 2.89 Gbits/sec [ 3] 8.0- 9.0 sec 343 MBytes 2.88 Gbits/sec [ 3] 9.0-10.0 sec 344 MBytes 2.89 Gbits/sec [ 3] 0.0-10.0 sec 3.32 GBytes 2.85 Gbits/sec
遅いホスト (ConnectX-3)
# iperf -c 192.168.17.12 -i 1 -w 256K ------------------------------------------------------------ Client connecting to 192.168.17.12, TCP port 5001 TCP window size: 512 KByte (WARNING: requested 256 KByte) ------------------------------------------------------------ [ 3] local 192.168.17.61 port 47123 connected with 192.168.17.12 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 108 MBytes 908 Mbits/sec [ 3] 1.0- 2.0 sec 285 MBytes 2.39 Gbits/sec [ 3] 2.0- 3.0 sec 287 MBytes 2.41 Gbits/sec [ 3] 3.0- 4.0 sec 288 MBytes 2.41 Gbits/sec [ 3] 4.0- 5.0 sec 287 MBytes 2.40 Gbits/sec [ 3] 5.0- 6.0 sec 282 MBytes 2.37 Gbits/sec [ 3] 6.0- 7.0 sec 281 MBytes 2.36 Gbits/sec [ 3] 7.0- 8.0 sec 281 MBytes 2.36 Gbits/sec [ 3] 8.0- 9.0 sec 282 MBytes 2.36 Gbits/sec [ 3] 9.0-10.0 sec 282 MBytes 2.36 Gbits/sec [ 3] 0.0-10.0 sec 2.60 GBytes 2.23 Gbits/sec
MHES18-XT FreeBSD 10.0 mthca + ipoib_cm driver
ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010) ib_mthca: Initializing ib_mthca ib0: Attached to mthca0 port 1
# iperf -c 192.168.17.12 -i 1 ------------------------------------------------------------ Client connecting to 192.168.17.12, TCP port 5001 TCP window size: 648 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.17.55 port 52466 connected with 192.168.17.12 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 92.1 MBytes 773 Mbits/sec [ 3] 1.0- 2.0 sec 348 MBytes 2.92 Gbits/sec [ 3] 2.0- 3.0 sec 348 MBytes 2.92 Gbits/sec [ 3] 3.0- 4.0 sec 348 MBytes 2.92 Gbits/sec [ 3] 4.0- 5.0 sec 348 MBytes 2.92 Gbits/sec [ 3] 5.0- 6.0 sec 348 MBytes 2.92 Gbits/sec [ 3] 6.0- 7.0 sec 348 MBytes 2.92 Gbits/sec [ 3] 7.0- 8.0 sec 348 MBytes 2.92 Gbits/sec [ 3] 8.0- 9.0 sec 348 MBytes 2.92 Gbits/sec [ 3] 9.0-10.0 sec 348 MBytes 2.92 Gbits/sec [ 3] 0.0-10.0 sec 3.15 GBytes 2.70 Gbits/sec
MHQH29-XTCのログ
標準だと、両方共IBポートとして認識。UDPのRSSは無効(ConnectX-2以降じゃないと有効にならない)
mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing mlx4_core mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010) mlx4_en mlx4_core0: UDP RSS is not supported on this device. mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008) ib0: Attached to mlx4_0 port 1 ib1: Attached to mlx4_0 port 2
sysctl.confでethへ設定する
sys.device.mlx4_core0.mlx4_port1=eth sys.device.mlx4_core0.mlx4_port2=eth
sysctl.confでeth指定した場合のログ
mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing mlx4_core mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010) mlx4_en mlx4_core0: UDP RSS is not supported on this device. mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008) ib0: Attached to mlx4_0 port 1 ib1: Attached to mlx4_0 port 2 mlx4_core0: Only same port types supported on this HCA, aborting. "qpn 0x48: invalid attribute mask specified " "for transition 0 to 6. qp_type 4," " attr_mask 0x1\134n"<4>ib0: Failed to modify QP to ERROR state "qpn 0x49: invalid attribute mask specified " "for transition 0 to 6. qp_type 4," " attr_mask 0x1\134n"<4>ib1: Failed to modify QP to ERROR state mlx4_en mlx4_core0: UDP RSS is not supported on this device. mlx4_en mlx4_core0: Using 5 tx rings for port:1 mlx4_en mlx4_core0: Defaulting to 2 rx rings for port:1 mlx4_en mlx4_core0: Using 5 tx rings for port:2 mlx4_en mlx4_core0: Defaulting to 2 rx rings for port:2 mlx4_en mlx4_core0: Activating port:1 mlx4_en: mlx4_core0: Port 1: Port: 1, invalid mac burned: 0x0, quiting mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
何か失敗してるっぽいけど、
ifconfigで確認
root@freebsd-ib:~ # ifconfig -a em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO> ether 00:25:90:54:5d:5e inet 192.168.11.12 netmask 0xffffff00 broadcast 192.168.11.255 inet6 fe80::225:90ff:fe54:5d5e%em0 prefixlen 64 scopeid 0x1 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active em1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO> ether 00:25:90:54:5d:5f nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active pflog0: flags=0<> metric 0 mtu 33160 pfsync0: flags=0<> metric 0 mtu 1500 syncpeer: 0.0.0.0 maxupd: 128 defer: off lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 inet 127.0.0.1 netmask 0xff000000 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
orz
要調査ですな。
MHQH19B-XTRのログ
mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing mlx4_core mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010) mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008) ib0: Attached to mlx4_0 port 1
MHZH29-XTCのログ
IBとEtherで認識してる。
このカードは、QSFP(IBポート)とSFP+(10GbE)のポート構成
mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing mlx4_core mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010) mlx4_en mlx4_core0: Using 5 tx rings for port:2 mlx4_en mlx4_core0: Defaulting to 2 rx rings for port:2 mlx4_en mlx4_core0: Activating port:2 mlxen0: Ethernet address: 00:02:c9:ff:ff:ff mlx4_en: mlx4_core0: Port 2: Using 5 TX rings mlx4_en: mlx4_core0: Port 2: Using 2 RX rings mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008) ib0: Attached to mlx4_0 port 1
MCX353A-FCBTの場合
10系だと、ConnectX-3も認識する。
mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing mlx4_core mlx4_core0: 64B EQEs/CQEs supported by the device but not enabled mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010) mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008) ib0: link state changed to DOWN ib0: Attached to mlx4_0 port 1 mlx4_ib: Port 1 logical link is up ib0: link state changed to UP
MCX354A-FCBT+KVM SR-IOVの場合
mlx4_core0: <mlx4_core> mem 0xf1800000-0xf1ffffff at device 8.0 on pci0 mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) mlx4_core: Initializing mlx4_core mlx4_core0: Missing DCS, aborting.(driver_data: 0x0, pci_resource_flags(pdev, 0):0x0) device_attach: mlx4_core0 attach returned 19