FreeBSD InfiniBand


FreeBSDInfiniBand

iscsiでiSERが使用可能になったという話を見かけたので、
やる気をだしてFreeBSDInfiniBand環境を作ってみた。
ここら辺を参考にして、環境構築。
FreeBSD 10.0-BETA1にてMHZH29-XTCが動作するのを確認。
KVM上のSR-IOVなConnectX-3は認識失敗。
10系は、Linuxの3.7相当のドライバに上がってるそうなので、ConnectX-3まで動作するのを確認済み。
FreeBSDで、InfiniBandを動かすのに必要な作業は、ユーザランド再構築+カーネル再構築。
10.1で、ドライバアップデートされている。
パフォーマンスの向上とか。
SR-IOVしているのKVMゲストでの動作はいまだにせず。

 operability confirmed card - 動作確認カード (Inifiniband + 10GbE)

Card OS Version etc
InfiniHost III MHEA28-1TC FreeBSD 10.0 BATA3 port1 2 IB working
InfiniHost III MHEA28-XTC FreeBSD 10.0 BATA3 port1 2 IB working
InfiniHost LX MHES18-XTC FreeBSD 10.0 BATA3 port1 IB working
InfiniHost LX MHES18-XT FreeBSD 10.0 BATA3 port1 IB working
ConnectX MHRH19-XTC FreeBSD 10.0 BATA3 port1 IB working
ConnectX MHQH29-XTC FreeBSD 10.0 BATA3 port1 2 IB working
ConnectX-2 MHQH19B-XTR FreeBSD 10.0 BATA3 port1 IB working
ConnectX-2 MHQH29B-XTR FreeBSD 10.0 BATA3 port1 IB port2 10GbE working
ConnectX-2 MHZH29-XTC FreeBSD 10.0 BATA3 port1 IB port2 10GbE working
ConnectX-3 MCX353A-FCBT FreeBSD 10.0 BATA3 port1 IB working
ConnectX-3 MCX354A-FCBT FreeBSD 10.0 BATA3 KVM SR-IOV system not working.port1&2 IB working. Ether isn't test.

 InfiniBand動作用の環境構築の仕方

標準のGENRICカーネルでは、InfiniBandのドライバは未導入。
ユーザランドの方も、InfiniBandのコマンドは未導入。
その為、InfiniBandが使えるようにユーザランドとカーネルの再構築を最初に行う必要がある。
ソースの取得とかは、ここでは省略。

カーネル再構築

システム全体のソースを/usr/srcへ展開後、

cp sys/amd64/conf/GENERIC sys/amd64/conf/IB

で、InfiniBand用の「IB」カーネルコンフィグファイルを作成。
InfiniBandを使用する為「IB」ファイルに以下に変更・追加

ident          IB-TEST         # GENERICからわかりやすい名前へ
options        OFED            # Infiniband protocol stack and support
options        SDP             # Sockets Direct Protocol for infiniband
options        IPOIB_CM        # Use connect mode ipoib
device         ipoib           # IP over IB devices
device         mthca           # InfiniHostIII
device         mlx4ib          # ConnectX Infiniband support
device         mlxen           # ConnectX Ethernet support

上記の設定で、

  • InfiniHostドライバ
  • ConnectX InfiniBandドライバ
  • ConnectX Ethernetドライバ
  • ipoibのドライバ
  • ipoib CM使用可能
  • SDP使用可能

となる。

ユーザランド再構築

ユーザランド側の設定は、/etc/src.confを作成し

WITH_OFED="YES"

を追加。
あとは、/usr/srcで

make buildworld installworld kernel KERNCONF=IB

実行してカーネルとユーザランドをインストール。
インストール完了後、再起動したら、IBカードが認識される。

  • mhtcaの場合
ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xe8000000-0xe87fffff,0xe0000000-0xe7ffffff irq 16 at device 0.0 on pci1
ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010)
ib_mthca: Initializing ib_mthca

  • mlx4en/mlx4ibの場合
mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 
mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) 
mlx4_core: Initializing mlx4_core 
mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010)
mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)

 Tips

node_desc 設定

sysctl sys.class.infiniband.mlx4_0.node_desc=freebsd-ib

で設定可能。

port type変更


ib->ethへとかする場合。要ConnectX-2以降

 手持ちのカードのkernel log + iperfの簡易ベンチの結果

iperfのベンチは、Networkチューニング前の値なので、参考程度に。

MHEA28-1TC FreeBSD 10.0 mthca + ipoib_cm driver


ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xe8000000-0xe87fffff,0xe0000000-0xe7ffffff irq 16 at device 0.0 on pci1
ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010)
ib_mthca: Initializing ib_mthca
ib0: Attached to mthca0 port 1
ib1: Attached to mthca0 port 2

# iperf -c 192.168.17.12 -i 1 -w 128K
------------------------------------------------------------
Client connecting to 192.168.17.12, TCP port 5001
TCP window size:  256 KByte (WARNING: requested  128 KByte)
------------------------------------------------------------
[  3] local 192.168.17.21 port 60800 connected with 192.168.17.12 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   222 MBytes  1.86 Gbits/sec
[  3]  1.0- 2.0 sec   278 MBytes  2.33 Gbits/sec
[  3]  2.0- 3.0 sec   280 MBytes  2.35 Gbits/sec
[  3]  3.0- 4.0 sec   279 MBytes  2.34 Gbits/sec
[  3]  4.0- 5.0 sec   276 MBytes  2.31 Gbits/sec
[  3]  5.0- 6.0 sec   271 MBytes  2.28 Gbits/sec
[  3]  6.0- 7.0 sec   277 MBytes  2.32 Gbits/sec
[  3]  7.0- 8.0 sec   275 MBytes  2.31 Gbits/sec
[  3]  8.0- 9.0 sec   280 MBytes  2.35 Gbits/sec
[  3]  9.0-10.0 sec   279 MBytes  2.34 Gbits/sec
[  3]  0.0-10.0 sec  2.65 GBytes  2.28 Gbits/sec

# iperf -c 192.168.17.12 -i 1
------------------------------------------------------------
Client connecting to 192.168.17.12, TCP port 5001
TCP window size:  648 KByte (default)
------------------------------------------------------------
[  3] local 192.168.17.21 port 60804 connected with 192.168.17.12 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  71.4 MBytes   599 Mbits/sec
[  3]  1.0- 2.0 sec   182 MBytes  1.53 Gbits/sec
[  3]  2.0- 3.0 sec   268 MBytes  2.24 Gbits/sec
[  3]  3.0- 4.0 sec   283 MBytes  2.38 Gbits/sec
[  3]  4.0- 5.0 sec   265 MBytes  2.23 Gbits/sec
[  3]  5.0- 6.0 sec   267 MBytes  2.24 Gbits/sec
[  3]  6.0- 7.0 sec   289 MBytes  2.42 Gbits/sec
[  3]  7.0- 8.0 sec   293 MBytes  2.46 Gbits/sec
[  3]  8.0- 9.0 sec   277 MBytes  2.32 Gbits/sec
[  3]  9.0-10.0 sec   291 MBytes  2.44 Gbits/sec
[  3]  0.0-10.0 sec  2.43 GBytes  2.08 Gbits/sec

MHEA28-XTC FreeBSD 10.0 mthca + ipoib_cm driver


ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1
ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010)
ib_mthca: Initializing ib_mthca
ib0: Attached to mthca0 port 1
ib1: Attached to mthca0 port 2

01:00.0 InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0)

速いホスト

# iperf -c 192.168.17.12 -i 1 -w 256K
------------------------------------------------------------
Client connecting to 192.168.17.12, TCP port 5001
TCP window size:  512 KByte (WARNING: requested  256 KByte)
------------------------------------------------------------
[  3] local 192.168.17.55 port 52472 connected with 192.168.17.12 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   286 MBytes  2.40 Gbits/sec
[  3]  1.0- 2.0 sec   368 MBytes  3.08 Gbits/sec
[  3]  2.0- 3.0 sec   367 MBytes  3.08 Gbits/sec
[  3]  3.0- 4.0 sec   367 MBytes  3.08 Gbits/sec
[  3]  4.0- 5.0 sec   368 MBytes  3.08 Gbits/sec
[  3]  5.0- 6.0 sec   366 MBytes  3.07 Gbits/sec
[  3]  6.0- 7.0 sec   367 MBytes  3.08 Gbits/sec
[  3]  7.0- 8.0 sec   367 MBytes  3.08 Gbits/sec
[  3]  8.0- 9.0 sec   367 MBytes  3.08 Gbits/sec
[  3]  9.0-10.0 sec   366 MBytes  3.07 Gbits/sec
[  3]  0.0-10.0 sec  3.50 GBytes  3.01 Gbits/sec

遅いホスト

# iperf -c 192.168.17.12 -i 1
------------------------------------------------------------
Client connecting to 192.168.17.12, TCP port 5001
TCP window size:  648 KByte (default)
------------------------------------------------------------
[  3] local 192.168.17.61 port 46974 connected with 192.168.17.12 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  77.8 MBytes   652 Mbits/sec
[  3]  1.0- 2.0 sec   298 MBytes  2.50 Gbits/sec
[  3]  2.0- 3.0 sec   299 MBytes  2.51 Gbits/sec
[  3]  3.0- 4.0 sec   299 MBytes  2.51 Gbits/sec
[  3]  4.0- 5.0 sec   298 MBytes  2.50 Gbits/sec
[  3]  5.0- 6.0 sec   207 MBytes  1.74 Gbits/sec
[  3]  6.0- 7.0 sec   297 MBytes  2.49 Gbits/sec
[  3]  7.0- 8.0 sec   297 MBytes  2.49 Gbits/sec
[  3]  8.0- 9.0 sec   298 MBytes  2.50 Gbits/sec
[  3]  9.0-10.0 sec   297 MBytes  2.49 Gbits/sec
[  3]  0.0-10.0 sec  2.60 GBytes  2.24 Gbits/sec

MHES18-XTC FreeBSD 10.0 mthca + ipoib_cm driver


ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1
ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010)
ib_mthca: Initializing ib_mthca
ib0: Attached to mthca0 port 1

01:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev a0)

速いホスト (ConnectX-2)

# iperf -c 192.168.17.12 -i 1 -w 256K
------------------------------------------------------------
Client connecting to 192.168.17.12, TCP port 5001
TCP window size:  512 KByte (WARNING: requested  256 KByte)
------------------------------------------------------------
[  3] local 192.168.17.55 port 52474 connected with 192.168.17.12 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   276 MBytes  2.31 Gbits/sec
[  3]  1.0- 2.0 sec   354 MBytes  2.97 Gbits/sec
[  3]  2.0- 3.0 sec   353 MBytes  2.96 Gbits/sec
[  3]  3.0- 4.0 sec   344 MBytes  2.88 Gbits/sec
[  3]  4.0- 5.0 sec   351 MBytes  2.94 Gbits/sec
[  3]  5.0- 6.0 sec   350 MBytes  2.93 Gbits/sec
[  3]  6.0- 7.0 sec   344 MBytes  2.88 Gbits/sec
[  3]  7.0- 8.0 sec   345 MBytes  2.89 Gbits/sec
[  3]  8.0- 9.0 sec   343 MBytes  2.88 Gbits/sec
[  3]  9.0-10.0 sec   344 MBytes  2.89 Gbits/sec
[  3]  0.0-10.0 sec  3.32 GBytes  2.85 Gbits/sec

遅いホスト (ConnectX-3)

# iperf -c 192.168.17.12 -i 1 -w 256K
------------------------------------------------------------
Client connecting to 192.168.17.12, TCP port 5001
TCP window size:  512 KByte (WARNING: requested  256 KByte)
------------------------------------------------------------
[  3] local 192.168.17.61 port 47123 connected with 192.168.17.12 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   108 MBytes   908 Mbits/sec
[  3]  1.0- 2.0 sec   285 MBytes  2.39 Gbits/sec
[  3]  2.0- 3.0 sec   287 MBytes  2.41 Gbits/sec
[  3]  3.0- 4.0 sec   288 MBytes  2.41 Gbits/sec
[  3]  4.0- 5.0 sec   287 MBytes  2.40 Gbits/sec
[  3]  5.0- 6.0 sec   282 MBytes  2.37 Gbits/sec
[  3]  6.0- 7.0 sec   281 MBytes  2.36 Gbits/sec
[  3]  7.0- 8.0 sec   281 MBytes  2.36 Gbits/sec
[  3]  8.0- 9.0 sec   282 MBytes  2.36 Gbits/sec
[  3]  9.0-10.0 sec   282 MBytes  2.36 Gbits/sec
[  3]  0.0-10.0 sec  2.60 GBytes  2.23 Gbits/sec

MHES18-XT FreeBSD 10.0 mthca + ipoib_cm driver


ib_mthca0: <ib_mthca> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1
ib_mthca: Mellanox InfiniBand HCA driver v1.0-ofed1.5.2 (August 4, 2010)
ib_mthca: Initializing ib_mthca
ib0: Attached to mthca0 port 1

# iperf -c 192.168.17.12 -i 1
------------------------------------------------------------
Client connecting to 192.168.17.12, TCP port 5001
TCP window size:  648 KByte (default)
------------------------------------------------------------
[  3] local 192.168.17.55 port 52466 connected with 192.168.17.12 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  92.1 MBytes   773 Mbits/sec
[  3]  1.0- 2.0 sec   348 MBytes  2.92 Gbits/sec
[  3]  2.0- 3.0 sec   348 MBytes  2.92 Gbits/sec
[  3]  3.0- 4.0 sec   348 MBytes  2.92 Gbits/sec
[  3]  4.0- 5.0 sec   348 MBytes  2.92 Gbits/sec
[  3]  5.0- 6.0 sec   348 MBytes  2.92 Gbits/sec
[  3]  6.0- 7.0 sec   348 MBytes  2.92 Gbits/sec
[  3]  7.0- 8.0 sec   348 MBytes  2.92 Gbits/sec
[  3]  8.0- 9.0 sec   348 MBytes  2.92 Gbits/sec
[  3]  9.0-10.0 sec   348 MBytes  2.92 Gbits/sec
[  3]  0.0-10.0 sec  3.15 GBytes  2.70 Gbits/sec

MHQH29-XTCのログ

標準だと、両方共IBポートとして認識。UDPのRSSは無効(ConnectX-2以降じゃないと有効にならない)

mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1 
mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) 
mlx4_core: Initializing mlx4_core 
mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010)
mlx4_en mlx4_core0: UDP RSS is not supported on this device. 
mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
ib0: Attached to mlx4_0 port 1
ib1: Attached to mlx4_0 port 2

sysctl.confでethへ設定する

sys.device.mlx4_core0.mlx4_port1=eth
sys.device.mlx4_core0.mlx4_port2=eth

sysctl.confでeth指定した場合のログ

mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1
mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
mlx4_core: Initializing mlx4_core
mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010)
mlx4_en mlx4_core0: UDP RSS is not supported on this device.
mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
ib0: Attached to mlx4_0 port 1
ib1: Attached to mlx4_0 port 2
mlx4_core0: Only same port types supported on this HCA, aborting.
"qpn 0x48: invalid attribute mask specified " "for transition 0 to 6. qp_type 4," " attr_mask 0x1\134n"<4>ib0: Failed to modify QP to ERROR state
"qpn 0x49: invalid attribute mask specified " "for transition 0 to 6. qp_type 4," " attr_mask 0x1\134n"<4>ib1: Failed to modify QP to ERROR state
mlx4_en mlx4_core0: UDP RSS is not supported on this device.
mlx4_en mlx4_core0: Using 5 tx rings for port:1
mlx4_en mlx4_core0: Defaulting to 2 rx rings for port:1
mlx4_en mlx4_core0: Using 5 tx rings for port:2
mlx4_en mlx4_core0: Defaulting to 2 rx rings for port:2
mlx4_en mlx4_core0: Activating port:1
mlx4_en: mlx4_core0: Port 1: Port: 1, invalid mac burned: 0x0, quiting
mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)

何か失敗してるっぽいけど、
ifconfigで確認

root@freebsd-ib:~ # ifconfig -a
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
        ether 00:25:90:54:5d:5e
        inet 192.168.11.12 netmask 0xffffff00 broadcast 192.168.11.255
        inet6 fe80::225:90ff:fe54:5d5e%em0 prefixlen 64 scopeid 0x1
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
em1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
        ether 00:25:90:54:5d:5f
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
pflog0: flags=0<> metric 0 mtu 33160
pfsync0: flags=0<> metric 0 mtu 1500
        syncpeer: 0.0.0.0 maxupd: 128 defer: off
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

orz
要調査ですな。

MHQH19B-XTRのログ

mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1
mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
mlx4_core: Initializing mlx4_core
mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010)
mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
ib0: Attached to mlx4_0 port 1

MHZH29-XTCのログ

IBとEtherで認識してる。
このカードは、QSFP(IBポート)とSFP+(10GbE)のポート構成

mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1
mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011) 
mlx4_core: Initializing mlx4_core
mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010)
mlx4_en mlx4_core0: Using 5 tx rings for port:2
mlx4_en mlx4_core0: Defaulting to 2 rx rings for port:2
mlx4_en mlx4_core0: Activating port:2
mlxen0: Ethernet address: 00:02:c9:ff:ff:ff
mlx4_en: mlx4_core0: Port 2: Using 5 TX rings 
mlx4_en: mlx4_core0: Port 2: Using 2 RX rings
mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
ib0: Attached to mlx4_0 port 1

MCX353A-FCBTの場合

10系だと、ConnectX-3も認識する。

mlx4_core0: <mlx4_core> mem 0xf7d00000-0xf7dfffff,0xf0000000-0xf07fffff irq 16 at device 0.0 on pci1
mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
mlx4_core: Initializing mlx4_core
mlx4_core0: 64B EQEs/CQEs supported by the device but not enabled
mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.2 (July 2010)
mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
ib0: link state changed to DOWN
ib0: Attached to mlx4_0 port 1
mlx4_ib: Port 1 logical link is up
ib0: link state changed to UP

MCX354A-FCBT+KVM SR-IOVの場合

mlx4_core0: <mlx4_core> mem 0xf1800000-0xf1ffffff at device 8.0 on pci0
mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
mlx4_core: Initializing mlx4_core
mlx4_core0: Missing DCS, aborting.(driver_data: 0x0, pci_resource_flags(pdev, 0):0x0)
device_attach: mlx4_core0 attach returned 19