Performance Tuning in centos7

High availability cluster

2 x 2U Sever SAS3 connect 4 x 90 bay JBOD

Top sas throughput base 32 x Raidz2 (9+2)

Sinle x86 server thgroughput could be 23GB/s
Average throughput about 15~17GB/s

BIOS setting

Enable X2APIC

cpu setting
1
2
3
4
5
6
7
tuned-adm profile throughput-performance
tuned-adm active
cpupower idle-set -d 4
cpupower idle-set -d 3
cpupower idle-set -d 2
cpupower frequency-set -g performance
# for more info /usr/lib/tuned/throughput-performance/tuned.conf
sysctl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
kernel.numa_balancing=0

net.core.netdev_max_backlog = 300000
net.ipv4.tcp_sack = 0
net.core.netdev_budget=600
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_low_latency=1
net.ipv4.tcp_rmem=16384 349520 16777216
net.ipv4.tcp_wmem=16384 349520 16777216
net.ipv4.tcp_mem = 2314209 3085613 4628418
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.somaxconn=2048
net.ipv4.tcp_adv_win_scale=1
net.ipv4.tcp_window_scaling = 1
#UDP buffer
net.core.rmem_max=16777216
Dual port 25GbE
NIC settting
1
2
3
4
5
6
7
8
ethtool -C enp131s0f0 adaptive-rx off rx-usecs 0 rx-frames 0
ethtool -G enp131s0f0 rx 8192 tx 8192
ethtool -G enp131s0f0 rx 8192 tx 8192
ethtool -N enp131s0f0 rx-flow-hash udp4 sdfn
ip link set dev enp131s0f0 txqueuelen 1500
ip link set dev enp131s0f1 txqueuelen 1500
ip link set dev bond0 txqueuelen 3000
MTU=9000
Bond info
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 7c:fe:90:de:08:38
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 1
Partner Key: 449
Partner Mac Address: 38:bc:01:79:33:51

Slave Interface: enp131s0f0
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 7c:fe:90:de:08:38
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 7c:fe:90:de:08:38
port key: 1
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 32768
system mac address: 38:bc:01:79:33:51
oper key: 449
port priority: 32768
port number: 1
port state: 61

Slave Interface: enp131s0f1
MII Status: up
Speed: 25000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 7c:fe:90:de:08:39
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 7c:fe:90:de:08:38
port key: 1
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 32768
system mac address: 38:bc:01:79:33:51
oper key: 449
port priority: 32768
port number: 97
port state: 61
Get CPU and PCIE adapter topology

Pin different cpu cores to different adapter
Disable irqbalance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Enable HT
0,1,10,11,2,24,25,26,27,28,29,3,30,31,32,33,34,35,4,5,6,7,8,9,
12,13,14,15,16,17,18,19,20,21,22,23,36,37,38,39,40,41,42,43,44,45,46,47,
cpu numa node number is: 2
cpu-hex cpu-hex numa_node
200 200000000 0
40 40000000 0
8 8000000 0
1 1000000 0
10 10000000 0
2 2000000 0
20 20000000 0
80 80000000 0
400 400000000 0
100 100000000 0
4 4000000 0
800 800000000 0
400000 400000000000 1
8000 8000000000 1
800000 800000000000 1
20000 20000000000 1
40000 40000000000 1
1000 1000000000 1
100000 100000000000 1
80000 80000000000 1
10000 10000000000 1
2000 2000000000 1
200000 200000000000 1
4000 4000000000 1

Numa node0
1
2
3
02:00.0 Non-Volatile memory controller: Intel Corporation Device 0a53 (rev 02)
03:00.0 Non-Volatile memory controller: Intel Corporation Device 0a53 (rev 02)
04:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3
Numa node1

Next time , I will change etheret adapter to numa 0

1
2
3
4
5
83:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
83:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
82:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)
84:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)

mpt3sas
1
2
$ cat /etc/modprobe.d/mpt3sas.conf
options mpt3sas max_msix_vectors=6
storage device setting
1
2
3
4
5
6
7
8
9
HDD /sys/block/*/queue/scheduler deadline
SSD /sys/block/*/queue/scheduler noop
/sys/block/*/queue/nr_requests 1024
/sys/block/*/device/queue_depth 256 #For Throughput, for latency you could reduece it
/sys/block/*/device/scsi_disk/*/cache_type write back #For throughput

/sys/block/*/device/scsi_disk/*/cache_type write through #For latency
rr_min_io_rq /etc/multipath.conf 1 #For latency
/sys/block/*/queue/nomerges 1 #For latency

my scripts
./pin_cpu.sh SAS3008 Mellanox Non-Volatile

Trigger raw_spin_lock

Reconnect SAS cable, no multipath, 10GB/s zfs throughput

Glusterfs/Lustre could get performance 5~6GB/s in 2x25GbE

ZFS for all SAS SSD JBOD
1
zfs set logbias=troughput tank
The others

running your app by numactl in single numa node
libvma to bypass kernel