I am running some experiments on a private cloud powered by OpenNebula. Not sure how relevant this is but the experiments are Hyperledger Farbirc benchmarking. It is a blockchain network that runs benchmarks with 15 containers distributed across 5 worker nodes and one manager node (The one we are talking about) that runs the benchmark. The networking and container orchestration is done using docker swarm.
When I run the benchmark on the private cloud a process called Ksoftirqd uses up 100% of the CPU and crashes the benchmark. However, to make sure the problem was not the software or docker swarm etc., I treated the exact experiment on a very similar setup on Google cloud VMs and nothing crashed so benchmark was successful.
top - 06:25:30 up 10 min, 4 users, load average: 3.03, 0.99, 0.43
Tasks: 253 total, 2 running, 152 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.5 sy, 0.0 ni, 86.7 id, 5.7 wa, 0.0 hi, 6.7 si, 0.2
KiB Mem : 14331012 total, 11549292 free, 1983412 used, 798308 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12053012 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
54 root 20 0 0 0 0 R 100 0.0 0:16.32 ksoftir+
221 root 20 0 0 0 0 I 20.8 0.0 0:00.69 kworker+
9090 root 20 0 8104 2300 1972 D 6.3 0.0 0:01.85 updated+
520 root 0 -20 0 0 0 I 1.3 0.0 0:00.07 kworker+
1696 root 20 0 2524388 142968 49000 S 0.7 1.0 0:12.89 dockerd
7643 root 20 0 14772 3232 2540 S 0.7 0.0 0:00.95 watch
7666 root 20 0 44480 4368 3576 R 0.7 0.0 0:01.42 top
95 root rt 0 0 0 0 S 0.3 0.0 0:00.37 migrati+
7298 nima 20 0 1442936 83188 26064 S 0.3 0.6 0:07.78 node
7442 root 20 0 107980 7100 6092 S 0.3 0.0 0:00.07 sshd
8775 nima 20 0 1547888 218960 26456 S 0.3 1.5 0:24.46 node
1 root 20 0 78136 9268 6704 S 0.0 0.1 0:12.68 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par+
5 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker+
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker+
7 root 20 0 0 0 0 I 0.0 0.0 0:00.10 kworker+
8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_perc+
9 root 20 0 0 0 0 S 0.0 0.0 0:00.02 ksoftir+
10 root 20 0 0 0 0 I 0.0 0.0 0:00.29 rcu_sch+
Looking at the interrupts I realized that the uhci_hcd:usb1, ens3
which is the network interface I’m using to allow networking between the VMs is only utilizing CPU7.
Every 2.0s: cat /proc/interrupts caliper-latest1: Sat May 2 06:25:31 2020
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
0: 29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC 2-edge timer
1: 0 0 0 0 0 0 0 9 0 0 0 0 0 0 0 0 IO-APIC 1-edge i8042
6: 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 IO-APIC 6-edge floppy
8: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 IO-APIC 8-edge rtc0
9: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi
10: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC 10-fasteoi virtio1
11: 0 0 0 0 0 909 0 321335 0 0 0 0 0 0 0 0 IO-APIC 11-fasteoi uhci_hcd:usb1, ens3
12: 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 IO-APIC 12-edge i8042
14: 0 0 0 260 0 0 0 0 0 0 0 0 0 0 729 0 IO-APIC 14-edge ata_piix
15: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IO-APIC 15-edge ata_piix
24: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI 65536-edge virtio0-config
25: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40349 PCI-MSI 65537-edge virtio0-req.0
NMI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Non-maskable interrupts
LOC: 24949 21841 25547 27193 17287 18251 27781 20564 20127 18850 18684 18947 18961 23488 19858 18301 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 15499 14382 14482 13364 10644 13499 15352 8655 13473 13083 8528 14588 16648 14164 14177 13493 Rescheduling interrupts
CAL: 5999 5715 7934 4806 4225 4683 4689 3410 5292 5338 9532 5868 2765 7084 3869 2875 Function call interrupts
TLB: 2147 3053 6996 4550 3418 4793 4459 4029 3515 6444 7028 3968 1070 2846 4118 4763 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Machine check polls
HYP: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Hypervisor callback interrupts
HRE: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Hyper-V reenlightenment interrupts
HVS: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Hyper-V stimer0 interrupts
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Posted-interrupt notification event
and:
root@caliper-latest1:~# grep . /proc/irq/*/smp_affinity
/proc/irq/0/smp_affinity:ffff
/proc/irq/1/smp_affinity:0100
/proc/irq/10/smp_affinity:0010
/proc/irq/11/smp_affinity:0080
/proc/irq/12/smp_affinity:0002
/proc/irq/13/smp_affinity:ffff
/proc/irq/14/smp_affinity:4000
/proc/irq/15/smp_affinity:0008
/proc/irq/2/smp_affinity:ffff
/proc/irq/24/smp_affinity:0800
/proc/irq/25/smp_affinity:ffff
/proc/irq/3/smp_affinity:ffff
/proc/irq/4/smp_affinity:ffff
/proc/irq/5/smp_affinity:ffff
/proc/irq/6/smp_affinity:0400
/proc/irq/7/smp_affinity:ffff
/proc/irq/8/smp_affinity:1000
/proc/irq/9/smp_affinity:0020
After a closer look at the /proc/softirqs I realised that NET_TX/NET_RX and TASKLETs are unevenly distributed among the CPU and are bombarding CPU7.
Every 2.0s: cat /proc/softirqs caliper-latest1: Sat May 2 06:25:44 2020
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
HI: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
TIMER: 21656 15833 14169 13228 13466 13059 13383 10565 16011 12679 16139 14041 12160 15680 14883 14232
NET_TX: 14 15 12 16 20 477 106 79183 19 14 10 17 2 22 13 9
NET_RX: 8195 11835 12037 12203 12227 9200 6719 116134 6855 6418 7086 8225 5721 7992 12477 8105
BLOCK: 3646 2950 6351 2031 1354 1851 1814 689 3132 2434 6847 3007 570 4276 1149 1569
IRQ_POLL: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
TASKLET: 0 2 0 0 1 24 2 597 14 0 1 1 1 0 0 1
SCHED: 7943 4241 3528 3305 3152 3138 2759 4318 3789 2955 3730 3298 2532 3483 3660 3383
HRTIMER: 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0
RCU: 18913 10231 18753 10353 18891 8120 10553 14994 10600 6451 12809 9338 7204 9445 10576 10050
And here is some system info:
root@caliper-latest1:~# inxi -Fxz
System: Host: caliper-latest1 Kernel: 5.3.0-51-generic x86_64
bits: 64 gcc: 7.5.0
Console: tty 3 Distro: Ubuntu 18.04.4 LTS
Machine: Device: kvm System: QEMU product: Standard PC (i440FX + PIIX 1996) v: pc-i440fx-2.8 serial: N/A
Mobo: N/A model: N/A serial: N/A
BIOS: SeaBIOS v: 1.10.2-1 date: 04/01/2014
CPU(s): 16 Single core QEMU Virtual version 2.5+s (-SMP-)
arch: P6 II rev.3 cache: 262144 KB
flags: (lm nx sse sse2 sse3) bmips: 85119
clock speeds: max: 2659 MHz 1: 2659 MHz 2: 2659 MHz 3: 2659 MHz
4: 2659 MHz 5: 2659 MHz 6: 2659 MHz 7: 2659 MHz 8: 2659 MHz
9: 2659 MHz 10: 2659 MHz 11: 2659 MHz 12: 2659 MHz 13: 2659 MHz
14: 2659 MHz 15: 2659 MHz 16: 2659 MHz
Graphics: Card: Cirrus Logic GD 5446 bus-ID: 00:02.0
Display Server: X.org 1.20.5 driver: cirrus
tty size: 77x28 Advanced Data: N/A for root out of X
Network: Card: Realtek RTL-8100/8101L/8139 PCI Fast Ethernet Adapter
driver: 8139cp v: 1.3 port: c000 bus-ID: 00:03.0
IF: ens3 state: up speed: 100 Mbps duplex: full mac: <filter>
Drives: HDD Total Size: 32.2GB (24.1% used)
ID-1: /dev/vda model: N/A size: 32.2GB
Partition: ID-1: / size: 29G used: 7.3G (26%) fs: ext4 dev: /dev/vda1
RAID: No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors: None detected - is lm-sensors installed and configured?
Info: Processes: 249 Uptime: 15 min Memory: 492.3/13995.1MB
Init: systemd runlevel: 5 Gcc sys: 7.5.0
Client: Shell (bash 4.4.201) inxi: 2.3.56```
So I personally suspect that maybe the Realtek RTL-8100/8101L/8139 adapter does not support multi-core/thread?