Unbalanced interrupt handling in multi-CPU leads to a crash (uhci_hcd:usb1, ens3) an OpenNebula VM instance

I am running some experiments on a private cloud powered by OpenNebula. Not sure how relevant this is but the experiments are Hyperledger Farbirc benchmarking. It is a blockchain network that runs benchmarks with 15 containers distributed across 5 worker nodes and one manager node (The one we are talking about) that runs the benchmark. The networking and container orchestration is done using docker swarm.

When I run the benchmark on the private cloud a process called Ksoftirqd uses up 100% of the CPU and crashes the benchmark. However, to make sure the problem was not the software or docker swarm etc., I treated the exact experiment on a very similar setup on Google cloud VMs and nothing crashed so benchmark was successful.

top - 06:25:30 up 10 min,  4 users,  load average: 3.03, 0.99, 0.43
Tasks: 253 total,   2 running, 152 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.5 sy,  0.0 ni, 86.7 id,  5.7 wa,  0.0 hi,  6.7 si,  0.2
KiB Mem : 14331012 total, 11549292 free,  1983412 used,   798308 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 12053012 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   54 root      20   0       0      0      0 R  100  0.0   0:16.32 ksoftir+
  221 root      20   0       0      0      0 I  20.8  0.0   0:00.69 kworker+
 9090 root      20   0    8104   2300   1972 D   6.3  0.0   0:01.85 updated+
  520 root       0 -20       0      0      0 I   1.3  0.0   0:00.07 kworker+
 1696 root      20   0 2524388 142968  49000 S   0.7  1.0   0:12.89 dockerd
 7643 root      20   0   14772   3232   2540 S   0.7  0.0   0:00.95 watch
 7666 root      20   0   44480   4368   3576 R   0.7  0.0   0:01.42 top
   95 root      rt   0       0      0      0 S   0.3  0.0   0:00.37 migrati+
 7298 nima      20   0 1442936  83188  26064 S   0.3  0.6   0:07.78 node
 7442 root      20   0  107980   7100   6092 S   0.3  0.0   0:00.07 sshd
 8775 nima      20   0 1547888 218960  26456 S   0.3  1.5   0:24.46 node
    1 root      20   0   78136   9268   6704 S   0.0  0.1   0:12.68 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
    3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_gp
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_par+
    5 root      20   0       0      0      0 I   0.0  0.0   0:00.00 kworker+
    6 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 kworker+
    7 root      20   0       0      0      0 I   0.0  0.0   0:00.10 kworker+
    8 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 mm_perc+
    9 root      20   0       0      0      0 S   0.0  0.0   0:00.02 ksoftir+
   10 root      20   0       0      0      0 I   0.0  0.0   0:00.29 rcu_sch+

Looking at the interrupts I realized that the uhci_hcd:usb1, ens3 which is the network interface I’m using to allow networking between the VMs is only utilizing CPU7.

Every 2.0s: cat /proc/interrupts                                                                                                                                                                                                       caliper-latest1: Sat May  2 06:25:31 2020

           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14      CPU15
  0:         29          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC   2-edge      timer
  1:          0          0          0          0          0          0          0          9          0          0          0          0          0          0          0          0   IO-APIC   1-edge      i8042
  6:          0          0          0          0          0          0          0          0          0          3          0          0          0          0          0          0   IO-APIC   6-edge      floppy
  8:          0          0          0          0          0          0          0          0          1          0          0          0          0          0          0          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC   9-fasteoi   acpi
 10:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC  10-fasteoi   virtio1
 11:          0          0          0          0          0        909          0     321335          0          0          0          0          0          0          0          0   IO-APIC  11-fasteoi   uhci_hcd:usb1, ens3
 12:          0          0          0          0          0          0         15          0          0          0          0          0          0          0          0          0   IO-APIC  12-edge      i8042
 14:          0          0          0        260          0          0          0          0          0          0          0          0          0          0        729          0   IO-APIC  14-edge      ata_piix
 15:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   IO-APIC  15-edge      ata_piix
 24:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI 65536-edge      virtio0-config
 25:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0      40349   PCI-MSI 65537-edge      virtio0-req.0
NMI:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:      24949      21841      25547      27193      17287      18251      27781      20564      20127      18850      18684      18947      18961      23488      19858      18301   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          1   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:      15499      14382      14482      13364      10644      13499      15352       8655      13473      13083       8528      14588      16648      14164      14177      13493   Rescheduling interrupts
CAL:       5999       5715       7934       4806       4225       4683       4689       3410       5292       5338       9532       5868       2765       7084       3869       2875   Function call interrupts
TLB:       2147       3053       6996       4550       3418       4793       4459       4029       3515       6444       7028       3968       1070       2846       4118       4763   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          2          2          2          2          2          2          2          2          2          2          2          2          2          2          2          2   Machine check polls
HYP:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Hypervisor callback interrupts
HRE:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Hyper-V reenlightenment interrupts
HVS:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Hyper-V stimer0 interrupts
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0   Posted-interrupt notification event

and:

root@caliper-latest1:~#  grep . /proc/irq/*/smp_affinity
/proc/irq/0/smp_affinity:ffff
/proc/irq/1/smp_affinity:0100
/proc/irq/10/smp_affinity:0010
/proc/irq/11/smp_affinity:0080
/proc/irq/12/smp_affinity:0002
/proc/irq/13/smp_affinity:ffff
/proc/irq/14/smp_affinity:4000
/proc/irq/15/smp_affinity:0008
/proc/irq/2/smp_affinity:ffff
/proc/irq/24/smp_affinity:0800
/proc/irq/25/smp_affinity:ffff
/proc/irq/3/smp_affinity:ffff
/proc/irq/4/smp_affinity:ffff
/proc/irq/5/smp_affinity:ffff
/proc/irq/6/smp_affinity:0400
/proc/irq/7/smp_affinity:ffff
/proc/irq/8/smp_affinity:1000
/proc/irq/9/smp_affinity:0020

After a closer look at the /proc/softirqs I realised that NET_TX/NET_RX and TASKLETs are unevenly distributed among the CPU and are bombarding CPU7.

Every 2.0s: cat /proc/softirqs                                                                                                                          caliper-latest1: Sat May  2 06:25:44 2020

                    CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14      CPU15

          HI:          0          0          0          0          0          0          1          0          0          0          0          0          0          0          0          0
       TIMER:      21656      15833      14169      13228      13466      13059      13383      10565      16011      12679      16139      14041      12160      15680      14883      14232
      NET_TX:         14         15         12         16         20        477        106      79183         19         14         10         17          2         22         13          9
      NET_RX:       8195      11835      12037      12203      12227       9200       6719     116134       6855       6418       7086       8225       5721       7992      12477       8105
       BLOCK:       3646       2950       6351       2031       1354       1851       1814        689       3132       2434       6847       3007        570       4276       1149       1569
    IRQ_POLL:          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0          0
     TASKLET:          0          2          0          0          1         24          2        597         14          0          1          1          1          0          0          1
       SCHED:       7943       4241       3528       3305       3152       3138       2759       4318       3789       2955       3730       3298       2532       3483       3660       3383
     HRTIMER:          0          0          0          0          0          0          0          5          0          0          0          0          0          0          0          0
         RCU:      18913      10231      18753      10353      18891       8120      10553      14994      10600       6451      12809       9338       7204       9445      10576      10050

And here is some system info:

root@caliper-latest1:~# inxi -Fxz
System:    Host: caliper-latest1 Kernel: 5.3.0-51-generic x86_64
           bits: 64 gcc: 7.5.0
           Console: tty 3 Distro: Ubuntu 18.04.4 LTS
Machine:   Device: kvm System: QEMU product: Standard PC (i440FX + PIIX 1996) v: pc-i440fx-2.8 serial: N/A
           Mobo: N/A model: N/A serial: N/A
           BIOS: SeaBIOS v: 1.10.2-1 date: 04/01/2014
CPU(s):    16 Single core QEMU Virtual version 2.5+s (-SMP-)
           arch: P6 II rev.3 cache: 262144 KB
           flags: (lm nx sse sse2 sse3) bmips: 85119
           clock speeds: max: 2659 MHz 1: 2659 MHz 2: 2659 MHz 3: 2659 MHz
           4: 2659 MHz 5: 2659 MHz 6: 2659 MHz 7: 2659 MHz 8: 2659 MHz
           9: 2659 MHz 10: 2659 MHz 11: 2659 MHz 12: 2659 MHz 13: 2659 MHz
           14: 2659 MHz 15: 2659 MHz 16: 2659 MHz
Graphics:  Card: Cirrus Logic GD 5446 bus-ID: 00:02.0
           Display Server: X.org 1.20.5 driver: cirrus
           tty size: 77x28 Advanced Data: N/A for root out of X
Network:   Card: Realtek RTL-8100/8101L/8139 PCI Fast Ethernet Adapter
           driver: 8139cp v: 1.3 port: c000 bus-ID: 00:03.0
           IF: ens3 state: up speed: 100 Mbps duplex: full mac: <filter>
Drives:    HDD Total Size: 32.2GB (24.1% used)
           ID-1: /dev/vda model: N/A size: 32.2GB
Partition: ID-1: / size: 29G used: 7.3G (26%) fs: ext4 dev: /dev/vda1
RAID:      No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors:   None detected - is lm-sensors installed and configured?
Info:      Processes: 249 Uptime: 15 min Memory: 492.3/13995.1MB
           Init: systemd runlevel: 5 Gcc sys: 7.5.0
           Client: Shell (bash 4.4.201) inxi: 2.3.56```


So I personally suspect that maybe the Realtek RTL-8100/8101L/8139 adapter does not support multi-core/thread?