K3s-cluster // server role stuck in deploying status

Hi all,
During mu k3s cluster deployment, i have registered issue on server role which is stuck on deploying status, meanwhile noted that the corresponding VM is running and in ready status.
i check oneflow service logs which view exception as shown bellow :

oneadmin@node0:~$ journalctl -u opennebula-flow.service
Hint: You are currently not seeing messages from other users and the system.
      Users in groups 'adm', 'systemd-journal' can see all messages.
      Pass -q to turn off this notice.
-- Logs begin at Sun 2021-07-11 20:54:21 MDT, end at Tue 2021-07-13 06:09:41 MDT. --
Jul 11 20:54:39 node0.testlab.cord-testdrive-pg0.apt.emulab.net logrotate[8847]: error: error renaming temp state file /var/lib/one/.logrotate.status.tmp
Jul 11 20:54:40 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[8875]: == Sinatra (v2.1.0) has taken the stage on 2474 for development with backup from Thin
Jul 12 05:43:45 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[8875]: == Sinatra has ended his set (crowd applauds)
Jul 12 05:43:45 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[8875]: 2021-07-11 20:54:40 -0600 Thin web server (v1.8.0 codename Possessed Pickle)
Jul 12 05:43:45 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[8875]: 2021-07-11 20:54:40 -0600 Maximum connections set to 1024
Jul 12 05:43:45 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[8875]: 2021-07-11 20:54:40 -0600 Listening on 0.0.0.0:2474, CTRL+C to stop
Jul 12 05:43:45 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[8875]: 2021-07-12 05:43:45 -0600 Stopping ...
Jul 12 05:45:27 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]: == Sinatra (v2.1.0) has taken the stage on 2474 for development with backup from Thin
Jul 12 22:23:27 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]: #<Thread:0x00005617d6edb1f8@/usr/lib/one/ruby/ActionManager.rb:198 run> terminated with exception (report_on_exception is true):
Jul 12 22:23:27 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]: /usr/lib/one/oneflow/lib/EventManager.rb:439:in `check_nodes_event': undefined local variable or method `vm_state' for #<EventManager:0x00005617d68df1
Jul 12 22:23:27 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]: Did you mean?  lcm_state
Jul 12 22:23:27 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]:                h_state
Jul 12 22:23:27 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]:         from /usr/lib/one/oneflow/lib/EventManager.rb:372:in `wait_report_ready'
Jul 12 22:23:27 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]:         from /usr/lib/one/oneflow/lib/EventManager.rb:107:in `wait_deploy_action'
Jul 12 22:23:27 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]:         from /usr/lib/one/ruby/ActionManager.rb:200:in `call'
Jul 12 22:23:27 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]:         from /usr/lib/one/ruby/ActionManager.rb:200:in `block in run_action'
Jul 13 05:53:10 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]: == Sinatra has ended his set (crowd applauds)
Jul 13 05:53:10 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]: 2021-07-12 05:45:27 -0600 Thin web server (v1.8.0 codename Possessed Pickle)
Jul 13 05:53:10 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]: 2021-07-12 05:45:27 -0600 Maximum connections set to 1024
Jul 13 05:53:10 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]: 2021-07-12 05:45:27 -0600 Listening on 0.0.0.0:2474, CTRL+C to stop
Jul 13 05:53:10 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[9614]: 2021-07-13 05:53:10 -0600 Stopping ...
Jul 13 05:53:11 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]: == Sinatra (v2.1.0) has taken the stage on 2474 for development with backup from Thin
Jul 13 05:58:57 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]: #<Thread:0x000055b19e30ac00@/usr/lib/one/ruby/ActionManager.rb:198 run> terminated with exception (report_on_exception is true):
Jul 13 05:58:57 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]: /usr/lib/one/oneflow/lib/EventManager.rb:439:in `check_nodes_event': undefined local variable or method `vm_state' for #<EventManager:0x000055b19e1ff
Jul 13 05:58:57 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]: Did you mean?  lcm_state
Jul 13 05:58:57 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]:                h_state
Jul 13 05:58:57 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]:         from /usr/lib/one/oneflow/lib/EventManager.rb:372:in `wait_report_ready'
Jul 13 05:58:57 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]:         from /usr/lib/one/oneflow/lib/EventManager.rb:107:in `wait_deploy_action'
Jul 13 05:58:57 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]:         from /usr/lib/one/ruby/ActionManager.rb:200:in `call'
Jul 13 05:58:57 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]:         from /usr/lib/one/ruby/ActionManager.rb:200:in `block in run_action'
Jul 13 06:09:41 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]: #<Thread:0x000055b19e1fe370@/usr/lib/one/oneflow/lib/LifeCycleManager.rb:95 run> terminated with exception (report_on_exception is true):
Jul 13 06:09:41 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]: /usr/lib/one/oneflow/lib/ServiceAutoScaler.rb:46:in `block in start': undefined method `[]' for nil:NilClass (NoMethodError)
Jul 13 06:09:41 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]:         from /usr/lib/one/oneflow/lib/ServiceAutoScaler.rb:37:in `loop'
Jul 13 06:09:41 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]:         from /usr/lib/one/oneflow/lib/ServiceAutoScaler.rb:37:in `start'
Jul 13 06:09:41 node0.testlab.cord-testdrive-pg0.apt.emulab.net ruby[25248]:         from /usr/lib/one/oneflow/lib/LifeCycleManager.rb:99:in `block in initialize'

VM instance :

oneadmin@node0:~$ onevm list
  ID USER     GROUP    NAME                                                                                                     STAT  CPU     MEM HOST                                                                        TIME
  34 oneadmin oneadmin server_0_(service_22)                                                                                    runn    1      2G node3                                                                   0d 00h50

Service visualize :

oneadmin@node0:~$ oneflow show 22
SERVICE 22 INFORMATION                                                          
ID                  : 22                  
NAME                : k3s-cluster         
USER                : oneadmin            
GROUP               : oneadmin            
STRATEGY            : straight            
SERVICE STATE       : DEPLOYING           
START TIME          : 07/13 05:54:54      

PERMISSIONS                                                                     
OWNER               : um-                 
GROUP               : ---                 
OTHER               : ---                 

ROLE server
ROLE STATE          : DEPLOYING           
VM TEMPLATE         : 3                   
CARDINALITY         : 1                   
MIN VMS             : 1                   
MAX VMS             : 1                   

NODES INFORMATION
 VM_ID NAME                     USER            GROUP          
    34 server_0_(service_22)    oneadmin        oneadmin

ROLE agent
ROLE STATE          : PENDING             
PARENTS             : server              
VM TEMPLATE         : 3                   
CARDINALITY         : 1                   
MIN VMS             : 1                   
MAX VMS             : 10                  

NODES INFORMATION
 VM_ID NAME                     USER            GROUP          

LOG MESSAGES                                                                    
07/13/21 05:54 [I] New state: DEPLOYING

Hi @htwalid

It seems that the server didn’t send the READY status and so oneflow will not deploy the agent VM.

Can you ssh to the server VM and execute the following:

onegate vm update --data READY=YES

and see if the server status change?

Thanks

Hi Marco,
thanks for your fast reply :slight_smile:
Unfortunately i couldn’t access server VM through ssh and i found errors in fireedge logs related to guacamole connections :

[2021-07-Jul 12, 2021 21:49:31] [Connection 1]  Closing connection with error:  1000
[2021-07-Jul 12, 2021 22:21:37] [Connection 2]  Closing connection with error:  1000
[2021-07-Jul 13, 2021 20:02:58] [Connection 3]  Closing connection with error:  1000
[2021-07-Jul 13, 2021 20:03:02] [Connection 4]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 02:21:31] [Connection 1]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 02:21:55] [Connection 2]  Closing connection with error:  Error: not opened
    at WebSocket.send (/usr/lib/one/fireedge/node_modules/ws/lib/WebSocket.js:218:38)
    at ClientConnection.send (/usr/lib/one/fireedge/node_modules/opennebula-guacamole/lib/ClientConnection.js:126:24)
    at GuacdClient.sendBufferToWebSocket (/usr/lib/one/fireedge/node_modules/opennebula-guacamole/lib/GuacdClient.js:172:35)
    at GuacdClient.processReceivedData (/usr/lib/one/fireedge/node_modules/opennebula-guacamole/lib/GuacdClient.js:163:14)
    at Socket.emit (events.js:314:20)
    at Socket.EventEmitter.emit (domain.js:483:12)
    at addChunk (_stream_readable.js:297:12)
    at readableAddChunk (_stream_readable.js:272:9)
    at Socket.Readable.push (_stream_readable.js:213:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:188:23)
[2021-07-Jul 14, 2021 02:24:57] [Connection 3]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 02:27:26] [Connection 4]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 02:43:16] [Connection 5]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 02:44:56] [Connection 6]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 02:45:35] [Connection 7]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 02:48:08] [Connection 8]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 02:48:11] [Connection 9]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 02:53:31] [Connection 10]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 03:00:16] [Connection 11]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 03:01:08] [Connection 12]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 03:01:56] [Connection 13]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 03:02:13] [Connection 14]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 03:24:52] [Connection 15]  Closing connection with error:  1000
[2021-07-Jul 14, 2021 03:44:54] [Connection 1]  Closing connection with error:  Error: not opened
    at WebSocket.send (/usr/lib/one/fireedge/node_modules/ws/lib/WebSocket.js:218:38)
    at ClientConnection.send (/usr/lib/one/fireedge/node_modules/opennebula-guacamole/lib/ClientConnection.js:126:24)
    at GuacdClient.sendBufferToWebSocket (/usr/lib/one/fireedge/node_modules/opennebula-guacamole/lib/GuacdClient.js:172:35)
    at GuacdClient.processReceivedData (/usr/lib/one/fireedge/node_modules/opennebula-guacamole/lib/GuacdClient.js:163:14)
    at Socket.emit (events.js:314:20)
    at Socket.EventEmitter.emit (domain.js:483:12)
    at addChunk (_stream_readable.js:297:12)
    at readableAddChunk (_stream_readable.js:272:9)
    at Socket.Readable.push (_stream_readable.js:213:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:188:23)
[2021-07-Jul 14, 2021 03:45:08] [Connection 2]  Closing connection with error:  1000