Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinstalling the new custom pack is not getting rendered on the Web UI #315

Open
Bindu-yadav8 opened this issue Jun 6, 2022 · 31 comments
Open
Labels
question Further information is requested

Comments

@Bindu-yadav8
Copy link

We have uninstalled one of the custom pack, and reinstalled it by using the command st2 pack install (File path of all the actions, rules, workflows).

By running this command, the files were installed on the K8 cluster. However, On the Stackstorm GUI, the workflows, actions are not rendering properly. The pack is still showing the older actions even after uninstalling the pack.

Uninstalled the custom pack:
image

Reinstalled the same custom pack:
image

We are not able to see the new actions on the K8 instance Web UI, and the edit parameters, executions options not visible on the Right Hand side of the Actions tab:
image

Is this a know issue? How to enable the options on the RHS of the Action Tab?

@Bindu-yadav8
Copy link
Author

@armab Could you please look into this issue?

@cognifloyd
Copy link
Member

You have not answered the question here:
#313 (comment)

This issue and #313 probably have the same cause. To help you we need to know how you have configured packs. In particular what are your values for st2.packs.volumes and st2.packs.images?

@Bindu-yadav8
Copy link
Author

@cognifloyd @armab We have used Shared Volumes to install the custom packs as per the documentation.
https://docs.stackstorm.com/install/k8s_ha.html#custom-st2-packs

image

One of the API Calls is showing Response 404. We have not used any Custom APIs apart from default StackStorm APIs.

@Bindu-yadav8
Copy link
Author

[email protected] - Please look into this

@Bindu-yadav8
Copy link
Author

image
While trying to run the action from the Custom pack, it is giving error. However, the action from in-built pack "core" is getting executed.

st2 run cnas.trigger_notification is the action from custom pack cnas.

st2 run core.local cmd="echo 'OK'" - is working without any error

@cognifloyd
Copy link
Member

Again: please show me your values. Feel free to replace any secrets with ****.

@amanda11
Copy link

amanda11 commented Jun 7, 2022

Looking at the command you are running then you need to be careful about your use of '.

You look like you are trying to send through a parmetr of message with a json string of {'title':'..', ... }. But as your outer quote is the same type as the inner ones then it's going to get all a bit messed up.

You get away with it on the core.local as sending cmd='echo 'OK'' - is actually treated as the concatenaton of strings:

  • 'echo '
  • OK
  • ''

Hence why in the parameters to the first call, it says the cmd passed was echo OK, rather than echo 'OK'.

So I'd suggest using "" as the outer quotes, so that the single quotes are kept.

@Bindu-yadav8
Copy link
Author

Again: please show me your values. Feel free to replace any secrets with ****.

@cognifloyd Here is the vaule.yaml file for your reference below. We have used Shared Volumes for the custom pack. Initially, when we tried to install the custom pack CNAS, the pack and its underlying actions/workflows got registered. However, when we are trying to update the pack, (by uninstalling the previous pack and reinstalling the cnas pack). The files are not getting rendered properly on the UI, it still showing the older files. You can see the snapshots above

image

@Bindu-yadav8
Copy link
Author

Looking at the command you are running then you need to be careful about your use of '.

You look like you are trying to send through a parmetr of message with a json string of {'title':'..', ... }. But as your outer quote is the same type as the inner ones then it's going to get all a bit messed up.

You get away with it on the core.local as sending cmd='echo 'OK'' - is actually treated as the concatenaton of strings:

  • 'echo '
  • OK
  • ''

Hence why in the parameters to the first call, it says the cmd passed was echo OK, rather than echo 'OK'.

So I'd suggest using "" as the outer quotes, so that the single quotes are kept.

@amanda11 Thanks for the suggestion. Yes, I gave the parameter with double quotes and tried executing the action. But it failed.

image

@cognifloyd
Copy link
Member

Ok, so you are using persistentVolumeClaims for the packs and virtualenvs volumes. Could you show the values for the virtualenvs volume and the configs volume as well?

Can also share your persistentVolume resource definition? (You would have created this outside the chart)

@Bindu-yadav8
Copy link
Author

@cognifloyd Here is the values for the virtualenvs volume and the configs volume:

image

Also, the persistentVolume resource definition outside the chart:

image

@cognifloyd
Copy link
Member

Access Mode: RWO is probably the source of your issues.

RWO, or ReadWriteOnce: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

the volume can be mounted as read-write by a single node. ReadWriteOnce access mode still can allow multiple pods to access the volume when the pods are running on the same node.

Are all of you pods on the same node? If not, then only the pods that are on the node with the read/write mount will be able to successfully modify the files. Plus, the read/write mount for virtualenvs and packs could be on different nodes, leading to partially successful pack installs.

At a minimum, you need a PVC+PV with ReadWriteMany access mode.

@arm4b arm4b added the question Further information is requested label Jun 9, 2022
@arm4b
Copy link
Member

arm4b commented Jun 13, 2022

Good find @cognifloyd 💯

While https://github.com/StackStorm/stackstorm-k8s#method-2-shared-volumes mentions the use of NFS shares as an example (that provide RWX capabilities under the hood), looks like it would make sense to mention explicitly the ReadWriteMany requirements in the Readme for this type of sharing the pack content.

@Bindu-yadav8 could you please create a PR against the https://github.com/StackStorm/stackstorm-k8s#method-2-shared-volumes and mention that somewhere? It would definitely help others.

@Bindu-yadav8
Copy link
Author

@cognifloyd @armab We tried to delete the existing PVC in order to reapply the changes, but the status of PVC is in "terminating" state as shown below:

We also tried to check for volume attachments and tried to patch the PVC considering it a fix for the issue but this didn't work.

MicrosoftTeams-image (13)

@arm4b
Copy link
Member

arm4b commented Jun 13, 2022

@Bindu-yadav8 Do you have a storage class backed by the infrastructure that supports ReadWriteMany?
Many means its a distributed file-system shared across the nodes. It's not just usual volumes. You need something like Ceph or NFS or similar.

See the table here with different providers: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

@arms11
Copy link
Contributor

arms11 commented Jun 13, 2022

@armab good catch.
Above in the screenshot I see this is AzureDisk. As per the link you have placed, it is not offering the capability. So, appropriate StorageClass will need to be configured for the PVC for this to work.

@Bindu-yadav8
Copy link
Author

@armab We are using Azure Disk

MicrosoftTeams-image (14)

@arms11
Copy link
Contributor

arms11 commented Jun 14, 2022

@Bindu-yadav8 as suggested please consider using different storageclass that supports the readwritemany feature.

Perhaps below could help:
https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv

@Bindu-yadav8
Copy link
Author

@armab @cognifloyd Need your assistance in deleting the existing PVC since its in terminating state from long time now.

MicrosoftTeams-image (13)

@Bindu-yadav8
Copy link
Author

Hi @armab @cognifloyd,

We have added NFS share for packs, virtual environment and configs. We connected to one of the pods "st2client" and inside the pod, we cloned our custom pack repository. While installing the custom pack we provided the command inside the client pod

stackstorm-ha-st2client-pod- st2 pack install file://custompack

However, we are seeing the below error:

stderr: "st2.actions.python.St2RegisterAction: DEBUG Calling client method "register" with kwargs "{'types': ['all'], 'packs': ['cnas']}"
Traceback (most recent call last):
File "/opt/stackstorm/st2/lib/python3.8/site-packages/python_runner/python_action_wrapper.py", line 395, in
obj.run()
File "/opt/stackstorm/st2/lib/python3.8/site-packages/python_runner/python_action_wrapper.py", line 214, in run
output = action.run(**self._parameters)
File "/opt/stackstorm/packs/packs/actions/pack_mgmt/register.py", line 76, in run
result = self._run_client_method(
File "/opt/stackstorm/packs/packs/actions/pack_mgmt/register.py", line 155, in _run_client_method
result = method(**method_kwargs)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 45, in decorate
return func(*args, **kwargs)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 684, in register
self.handle_error(response)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 218, in handle_error
response.raise_for_status()
File "/opt/stackstorm/st2/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request
MESSAGE: Failed to register pack "cnas": Pack "/opt/stackstorm/packs/cnas" is missing pack.yaml file for url: http://stackstorm-ha-st2api:9101/v1/packs/register
"
stdout: ''

Custom pack install error

Even when we place the custom pack in the shared NFS location, and did a POST call to Stackstorm API to install the pack, we are seeing the same error. Changes in custom pack are not reflecting in the Web UI. It is still showing the old installations

API: POST METHOD- https://api.stackstorm.com/api/v1/packs/#/packs_controller.install.post

@cognifloyd
Copy link
Member

The key piece of that error message is:

MESSAGE: Failed to register pack "cnas": Pack "/opt/stackstorm/packs/cnas" is missing pack.yaml 

Is there a pack.yaml file in /opt/stackstorm/packs/cnas?

@Kapildev2018
Copy link

@cognifloyd we had the pack.yaml before issuing the command st2 pack install, after issuing the command we see pack.yaml is not there.

@arm4b
Copy link
Member

arm4b commented Sep 16, 2022

I see you used st2 pack install file://custompack based on a local pack.
Do you see the same issue if you install pack from the official stackstorm exchange or maybe using a custom git repository?

@Kapildev2018
Copy link

Kapildev2018 commented Sep 16, 2022

I see you used st2 pack install file://custompack based on a local pack. Do you see the same issue if you install pack from the official stackstorm exchange or maybe using a custom git repository?

Yes when I tried to install email pack from stackstorm below error is coming.

2022-09-16 13:36:22,808 ERROR [-] [63247bb3e1a857d29fce672c] Workflow execution completed with errors.
2022-09-16 13:36:22,815 ERROR [-] [63247bb3e1a857d29fce672c] {'type': 'error', 'message': 'Execution failed. See result for details.', 'task_id': 'register_pack', 'result': {'stdout': '', 'stderr': 'st2.actions.python.St2RegisterAction: DEBUG    Calling client method "register" with kwargs "{\'types\': [\'all\'], \'packs\': [\'email\']}"\nTraceback (most recent call last):\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/python_runner/python_action_wrapper.py", line 395, in <module>\n    obj.run()\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/python_runner/python_action_wrapper.py", line 214, in run\n    output = action.run(**self._parameters)\n  File "/opt/stackstorm/packs/packs/actions/pack_mgmt/register.py", line 76, in run\n    result = self._run_client_method(\n  File "/opt/stackstorm/packs/packs/actions/pack_mgmt/register.py", line 155, in _run_client_method\n    result = method(**method_kwargs)\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 45, in decorate\n    return func(*args, **kwargs)\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 684, in register\n    self.handle_error(response)\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2client/models/core.py", line 218, in handle_error\n    response.raise_for_status()\n  File "/opt/stackstorm/st2/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status\n    raise HTTPError(http_error_msg, response=self)\nrequests.exceptions.HTTPError: 400 Client Error: Bad Request\nMESSAGE: Failed to register pack "email": Pack "/opt/stackstorm/packs/email" is missing pack.yaml file for url: http://stackstorm-ha-st2api:9101/v1/packs/register\n', 'exit_code': 1, 'result': 'None'}}

@arm4b
Copy link
Member

arm4b commented Sep 19, 2022

Yeah, same message with missing pack.yaml.

Can you check the content of the /opt/stackstorm/packs/email if pack.yaml is present or not ?
Also verify if the contents of that dir are in sync across all the pods like st2api and st2actionrunners.

@Kapildev2018
Copy link

@armab & @cognifloyd we see default packs in st2api, st2actionrunner & st2 client pods at /opt/stackstorm/packs. But no email or cnas folder there.

We see some log like amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - home node 'rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local' of durable queue 'st2.preinit' in vhost '/' is down or inaccessible
.

But when we did a curl from client pod to rabitmq endpoint we got rabitmq management page. Please suggest.
email_Pack
StackstormClient
AllPods
WebInterface
Rabitmq

@arm4b
Copy link
Member

arm4b commented Sep 20, 2022

Check the RabbitMQ logs for any errors.

Also, could you provide more information about your K8s cluster?
The version, setup, resources behind the cluster. Is it a bare-metal or cloud environment?

@Kapildev2018
Copy link

Kapildev2018 commented Sep 22, 2022

@armab & @cognifloyd, we checked the RabbitMQ logs & cluster status. Cluster status seems running from all the 3 pods, ping & erlang_cookie_sources listed too. But log in the rabbitmq pods has below errors.

Pod Logs=======================

** Last message in was emit_stats
** When Server state == {q,{amqqueue,{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>},true,false,none,[],<0.1770.0>,[],[],[],[{vhost,<<"/">>},{name,<<"ha">>},{pattern,<<>>},{'apply-to',<<"all">>},{definition,[{<<"ha-mode">>,<<"all">>},{<<"ha-sync-batch-size">>,10},{<<"ha-sync-mode">>,<<"automatic">>}]},{priority,0}],undefined,[{<0.1773.0>,<0.1770.0>}],[],live,0,[],<<"/">>,#{user => <<"admin">>},rabbit_classic_queue,#{}},none,true,rabbit_mirror_queue_master,{state,{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>},<0.1773.0>,<0.1772.0>,rabbit_priority_queue,{passthrough,rabbit_variable_queue,{vqstate,{0,{[],[]}},{0,{[],[]}},{delta,undefined,0,0,undefined},{0,{[],[]}},{0,{[],[]}},0,{0,nil},{0,nil},{0,nil},{qistate,"/bitnami/rabbitmq/mnesia/rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/6OG9XBWUJYQSYMSWOI75W46QV",{#{},[]},undefined,0,32768,#Fun<rabbit_variable_queue.10.100744432>,#Fun<rabbit_variable_queue.11.100744432>,{0,nil},{0,nil},[],[],{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>}},{{client_msstate,<0.519.0>,<<103,101,208,145,209,39,162,172,115,108,112,70,114,234,217,68>>,#{},{state,#Ref<0.1326239266.4150919169.134558>,"/bitnami/rabbitmq/mnesia/rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent"},rabbit_msg_store_ets_index,"/bitnami/rabbitmq/mnesia/rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent",<0.522.0>,#Ref<0.1326239266.4150919169.134559>,#Ref<0.1326239266.4150919169.134555>,#Ref<0.1326239266.4150919169.134560>,#Ref<0.1326239266.4150919169.134561>,{4000,800}},{client_msstate,<0.515.0>,<<63,232,113,194,157,163,163,40,156,37,...>>,...}},...}},...},...}
** Reason for termination ==
** {{badmatch,{error,not_found}},[{rabbit_amqqueue_process,i,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1151}]},{rabbit_amqqueue_process,'-infos/2-fun-0-',4,[{file,"src/rabbit_amqqueue_process.erl"},{line,1070}]},{lists,foldr,3,[{file,"lists.erl"},{line,1276}]},{rabbit_amqqueue_process,emit_stats,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1177}]},{rabbit_amqqueue_process,handle_info,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1683}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]}
2022-09-19 10:59:00.978 [error] <0.4526.5> Restarting crashed queue '**st2.sensor.watch.sensor_container**-1640d834e2' in vhost '/'.
2022-09-19 10:59:00.978 [info] <0.1770.0> [{initial_call,{rabbit_prequeue,init,['Argument__1']}},{pid,<0.1770.0>},{registered_name,[]},{error_info,{exit,{{badmatch,{error,not_found}},[{rabbit_amqqueue_process,i,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1151}]},{rabbit_amqqueue_process,'-infos/2-fun-0-',4,[{file,"src/rabbit_amqqueue_process.erl"},{line,1070}]},{lists,foldr,3,[{file,"lists.erl"},{line,1276}]},{rabbit_amqqueue_process,emit_stats,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1177}]},{rabbit_amqqueue_process,handle_info,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1683}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]},[{gen_server2,terminate,3,[{file,"src/gen_server2.erl"},{line,1183}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]}},{ancestors,[<0.1769.0>,<0.526.0>,<0.510.0>,<0.509.0>,rabbit_vhost_sup_sup,rabbit_sup,<0.269.0>]},{message_queue_len,0},{messages,[]},{links,[<0.1772.0>,<0.1769.0>]},{dictionary,[{process_name,{rabbit_amqqueue_process,{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>}}},{guid,{{1072198082,2644747048,2619673130,4006597486},1}},{rand_seed,{#{jump => #Fun<rand.3.8986388>,max => 288230376151711743,next => #Fun<rand.2.8986388>,type => exsplus},[14465161831692881|57586813653482372]}},{{ch,<0.1767.0>},{cr,<0.1767.0>,#Ref<0.1326239266.4150788097.144685>,{0,{[],[]}},1,{queue,[],[],0},{qstate,<0.1766.0>,dormant,{0,nil}},0}}]},{trap_exit,true},{status,running},{heap_size,2586},{stack_size,27},{reductions,22278}], [{neighbour,[{pid,<0.1773.0>},{registered_name,[]},{initial_call,{gm,init,['Argument__1']}},{current_function,{gen_server2,process_next_msg,1}},{ancestors,[<0.1772.0>,<0.1770.0>,<0.1769.0>,<0.526.0>,<0.510.0>,<0.509.0>,rabbit_vhost_sup_sup,rabbit_sup,<0.269.0>]},{message_queue_len,0},{links,[<0.1772.0>]},{trap_exit,false},{status,waiting},{heap_size,376},{stack_size,9},{reductions,76511681},{current_stacktrace,[{gen_server2,process_next_msg,1,[{file,"src/gen_server2.erl"},{line,685}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}]},{neighbour,[{pid,<0.1772.0>},{registered_name,[]},{initial_call,{rabbit_mirror_queue_coordinator,init,['Argument__1']}},{current_function,{erlang,hibernate,3}},{ancestors,[<0.1770.0>,<0.1769.0>,<0.526.0>,<0.510.0>,<0.509.0>,rabbit_vhost_sup_sup,rabbit_sup,<0.269.0>]},{message_queue_len,0},{links,[<0.1770.0>,<0.1773.0>]},{trap_exit,false},{status,waiting},{heap_size,240},{stack_size,0},{reductions,339},{current_stacktrace,[]}]}]
2022-09-19 10:59:00.978 [error] <0.1770.0> **CRASH REPORT Process** <0.1770.0> with 2 neighbours exited with reason: no match of right hand value {error,not_found} in rabbit_amqqueue_process:i/2 line 1151 in gen_server2:terminate/3 line 1183
2022-09-19 10:59:00.979 [error] <0.1769.0> Supervisor {<0.1769.0>,rabbit_amqqueue_sup} had child rabbit_amqqueue started with rabbit_prequeue:start_link({amqqueue,{resource,<<"/">>,queue,<<"st2.sensor.watch.sensor_container-1640d834e2">>},true,false,...}, declare, <0.1768.0>) at <0.1770.0> exit with reason no match of right hand value {error,not_found} in rabbit_amqqueue_process:i/2 line 1151 in context child_terminated
2022-09-19 10:59:00.979 [error] <0.1761.0> Error on AMQP connection <0.1761.0> (10.244.1.20:45060 -> 10.244.0.39:5672, vhost: '/', user: 'admin', state: running), channel 1:
 {{{{badmatch,{error,not_found}},
   [{rabbit_amqqueue_process,i,2,
                             [{file,"src/rabbit_amqqueue_process.erl"},
                              {line,1151}]},
    {rabbit_amqqueue_process,'-infos/2-fun-0-',4,
                             [{file,"src/rabbit_amqqueue_process.erl"},
                              {line,1070}]},
    {lists,foldr,3,[{file,"lists.erl"},{line,1276}]},
    {rabbit_amqqueue_process,emit_stats,2,
                             [{file,"src/rabbit_amqqueue_process.erl"},
                              {line,1177}]},
    {rabbit_amqqueue_process,handle_info,2,
                             [{file,"src/rabbit_amqqueue_process.erl"},
                              {line,1683}]},
    {gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},
    {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]},
  {gen_server2,call,
               [<0.1770.0>,
                {basic_cancel,<0.1767.0>,<<"None1">>,
                              {'basic.cancel_ok',<<"None1">>},
                              <<"admin">>},
                infinity]}},
 [{gen_server2,call,3,[{file,"src/gen_server2.erl"},{line,346}]},
  {rabbit_amqqueue,basic_cancel,6,
                   [{file,"src/rabbit_amqqueue.erl"},{line,1770}]},
  {rabbit_misc,with_exit_handler,2,[{file,"src/rabbit_misc.erl"},{line,532}]},
  {rabbit_channel,handle_method,3,
                  [{file,"src/rabbit_channel.erl"},{line,1541}]},
  {rabbit_channel,handle_cast,2,[{file,"src/rabbit_channel.erl"},{line,630}]},
  {gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},
  {proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]}
2022-09-19 10:59:00.979 [error] <0.1767.0> ** Generic server <0.1767.0> terminating

Cluster Status========================================
$ rabbitmq-diagnostics check_running
Checking if RabbitMQ is running on node rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local ...
RabbitMQ on node rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local is fully booted and running
$ rabbitmq-diagnostics ping
Will ping rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.
Ping succeeded
$ rabbitmqctl cluster_status
Cluster status of node rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local ...
Basics

Cluster name: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local

Disk Nodes

rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local
rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local
rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local

Running Nodes

rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local
rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local
rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local

Versions

rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local: RabbitMQ 3.8.9 on Erlang 22.3
rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local: RabbitMQ 3.8.9 on Erlang 22.3
rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local: RabbitMQ 3.8.9 on Erlang 22.3

Maintenance status

Node: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, status: not under maintenance
Node: rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, status: not under maintenance
Node: rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@stackstorm-ha-rabbitmq-1.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@stackstorm-ha-rabbitmq-2.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: drop_unroutable_metric, state: disabled
Flag: empty_basic_get_metric, state: disabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: virtual_host_metadata, state: enabled
$ rabbitmq-diagnostics erlang_cookie_sources
Listing Erlang cookie sources used by CLI tools...
Cookie File

Effective user: (none)
Effective home directory: /opt/bitnami/rabbitmq/.rabbitmq
Cookie file path: /opt/bitnami/rabbitmq/.rabbitmq/.erlang.cookie
Cookie file exists? true
Cookie file type: regular
Cookie file access: read
Cookie file size: 41

Cookie CLI Switch

--erlang-cookie value set? false
--erlang-cookie value length: 0

Env variable  (Deprecated)

RABBITMQ_ERLANG_COOKIE value set? false
RABBITMQ_ERLANG_COOKIE value length: 0

We are using K8s cluster in Azure & kubernetes version is 1.22.6 with 6 nodes of Ubuntu 18.04.6 LTS & kernel version 5.4.0-1089-azure.

Please suggest.

@cognifloyd
Copy link
Member

Can you share your rabbitmq helm values?

@Kapildev2018
Copy link

Kapildev2018 commented Sep 26, 2022

Can you share your rabbitmq helm values?

@cognifloyd & @armab ,

We have the following helm values or configurations for Rabbitmq. We had the following values commented. Did uncomment & upgraded the stackstorm deployment. Still same issue pack installation is not progressing.

## RabbitMQ configuration (3rd party chart dependency)
##
## For values.yaml reference:
## https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq
##
rabbitmq:
  # Change to `false` to disable in-cluster rabbitmq deployment.
  # Specify your external [messaging] connection parameters under st2.config
  enabled: true
  clustering:
    # On unclean cluster restarts forceBoot is required to cleanup Mnesia tables (see: https://github.com/helm/charts/issues/13485)
    # Use it only if you prefer availability over integrity.
    forceBoot: true
  # Authentication Details
  auth:
    username: admin
    # TODO: Use default random 10 character password, but need to fetch this string for use by downstream services
    password: 9jS+w1u07NbHtZke1m+jW4Cj
    # Up to 255 character string, should be fixed so that re-deploying the chart does not fail (see: https://github.com/helm/charts/issues/12371)
    # NB! It's highly recommended to change the default insecure rabbitmqErlangCookie value!
    erlangCookie: 8MrqQdCQ6AQ8U3MacSubHE5RqkSfvNaRHzvxuFcG
  # RabbitMQ Memory high watermark. See: http://www.rabbitmq.com/memory.html
  # Default values might not be enough for StackStorm deployment to work properly. We recommend to adjust these settings for you needs as well as enable Pod memory limits via "resources".
  rabbitmqMemoryHighWatermark: 512MB
  rabbitmqMemoryHighWatermarkType: absolute
  persistence:
    enabled: true
  # Enable Queue Mirroring between nodes
  # See https://www.rabbitmq.com/ha.html
  # This code block is commented out waiting for
  # https://github.com/bitnami/charts/issues/4635
  loadDefinition:
    enabled: true
    existingSecret: "{{ .Release.Name }}-rabbitmq-definitions"
  extraConfiguration: |
    load_definitions = /app/rabbitmq-definitions.json
    # We recommend to set the memory limit for RabbitMQ-HA Pods in production deployments.
    # Make sure to also change the rabbitmqMemoryHighWatermark following the formula:
    rabbitmqMemoryHighWatermark = 0.4 * resources.limits.memory
  resources: {}
  # number of replicas in the rabbit cluster
  replicaCount: 3
  # As RabbitMQ enabled prometheus operator monitoring by default, disable it for non-prometheus users
  metrics:
    enabled: false

=========================================================
Some logs of stackstorm

2022-09-25 05:18:12,846 INFO [-] 99e478fd-ee37-4b06-9aec-d91772c95dec - GET / with query={} (method='GET',path='/',remote_addr='10.244.4.4',query={},request_id='99e478fd-ee37-4b06-9aec-d91772c95dec')
2022-09-25 05:18:12,847 INFO [-] 99e478fd-ee37-4b06-9aec-d91772c95dec - 404 50 0.635ms (method='GET',path='/',remote_addr='10.244.4.4',status=404,runtime=0.635,content_length=50,request_id='99e478fd-ee37-4b06-9aec-d91772c95dec')
2022-09-25 05:25:48,426 INFO [-] 3fd71e07-480e-42b6-8806-e9c92fe42e5e - GET / with query={} (method='GET',path='/',remote_addr='10.244.4.4',query={},request_id='3fd71e07-480e-42b6-8806-e9c92fe42e5e')
2022-09-25 05:25:48,426 INFO [-] 3fd71e07-480e-42b6-8806-e9c92fe42e5e - 404 50 0.579ms (method='GET',path='/',remote_addr='10.244.4.4',status=404,runtime=0.579,content_length=50,request_id='3fd71e07-480e-42b6-8806-e9c92fe42e5e')
2022-09-25 05:37:48,587 INFO [-] ad61af77-a62e-4ee4-9844-a09dfdc4268a - GET / with query={} (method='GET',path='/',remote_addr='10.244.4.4',query={},request_id='ad61af77-a62e-4ee4-9844-a09dfdc4268a')
2022-09-25 05:37:48,588 INFO [-] ad61af77-a62e-4ee4-9844-a09dfdc4268a - **404** 50 0.531ms (method='GET',path='/',remote_addr='10.244.4.4',status=404,runtime=0.531,content_length=50,request_id='ad61af77-a62e-4ee4-9844-a09dfdc4268a')

2022-09-26 05:17:02,834 INFO [-] Sleeping for 600 seconds before next garbage collection...
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/connection.py", line 535, in on_inbound_method
    return self.channels[channel_id].dispatch_method(
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/abstract_channel.py", line 143, in dispatch_method
    listener(*args)
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/channel.py", line 277, in _on_close
    raise error_for_code(
amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - home node 'rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local' of durable queue 'st2.preinit' in vhost '/' is down or inaccessible
2

2022-09-23 11:56:33,564 INFO [-] The status of action execution is changed from requested to scheduled. <LiveAction.id=632d9ef128e8de20488a169e, ActionExecution.id=632d9ef128e8de20488a169f>
    return self.channels[channel_id].dispatch_method(
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/abstract_channel.py", line 143, in dispatch_method
    listener(*args)
  File "/opt/stackstorm/st2/lib/python3.8/site-packages/amqp/channel.py", line 277, in _on_close
    raise error_for_code(
amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - home node 'rabbit@stackstorm-ha-rabbitmq-0.stackstorm-ha-rabbitmq-headless.hasstorm.svc.cluster.local' of durable queue **'st2.preinit' in vhost '/' is down or inaccessible**
2

@Kapildev2018
Copy link

Hi @cognifloyd & @armab, is RabbitMQ helm values fine or any change required, can you please suggest us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants