Version:

Published events are pending in the stream

Symptom

You publish events, but some of them are not received by the subscriber and stay pending in the stream.

Cause

When NATS EventingBackend has more than 1 replica and the Clustering property on the NATS Server is enabled, a leader-election is taking place on the stream and consumer levels ( see NATS Documentation). Once the leader is elected, all the messages are being replicated across the replicas.

Sometimes replicas can go out-of-sync with the other replicas. As a result of this, messages on some consumers can stop being acknowledged and start piling up in the stream.

Remedy

There are two ways of how to fix the "broken" consumers with pending messages. You will need either to trigger a leader reelection either on the consumers with pending messages or on the stream level.

Trigger consumer leader election

First, you need to find out which consumer(s) have pending messages. For that, you need the latest version of NATS cli installed on your machine. You can find the broken consumer in two ways: by using Grafana dashboard or directly by using the NATS cli command.

Find the broken consumers using Grafana dashboard

Access and Expose Grafana
Find the NATS JetStream Dashboard and check the pending messages
Find the consumer with pending messages and encode it as an md5 hash:

Click to copy

echo -n "tunas-testing/test-noapp3/kyma.noapp.order.created.v1" | md5

this shell command results in ebcabfe5c902612f0ba3ebde7653f30b.

Then, you need to find consumer's leader:

Click to copy

nats consumer info sap ebcabfe5c902612f0ba3ebde7653f30b

Click to copy

Information for Consumer sap > ebcabfe5c902612f0ba3ebde7653f30b created 2022-10-24T15:49:43+02:00
Configuration:
                Name: ebcabfe5c902612f0ba3ebde7653f30b
         Description: tunas-testing/test-noapp3/kyma.noapp.order.created.v1
          ...
Cluster Information:
                Name: eventing-nats
              Leader: eventing-nats-1 # that's what we need
             Replica: eventing-nats-0, current, seen 0.96s ago
             Replica: eventing-nats-2, current, seen 0.96s ago

You can see, that its leader is the eventing-nats-1 replica.

Find the broken consumers using the NATS cli

If you have NATS cli installed on your machine, you can simply run this shell script:

Click to copy

for consumer in $(nats consumer list -n sap) # sap is the stream name
do
  nats consumer info sap $consumer -j | jq -c '{name: .name, pending: .num_pending, leader: .cluster.leader}'
done

You must get the following output:

Click to copy

{"name":"ebcabfe5c902612f0ba3ebde7653f30b","pending":25,"leader":"eventing-nats-1"}
{"name":"c74c20756af53b592f87edebff67bdf8","pending":0,"leader":"eventing-nats-0"}

here you can see, that the consumer ebcabfe5c902612f0ba3ebde7653f30b has pending messages. The other one has no pending message and is successfully processing events.

Trigger the consumer leader reelection

Now, when we know the name of the broken consumer and its leader, we can trigger the reelection:

You must port-forward the leader replica and trigger the leader reelection for that broken consumer:

Click to copy

kubectl port-forward -n kyma-system eventing-nats-1 4222

Trigger the leader reelection:

Click to copy

nats consumer cluster step-down sap ebcabfe5c902612f0ba3ebde7653f30b

After execution, you see the following message:

Click to copy

New leader elected "eventing-nats-2"
Information for Consumer sap > ebcabfe5c902612f0ba3ebde7653f30b created 2022-10-24T15:49:43+02:00

You can check the consumer now and confirm that the pending messages started to be dispatched.

Restart the NATS Pods and trigger the stream leader reelection

Sometimes triggering the leader reelection on the broken consumers doesn't work. In that case, you must try to trigger leader reelection on the stream level:

Click to copy

nats stream cluster step-down sap

You must get the following result:

Click to copy

11:08:22 Requesting leader step down of "eventing-nats-1" in a 3 peer RAFT group
11:08:23 New leader elected "eventing-nats-0"
Information for Stream sap created 2022-10-24 15:47:19
             Subjects: kyma.>
             Replicas: 3
              Storage: File