View all talks

NATS 2.11 - What's New?

Gain deeper insights and enhanced control over your NATS deployments with new capabilities in NATS Server v2.11

In his overview of NATS 2.11, Neil from Synadia introduced key features aimed at improving system visibility, enabling new development patterns, and enhancing reliability.

Highlights include distributed message tracing for clear insights into complex message paths, per-message TTLs that optimize message lifecycle management, and consumer priority groups designed for smarter load balancing and resilient message handling.

Additional improvements like consumer pause simplify maintenance operations, while multi-message get enables efficient retrieval of stream data without dedicated consumers, laying groundwork for future capabilities.

“In order to use the distributed message tracing, you publish a message on whatever the subject is that you're interested in with a trace header. And our trace header instructs servers that are handling the message to feed back information about what they're going to do with that message or where they're going to deliver it.

So you can do tracing whilst also delivering the message, but if you just want to find out where it's gonna go as a dry run, you can also do that as well without the remote applications ever being aware that it took place”

— Neil Twigg, Synadia

Go Deeper

Full Transcript

Neil Twigg:

Hey. My name's Neil, and I'm the lead developer on the server at Synadia. And I'd like to spend just a little bit of time today talking about the new two eleven server release. Now we've spent quite a bit of time on this release, and we focused in three main areas, which are improving visibility about what the NATS server is doing, enabling new patterns for users and developers, and improving correctness, making sure the system is resilient against failures. Now we have quite a list of new features, and I highly encourage you to go and have a look at the release notes and the change log to find out what all of them are.

But I'd like to call out and introduce five headline features today, which are the distributed message tracing, per message TTLs, consumer pause, pull consumer priority groups, and stream multi get. And I'd like to just take you on a very quick tour about what these five things are for and why you would use them. Now the distributed message tracing is simply designed to answer the question, what happens to a message when you publish it into a NATS system? Because most NATS systems are not single servers. Quite often, they're clusters that have got routes or they're super clusters with gateways.

There can be leaf nodes extending the topology. You can have imports and exports over account boundaries, and you can also have subject transforms happening in some of these places as well. In order to use the distributed message tracing, you publish a message on whatever the subject is that you're interested in with a trace header. And our trace header instructs servers that are handling the message to feed back information about what they're going to do with that message or where they're going to deliver it. Now, we've also added in support to the NATS CLI for this as well.

So you can just do NATS trace subject, and you'll basically get a printout like doing an IP traceroute normally, where it says your message has been here, it's gonna be sent to this connection and to this server, and it's gonna go across this import export boundary. Basically giving you the story of what's actually happening to the message. Now this is not just for server operators. This is also for users of an NATS system, developers who want to understand what's actually happening. Furthermore, you can do this without actually finally delivering the message to the application.

So you can do tracing whilst also delivering the message, but if you just want to find out where it's gonna go as a dry run, you can also do that as well without the remote applications ever being aware that it took place.

Per message TTLs are the first time that we've been able to introduce a limit that applies only to individual messages rather than to the entire stream. So today, you can have stream limits such as max number of messages, max number of bytes in the stream, max number of messages per subject, and so on. But you can now also set the NATS TTL header when publishing into a stream with TTLs enabled that allows you to age out individual messages after that period of time. This is quite a powerful construct actually because those messages don't have to have a TTL.

It is optional. And they don't have to have the same TTL as other messages that you've just published. Now one place that we're using this or will be using this is in the KV abstraction. Because if you're using deletes quite heavily in a k v bucket at the moment, you'll probably be aware of those delete markers building up, and that can make the stream underlying the KV bucket quite big and quite expensive to operate on. We will be changing those delete markers to use TTLs so that even if you have KV watchers looking for deletes, you'll still be able to get those notifications, but those delete markers aren't gonna stay in the stream forever, which will reduce the need for compaction operations taking place manually.

Pull consumer priority groups are a new feature designed to make it possible to control the behavior around having multiple clients pulling from a single consumer, which isn't possible today in 2.10. If you wanted to be able to control messages overflowing, for example, from one client to another when either the one is not keeping up with demand or has just become unavailable for some reason, then priority groups will allow you to set policies to overflow based on, for example, the number of pending messages on the consumer, or the number of pending acts that are outstanding.

We'll be extending this functionality more in the future to allow more elaborate policies, but at the moment, you'll be able to pin clients and overflow based on these two controls. Consumer pause is another feature which actually will be more useful during maintenance windows because it allows you to suspend delivery of messages to a consumer without the application being aware that something has happened. The way this works is is by specifying a pause deadline into the consumer config.

So you just do this as a normal consumer update. And if you are before that deadline, message delivery is suspended. Now this is exactly the same if it's a pull consumer or a push consumer. And when you reach that deadline, the message delivery just resumes automatically. The important part of this though is that the consumer heartbeats continue to be sent to the client, so you won't get unexpected errors being surfaced from your client SDKs because the consumer is paused.

This should make it much easier to manage deployment operations, migrations, or application maintenance, where you don't want to interleave with accidentally pulling messages and processing them while you're trying to do something. And then finally, the last feature that I'd like to call out is around the stream multi-get. Now this is for applications that aren't using consumers, but instead are trying to pull messages from streams directly. And the abilities introduced with stream multi gets are that you can provide multiple subject filters. You can provide additional bounds such as only up to a certain sequence number or time, and you'll receive a batch of messages back from the system in one go in sequence order.

Now this is the first of a number of batch operations that we will be introducing into the server in the future. Now this has just been a very quick overview of some of the headline features, and there are still more features to see as well. Take a look at the release notes on the NATS docs or the GitHub release, which is available now for the full changelog. I'll be around in a little while to answer questions at the q and a session, so make sure to queue some up. Apart from that, I hope you're enjoying RethinkConn, and thank you so much for joining us.