NATS Weekly #6

Week of December 20 - 26, 2021

⚠️ This is part one of this week since I am a little behind reviewing all of the Slack conversations for the “Recently asked questions” section due to the holiday. I intend to publish #6.5 in a day or two. There are a lot of great questions so I want to be sure to include them!

🗞 Announcements, writings, and projects

A short list of announcements, blog posts, projects updates and other news.

⚡Official releases

nats-kafka - v1.1.0
nats.ex - v1.4.0 (links to a commit since a tag has been created yet)

⚙️ Community projects

graphrpc - RPC with GraphQL over NATS and JetStream

📖 Books and articles

Support Chanaka Fernando and order their book!

Seasonal offer from @PacktPub. My book and many other books are offered at a discounted price. #books #microservices @nats_io @PacktAuthors https://t.co/ZPcmKewjaW— Chanaka Fernando (@chanakaudaya) December 22, 2021

💡 Recently asked questions

Questions sourced from Slack, Twitter, or individuals. Responses and examples are in my own words, unless otherwise noted.

Is there a max length to a NATS subject?

Quoting core maintainer @wallyqs directly.

It has to be within the limit of the max control line, so would be less than 4K.

What is the max control line? From the docs:

Maximum length of a protocol line (including combined length of subject and queue group). Increasing this value may require client changes to be used. Applies to all traffic.

This does not include the message payload itself which can be up to 1MB in size by default (see the max_payload limit right below the max_control_line limit in the docs link above).

The control line is essentially the first part of the message that gives the server information for routing. The payload follows the control line.

Given this context and back to the original question, subject lengths can be quite long if absolutely required. However, like with any data, adding more bytes require more bandwidth and memory, so keep subject names sufficiently descriptive, but not verbose.

Does NATS use a connection pool?

Using Go as an example, NATS client connections are established via the Connect function which takes one or more server URLs (as a string).

nc, err := nats.Connect("localhost:4222,localhost:4223,localhost:4224")

Given the connection value nc, we can now create multiple subscriptions as well as publish messages using this connection. This is shown in the Basic Usage section of the nats.go docs.

The NATS protocol is implemented using TCP as the transport layer. So the question is, does the client transparently establish separate TCP connections under the hood if there are concurrent subscriptions created or publishers in use?

It does not! There is one TCP connection per nats.Connect call. All subscriptions and publishers are multiplexed over this single TCP connection. For one example, here is the rough call stack of a subscription:

Top-level [Subscribe](https://github.com/nats-io/nats.go/blob/2ea8d393bbd4eb781462695533a0f1bf321ca7e5/nats.go#L3647-L3652) method to create an async subscription delegated to an internal helper method. The other variants of *Subscribe follow the same pattern.
The [subscribeLocked](https://github.com/nats-io/nats.go/blob/2ea8d393bbd4eb781462695533a0f1bf321ca7e5/nats.go#L3738) method does the work of registering the subscription to the local connection and spinning up a goroutine when the subscription is asynchronous.
The [waitForMsgs](https://github.com/nats-io/nats.go/blob/2ea8d393bbd4eb781462695533a0f1bf321ca7e5/nats.go#L2536-L2538) method (run in the goroutine) works in tandem with [processMsg](https://github.com/nats-io/nats.go/blob/2ea8d393bbd4eb781462695533a0f1bf321ca7e5/nats.go#L2650-L2654) which is called whether a message is received and parsed from the connection.
The last high-level bit is [readLoop](https://github.com/nats-io/nats.go/blob/2ea8d393bbd4eb781462695533a0f1bf321ca7e5/nats.go#L2493-L2496) which is the single reader on the connection that receives the raw TCP byte streams and parses them as NATS protocol messages.

One other point of clarification with the server URLs, these are referred to as the seed servers for a connection. By default, a connection will be established against one of the URLs at random. Once connected, the client learns about the other servers in the cluster.

This is independent to whether a client establishes multiple TCP connections to the cluster. If a separate nats.Connect call is made, this will randomly choose a seed server and create a new TCP connection.

So the final question is.. do you always want to rely on one connection to the cluster per application process? Not necessarily. There is the potential that the subscriptions or publishers on a single connection are competing since all messages are received and parsed in order. If you observe increased latency with certain subscribers, then it may make sense to establish two separate connections so the messages can be received and processed in parallel.

There is the other more extreme situation that the TCP connection is saturated and it makes sense to split up the workload. If the NIC is saturated, then you need split up the workload on a separate host.