NATS Weekly #3

Week of November 29 - December 5, 2021

🗞 Announcements, writings, and projects

A short list of announcements, blog posts, projects updates and other news.

⚡Releases

📖 Info

JetStream based Key-Value Stores - Architecture decision record (ADR) about the new KV support
NATS client written in Crystal - Community provided by @jgaskins and supports core NATS, JetStream, and the KV API
NATS Discussions on Github - Maybe this will be the place to support long-form, persistent discussions??
New #risc-v channel in the NATS Slack

💡 Recently asked questions

Questions sourced from Slack, Twitter, or individuals. Responses and examples are in my own words, unless otherwise noted.

What is the quality of service NATS provides?

In this context, quality of service (QoS) corresponds to the performance and reliability of messages being published and received. There are a number of factors that can be evaluated, but a common one is message delivery guarantees.

Core NATS does not store or buffer (for the intention of retries) messages when they are received by publishers. By default, messages that are published are relayed to connected subscribers. Any subscriber that connects later would not see any previously published messages. Likewise if there is any network partition between NATS and clients, messages could be dropped since acknowledgements are not being used.

This mechanism provides an at-most-once guarantee since given any arbitrary client, a message may be delivered to that client zero or one time.

It takes some time to get comfortable with the fact that when a publisher sends a message, other than the NATS server receiving it, it has no knowledge of when there are subscribers of the subject the message was published to or, if there are, when they received the message.

One messaging pattern that provides application-level at-least-once is request-reply. This chains together two messages into one request. A publisher sends a message and include a unique reply subject any receiver of that message can reply on to indicate they received it and/or provide some data back to the publisher.

Even though the request-reply pattern improves the QoS, this still has two trade-offs:

The publisher now needs to wait for a subscriber to respond rather then just NATS itself
Future/late subscribers will never see that message after it was published and acknowledged

JetStream provides native at-least-once QoS by persisting messages in memory or on disk. The two primary benefits of this includes:

Publishers can simply wait for an ACK from NATS that a message is stored rather than a downstream client to have processed it.
Subscribers can connect later and receive/replay messages that were previously published

The holy grail of QoS is exactly once. There are two flavors of exactly once, delivery and processing. NATS supports partial exactly once delivery using message duplication combined with double acknowledgements. I say partial because it is theoretically impossible to guarantee it under all failure cases.

Exactly once processing is generally more feasible since this pertains to the consumer side. Whether a consumer keeps track of message sequence IDs or the Nats-Msg-Id header on published messages, a client can ensure duplicate messages are not processed twice even if they are re-delivered (in rare cases).

Are there integrations available for feeding messages into databases?

There are no official set of integrations like you would find for Kafka Connect. However, depending on the kind of integration and the scale or performance requirements, prototyping one is straightforward.

There are two types of integrations, a source and a sink. For example, a source could be a database (e.g. Postgres, MongoDB, etc). As data are changed in the database, change data capture (CDC) records could be written to a NATS stream for downstream consumption. A sink involves taking data in a NATS stream and writing it out to some target (database, file, etc).

Source integrations tend to be much tricker to implement in a generic way since it requires fairly deep understanding of how the source system emits these CDC records. For example, the most reliable approach with Postgres is to hook into its logical replication feature.

That said, you could also simply follow the outbox pattern if you have a specific set of tables or records you want to target. Most implementations require the source application to include this secondary write to the outbox table. If changing the source application is not feasible, there are strategies for taking snapshots of tables and diffing them over time to determine changes.

Whether a source or sink is being implemented, a NATS stream should definitely be used so messages are persisted and can be retried when connection failures occur. The happy path with these kinds of integrations are fairly straightforward, but failure cases can be challenging.

For example, in the sink scenario, a message could be translated and inserted into a database table, but then the ack on that message to NATS fails (due to some network issue). Is that message is redelivered, the connector would need to ensure idempotent writes.

When managing accounts with `nsc`, where should this directory tree be stored?

[nsc](https://docs.nats.io/using-nats/nats-tools/nsc) is a tool for creating operators, accounts, and users. The backup section of the JWT tutorial provides some insights into the sensitivity of the files created by nsc. Here is a summary and some takeaways:

By default, nsc creates two directories ~/.nkeys and ~/.nsc, however these locations can be changed using NKEYS_PATH environment variable and setting the JWT store path nsc env --store <path>, respectively.
The .nsc directory contains JWTs that are used by the built-in nats-resolver to indicate the hierarchy of accounts and permissions. Although these JWTs are not credentials or secrets, if the subjects or names used are sensitive in your respective domain, then they should be treated as sensitive.
The .nkeys directory, unsurprisingly, stores the keys as well as user credentials. In practice, this directory should only be materialized for the period of time needed to creating accounts or users. Otherwise, the files should reside in some encrypted store or secrets manager.
An additional recommendation is to use signing keys which derived from the identity keys that are initially created by nsc. Once a signing key is created, the identity key no longer should be used except if/when the signing key is lost or compromised. At this point the signing key can be rotated by the identity key so the existing signing key can no longer be used.
Accounts typically correspond to a team, service, or product for which one or more users can be created. This allows for distribution of the account signing key to the corresponding team to manage rather than centralized all keys in one place, however that is an organizational decision.
User credentials can be stored in a secrets manager (e.g. Vault) and fetched at deploy time.

The short of it, is that the .nsc directory needs to be maintained in order to sync the accounts and users to the server. However, the .nkeys directory can be decomposed and stored in a secrets manager and only fetched when needed.

What are strategies for moving a stream to a different set of nodes in the cluster?

I asked this question on NATS Github discussions 😉