Why Kafka was a non-starter for Sveltos—and how NATS’ simpler, more flexible, and lightweight design unlocked new capabilities
Project Sveltos, an open-source Kubernetes cluster management tool, integrated NATS to extend its event-driven framework beyond native cluster events. NATS was chosen over heavier solutions like Kafka due to its lightweight nature, ease of deployment at scale, and straightforward Go integration.
Sveltos creator and maintainer Gianluca Mardente particularly appreciated NATS’ clear documentation, simple setup, and low resource consumption, all of which allowed for rapid prototyping and a smooth developer experience.
“I would just say, just go into the documentation and build a prototype. Do not spend too much time trying the perfect solution. Just go with the prototype, see that it works, and then pretty much from there you can build whatever you need. At least that was my experience.
For me having the writing, developing the clients and the server, you know, publishing events and listening to events and putting some NATS server in the middle, it was very easy and half a day of work and I had a prototype working”
— Gianluca Mardente, Creator, Project Sveltos
Go Deeper
Full Transcript
Andrew Connolly:
Folks, thank you for being here for our next Community Lightning Talk. Today I'm joined by Gianluca Mardente, who is a principal engineer at Cisco. And more importantly for today's talk, the founder and creator of Project Sveltos, an open source Kubernetes cluster management tool that simplifies cluster management. Gianluca, thanks for being here. You start by telling us briefly about Sveltos, why you started it, and what problems it solves?
Gianluca Mardente:
Great. Thanks for having me here. I started Sveltos a few years back, it's almost three years now, because I was working on a project where we had to create, like, multiple clusters on demand, on prem, and I felt there was a need to be able to consistently distribute, advance, and application on those clusters. When you're managing a fleet of cluster, you wanna have, like, a simple tool where you can say, this is the configuration that has to go to a production cluster. This is the configuration that has to sell staging cluster and have a single place where you're going in this configuration, and it applies to the matching clusters.
That's where I start with Sveltos.
Andrew Connolly:
Okay. And before you incorporated NATS into Sveltos, what did the tool look like? Were you using an alternate solution? And what what made NATS a valuable add?
Gianluca Mardente:
Well, I'll take a step back before I go to NATS because essentially, Sveltos has an event driven framework. You can essentially instruct Sveltos to watch for certain Kubernetes resources, and you can tell Sveltos what to do when certain events happen. For instance, a Kubernetes service is created, I want to create a network policy that exposed this Kubernetes service, and you do with an event framework. So before NATS, Sveltos was limited to only watching Kubernetes resources and events related to Kubernetes resources. But many Sveltos users asked whether it was possible to watch for events which were happening outside the Kubernetes clusters.
For instance, you may have a user logging in or logging out, you might have a new user added, onboarded, and so it's I needed something to be able to watch for those events happening outside of the cluster. And then I read post by Justina Beck on LinkedIn, I think it was called the Nomadic Apps Manifesto, and I started looking into this, it sounded like really really interesting, and so I started looking into NATS and I decided to use it. The main reason at that point before NATS did not have like any integration to watch for events outside of the cluster was because any other solution is too heavy. You know, I could have used Kafka but it's very heavy. It consumes a lot of resources.
It's not really that straightforward to set it up, like, on a Kubernetes cluster. Now think if you have to set it up, like, in 100 clusters. And they said like when I started looking with I started playing with NATS it was super easy. I took the NATS uncharted, Sveltos has the ability to say okay this is an uncharted deploy in all the clusters which are matching this set of labels. And so this is essentially what I did.
Like in five minutes I had Sveltos deploying NATS in all my staging clusters, and NATS was up and running. And so it was the first part, and so it was like super easy to bring it up. Was not consuming a lot of resource, you know, for instance project that is important because, you know, you have to pay for those resources, so it's important that doesn't consume much. And then also integrating was super straightforward. So the way Sveltos works is that Sveltos deploys an agent in every managed cluster, and this is the agent that watches for events.
And so what I did, I simply integrated this agent to connect to the NATS and watch for listen for events which were being published on certain topics, and that's what I did. And I think it took me less than half a day to have a prototype which was working and from that pretty much like it was super exciting so like I kept dancing and but honestly the main reason was it's very easy to start and it is very light. It doesn't consume much. And for me, those were main keys. Like, okay.
This is what I need. And this is what, like, Sveltos user mean.
Andrew Connolly:
Very nice. Yeah. It sounds like with NATS you were able to add functionality to Sveltos that otherwise wouldn't have made sense. Right? Kafka wasn't a fit for enabling this capability because it was too heavyweight Sveltos?
Gianluca Mardente:
Yeah. I mean, it's because at the end, I'm deploying like NATS on not just on one cluster. You know, Sveltos is to manage like a fleet of Kubernetes clusters. And so I need to deploy NATS in every cluster where I want to watch for events which are happening outside of the cluster. It's not a requirement.
You know, user can have, like, NATS deployed in a single cluster, and all the events are pushed to the NATS server and from there distributed everywhere by Sveltos. Ideally if you want to like have if your clusters, if you have a fleet of clusters and you have different tenants managing different clusters, maybe, you know, every tenant wants to have like his own NAT server. And so then I needed something which I was able to deploy in more than one cluster. And of course you know when you scale the number of clusters where you have to deploy, the less resource you consume the more important it becomes. Because you know like if it consumes a lot in one cluster and then you have to deploy in 10 clusters now your costs are going up by 10, right?
So in certain with NATS it was very very easy. It's also very easy explain. It's also very easy for Sveltos users to create their own client that publishes events and so every user creates his own client and publishes events to this NATS server and then from there Sveltos takes over and does the rest of the work. So it's not just deploying, but it's also the ability for users to create their own NATS server or publish events to the NAT server. That was like one of the key.
Because you know, like if it's too complex, nobody's gonna use it. It's very easy, then everybody's gonna use it.
Andrew Connolly:
Nice. Yeah. I think that tracks with things that we've seen with other users and customers, both a smooth or easy operator experience, but also the developer experience being relatively straightforward. That leads me to my next
Gianluca Mardente:
And also sorry to interrupt, but also like documentation honestly, it's wonderful. You just go there, you read it. I use like Sveltos written in Go. So, you know, I just pick like the Go section of the documentation. It's very easy.
Just import this, this, this, and it's up and running. It's like few lines of code and it works.
Andrew Connolly:
Fantastic. Yeah. And it's great to hear that you're also a Go fan because of course that is the language of NATS servers. So always a good fit when someone's using Go. We sort of touched on this, but as someone who's relatively new to NATS and onboarded quickly and ramped up quickly and found success quickly, what advice would you give to attendees who are considering building with NATS or just getting started building with NATS?
What things did you do or what things did you find were useful in getting started quickly?
Gianluca Mardente:
I would just say, just go into the documentation and build a prototype. Do not spend too much time trying the perfect solution. Just go with the prototype, see that it works, and then pretty much from there pretty much you can build whatever you need. At least that was my experience. Usually you go and spend much time reading because things are not clear, but it was super easy to create a client, to create a server because, you know, for my testing also to create a client.
So for me having the writing, developing the clients and the server, you know, publishing events and listening to events and putting some NATS server in the middle, it was very easy and half a day of work and I had a prototype working. And then, you know, of course from there then I had to build like other things that were really at Sveltos, but at least I knew that I had something working out a day and then after that I could have improved like whatever else I needed. So just read the documentation because it goes to the point, straight to the point and it's actually what you need.
Andrew Connolly:
Cool. How about any future plans for expanding or, you know, growing the scope of how NATS has used it within Sveltos? I guess I didn't ask yet, you using just core NATS or are you using Jetstream? What specific capabilities are most important to Sveltos and any you're looking to add in the future?
Gianluca Mardente:
So this is actually up to the users because Sveltos is integrated to be able to receive events from both NATS or Jetstream. Whether the user wants to use apps or Jetstream, it depends on the type of events that they want, like Sveltos, to be notified of. You know, if they don't want to lose an event, they want to make sure that the event is delivered, then they have to go with with Jetstream. But that is up to the user. We will say that, likely, Sveltos user is gonna use, like, NATS for prototyping their own client and seeing that, the end to end works.
And then when they go in production, very likely, they're gonna switch to Jetstream. As, you know, like, you have reliability, which is fundamental when you go, like, to production. But Sveltos is integrated, like, to, you know, work with bot. And as part of this future plans, like, actually, there was a nice request which I already got. Essentially, as I was mentioning, Sveltos, precise in a management cluster, from this management cluster, manage, advance an application in a fleet of clusters.
And one request was, okay. So now Sveltos, I cannot deploy it in all those clusters, and Sveltos is receiving, like, those events. What one of the one request was, can you integrate also can you have NATS also in the management cluster so we can watch for events in all the managed clusters, which Sveltos sends those events to the management cluster. And from the from the management cluster, I've send it to the NATS and so at this point they can build their own client on the other side which is receiving those events. And so at this point now you have events which can be published through NATS in any cluster which is managed.
They're all forwarded to the management cluster. And from the management cluster, there is another NAT server which forwards to where, you know, the client wants those events to be sent. So it's actually an interesting use case. It's actually a pretty cool one.
Andrew Connolly:
Very nice.
Gianluca Mardente:
And, you know, beauty of NATS, it makes it very easy to to build this entire flow. So
Andrew Connolly:
Yeah. I've heard similar stories of When end users of NATS applications or tooling kind of realize what NATS is under the covers, they want more access. They want more data, more events because they realize how easy it is. And so, yeah, it seems to spread once it gets inside a tool or inside an application. Well Jean Lucch, anything you want to add before we wrap up?
Gianluca Mardente:
No, I want again, congratulate again with because my experience was great. And as a developer, it was a great experience, and it's it's not so common to be, like, that great. So kudos to the team.
Andrew Connolly:
Well, Gianluca, thank you for being a part of the NATS community, and thank you for making time to chat today. We appreciate it.
Nate Emerson:
Alright. And we have Gianluca joining us live for a quick q and a. Thank you so much for being here. We had two questions that came through that I think were particularly interesting. So first off, what is your strategy on schemas and breaking changes when dealing with multiple customer managed clusters?
Gianluca Mardente:
That it's actually on the user side because what the integration that I did with Sveltos and NATS is that Sveltos expects to receive cloud events from the from NATS and so and then he processes events to this to create to deploy other stuff. So usually the event it might be the event might be a user logged in and I want to create a Postgres database in response or I want to create an entry in a Postgres database in response, so pretty much like Sveltos used that way. So I'm not much concerned about schema because the content for Sveltos is a cloud event and the it's up to the user what, they wanna put in the cloud event and so and how to consume it, essentially.
Nate Emerson:
Yeah. That makes sense. Tell us how quickly you integrated NATS. So Sveltos was a pretty feature complete platform that you went to then integrating NATS, the capabilities. So what was that journey like, and how hard was it to get that kind of initial integration in?
Gianluca Mardente:
It was honestly super easy. I I in the past, I had to use Kafka as a messaging system, and my experience with Kafka was what stopping me from adding Sveltos, supporting Sveltos to watch for events happening outside the cluster because I was too heavy. And every day, you have to deal with something and it's not coming up and that is not coming up and you have to restart this and you have to reset this. So when I started reading the documentation, it's like, okay. It looks looks super easy.
I just have to import this one, and this is how do I connect, and this I just need to find a way to pass the credentials so I can connect, and that's it. So it was honestly, like, amazing, and that's what kept me going because the the first step was, like, let me see if I can actually integrate quickly. And I was able to bring it up, integrate quickly, and then I said, okay. Now it makes sense. This works because I tried multiple times.
So let me build the entire flow end to end. But, yeah, it was one of the rare experiences where you should go to documentation and you don't have to read 10 pages to get to the point of, okay, this is what I need. So it was straight there. It's like, you're using Go? Just click this link.
This is the Go documentation. Import this package and use it.
Nate Emerson:
That's awesome. Yeah. It's the documentation is something we are always working to upgrade and improve and we've had some feedback that it can be more difficult to get into, especially for newer users. But I do appreciate and there's so much good stuff in there too that really dives into the conceptual background, so I appreciate that positive feedback.
Gianluca Mardente:
Absolutely. And there is a bunch of examples as well. Like, I found, like, just amazing. Like, you wanna do this? This is an example.
You want this? This is an example. So it's it's amazing.
Nate Emerson:
Alright. Well, thank you so much for joining us. We're gonna get moving right along with our schedule here, but thank you again for your time.