Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can Corrosion be a standalone service catalog? #161

Open
hkrutzer opened this issue Feb 26, 2024 · 13 comments
Open

Can Corrosion be a standalone service catalog? #161

hkrutzer opened this issue Feb 26, 2024 · 13 comments

Comments

@hkrutzer
Copy link

Hi,

I saw Corrosion has the option to import data from Consul. I am looking for an alternative to Consul's service directory, as I am running a relatively small amount of servers, and I don't want to add 3 servers to host a Consul cluster.

It seems I can simply put my services in a SQLite table and have a DNS server like CoreDNS with a SQL plugin to serve DNS queries from the table. Then all I would need to build is a service that does health checking, and have a table for that as well, and that should cover the basics. Would Corrosion be suitable for this? Are there any components for this that you will add in the future? (E.g. I take it you won't add a DNS server but perhaps you are.)

@jeromegn
Copy link
Member

jeromegn commented Feb 26, 2024

Yes, you can use Corrosion as a replacement for Consul if you build health checking and have a way to serve DNS from the database.

On our end, we'd built our own DNS server using data from Corrosion, but we still use Consul for health checking and as the source of truth for services we manage.

You may not need a centralized Consul servers cluster. corrosion consul sync will pull data from the local Consul agent only, so in theory you can run single-node Consul agents that don't talk to Consul servers and it should still work out. Corrosion would take care of replicating the data on every node in an eventually consistent manner. You would still have to do your own DNS which could be tricky if you're working off of the consul_services schema directly, but you can add virtual columns that expose meta fields as columns to help with an integration with an existing DNS server. Using Consul agents should at least give you health checks for "free".

As a general advice: you probably want to use the latest main for your deployment. There has been countless improvements since the 0.1.0 release.

@hkrutzer
Copy link
Author

Interesting, thanks!

I think I should be able to use something like https://github.com/eadz/coredns_sqlite3 and but have a view instead of a table that takes the service and health status etc. into account. Is there a reason you have your own DNS server, or is that unrelated to Corrosion/Consul? And are you looking to move more Consul features into Corrosion in the future?

@jeromegn
Copy link
Member

coredns_sqlite3 with a view should work!

Our DNS server has its own special handling for authorization based on source IP (information is encoded in IPv6s) and many of its queries aren't straightforward. We've also built this a long while ago when the DNS landscape was different. It turns out it's less than 1000 lines of Rust and that pleases me :) (we use the domain crate)

@hkrutzer
Copy link
Author

hkrutzer commented Mar 2, 2024

Another question: I could run Consul without a cluster or build my own healthchecks, but what if an entire node goes down and it no longer provides any health information? Does Corrosion provide a view of the cluster membership?

@jeromegn
Copy link
Member

jeromegn commented Mar 5, 2024

It does, in a roundabout way. It stores the SWIM cluster state for each node, serialized as JSON, in the foca_state column of the __corro_members table. This table is internal and subject to change, but it should be relatively safe to use it.

The JSON representation contains a field state that can that a value of either "Alive", "Down" or "Suspect".

This is the state agreed upon by the whole cluster.

@hkrutzer
Copy link
Author

hkrutzer commented Mar 5, 2024

Would you consider a feature request to add the node that is queried to that table? If I used this table and I want to e.g. write a template with all available nodes, it wouldn't include the node that the template is written on.

It's also a little hard to relate this table to a services table, the services table would include some kind of instance ID, which would have to be exposed in some way so that it can be inserted, or at least the address from __corro_members. Is there any way to do this?

@jeromegn
Copy link
Member

jeromegn commented Mar 5, 2024

Yes, we can probably add the current node in __corro_members.

It's also a little hard to relate this table to a services table, the services table would include some kind of instance ID, which would have to be exposed in some way so that it can be inserted, or at least the address from __corro_members. Is there any way to do this?

The way we've done it here is with a new table, replicated by Corrosion, called corro_meta. It contains the actor_id and the hostname. We use the hostname for most things so we JOIN on that small table to get that info. You can find the current actor_id by doing this query: SELECT site_id FROM crsql_site_id WHERE ordinal = 0

Doing a separate table may sound cumbersome, but it's the most flexible way to attach metadata about a node without assuming too much from unknown-to-us use cases.

@hkrutzer
Copy link
Author

hkrutzer commented Mar 6, 2024

Doing a separate table may sound cumbersome, but it's the most flexible way to attach metadata about a node without assuming too much from unknown-to-us use cases.

That sounds fine to me! Do you periodically clean this table? As you can't create a foreign key on the actor_id inside the JSON in __corro_members to delete it automatically.

@jeromegn
Copy link
Member

jeromegn commented Mar 7, 2024

We don't clean it, yet. It doesn't matter if there are extra rows in that table and we rarely deprovision nodes.

As you can't create a foreign key on the actor_id inside the JSON in __corro_members to delete it automatically.

Even worse: foreign keys are not supported by cr-sqlite and therefore aren't supported by Corrosion. Perhaps we should add that to the docs. Basically anything, on a table, that has a restrictive effect on a different table doesn't work due to how conflicts are resolved. Another example is unique indexes (outside of primary keys).

@gedw99
Copy link

gedw99 commented May 14, 2024

I was also wondering if foreign keys work. Def worth adding that it's not possible to the docs.

@psviderski
Copy link

psviderski commented Sep 25, 2024

It stores the SWIM cluster state for each node, serialized as JSON, in the foca_state column of the __corro_members table. The JSON representation contains a field state that can that a value of either "Alive", "Down" or "Suspect".

According to the code

let mut diff_last_states_every = tokio::time::interval(Duration::from_secs(60));

the __corro_members table is synchronised with the actual foca state only once a minute. Which means in the worst case scenario there could be a 1 min delay of knowing that a particular node is down. This even applies to a node that has gracefully left or joined the cluster.

@jeromegn I wonder, do you use something different to __corro_members to promptly react to cluster membership changes? I can probably watch corrosion cluster members which instantly reflects added or removed nodes. Does it provide a reliable source of information? For example, I'm not sure whether it includes Suspect nodes or not. Is there a better way?

@jeromegn
Copy link
Member

do you use something different to __corro_members to promptly react to cluster membership changes?

We're not using memberships outside of Corrosion itself. We usually care more about 1:1 failures (A <-> B) than the global view of the cluster which is what is represented in __corro_members and by the corrosion cluster members command.

If you do want the global view of the SWIM cluster's state: yes, the corrosion cluster members should reflect that precisely since it uses what's in memory. It includes suspect members as well.

@psviderski
Copy link

We're not using memberships outside of Corrosion itself. We usually care more about 1:1 failures (A <-> B) than the global view of the cluster

Not sure I'm following. I'm still trying to understand how Corrosion can be used as a service catalog when substituting Consul. Let's imagine we have nodes A and B in the cluster. Node A runs a Consul agent for health checks and runs corrosion consul sync to sync local Consul services and health checks to distributed consul_services and consul_checks tables respectively.

On node B, we can see services and their statuses from node A. All of them are reported as healthy. Then imagine a network partition occurs between nodes A and B or node A just crashes. Now, the service records from node A are no longer updated in consul_services and consul_checks tables but they still show a healthy status.
The question is: How should node B determine whether particular service records are current or already outdated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants