Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: resolver.cloudflare-eth.com is down; change default ENS resolver? #771

Closed
lidel opened this issue Dec 23, 2024 · 3 comments · Fixed by #781
Closed

bug: resolver.cloudflare-eth.com is down; change default ENS resolver? #771

lidel opened this issue Dec 23, 2024 · 3 comments · Fixed by #781
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization

Comments

@lidel
Copy link
Member

lidel commented Dec 23, 2024

Problem

DNSLinks that use ENS (websites on .eth TLD) are broken for users that run boxo/gateway with default settings (incl. Kubo, Rainbow, IPFS Desktop).

Cloudflare resolver at https://resolver.cloudflare-eth.com/dns-query is currently broken and results no DNSLink results ("Answer":[] at the end):

$ curl -s -H "accept: application/dns-json" "https://resolver.cloudflare-eth.com/dns-query?name=_dnslink.vitalik.eth&type=TXT"
{"AD":true,"CD":false,"RA":true,"RD":true,"TC":false,"Status":3,"Question":[{"name":"_dnslink.vitalik.eth.","type":16}],"Answer":[]

We've reported outage to Cloudflare, but if it does not get fixed until January when the team is back from holidays, we should consider removing .eth support from implicit defaults in Boxo (and Kubo), or switch implicit default to a different DoH resolver.

Solution

We could make things more robust by supporting fallbacks (ipfs/kubo#8173) but for that we need more than one, and it seems ENS has only one stable resolver atm.

So the options are:

Solution (A): do nothing, wait for resolver.cloudflare-eth.com to be fixed

This happens once a year on average. Acceptable? 🤷

Solution (B): remove default resolver for .eth

This would break all ENS websites, and require all end users to choose or set up their own resolver.
Probably not what we want given ENS+IPFS use in the wider ecosystem, but writing this down here for completeness.

Solution (C): switch URL to https://dns.eth.limo/dns-query

The DoH at https://dns.eth.limo/dns-query is a good candidate, seems to be well maintained and provided non-empty DNSLink in "Answer":

$ curl -s -H "accept: application/dns-json" "https://dns.eth.limo/dns-query?name=_dnslink.vitalik.eth&type=TXT"
{"Status":"0","TC":false,"Question":[{"name":"_dnslink.vitalik.eth","type":16}],"Answer":[{"name":"_dnslink.vitalik.eth","data":"dnslink=/ipfs/bafybeifvusbh4iunpvwjlowu47sxnt4hjlebx46kxi4yz5zdsoecfpkkei","type":16,"ttl":300}]}

(this is not final decision, consider this commit as a way fo kicking-off conversation what we should do in 2025 to minimize issues with proxied naming systems like ENS)

Solution (D): ?

Other ideas welcome.

@lidel lidel added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Dec 23, 2024
@aschmahmann
Copy link
Contributor

A few other options:

Solution (D): Mandate multiple resolvers for defaults

Insist that any naming system that wants to be included by default in boxo provide multiple (2 or 3?) independently operated trusted DNS resolvers and then add some logic in boxo to be able to try the others should one of them fail consistently.

Some notes:

  • It can't just be a check for a 4xx / 5xx response or a SERVFAIL DNS response because sometimes failures (like the Cloudflare one above) will fail differently (e.g. an incorrect NXDOMAIN) which makes this more complicated
  • This is roughly the same as Add DNS Fallback Resolvers kubo#8173 (comment), but taking into account that some failures might not be obvious
  • The extra logic here might be reusable for something like querying IPNI providers
  • We'd have to consider removing .crypto support unless their community was willing to operate 1-2 more resolvers

Solution (E): Have the dweb.link maintainers maintain public DNSLink infra that can be more agile / adaptable to issues

This basically hides the existing problem behind a party that is likely to be more closely aligned with the boxo maintainers. On its own this basically trades one centralized party for another. That the centralized party is closer aligned with the library and its dependents can be helpful, but it also seems reasonable that responsibility around continuity of these naming systems should reside with their communities. Continuity questions are ones we'd have to bring up when considering new defaults.

Solution (F): Use default Ethereum JSON RPC endpoints rather than default ENS resolvers

There are many more Ethereum JSON RPC providers than ENS providers so we could use a library (e.g. https://github.com/wealdtech/go-ens) to do the ENS translation (and handle CCIP, etc.).

The two major downsides here are:

  1. Given folks have built businesses around being ETH RPC providers it seems unlikely we'll have a stable one we can include by default here and requiring an ETH RPC endpoint to be configured is not what's needed of default behavior. This is likely what kills this idea ... although if this turns out to be incorrect that'd be great.
  2. There is both added maintenance and gatekeeping added to the boxo maintainers. To be fair it could be argued that we have some of this already by virtue of including defaults (as noted by us needing to worry about this Cloudflare outage)

Solution (G): Implement verifiable decentralized ENS resolution

Instead of using a DoH resolver for ENS, implement a way of fetching ENS data verifiably from a distributed set of peers.

Personally I'd like to see this happen, but I suspect that realistically it's a lot of work and would require some coordination and funding from the ENS community to make it happen. While I tend to agree with not wanting to pick "winners", perform gatekeeping, or deal with extra maintenance work I feel less bad about giving some preferential treatment to protocols where:

  1. There is adoption of the protocol
  2. We can promote verifiable and resilient systems over trusted and brittle proxies

Realistically, I suspect the best path forward for now is:

  1. If this outage goes on for a while and the eth.limo folks are ok with it go with option C
  2. Take a look at the viability / ease of implementing D since that should be the easiest way to gain resiliency
  3. I'd be curious if G had enough interest to be a thing and we can discuss with the ENS community (e.g. on their forums or ping their about this issue), but it's definitely not a near-term solution

@MicahZoltu
Copy link

For (G) I believe the recently launched Portal Network is likely at least part of the solution, combined with some sort of light client. The Portal Network is a state storage system that allows participants to distribute all state over a large decentralized network. You would still need a light client that can do ENS resolution (which likely requires running an EVM). In theory, one should be able to build a light client to achieve these goals, but I don't believe one exists as of yet.

@lidel
Copy link
Member Author

lidel commented Jan 6, 2025

Ref. https://developers.cloudflare.com/fundamentals/api/reference/deprecations/#2025-07-01:
image

Cloudflare decommissioned resolver.cloudflare-eth.com/dns-query and it now redirects to https://dns.eth.link/dns-query which afaik is alias for eth.limo.

This provisionally fixes ENS resolution for all boxo users (incl. kubo, rainbow, gateways), but effectively re-routes user queries to a new entity.

Proposed next step: (C)

  • Until there is a working PoC demonstrating doing (G) or (F) + (someone sponsors unpaid, reliable endpoint), Boxo should update default DoH resolver to (C) but use .eth.limo and not .eth.link to avoid exposure to this mess.
  • Rationale: Cloudflare already redirects to eth.limo. We use eth.limo one anyway already. By hitting it directly we avoid perf. penalty caused by HTTP 308 redirect + we we are not depending on 2 extra domain names (resolver.cloudflare-eth.com may be killed like https://cloudflare-ipfs.com was, and eth.limo is just an alias that effectively splits HTTP cache into two)

Short term:

  • get approval from eth.limo operators
  • PR that switches .eth resolution to https://dns.eth.limo/dns-query

lidel added a commit that referenced this issue Jan 7, 2025
Cloudflare decomissionned their resolver and started redirecting
(HTTP 308) to https://dns.eth.link/dns-query which is an alias for
eth.limo.

We change default to https://dns.eth.limo/dns-query
to avoid redirects and cache splitting.

Closes #771
lidel added a commit that referenced this issue Jan 7, 2025
Cloudflare decomissionned their resolver and started redirecting
(HTTP 308) to https://dns.eth.link/dns-query which is an alias for
eth.limo.

We change default to https://dns.eth.limo/dns-query
to avoid redirects and cache splitting.

Closes #771
lidel added a commit to ipfs/rainbow that referenced this issue Jan 7, 2025
lidel added a commit to ipfs/rainbow that referenced this issue Jan 7, 2025
@lidel lidel closed this as completed in #781 Jan 8, 2025
@lidel lidel closed this as completed in 5518e1a Jan 8, 2025
lidel added a commit to ipfs/rainbow that referenced this issue Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants