Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix badges not working #10138

Open
nadonomy opened this issue May 7, 2024 · 9 comments
Open

Matrix badges not working #10138

nadonomy opened this issue May 7, 2024 · 9 comments
Labels
question Support questions, usage questions, unconfirmed bugs, discussions, ideas

Comments

@nadonomy
Copy link

nadonomy commented May 7, 2024

Are you experiencing an issue with...

shields.io

馃悶 Description

It looks like the matrix badges aren't working.

Here's how the badge embedded in the Element Web README (source) shows for me in Safari:

image

And then Chrome:

image

Which should be configured correctly.

馃敆 Link to the badge

https://github.com/element-hq/element-web

馃挕 Possible Solution

No response

@nadonomy nadonomy added the question Support questions, usage questions, unconfirmed bugs, discussions, ideas label May 7, 2024
@nadonomy
Copy link
Author

nadonomy commented May 7, 2024

@t3chguy if you're happy to post that screenshot here where you tested with an alternate server that'd be helpful

@thibaultamartin tagging for your radar too in case you can help debug/fix :)

@chris48s
Copy link
Member

chris48s commented May 7, 2024

Hello.

In general, the matrix badges are working.

Here's some examples that should work:

- https://img.shields.io/matrix/aio-libs:matrix.org
- https://img.shields.io/matrix/cemu:cemu.info?server_fqdn=matrix.cemu.info

We do seem to have an issue though. Having looked into it, I'd describe the problem more like: The badge/upstream endpoint we're calling does not scale well when the number of users is very large.

For the example you've posted - https://img.shields.io/matrix/element-web:matrix.org
One of 2 things is going to happen if you view this in a GitHub readme

  • GitHub's camo proxy will wait 4 seconds for us to return a response, then give up when it doesn't get one (that's what you're seeing in the first screenshot)
  • We'll return an error response, having failed with Error: Maximum response size exceeded because the endpoint we're calling on matrix.org is sending back over 10Mb of JSON (we will refuse to accept a response greater than 10Mb from an upstream endpoint)

Fundamentally, I think in order to fix this we would need to be able to call a more efficient endpoint on the upstream service, but I'm not sure there is one.

@t3chguy
Copy link

t3chguy commented May 7, 2024

Likely the implementation needs updating. Instead of /_matrix/client/r0/rooms/$roomId/state you can try matrix-org/matrix-spec-proposals#3266 on certain servers which will return a near-constant-size payload with things like name & number of users in it.

{
  room_id: "!ol19s:bleecker.street",
  avatar_url: "mxc://bleecker.street/CHEDDARandBRIE",
  guest_can_join: false,
  name: "CHEESE",
  num_joined_members: 37,
  topic: "Tasty tasty cheese",
  world_readable: true,
  join_rule: "public",
  room_type: "m.space",
  membership: "invite",
  encryption: "m.megolm.v100",
  room_version: "9001",
}

This API also lets you skip guest registration (as it is unauthenticated), and once enabled on gitter.im would also resolve #9714

https://matrix.org/_matrix/client/unstable/im.nheko.summary/summary/%23GeoJsonCircleToPolygon:gitter.im for example.

@chris48s
Copy link
Member

chris48s commented May 7, 2024

That would be a big improvement. The fact that we have to call an endpoint that requires guest auth is another thing that makes this slow (because we have to make multiple requests to render a badge) and a source of complexity. If we can bin that off, I'm all for it.

The link you've posted there is to an open PR and it seems like we would currently have to call something like:

https://matrix.org/_matrix/client/unstable/im.nheko.summary/summary/%23element-web:matrix.org
rather than a stable endpoint like https://matrix.org/_matrix/client/v1/summary/%23element-web:matrix.org (as described in those docs).

Do you have an idea of how many servers out there have that endpoint? Is it basically just matrix.org? What would be the minimum version of matrix we would say a user needs to be running for this to work?
I guess the format of that JSON response is subject to change while that PR is still open?
Presumably that URL would also need to change at some point in future?

@t3chguy
Copy link

t3chguy commented May 7, 2024

Do you have an idea of how many servers out there have that endpoint?

I wouldn't be able to tell you that

Is it basically just matrix.org?

No, any Synapse can enable it but it isn't on by default

What would be the minimum version of matrix we would say a user needs to be running for this to work?

None specifically given the MSC is still unstable, once merged it'd be in the following Matrix spec release

I guess the format of that JSON response is subject to change while that PR is still open?

Possible but quite unlikely at this point

Presumably that URL would also need to change at some point in future?

Yes, it'd move to a stable path


All of the above is redundant if you were to try the new API call before falling back to current behaviour which is known to break for large rooms.

@chris48s
Copy link
Member

All of the above is redundant

I find this response dismissive. As an outsider, I'm looking here at an unstable endpoint and an unmerged PR that has been open for 3 years. I think my questions are pretty valid.

That said, I'm going to attempt to take a charitable interpretation and move on.

The most common place shields badges are viewed is in READMEs on GitHub. GitHub's image proxy puts a hard limit of 4 seconds on the time it will wait for a response, which is part of the problem here.

As it stands, the matrix badge already requires multiple round-trips on the network. It is one of our slower badges, even when the list of users in a channel isn't huge. I think the happy path is 3 requests and the sad path is 4. One reason I'm asking this stuff is that adding another network round-trip (or trips) ahead of a process that is already slow (even for smaller rooms) might be unhelpful if we expect that most servers don't have that feature enabled yet and may not for some time. For some users this will make things faster, but for some it would make it slower.

I'm also scoping out the future follow up tasks that might be required if we rely on an unstable endpoint. This isn't unreasonable in my view, particularly given none of the maintainer use matrix or actively follow its development.

In terms of an implementation, I feel like the most "set it and forget it" implementation would be:

  • Fire off 2 requests to both _matrix/client/v1/summary and _matrix/client/unstable/im.nheko.summary/summary in parallel (as opposed to in sequence - to minimise the amount of extra wait time we're adding to the front of this).
  • If _matrix/client/v1/summary gives us a 200, use that in preference to any response from _matrix/client/unstable/im.nheko.summary/summary.
  • If they both fail, fall back to the existing code.
  • If we were to get a 200 back from either endpoint that doesn't have a num_joined_members key, treat it as a validation error and fail rather than falling back and masking the issue. Then at least someone will complain it is broken and we'll update the code. If there is no change to the response format, then it is a non-issue.

That would deal with the eventuality that at various points in future there would be some servers out there exposing the unstable endpoint, some exposing the stable one, and some exposing neither without us having to revisit this a lot. Then at some point way in the far future, we ditch the legacy code and the call to _matrix/client/unstable/im.nheko.summary/summary and document that it only works for servers running some version [tbc] or greater.

That is, however, predicated on the assumption this is on a path to becoming stable and rolling out.

@t3chguy
Copy link

t3chguy commented May 12, 2024

I find this response dismissive.

Not the intent, just trying to help you find a solution which is indeed not ideal for this interim.

As an outsider, I'm looking here at an unstable endpoint and an unmerged PR that has been open for 3 years.

The slowness of that MSC being accepted is criminal. It is critical for first-party services like matrix.to which currently have to do the exact same dance you do to get at similar data.

Your implementation plan sounds sane. I have not yet had the chance to dig through your code in depth but if you have any way to cache state between calls then you could cache which APIs are supported keyed by Matrix server name and minimise parallel/fallback work that way. Though given you are having to register a new guest/account for each badge it does seem as though that is not an option here or you might have already been caching guest credentials to save time.

You could also choose to try hit only matrix.org's summary API before falling back to registering on the specified server if that API doesn't produce data, rather than trying both APIs on the target server. matrix.org is in very many public rooms and it has that API enabled right now so could be a good stopgap until the MSC merges and the API is more widely supported.

@chris48s
Copy link
Member

Keeping a cache of which matrix servers support which endpoints is a good idea 馃憤

We don't currently cache credentials between requests either, but that would also take one or two round-trips off in many cases when using the existing approach.

@t3chguy
Copy link

t3chguy commented May 12, 2024

but that would also take one or two round-trips off in many cases when using the existing approach.

And also being a nicer citizen to the servers which you are using, not bloating their DB and whatnot :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Support questions, usage questions, unconfirmed bugs, discussions, ideas
Projects
None yet
Development

No branches or pull requests

3 participants