Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multipart/mixed (and possibly better multipart/* support in general) #3721

Open
handrews opened this issue Apr 19, 2024 · 11 comments
Assignees
Labels
media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Milestone

Comments

@handrews
Copy link
Contributor

handrews commented Apr 19, 2024

We occasionally see folks trying to use OAS with multipart/mixed, which does not support it. But it would not be hard to define a mapping along the lines of multipart/form-data but with more of an array model. See also discussion #2599. I think @jeremyfiel also has experience with this.

[EDIT: See also multipart/x-mixed-replace which AFAICT does not have a usable specification as WHATWG explicitly states in that section that there spec "describes processing rules for web browsers" and relegates everything else to multipart/mixed... I guess non-web-browsers aren't supposed to process it? 😒 ]

@handrews handrews added the media and encoding Issues regarding media type support and how to encode data (outside of query/path params) label Apr 19, 2024
@handrews handrews self-assigned this Apr 21, 2024
@handrews
Copy link
Contributor Author

It's also worth noting that when uploading multiple files for a form field with multipart/form-data, RFC 7578 requires having multiple parts with the same name, which can't be modeled by the current "treat it as a JSON object" approach.

@handrews handrews added this to the v3.2.0 milestone Apr 21, 2024
@jeremyfiel
Copy link

jeremyfiel commented Apr 23, 2024

EDIT: guess I should have read your link.

Another multipart option missing is the ability to send multiple files in a single form field.

It should be a form-data payload with a nested multipart body with individual parts for each file. There is no way to define a nested multipart payload.

This is as far as I can get with it. But no way to define the nested body parts if I were to send multiple file types

requestBody:
  content:
    'multipart/form-data':
      schema:
       name:
         type: string
       file:
         type: string
         format: binary
      encoding:
        name:
          headers:
            content-disposition:
              schema:
                type: string
          contentType: 'application/json'
        file:
          headers:
            content-disposition:
              schema:
                type: string
          contentType: 'multipart/mixed'

@handrews
Copy link
Contributor Author

@jeremyfiel I know that multipart/mixed and similar MIME-derived formats explicitly allow nesting.

I noticed that the most recent multipart/form-data RFC, RFC 7578, deprecates the nesting in favor of multiple parts with the same name. According to the RFC's appendix, the nested approach wasn't known to be implemented but I assume you have an implementation that does it.

@jeremyfiel
Copy link

A big missing piece from the multipart/mixed RFC is the ability to define the body parts in the same way form-data uses content-disposition header.

Because the RFC is so old, and was originally for email messaging, apparently they didn't have a need to identify the body parts and the form-data RFC didn't address the /mixed RFC when updates were made for the new header.

if we could depend on this same header, I think it would help quite a lot to define mixed bodies with OAS, with the same syntax as form-data and x-www-urlencoded bodies

@jeremyfiel
Copy link

@jeremyfiel I know that multipart/mixed and similar MIME-derived formats explicitly allow nesting.

I noticed that the most recent multipart/form-data RFC, RFC 7578, deprecates the nesting in favor of multiple parts with the same name. According to the RFC's appendix, the nested approach wasn't known to be implemented but I assume you have an implementation that does it.

I'm not entirely sure we have a nested implementation, but we do heavily use /mixed payloads because we like having JSON metadata along side a file representation.

@handrews
Copy link
Contributor Author

handrews commented May 19, 2024

So it turns out that, in theory, support for multipart/mixed was added for 3.0:

The question remains, how is it supposed to work? Somewhere during the extensive GitHub archaeology involved in digging this up, I thought I found an example of someone using a name parameter to Content-Disposition with multipart/mixed.

Unfortunately, name is not defined by RFC 2183, the most recent RFC defining that header in a multipart context (there is a later RFC that only applies to it with respect to top-level HTTP message bodies). However, the set of parameters is extensible, and multipart/form-data added name.

But I'm guessing that even if you can use name as a per-part Content-Disposition header with multipart/mixed, tools, including any legacy code involving in generating or parsing such messages, is unlikely to support it. So I'm a bit baffled as to how this is supposed to work.

I still need to take another pass at reading how array instances work with the Encoding Object as their might be something there, at least if all of the parts are of the same type (which kind of goes against the whole mixed part).

🤔

@jeremyfiel
Copy link

related #3827 (reply in thread)

@karenetheridge
Copy link
Contributor

It's also worth noting that when uploading multiple files for a form field with multipart/form-data, RFC 7578 requires having multiple parts with the same name, which can't be modeled by the current "treat it as a JSON object" approach.

There is another thing that the RFC specifies that is not currently supported by the "deserialize into a JSON object" mechanism: the ordering of the parts must be preserved. I think that, combined with duplicate names being allowed, means that we need to pivot to deserializing message parts (of any multipart/* type) as array items, rather than as object values. This is of course a breaking change.

@handrews
Copy link
Contributor Author

handrews commented May 21, 2024

@karenetheridge the multiple files with the same name is handled with a property that is itself an array (see example at the end of the linked section). It's on my TODO list to figure out if an array property using prefixItems would be sufficient to handle ordered+unnamed multipart media types, including multipart/mixed in 3.1.

For multipart/form-data ordering is not always present, but we do not have a way to set it. I suspect prefixItems on a single array property will not suffice there, and I've been thinking of writing a proposal for an ordering Encoding Object field in 3.2 rather than switching over to an array entirely.

My current thought is that it would define relative numeric order (so people don't have to worry about keeping a perfect sequential set of numbers), ordering 0, 1, etc. from the beginning, -1, -2, etc. from the end, and putting any fields without an ordering in the middle (in implementation-defined ordering).

This would also be important for supporting #1502 as query strings can also have duplicate names and are considered to be ordered (although whether the order of Parameter Objects with in: query is considered to be significant has never been clear to me).

@handrews
Copy link
Contributor Author

handrews commented May 22, 2024

It looks like prefixItems won't help with multipart/mixed because the Encoding Object applies to every item in the array, which works fine with the single-schema items syntax that was the only one allowed in 3.0, but does not work with prefixItems. At least not unless they all have the same encoding, which seems to be too weird of a subset of cases to highlight as an option.

Also, there is a Content-Disposition registry, and unfortunately it makes it clear that you cannot use the name parameter with dispositions other than form-data (note that multipart/mixed uses the attachment disposition).

I really have no idea how the changes for 3.0 were supposed to support multipart/mixed.

@handrews
Copy link
Contributor Author

handrews commented May 22, 2024

@jeremyfiel @karenetheridge OK I've been thinking on this more, and I realized that my statement that multipart/mixed uses the attachment dispositions is incorrect. The multipart/mixed RFC is §5.1.3 of RFC2046 which predates the Content-Disposition header. However, another section of RFC2046 notes that Content-Disposition is the expected solution to some other obsolete thing from an even older RFC.

So... I don't see anything that says you can't use Content-Disposition: form-data; name=foo with multipart/mixed. In fact, it would make sense if you are basically treating the mixed data as a (logical) form submission.

We can also look at the definition of multipart/related in RFC2387, which has a section dedicated to Content-Disposition, including a clause about user agents that do not understand related falling back to mixed and using the Content-Disposition to help process the parts. So clearly it is valid to use Content-Disposition in multipart/mixed.

Now, how many tools out there will support such a thing? No clue. But it is the only way I can think of that PR #878 could have resulted in issue #303 being closed: the whole Encoding Object depends on a name to correlate with the schema property name, and using Content-Type: multipart/mixed with Content-Disposition: form-data;name=foo is the only way I can think of to get that to work.

We might as well document that in 3.0.4 (because 3.0.3 actually has a multipart/mixed example without explaining it!)

(But we should still also be able to support ordered multipart media types without relying on splicing together ideas from multiple RFCs that were probably not intended to work that way)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Projects
None yet
Development

No branches or pull requests

3 participants