-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-sdk-translate - translate_document returns text in wrong encoding #2897
Comments
The problem or difference from translate_text is that translate_document API returns a base63 encoded string, which when processed by https://github.com/aws/aws-sdk-ruby/blob/9a4278dbe51fd1a7125973772c021dd02d328226/gems/aws-sdk-core/lib/aws-sdk-core/json/parser.rb#L69C11-L69C52 gets changed to ASCII-8BIT. I don't know if the generic implementation in the AWS core library can assume that all Blob Shape things are UTF-8 or not, so it probably cannot be fixed there. I would prefer to have a method override in the Aws::Translate::Types::TranslatedDocument class that forced the encoding but it also appears that class is auto-generated from the api json definitions so I'm at a loss as to how to fix it. Ideally I think the API definitions should include some specification or assumptions about the the character encodings - maybe it is assumed for string types, but Blobs could conceivably be strings or binary so in addition to content-type, it would be nice if the api response also specified the character encoding. But I am not an expert in this matter. :) |
I think this can possibly be fixed with a plugin/customization in aws-sdk-translate service for specifically this operation and api member. I can look into this on Monday. |
The TranslateDocument API "supports text, HTML, or Word documents as the input document." The output is documented as "The document format matches the source document format." So I think in cases such as a Word doc we would not want to apply an encoding to this string (and instead your application would need to interpret it as binary data). Possibly we could add a custom plugin that looks at the type and encoding of the input document and apply the same encoding on the response (eg, if the input document is a String with utf-8 encoding, then we can ensure the output document is also a String with utf-8). |
Is there a document that explains the high level architecture to the aws-sdk-ruby build? I see code for plugins etc, but it all appears to be auto generated, and I can't find any documentation on how to play within the system... |
We don't have good documentation on how to add plugins. But if you want to add a plugin in your own code, you can do something like: class FixTranslateDocumentEncoding < Seahorse::Client::Plugin
class Handler < Seahorse::Client::Handler
def call(context)
# detect encoding
encoding = "UTF-8" # TODO: actually detect it and ensure it doesn't break for non-string inputs
# call the rest of the stack, this will build the request, sign it, send it and parse the output
resp = @handler.call(context)
# modify the response before returning it upwards in the stack
resp.translated_document.content = resp.translated_document.content.force_encoding(encoding)
resp
end
end
def add_handlers(handlers, _config)
# Handler is early in the call stack
handlers.add(Handler, step: :initialize, operations: [:translate_document])
end
end
# Add the plugin to the client
Aws::Translate::Client.add_plugin(FixTranslateDocumentEncoding) This would apply for all instances of the Translate::Client. |
|
Describe the bug
When calling translate_document, the response contains the bytes that are the correct utf-8 encoding as expected but Ruby 3.0.6 encoding thinks it is ASCII-8BIT. Here's a rails console excerpt provided below.
Expected Behavior
Expect the content to be utf-8 encoded:
Current Behavior
Reproduction Steps
Possible Solution
No response
Additional Information/Context
No response
Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version
aws-sdk-translate
Environment details (Version of Ruby, OS environment)
ruby 3.0.6, OS X 13.4.1
The text was updated successfully, but these errors were encountered: