force use utf-8 open README.md #76

Aqaao · 2022-12-22T11:21:26Z

Otherwise encounter error

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\Aqaao\AppData\Local\Temp\pip-req-build-2dcr43hl\setup.py", line 8, in <module>
        README = fh.read()
    UnicodeDecodeError: 'gbk' codec can't decode byte 0x90 in position 757: illegal multibyte sequence

Aqaao · 2022-12-22T11:47:34Z

and here, "non-utf-8" codec error

raceback (most recent call last):
  File "autosub/main.py", line 170, in <module>
    main()
  File "autosub/main.py", line 161, in main
    ds_process_audio(ds, audio_segment_path, output_file_handle_dict, split_duration=args.split_duration)
  File "autosub/main.py", line 69, in ds_process_audio
    write_to_file(output_file_handle_dict, split_inferred_text, line_count, split_limits, cues)
  File "C:\env\python-venv\deepspeech\lib\site-packages\autosub\writeToFile.py", line 43, in write_to_file
    file_handle.write(inferred_text + "\n\n")
UnicodeEncodeError: 'gbk' codec can't encode character '\udce9' in position 0: illegal multibyte sequence

——————————
edit："utf-8" codec error too, idk why.

raceback (most recent call last):
  File "autosub/main.py", line 170, in <module>
    main()
  File "autosub/main.py", line 161, in main
    ds_process_audio(ds, audio_segment_path, output_file_handle_dict, split_duration=args.split_duration)
  File "autosub/main.py", line 69, in ds_process_audio
    write_to_file(output_file_handle_dict, split_inferred_text, line_count, split_limits, cues)
  File "C:\env\python-venv\deepspeech\lib\site-packages\autosub\writeToFile.py", line 43, in write_to_file
    file_handle.write(inferred_text + "\n\n")
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-262: surrogates not allowed

abhirooptalasila · 2022-12-23T10:03:06Z

Weird. Which language is your audio in?

Aqaao · 2022-12-23T10:35:19Z

Weird. Which language is your audio in?

mandarin, I found many people have the same problem in python.
mozilla/DeepSpeech#3557
but i didn't find a solution

abhirooptalasila · 2022-12-23T10:51:08Z

Aah yes. You'll need to add .decode('utf-8', 'ignore') and .encode(...) while writing to file/saving

Aqaao · 2022-12-23T11:38:23Z

Aah yes. You'll need to add .decode('utf-8', 'ignore') and .encode(...) while writing to file/saving

thk, it worked.

AutoSub/autosub/writeToFile.py

Line 43 in 5dc2314

file_handle.write(inferred_text + "\n\n")

file_handle.write(inferred_text.decode('utf-8', 'ignore').encode('utf-8') + "\n\n")

AutoSub/autosub/main.py

Line 140 in 5dc2314

output_file_handle_dict[format] = open(output_filename, "w")

output_file_handle_dict[format] = open(output_filename, "w", encoding='utf-8', errors='surrogateescape')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

force use utf-8 open README.md #76

force use utf-8 open README.md #76

Aqaao commented Dec 22, 2022

Aqaao commented Dec 22, 2022 •

edited

Loading

abhirooptalasila commented Dec 23, 2022

Aqaao commented Dec 23, 2022 •

edited

Loading

abhirooptalasila commented Dec 23, 2022

Aqaao commented Dec 23, 2022

force use utf-8 open README.md #76

force use utf-8 open README.md #76

Comments

Aqaao commented Dec 22, 2022

Aqaao commented Dec 22, 2022 • edited Loading

abhirooptalasila commented Dec 23, 2022

Aqaao commented Dec 23, 2022 • edited Loading

abhirooptalasila commented Dec 23, 2022

Aqaao commented Dec 23, 2022

Aqaao commented Dec 22, 2022 •

edited

Loading

Aqaao commented Dec 23, 2022 •

edited

Loading