Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] minSpeaker / maxSpeaker setting equal not possible #31401

Open
SebastianBodza opened this issue Nov 4, 2024 · 1 comment
Open

[BUG] minSpeaker / maxSpeaker setting equal not possible #31401

SebastianBodza opened this issue Nov 4, 2024 · 1 comment
Assignees
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. customer-reported Issues that are reported by GitHub users external to the Azure organization. data-plane question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@SebastianBodza
Copy link

API Spec link

https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/Speech/SpeechToText/preview/2024-05-15-preview/speechtotext.json

API Spec version

2024-05-15-preview

Describe the bug

When setting the minSpeakers and maxSpeakers to the same value the api returns:
The value of min speakers should be less than the value of max speakers.

The description of the min and maxSpeakers however do not limit this and allow for equal values.
minCount: A hint for the minimum number of speakers for diarization. Must be smaller than or equal to the maxSpeakers property.
maxCount: The maximum number of speakers for diarization. Must be less than 36 and larger than or equal to the minSpeakers

Ideally the API would allow to specify exactly the amount of speakers -> allow for minSpeakers == maxSpeakers. Otherwise the description should be reworked.

Expected behavior

Allow the setting of minSpeaker = maxSpeaker to have a fixed speaker count.

Actual behavior

Throws an 400 error with The value of min speakers should be less than the value of max speakers.

Reproduction Steps

AUDIO = "<AUDIO_FILE>.mp3"
REGION = "westeurope"
url = f"https://{REGION}.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe"
params = {"api-version": "2024-05-15-preview"}

headers = {
    "Accept": "application/json",
    "Ocp-Apim-Subscription-Key": <SPEECH_API_KEY>
}

definition = {
    "locales": ["de-DE"],
    "diarizationSettings": {
        "minSpeakers": 2,
        "maxSpeakers": 2
    },
    "profanityFilterMode": "Masked",
}

files = {
    "audio": ("test.mp3", open(AUDIO, "rb")),
    "definition": (None, json.dumps(definition))
}

response = requests.post(url, params=params, headers=headers, files=files)

Environment

No response

@SebastianBodza SebastianBodza added the bug This issue requires a change to an existing behavior in the product in order to be resolved. label Nov 4, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added question The issue doesn't require a change to the product in order to be resolved. Most issues start as that customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Nov 4, 2024
@v-jiaodi
Copy link
Member

v-jiaodi commented Nov 5, 2024

@bexxx Please help take a look, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue requires a change to an existing behavior in the product in order to be resolved. customer-reported Issues that are reported by GitHub users external to the Azure organization. data-plane question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

3 participants