Skip to content

Commit 8b9f6a8

Browse files
feat(stt, tts): add more models
1 parent d2d6fbf commit 8b9f6a8

File tree

3 files changed

+86
-74
lines changed

3 files changed

+86
-74
lines changed

ibm_watson/speech_to_text_v1.py

Lines changed: 75 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,10 @@
2727
have minimum sampling rates of 16 kHz. Narrowband and telephony models have minimum
2828
sampling rates of 8 kHz. The next-generation models offer high throughput and greater
2929
transcription accuracy.
30-
Effective **15 March 2022**, previous-generation models for all languages other than
31-
Arabic and Japanese are deprecated. The deprecated models remain available until **31 July
32-
2023**, when they will be removed from the service and the documentation. You must migrate
33-
to the equivalent next-generation model by the end of service date. For more information,
34-
see [Migrating to next-generation
30+
Effective **31 July 2023**, all previous-generation models will be removed from the
31+
service and the documentation. Most previous-generation models were deprecated on 15 March
32+
2022. You must migrate to the equivalent next-generation model by 31 July 2023. For more
33+
information, see [Migrating to next-generation
3534
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).{:
3635
deprecated}
3736
For speech recognition, the service supports synchronous and asynchronous HTTP
@@ -278,11 +277,10 @@ def recognize(self,
278277
* `keywords` and `keywords_threshold`
279278
* `processing_metrics` and `processing_metrics_interval`
280279
* `word_alternatives_threshold`
281-
**Important:** Effective **15 March 2022**, previous-generation models for all
282-
languages other than Arabic and Japanese are deprecated. The deprecated models
283-
remain available until **31 July 2023**, when they will be removed from the
284-
service and the documentation. You must migrate to the equivalent next-generation
285-
model by the end of service date. For more information, see [Migrating to
280+
**Important:** Effective **31 July 2023**, all previous-generation models will be
281+
removed from the service and the documentation. Most previous-generation models
282+
were deprecated on 15 March 2022. You must migrate to the equivalent
283+
next-generation model by 31 July 2023. For more information, see [Migrating to
286284
next-generation
287285
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).
288286
**See also:**
@@ -352,14 +350,18 @@ def recognize(self,
352350
to words from the custom language model compared to those from the base
353351
model for the current request.
354352
Specify a value between 0.0 and 1.0. Unless a different customization
355-
weight was specified for the custom model when it was trained, the default
356-
value is 0.3. A customization weight that you specify overrides a weight
357-
that was specified when the custom model was trained.
358-
The default value yields the best performance in general. Assign a higher
359-
value if your audio makes frequent use of OOV words from the custom model.
360-
Use caution when setting the weight: a higher value can improve the
361-
accuracy of phrases from the custom model's domain, but it can negatively
362-
affect performance on non-domain phrases.
353+
weight was specified for the custom model when the model was trained, the
354+
default value is:
355+
* 0.3 for previous-generation models
356+
* 0.2 for most next-generation models
357+
* 0.1 for next-generation English and Japanese models
358+
A customization weight that you specify overrides a weight that was
359+
specified when the custom model was trained. The default value yields the
360+
best performance in general. Assign a higher value if your audio makes
361+
frequent use of OOV words from the custom model. Use caution when setting
362+
the weight: a higher value can improve the accuracy of phrases from the
363+
custom model's domain, but it can negatively affect performance on
364+
non-domain phrases.
363365
See [Using customization
364366
weight](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).
365367
:param int inactivity_timeout: (optional) The time in seconds after which,
@@ -466,12 +468,12 @@ def recognize(self,
466468
default, the service returns no audio metrics.
467469
See [Audio
468470
metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#audio-metrics).
469-
:param float end_of_phrase_silence_time: (optional) If `true`, specifies
470-
the duration of the pause interval at which the service splits a transcript
471-
into multiple final results. If the service detects pauses or extended
472-
silence before it reaches the end of the audio stream, its response can
473-
include multiple final results. Silence indicates a point at which the
474-
speaker pauses between spoken words or phrases.
471+
:param float end_of_phrase_silence_time: (optional) Specifies the duration
472+
of the pause interval at which the service splits a transcript into
473+
multiple final results. If the service detects pauses or extended silence
474+
before it reaches the end of the audio stream, its response can include
475+
multiple final results. Silence indicates a point at which the speaker
476+
pauses between spoken words or phrases.
475477
Specify a value for the pause interval in the range of 0.0 to 120.0.
476478
* A value greater than 0 specifies the interval that the service is to use
477479
for speech recognition.
@@ -545,13 +547,11 @@ def recognize(self,
545547
* For more information about the `low_latency` parameter, see [Low
546548
latency](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).
547549
:param float character_insertion_bias: (optional) For next-generation
548-
`Multimedia` and `Telephony` models, an indication of whether the service
549-
is biased to recognize shorter or longer strings of characters when
550-
developing transcription hypotheses. By default, the service is optimized
551-
for each individual model to balance its recognition of strings of
552-
different lengths. The model-specific bias is equivalent to 0.0.
553-
The value that you specify represents a change from a model's default bias.
554-
The allowable range of values is -1.0 to 1.0.
550+
models, an indication of whether the service is biased to recognize shorter
551+
or longer strings of characters when developing transcription hypotheses.
552+
By default, the service is optimized to produce the best balance of strings
553+
of different lengths.
554+
The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
555555
* Negative values bias the service to favor hypotheses with shorter strings
556556
of characters.
557557
* Positive values bias the service to favor hypotheses with longer strings
@@ -562,8 +562,7 @@ def recognize(self,
562562
-0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the
563563
transcription results. Then experiment with different values as necessary,
564564
adjusting the value by small increments.
565-
The parameter is not available for previous-generation `Broadband` and
566-
`Narrowband` models.
565+
The parameter is not available for previous-generation models.
567566
See [Character insertion
568567
bias](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#insertion-bias).
569568
:param dict headers: A `dict` containing the request headers
@@ -891,11 +890,10 @@ def create_job(self,
891890
* `keywords` and `keywords_threshold`
892891
* `processing_metrics` and `processing_metrics_interval`
893892
* `word_alternatives_threshold`
894-
**Important:** Effective **15 March 2022**, previous-generation models for all
895-
languages other than Arabic and Japanese are deprecated. The deprecated models
896-
remain available until **31 July 2023**, when they will be removed from the
897-
service and the documentation. You must migrate to the equivalent next-generation
898-
model by the end of service date. For more information, see [Migrating to
893+
**Important:** Effective **31 July 2023**, all previous-generation models will be
894+
removed from the service and the documentation. Most previous-generation models
895+
were deprecated on 15 March 2022. You must migrate to the equivalent
896+
next-generation model by 31 July 2023. For more information, see [Migrating to
899897
next-generation
900898
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).
901899
**See also:**
@@ -987,14 +985,18 @@ def create_job(self,
987985
to words from the custom language model compared to those from the base
988986
model for the current request.
989987
Specify a value between 0.0 and 1.0. Unless a different customization
990-
weight was specified for the custom model when it was trained, the default
991-
value is 0.3. A customization weight that you specify overrides a weight
992-
that was specified when the custom model was trained.
993-
The default value yields the best performance in general. Assign a higher
994-
value if your audio makes frequent use of OOV words from the custom model.
995-
Use caution when setting the weight: a higher value can improve the
996-
accuracy of phrases from the custom model's domain, but it can negatively
997-
affect performance on non-domain phrases.
988+
weight was specified for the custom model when the model was trained, the
989+
default value is:
990+
* 0.3 for previous-generation models
991+
* 0.2 for most next-generation models
992+
* 0.1 for next-generation English and Japanese models
993+
A customization weight that you specify overrides a weight that was
994+
specified when the custom model was trained. The default value yields the
995+
best performance in general. Assign a higher value if your audio makes
996+
frequent use of OOV words from the custom model. Use caution when setting
997+
the weight: a higher value can improve the accuracy of phrases from the
998+
custom model's domain, but it can negatively affect performance on
999+
non-domain phrases.
9981000
See [Using customization
9991001
weight](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).
10001002
:param int inactivity_timeout: (optional) The time in seconds after which,
@@ -1123,12 +1125,12 @@ def create_job(self,
11231125
default, the service returns no audio metrics.
11241126
See [Audio
11251127
metrics](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#audio-metrics).
1126-
:param float end_of_phrase_silence_time: (optional) If `true`, specifies
1127-
the duration of the pause interval at which the service splits a transcript
1128-
into multiple final results. If the service detects pauses or extended
1129-
silence before it reaches the end of the audio stream, its response can
1130-
include multiple final results. Silence indicates a point at which the
1131-
speaker pauses between spoken words or phrases.
1128+
:param float end_of_phrase_silence_time: (optional) Specifies the duration
1129+
of the pause interval at which the service splits a transcript into
1130+
multiple final results. If the service detects pauses or extended silence
1131+
before it reaches the end of the audio stream, its response can include
1132+
multiple final results. Silence indicates a point at which the speaker
1133+
pauses between spoken words or phrases.
11321134
Specify a value for the pause interval in the range of 0.0 to 120.0.
11331135
* A value greater than 0 specifies the interval that the service is to use
11341136
for speech recognition.
@@ -1202,13 +1204,11 @@ def create_job(self,
12021204
* For more information about the `low_latency` parameter, see [Low
12031205
latency](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).
12041206
:param float character_insertion_bias: (optional) For next-generation
1205-
`Multimedia` and `Telephony` models, an indication of whether the service
1206-
is biased to recognize shorter or longer strings of characters when
1207-
developing transcription hypotheses. By default, the service is optimized
1208-
for each individual model to balance its recognition of strings of
1209-
different lengths. The model-specific bias is equivalent to 0.0.
1210-
The value that you specify represents a change from a model's default bias.
1211-
The allowable range of values is -1.0 to 1.0.
1207+
models, an indication of whether the service is biased to recognize shorter
1208+
or longer strings of characters when developing transcription hypotheses.
1209+
By default, the service is optimized to produce the best balance of strings
1210+
of different lengths.
1211+
The default bias is 0.0. The allowable range of values is -1.0 to 1.0.
12121212
* Negative values bias the service to favor hypotheses with shorter strings
12131213
of characters.
12141214
* Positive values bias the service to favor hypotheses with longer strings
@@ -1219,8 +1219,7 @@ def create_job(self,
12191219
-0.1, -0.05, 0.05, or 0.1, and assess how the value impacts the
12201220
transcription results. Then experiment with different values as necessary,
12211221
adjusting the value by small increments.
1222-
The parameter is not available for previous-generation `Broadband` and
1223-
`Narrowband` models.
1222+
The parameter is not available for previous-generation models.
12241223
See [Character insertion
12251224
bias](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#insertion-bias).
12261225
:param dict headers: A `dict` containing the request headers
@@ -1437,11 +1436,10 @@ def create_language_model(self,
14371436
The service returns an error if you attempt to create more than 1024 models. You
14381437
do not lose any models, but you cannot create any more until your model count is
14391438
below the limit.
1440-
**Important:** Effective **15 March 2022**, previous-generation models for all
1441-
languages other than Arabic and Japanese are deprecated. The deprecated models
1442-
remain available until **31 July 2023**, when they will be removed from the
1443-
service and the documentation. You must migrate to the equivalent next-generation
1444-
model by the end of service date. For more information, see [Migrating to
1439+
**Important:** Effective **31 July 2023**, all previous-generation models will be
1440+
removed from the service and the documentation. Most previous-generation models
1441+
were deprecated on 15 March 2022. You must migrate to the equivalent
1442+
next-generation model by 31 July 2023. For more information, see [Migrating to
14451443
next-generation
14461444
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).
14471445
**See also:**
@@ -1738,7 +1736,10 @@ def train_language_model(self,
17381736
weight for the custom language model. The customization weight tells the
17391737
service how much weight to give to words from the custom language model
17401738
compared to those from the base model for speech recognition. Specify a
1741-
value between 0.0 and 1.0; the default is 0.3.
1739+
value between 0.0 and 1.0. The default value is:
1740+
* 0.3 for previous-generation models
1741+
* 0.2 for most next-generation models
1742+
* 0.1 for next-generation English and Japanese models
17421743
The default value yields the best performance in general. Assign a higher
17431744
value if your audio makes frequent use of OOV words from the custom model.
17441745
Use caution when setting the weight: a higher value can improve the
@@ -2950,11 +2951,10 @@ def create_acoustic_model(self,
29502951
below the limit.
29512952
**Note:** Acoustic model customization is supported only for use with
29522953
previous-generation models. It is not supported for next-generation models.
2953-
**Important:** Effective **15 March 2022**, previous-generation models for all
2954-
languages other than Arabic and Japanese are deprecated. The deprecated models
2955-
remain available until **31 July 2023**, when they will be removed from the
2956-
service and the documentation. You must migrate to the equivalent next-generation
2957-
model by the end of service date. For more information, see [Migrating to
2954+
**Important:** Effective **31 July 2023**, all previous-generation models will be
2955+
removed from the service and the documentation. Most previous-generation models
2956+
were deprecated on 15 March 2022. You must migrate to the equivalent
2957+
next-generation model by 31 July 2023. For more information, see [Migrating to
29582958
next-generation
29592959
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).
29602960
**See also:** [Create a custom acoustic
@@ -3907,6 +3907,7 @@ class ModelId(str, Enum):
39073907
JA_JP_BROADBANDMODEL = 'ja-JP_BroadbandModel'
39083908
JA_JP_MULTIMEDIA = 'ja-JP_Multimedia'
39093909
JA_JP_NARROWBANDMODEL = 'ja-JP_NarrowbandModel'
3910+
JA_JP_TELEPHONY = 'ja-JP_Telephony'
39103911
KO_KR_BROADBANDMODEL = 'ko-KR_BroadbandModel'
39113912
KO_KR_MULTIMEDIA = 'ko-KR_Multimedia'
39123913
KO_KR_NARROWBANDMODEL = 'ko-KR_NarrowbandModel'
@@ -4019,6 +4020,7 @@ class Model(str, Enum):
40194020
JA_JP_BROADBANDMODEL = 'ja-JP_BroadbandModel'
40204021
JA_JP_MULTIMEDIA = 'ja-JP_Multimedia'
40214022
JA_JP_NARROWBANDMODEL = 'ja-JP_NarrowbandModel'
4023+
JA_JP_TELEPHONY = 'ja-JP_Telephony'
40224024
KO_KR_BROADBANDMODEL = 'ko-KR_BroadbandModel'
40234025
KO_KR_MULTIMEDIA = 'ko-KR_Multimedia'
40244026
KO_KR_NARROWBANDMODEL = 'ko-KR_NarrowbandModel'
@@ -4131,6 +4133,7 @@ class Model(str, Enum):
41314133
JA_JP_BROADBANDMODEL = 'ja-JP_BroadbandModel'
41324134
JA_JP_MULTIMEDIA = 'ja-JP_Multimedia'
41334135
JA_JP_NARROWBANDMODEL = 'ja-JP_NarrowbandModel'
4136+
JA_JP_TELEPHONY = 'ja-JP_Telephony'
41344137
KO_KR_BROADBANDMODEL = 'ko-KR_BroadbandModel'
41354138
KO_KR_MULTIMEDIA = 'ko-KR_Multimedia'
41364139
KO_KR_NARROWBANDMODEL = 'ko-KR_NarrowbandModel'

ibm_watson/text_to_speech_adapter_v1.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ def synthesize_using_websocket(self,
3131
timings=None,
3232
customization_id=None,
3333
spell_out_mode=None,
34-
rate_percentage= None,
35-
pitch_percentage= None,
34+
rate_percentage=None,
35+
pitch_percentage=None,
3636
http_proxy_host=None,
3737
http_proxy_port=None,
3838
**kwargs):

0 commit comments

Comments
 (0)