fix(webui): Experimental checkbox bugfixes and add visual warning label

- We can't use the original "Show experimental features" checkbox implementation, because it *deeply* breaks Gradio.

- Gradio's `gr.Examples()` API binds itself to the original state of the user interface. Gradio crashes and causes various bugs if we try to change the available UI controls later.

- Instead, we must use `gr.Dataset()` which acts like a custom input/output control and doesn't directly bind itself to the target control. We must also provide a secret, hidden "all mode choices" component so that it knows the names of all "control modes" that are possible in examples.

- We now also have a very visible warning label in the user interface, to clearly mark the experimental features.

- Bugs fixed:

* The code was unable to toggle the visibility of Experimental demos in the Examples list. It was not possible with Examples (since it's a wrapper around Dataset, but Examples contains its own internal state/copy of all data). Instead, we use a Dataset and manipulate its list directly.

* Gradio crashes with a `gradio.exceptions.Error` exception if you try to load an example that tries to use an experimental feature if we have removed its UI element. This is because Examples binds to the original user interface and *remembers* the list of choices, and it *cannot* dynamically select something that did not exist when the `gr.Examples()` was initially created. This problem is fixed by switching to `gr.Dataset()`.

* Furthermore, Gradio's `gr.Examples()` handler actually remembers and caches the list of UI options. So every time we load an example, it rewrites the "Emotion Control Mode" selection menu to only show the options that were available when the Examples table was created. This means that even if we keep the "Show experimental features" checkbox, Gradio itself will erase the experimental mode from the Control Mode selection menu every time the user loads an example. There are no callbacks or "update" functions to allow us to override this automatic Gradio behavior. But by switching to `gr.Dataset()`, we completely avoid this deep binding.

* The "Show experimental features" checkbox is no longer tied to a column in the examples-table, to avoid fighting between Gradio's example table trying to set the mode, and the experimental checkbox being toggled and also trying to set the mode.

* Lastly, the "Show experimental features" checkbox now remembers and restores the user's current mode selection when toggling the checkbox, instead of constantly resetting to the default mode ("same as voice reference"), to make the UI more convenient for users.
This commit is contained in:
Arcitec
2025-09-14 21:04:20 +02:00
parent c5f9a31127
commit ec368de932
4 changed files with 108 additions and 45 deletions

View File

@@ -4,9 +4,9 @@
{"prompt_audio":"voice_04.wav","text":"你就需要我这种专业人士的帮助,就像手无缚鸡之力的人进入雪山狩猎,一定需要最老练的猎人指导。","emo_mode":0}
{"prompt_audio":"voice_05.wav","text":"在真正的日本剑道中,格斗过程极其短暂,常常短至半秒,最长也不超过两秒,利剑相击的转瞬间,已有一方倒在血泊中。但在这电光石火的对决之前,双方都要以一个石雕般凝固的姿势站定,长时间的逼视对方,这一过程可能长达十分钟!","emo_mode":0}
{"prompt_audio":"voice_06.wav","text":"今天呢,咱们开一部新书,叫《赛博朋克二零七七》。这词儿我听着都新鲜。这赛博朋克啊,简单理解就是“高科技,低生活”。这一听,我就明白了,于老师就爱用那高科技的东西,手机都得拿脚纹开,大冬天为了解锁脱得一丝不挂,冻得跟王八蛋似的。","emo_mode":0}
{"prompt_audio":"voice_07.wav","emo_audio":"emo_sad.wav","emo_weight": 0.65, "emo_mode":1,"text":"酒楼丧尽天良,开始借机竞拍房间,哎,一群蠢货。"}
{"prompt_audio":"voice_08.wav","emo_audio":"emo_hate.wav","emo_weight": 0.65, "emo_mode":1,"text":"你看看你,对我还有没有一点父子之间的信任了。"}
{"prompt_audio":"voice_09.wav","emo_vec_3":0.8,"emo_mode":2,"text":"对不起嘛!我的记性真的不太好,但是和你在一起的事情,我都会努力记住的~"}
{"prompt_audio":"voice_10.wav","emo_vec_7":1.0,"emo_mode":2,"text":"哇塞!这个爆率也太高了!欧皇附体了!"}
{"prompt_audio":"voice_07.wav","emo_audio":"emo_sad.wav","emo_weight":0.65,"emo_mode":1,"text":"酒楼丧尽天良,开始借机竞拍房间,哎,一群蠢货。"}
{"prompt_audio":"voice_08.wav","emo_audio":"emo_hate.wav","emo_weight":0.65,"emo_mode":1,"text":"你看看你,对我还有没有一点父子之间的信任了。"}
{"prompt_audio":"voice_09.wav","emo_weight": 0.8,"emo_mode":2,"emo_vec_3":0.8,"text":"对不起嘛!我的记性真的不太好,但是和你在一起的事情,我都会努力记住的~"}
{"prompt_audio":"voice_10.wav","emo_weight": 0.8,"emo_mode":2,"emo_vec_7":1.0,"text":"哇塞!这个爆率也太高了!欧皇附体了!"}
{"prompt_audio":"voice_11.wav","emo_mode":3,"emo_text":"极度悲伤","text":"这些年的时光终究是错付了... "}
{"prompt_audio":"voice_12.wav","emo_mode":3,"emo_text":"You scared me to death! What are you, a ghost?","text":"快躲起来!是他要来了!他要来抓我们了!"}

View File

@@ -42,8 +42,9 @@
"请上传情感参考音频": "Please upload the emotion reference audio",
"当前模型版本": "Current model version: ",
"请输入目标文本": "Please input the text to synthesize",
"例如:委屈巴巴、危险在悄悄逼近": "e.g. deeply sad, danger is creeping closer",
"例如:委屈巴巴、危险在悄悄逼近": "e.g. \"deeply sad\", \"danger is creeping closer\"",
"与音色参考音频相同": "Same as the voice reference",
"情感随机采样": "Randomize emotion sampling",
"显示实验功能": "Show experimental features"
"显示实验功能": "Show experimental features",
"提示:此功能为实验版,结果尚不稳定,我们正在持续优化中。": "Note: This feature is currently experimental and may not produce satisfactory results. We're dedicated to improving its performance in a future release."
}

View File

@@ -39,6 +39,9 @@
"参数会影响音频多样性和生成速度详见": "参数会影响音频多样性和生成速度详见",
"是否进行采样": "是否进行采样",
"生成Token最大数量过小导致音频被截断": "生成Token最大数量过小导致音频被截断",
"例如:委屈巴巴、危险在悄悄逼近": "例如:委屈巴巴、危险在悄悄逼近",
"与音色参考音频相同": "与音色参考音频相同",
"情感随机采样": "情感随机采样",
"显示实验功能": "显示实验功能",
"例如:委屈巴巴、危险在悄悄逼近": "例如:委屈巴巴、危险在悄悄逼近"
"提示:此功能为实验版,结果尚不稳定,我们正在持续优化中。": "提示:此功能为实验版,结果尚不稳定,我们正在持续优化中。"
}

135
webui.py
View File

@@ -1,3 +1,4 @@
import html
import json
import os
import sys
@@ -63,19 +64,18 @@ LANGUAGES = {
"中文": "zh_CN",
"English": "en_US"
}
EMO_CHOICES = [i18n("与音色参考音频相同"),
EMO_CHOICES_ALL = [i18n("与音色参考音频相同"),
i18n("使用情感参考音频"),
i18n("使用情感向量控制"),
i18n("使用情感描述文本控制")]
EMO_CHOICES_BASE = EMO_CHOICES[:3] # 基础选项
EMO_CHOICES_EXPERIMENTAL = EMO_CHOICES # 全部选项(包括文本描述)
EMO_CHOICES_OFFICIAL = EMO_CHOICES_ALL[:-1] # skip experimental features
os.makedirs("outputs/tasks",exist_ok=True)
os.makedirs("prompts",exist_ok=True)
MAX_LENGTH_TO_USE_SPEED = 70
example_cases = []
with open("examples/cases.jsonl", "r", encoding="utf-8") as f:
example_cases = []
for line in f:
line = line.strip()
if not line:
@@ -85,8 +85,9 @@ with open("examples/cases.jsonl", "r", encoding="utf-8") as f:
emo_audio_path = os.path.join("examples",example["emo_audio"])
else:
emo_audio_path = None
example_cases.append([os.path.join("examples", example.get("prompt_audio", "sample_prompt.wav")),
EMO_CHOICES[example.get("emo_mode",0)],
EMO_CHOICES_ALL[example.get("emo_mode",0)],
example.get("text"),
emo_audio_path,
example.get("emo_weight",1.0),
@@ -99,8 +100,14 @@ with open("examples/cases.jsonl", "r", encoding="utf-8") as f:
example.get("emo_vec_6",0),
example.get("emo_vec_7",0),
example.get("emo_vec_8",0),
example.get("emo_text") is not None]
)
])
def get_example_cases(include_experimental = False):
if include_experimental:
return example_cases # show every example
# exclude emotion control mode 3 (emotion from text description)
return [x for x in example_cases if x[1] != EMO_CHOICES_ALL[3]]
def gen_single(emo_control_method,prompt, text,
emo_ref_path, emo_weight,
@@ -159,6 +166,12 @@ def update_prompt_audio():
update_button = gr.update(interactive=True)
return update_button
def create_warning_message(warning_text):
return gr.HTML(f"<div style=\"padding: 0.5em 0.8em; border-radius: 0.5em; background: #ffa87d; color: #000; font-weight: bold\">{html.escape(warning_text)}</div>")
def create_experimental_warning_message():
return create_warning_message(i18n('提示:此功能为实验版,结果尚不稳定,我们正在持续优化中。'))
with gr.Blocks(title="IndexTTS Demo") as demo:
mutex = threading.Lock()
gr.HTML('''
@@ -181,14 +194,24 @@ with gr.Blocks(title="IndexTTS Demo") as demo:
input_text_single = gr.TextArea(label=i18n("文本"),key="input_text_single", placeholder=i18n("请输入目标文本"), info=f"{i18n('当前模型版本')}{tts.model_version or '1.0'}")
gen_button = gr.Button(i18n("生成语音"), key="gen_button",interactive=True)
output_audio = gr.Audio(label=i18n("生成结果"), visible=True,key="output_audio")
experimental_checkbox = gr.Checkbox(label=i18n("显示实验功能"),value=False)
experimental_checkbox = gr.Checkbox(label=i18n("显示实验功能"), value=False)
with gr.Accordion(i18n("功能设置")):
# 情感控制选项部分
with gr.Row():
emo_control_method = gr.Radio(
choices=EMO_CHOICES_BASE,
choices=EMO_CHOICES_OFFICIAL,
type="index",
value=EMO_CHOICES_BASE[0],label=i18n("情感控制方式"))
value=EMO_CHOICES_OFFICIAL[0],label=i18n("情感控制方式"))
# we MUST have an extra, INVISIBLE list of *all* emotion control
# methods so that gr.Dataset() can fetch ALL control mode labels!
# otherwise, the gr.Dataset()'s experimental labels would be empty!
emo_control_method_all = gr.Radio(
choices=EMO_CHOICES_ALL,
type="index",
value=EMO_CHOICES_ALL[0], label=i18n("情感控制方式"),
visible=False) # do not render
# 情感参考音频部分
with gr.Group(visible=False) as emotion_reference_group:
with gr.Row():
@@ -213,13 +236,13 @@ with gr.Blocks(title="IndexTTS Demo") as demo:
vec8 = gr.Slider(label=i18n("平静"), minimum=0.0, maximum=1.0, value=0.0, step=0.05)
with gr.Group(visible=False) as emo_text_group:
create_experimental_warning_message()
with gr.Row():
emo_text = gr.Textbox(label=i18n("情感描述文本"),
placeholder=i18n("请输入情绪描述(或留空以自动使用目标文本作为情绪描述)"),
value="",
info=i18n("例如:委屈巴巴、危险在悄悄逼近"))
with gr.Row(visible=False) as emo_weight_group:
emo_weight = gr.Slider(label=i18n("情感权重"), minimum=0.0, maximum=1.0, value=0.65, step=0.01)
@@ -261,23 +284,55 @@ with gr.Blocks(title="IndexTTS Demo") as demo:
# typical_sampling, typical_mass,
]
if len(example_cases) > 0:
example_table = gr.Examples(
examples=(
example_cases[:-2]
if len(example_cases) > 2
else example_cases
),
examples_per_page=20,
inputs=[prompt_audio,
emo_control_method,
# we must use `gr.Dataset` to support dynamic UI rewrites, since `gr.Examples`
# binds tightly to UI and always restores the initial state of all components,
# such as the list of available choices in emo_control_method.
example_table = gr.Dataset(label="Examples",
samples_per_page=20,
samples=get_example_cases(include_experimental=False),
type="values",
# these components are NOT "connected". it just reads the column labels/available
# states from them, so we MUST link to the "all options" versions of all components,
# such as `emo_control_method_all` (to be able to see EXPERIMENTAL text labels)!
components=[prompt_audio,
emo_control_method_all, # important: support all mode labels!
input_text_single,
emo_upload,
emo_weight,
emo_text,
vec1, vec2, vec3, vec4, vec5, vec6, vec7, vec8,
experimental_checkbox]
)
vec1, vec2, vec3, vec4, vec5, vec6, vec7, vec8]
)
def on_example_click(example):
print(f"Example clicked: ({len(example)} values) = {example!r}")
return (
gr.update(value=example[0]),
gr.update(value=example[1]),
gr.update(value=example[2]),
gr.update(value=example[3]),
gr.update(value=example[4]),
gr.update(value=example[5]),
gr.update(value=example[6]),
gr.update(value=example[7]),
gr.update(value=example[8]),
gr.update(value=example[9]),
gr.update(value=example[10]),
gr.update(value=example[11]),
gr.update(value=example[12]),
gr.update(value=example[13]),
)
# click() event works on both desktop and mobile UI
example_table.click(on_example_click,
inputs=[example_table],
outputs=[prompt_audio,
emo_control_method,
input_text_single,
emo_upload,
emo_weight,
emo_text,
vec1, vec2, vec3, vec4, vec5, vec6, vec7, vec8]
)
def on_input_text_change(text, max_text_tokens_per_segment):
if text and len(text) > 0:
@@ -328,14 +383,6 @@ with gr.Blocks(title="IndexTTS Demo") as demo:
gr.update(visible=False)
)
def on_experimental_change(is_exp):
# 切换情感控制选项
# 第三个返回值实际没有起作用
if is_exp:
return gr.update(choices=EMO_CHOICES_EXPERIMENTAL, value=EMO_CHOICES_EXPERIMENTAL[0]), gr.update(value=example_cases)
else:
return gr.update(choices=EMO_CHOICES_BASE, value=EMO_CHOICES_BASE[0]), gr.update(value=example_cases[:-2])
emo_control_method.change(on_method_change,
inputs=[emo_control_method],
outputs=[emotion_reference_group,
@@ -345,18 +392,30 @@ with gr.Blocks(title="IndexTTS Demo") as demo:
emo_weight_group]
)
def on_experimental_change(is_experimental, current_mode_index):
# 切换情感控制选项
new_choices = EMO_CHOICES_ALL if is_experimental else EMO_CHOICES_OFFICIAL
# if their current mode selection doesn't exist in new choices, reset to 0.
# we don't verify that OLD index means the same in NEW list, since we KNOW it does.
new_index = current_mode_index if current_mode_index < len(new_choices) else 0
return (
gr.update(choices=new_choices, value=new_choices[new_index]),
gr.update(samples=get_example_cases(include_experimental=is_experimental)),
)
experimental_checkbox.change(
on_experimental_change,
inputs=[experimental_checkbox, emo_control_method],
outputs=[emo_control_method, example_table]
)
input_text_single.change(
on_input_text_change,
inputs=[input_text_single, max_text_tokens_per_segment],
outputs=[segments_preview]
)
experimental_checkbox.change(
on_experimental_change,
inputs=[experimental_checkbox],
outputs=[emo_control_method, example_table.dataset] # 高级参数Accordion
)
max_text_tokens_per_segment.change(
on_input_text_change,
inputs=[input_text_single, max_text_tokens_per_segment],