Skip to content

PyWhisperCpp API Reference

pywhispercpp.model

This module contains a simple Python API on-top of the C-style whisper.cpp API.

Segment

Segment(t0, t1, text, probability=np.nan)

A small class representing a transcription segment

Parameters:

  • t0 (int) –

    start time

  • t1 (int) –

    end time

  • text (str) –

    text

  • probability (float, default: nan ) –

    Confidence score for the segment, computed as the geometric mean of the token probabilities for the segment (NaN if not calculated). This makes it interpretable as a probability in [0, 1].

Source code in pywhispercpp/model.py
51
52
53
54
55
56
57
58
59
60
61
62
63
def __init__(self, t0: int, t1: int, text: str, probability: float = np.nan):
    """
    :param t0: start time
    :param t1: end time
    :param text: text
    :param probability: Confidence score for the segment, computed as the geometric mean of
        the token probabilities for the segment (NaN if not calculated).
        This makes it interpretable as a probability in [0, 1].
    """
    self.t0 = t0
    self.t1 = t1
    self.text = text
    self.probability = probability

Model

Model(
    model="tiny",
    models_dir=None,
    params_sampling_strategy=0,
    redirect_whispercpp_logs_to=False,
    use_openvino=False,
    openvino_model_path=None,
    openvino_device="CPU",
    openvino_cache_dir=None,
    context_params=None,
    **params
)

This classes defines a Whisper.cpp model.

Example usage.

model = Model('base.en', n_threads=6)
segments = model.transcribe('file.mp3')
for segment in segments:
    print(segment.text)

Parameters:

  • model (str, default: 'tiny' ) –

    model name, default tiny, or a direct path to a ggml model file.

  • models_dir (Optional[str], default: None ) –

    directory containing model files; if omitted, uses MODELS_DIR unless model is already a direct file path.

  • params_sampling_strategy (int, default: 0 ) –

    sampling strategy selector; 0 uses greedy decoding and any other value uses beam search.

  • redirect_whispercpp_logs_to (Union[bool, TextIO, str, None], default: False ) –

    log redirection target. Use False for no redirection, None for /dev/null, a file path string, or sys.stdout/sys.stderr.

  • use_openvino (bool, default: False ) –

    whether to initialize the OpenVINO encoder backend.

  • openvino_model_path (Optional[str], default: None ) –

    path to the OpenVINO model directory or files.

  • openvino_device (str, default: 'CPU' ) –

    OpenVINO device name, default CPU.

  • openvino_cache_dir (Optional[str], default: None ) –

    OpenVINO cache directory.

  • context_params (Optional[ContextParams], default: None ) –

    optional whisper context loader params. Accepted keys are use_gpu, flash_attn, gpu_device, dtw_token_timestamps, dtw_aheads_preset, dtw_n_top, and dtw_mem_size. Omitted keys inherit from whisper_context_default_params().

  • params

    keyword-only decode parameters matching the public API documented in model.pyi. These values are forwarded to whisper_full_params and remain active for future calls. Supported keys: - n_threads: number of inference threads. Default is min(4, hardware_concurrency()). - n_max_text_ctx: max prompt-text tokens carried into the decoder. Default 16384. - offset_ms: audio start offset in milliseconds. Default 0. - duration_ms: audio duration to process in milliseconds. Default 0. - translate: translate output to English. Default False. - no_context: disable reuse of past transcription context. Default True. - no_timestamps: disable timestamp generation. Default False. - single_segment: force a single output segment. Default False. - print_special: print special tokens. Default False. - print_progress: print progress information. Default True. - print_realtime: print realtime output from whisper.cpp. Default False. - print_timestamps: print timestamps during realtime output. Default True. - token_timestamps: enable token-level timestamps. Default False. - thold_pt: token timestamp probability threshold. Default 0.01. - thold_ptsum: token timestamp sum threshold. Default 0.01. - max_len: max segment length in characters. Default 0. - split_on_word: split on words when max_len is used. Default False. - max_tokens: max tokens per segment. Default 0. - debug_mode: enable whisper.cpp debug mode. Default False. - audio_ctx: override audio context size. Default 0. - tdrz_enable: enable tinydiarize speaker-turn detection. Default False. - initial_prompt: initial text prompt prepended before decoding. Default None. - prompt_tokens: explicit prompt token sequence. Default None. - prompt_n_tokens: number of prompt tokens. Default 0. - carry_initial_prompt: prepend the initial prompt to each decode window. Default False. - language: language code. Default `. -detect_language: enable automatic language detection during transcription. DefaultFalse. -suppress_blank: suppress blank outputs. DefaultTrue. -suppress_non_speech_tokens: Python alias forsuppress_nst. DefaultFalse. -suppress_nst: suppress non-speech tokens. DefaultFalse. -suppress_regex: regex pattern used to suppress matching text during decoding. Default''. -temperature: initial decoding temperature. Default0.0. -max_initial_ts: maximum initial timestamp. Default1.0. -length_penalty: length penalty. Default-1.0. -temperature_inc: fallback temperature increment. Default0.2. -entropy_thold: entropy threshold. Default2.4. -logprob_thold: logprob threshold. Default-1.0. -no_speech_thold: no-speech threshold. Default0.6. -greedy: greedy-decoder settings, typically. -beam_search: beam-search settings. Default. -vad: enable VAD. DefaultFalse. -vad_model_path: path to the VAD model. DefaultNone`.

Source code in pywhispercpp/model.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
def __init__(self,
             model: str = 'tiny',
             models_dir: Optional[str] = None,
             params_sampling_strategy: int = 0,
             redirect_whispercpp_logs_to: Union[bool, TextIO, str, None] = False,
             use_openvino: bool = False,
             openvino_model_path: Optional[str] = None,
             openvino_device: str = 'CPU',
             openvino_cache_dir: Optional[str] = None,
             context_params: Optional[ContextParams] = None,
             **params):
    """
    :param model: model name, default `tiny`, or a direct path to a ggml model file.
    :param models_dir: directory containing model files; if omitted, uses `MODELS_DIR` unless `model`
                       is already a direct file path.
    :param params_sampling_strategy: sampling strategy selector; `0` uses greedy decoding and any
                                     other value uses beam search.
    :param redirect_whispercpp_logs_to: log redirection target. Use `False` for no redirection, `None`
                                        for `/dev/null`, a file path string, or `sys.stdout`/`sys.stderr`.
    :param use_openvino: whether to initialize the OpenVINO encoder backend.
    :param openvino_model_path: path to the OpenVINO model directory or files.
    :param openvino_device: OpenVINO device name, default `CPU`.
    :param openvino_cache_dir: OpenVINO cache directory.
    :param context_params: optional whisper context loader params. Accepted keys are `use_gpu`,
                           `flash_attn`, `gpu_device`, `dtw_token_timestamps`,
                           `dtw_aheads_preset`, `dtw_n_top`, and `dtw_mem_size`. Omitted keys inherit
                           from `whisper_context_default_params()`.
    :param params: keyword-only decode parameters matching the public API documented in `model.pyi`.
        These values are forwarded to `whisper_full_params` and remain active for future calls.
        Supported keys:
        - `n_threads`: number of inference threads. Default is `min(4, hardware_concurrency())`.
        - `n_max_text_ctx`: max prompt-text tokens carried into the decoder. Default `16384`.
        - `offset_ms`: audio start offset in milliseconds. Default `0`.
        - `duration_ms`: audio duration to process in milliseconds. Default `0`.
        - `translate`: translate output to English. Default `False`.
        - `no_context`: disable reuse of past transcription context. Default `True`.
        - `no_timestamps`: disable timestamp generation. Default `False`.
        - `single_segment`: force a single output segment. Default `False`.
        - `print_special`: print special tokens. Default `False`.
        - `print_progress`: print progress information. Default `True`.
        - `print_realtime`: print realtime output from whisper.cpp. Default `False`.
        - `print_timestamps`: print timestamps during realtime output. Default `True`.
        - `token_timestamps`: enable token-level timestamps. Default `False`.
        - `thold_pt`: token timestamp probability threshold. Default `0.01`.
        - `thold_ptsum`: token timestamp sum threshold. Default `0.01`.
        - `max_len`: max segment length in characters. Default `0`.
        - `split_on_word`: split on words when `max_len` is used. Default `False`.
        - `max_tokens`: max tokens per segment. Default `0`.
        - `debug_mode`: enable whisper.cpp debug mode. Default `False`.
        - `audio_ctx`: override audio context size. Default `0`.
        - `tdrz_enable`: enable tinydiarize speaker-turn detection. Default `False`.
        - `initial_prompt`: initial text prompt prepended before decoding. Default `None`.
        - `prompt_tokens`: explicit prompt token sequence. Default `None`.
        - `prompt_n_tokens`: number of prompt tokens. Default `0`.
        - `carry_initial_prompt`: prepend the initial prompt to each decode window. Default `False`.
        - `language`: language code. Default ``.
        - `detect_language`: enable automatic language detection during transcription. Default `False`.
        - `suppress_blank`: suppress blank outputs. Default `True`.
        - `suppress_non_speech_tokens`: Python alias for `suppress_nst`. Default `False`.
        - `suppress_nst`: suppress non-speech tokens. Default `False`.
        - `suppress_regex`: regex pattern used to suppress matching text during decoding. Default `''`.
        - `temperature`: initial decoding temperature. Default `0.0`.
        - `max_initial_ts`: maximum initial timestamp. Default `1.0`.
        - `length_penalty`: length penalty. Default `-1.0`.
        - `temperature_inc`: fallback temperature increment. Default `0.2`.
        - `entropy_thold`: entropy threshold. Default `2.4`.
        - `logprob_thold`: logprob threshold. Default `-1.0`.
        - `no_speech_thold`: no-speech threshold. Default `0.6`.
        - `greedy`: greedy-decoder settings, typically `{"best_of": 5}`.
        - `beam_search`: beam-search settings. Default `{"beam_size": -1, "patience": -1.0}`.
        - `vad`: enable VAD. Default `False`.
        - `vad_model_path`: path to the VAD model. Default `None`.
    """
    self.model_path = utils.resolve_model_path(model, models_dir)
    self._ctx = None
    self._context_params = self._resolve_context_params(context_params)
    self._sampling_strategy = pw.whisper_sampling_strategy.WHISPER_SAMPLING_GREEDY if params_sampling_strategy == 0 else \
        pw.whisper_sampling_strategy.WHISPER_SAMPLING_BEAM_SEARCH
    self._params = pw.whisper_full_default_params(self._sampling_strategy)
    # assign params
    self.params = params
    self._set_params(params)
    self.redirect_whispercpp_logs_to = redirect_whispercpp_logs_to
    self.use_openvino = use_openvino
    self.openvino_model_path = openvino_model_path
    self.openvino_device = openvino_device
    self.openvino_cache_dir = openvino_cache_dir
    # todo... maybe setup default callbacks for segments and abort globaly and/or per model instance?
    self._new_segment_callback = None
    # init the model
    self._init_model()

transcribe

transcribe(
    media,
    n_processors=None,
    new_segment_callback=None,
    abort_callback=None,
    extract_probability=False,
    **params
)

Transcribes the media provided as input and returns list of Segment objects. Accepts a media_file path (audio/video) or a raw numpy array.

Parameters:

  • media (AudioInput) –

    Media file path or a numpy array

  • n_processors (Optional[int], default: None ) –

    number of worker processes for whisper_full_parallel. If omitted, runs a single-process whisper_full() decode.

  • new_segment_callback (Optional[Callable[[Segment], None]], default: None ) –

    callback invoked for each newly produced Segment during decoding.

  • abort_callback (Optional[Callable[[], bool]], default: None ) –

    callback function returning True to abort an in-flight transcription early.

  • extract_probability (bool, default: False ) –

    If True, calculates the geometric mean of token probabilities for each segment, providing a confidence score interpretable as a probability in [0, 1].

  • params

    additional keyword-only decode parameters matching the public API documented in model.pyi, with the same supported keys and defaults as Model.__init__. Any overrides applied here remain active for future calls.

Returns:

  • List[Segment]

    List of transcription segments

Source code in pywhispercpp/model.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
def transcribe(self,
               media: Union[str, np.ndarray],
               n_processors: Optional[int] = None,
               new_segment_callback: Optional[Callable[[Segment], None]] = None,
               abort_callback: Optional[Callable[[], bool]] = None,
               extract_probability: bool = False,
               **params) -> List[Segment]:
    """
    Transcribes the media provided as input and returns list of `Segment` objects.
    Accepts a media_file path (audio/video) or a raw numpy array.

    :param media: Media file path or a numpy array
    :param n_processors: number of worker processes for `whisper_full_parallel`. If omitted, runs a
                 single-process `whisper_full()` decode.
    :param new_segment_callback: callback invoked for each newly produced `Segment` during decoding.
    :param abort_callback: callback function returning True to abort an in-flight transcription early.
    :param extract_probability: If True, calculates the geometric mean of token probabilities for each segment,
        providing a confidence score interpretable as a probability in [0, 1].
    :param params: additional keyword-only decode parameters matching the public API documented in
        `model.pyi`, with the same supported keys and defaults as `Model.__init__`.
        Any overrides applied here remain active for future calls.
    :return: List of transcription segments
    """
    if isinstance(media, np.ndarray):
        audio = media
    else:
        if not Path(media).exists():
            raise FileNotFoundError(media)
        audio = self._load_audio(media)

    # update params if any
    self._set_params(params)

    # setting up callback. make sure self._new_segment_callback = None when new_segment_callback = None.
    # since this is no lonmger bound to the Model but on self 
    self._new_segment_callback = new_segment_callback
    pw.assign_new_segment_callback(
        self._params,
        self.__call_new_segment_callback if new_segment_callback is not None else None,
    )

    pw.assign_abort_callback(self._params, abort_callback)

    # run inference
    start_time = time()
    logger.info("Transcribing ...")
    res = self._transcribe(audio, n_processors=n_processors, extract_probability=extract_probability)
    end_time = time()
    logger.info(f"Inference time: {end_time - start_time:.3f} s")
    return res

get_params

get_params()

Returns a dict representation of the actual params

Returns:

  • Dict[str, Any]

    params dict

Source code in pywhispercpp/model.py
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
def get_params(self) -> dict:
    """
    Returns a `dict` representation of the actual params

    :return: params dict
    """
    res = {}
    for param in dir(self._params):
        if param.startswith('__'):
            continue
        try:
            res[param] = getattr(self._params, param)
        except Exception:
            # ignore callback functions
            continue
    return res

get_params_schema staticmethod

get_params_schema()

A simple link to ::: constants.PARAMS_SCHEMA

Returns:

  • Dict[str, Dict[str, Any]]

    dict of params schema

Source code in pywhispercpp/model.py
286
287
288
289
290
291
292
@staticmethod
def get_params_schema() -> dict:
    """
    A simple link to ::: constants.PARAMS_SCHEMA
    :return: dict of params schema
    """
    return constants.PARAMS_SCHEMA

lang_max_id staticmethod

lang_max_id()

Largest language id (i.e. number of available languages - 1) Direct binding to whisper.cpp/lang_max_id

Returns:

  • int
Source code in pywhispercpp/model.py
294
295
296
297
298
299
300
301
@staticmethod
def lang_max_id() -> int:
    """
    Largest language id (i.e. number of available languages - 1)
    Direct binding to whisper.cpp/lang_max_id
    :return:
    """
    return pw.whisper_lang_max_id()

print_timings

print_timings()

Direct binding to whisper.cpp/whisper_print_timings

Returns:

  • None

    None

Source code in pywhispercpp/model.py
303
304
305
306
307
308
309
def print_timings(self) -> None:
    """
    Direct binding to whisper.cpp/whisper_print_timings

    :return: None
    """
    pw.whisper_print_timings(self._ctx)

system_info staticmethod

system_info()

Direct binding to whisper.cpp/whisper_print_system_info

Returns:

  • Any

    None

Source code in pywhispercpp/model.py
311
312
313
314
315
316
317
318
@staticmethod
def system_info() -> None:
    """
    Direct binding to whisper.cpp/whisper_print_system_info

    :return: None
    """
    return pw.whisper_print_system_info()

available_languages staticmethod

available_languages()

Returns a list of supported language codes

Returns:

  • List[str]

    list of supported language codes

Source code in pywhispercpp/model.py
320
321
322
323
324
325
326
327
328
329
330
331
@staticmethod
def available_languages() -> List[str]:
    """
    Returns a list of supported language codes

    :return: list of supported language codes
    """
    n = pw.whisper_lang_max_id()
    res = []
    for i in range(n+1):
        res.append(pw.whisper_lang_str(i))
    return res

auto_detect_language

auto_detect_language(media, offset_ms=None, n_threads=None)

Automatic language detection using whisper.cpp/whisper_pcm_to_mel and whisper.cpp/whisper_lang_auto_detect

Parameters:

  • media (AudioInput) –

    Media file path or a numpy array

  • offset_ms (Optional[int], default: None ) –

    offset in milliseconds; when omitted, uses the model's current offset_ms

  • n_threads (Optional[int], default: None ) –

    number of threads to use; when omitted, uses the model's current n_threads

Returns:

  • Tuple[Tuple[str, float32], Dict[str, float32]]

    ((detected_language, probability), probabilities for all languages)

Source code in pywhispercpp/model.py
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
def auto_detect_language(self, media: Union[str, np.ndarray], offset_ms: Optional[int] = None, n_threads: Optional[int] = None) -> Tuple[Tuple[str, np.float32], Dict[str, np.float32]]:
    """
    Automatic language detection using whisper.cpp/whisper_pcm_to_mel and whisper.cpp/whisper_lang_auto_detect

    :param media: Media file path or a numpy array
    :param offset_ms: offset in milliseconds; when omitted, uses the model's current `offset_ms`
    :param n_threads: number of threads to use; when omitted, uses the model's current `n_threads`
    :return: ((detected_language, probability), probabilities for all languages)
    """
    if isinstance(media, np.ndarray):
        audio = media
    else:
        if not Path(media).exists():
            raise FileNotFoundError(media)
        audio = self._load_audio(media)

    if offset_ms is None:
        offset_ms = self._params.offset_ms

    if n_threads is None:
        n_threads = self._params.n_threads

    pw.whisper_pcm_to_mel(self._ctx, audio, len(audio), n_threads)
    lang_count = self.lang_max_id() + 1
    probs = np.zeros(lang_count, dtype=np.float32)
    auto_detect = pw.whisper_lang_auto_detect(self._ctx, offset_ms, n_threads, probs)
    langs = self.available_languages()
    lang_probs = {langs[i]: probs[i] for i in range(lang_count)}
    return (langs[auto_detect], np.float32(probs[auto_detect])), lang_probs

pywhispercpp.constants

Constants

WHISPER_SAMPLE_RATE module-attribute

WHISPER_SAMPLE_RATE = WHISPER_SAMPLE_RATE

MODELS_BASE_URL module-attribute

MODELS_BASE_URL = (
    "https://huggingface.co/ggerganov/whisper.cpp"
)

MODELS_PREFIX_URL module-attribute

MODELS_PREFIX_URL = 'resolve/main/ggml'

PACKAGE_NAME module-attribute

PACKAGE_NAME = 'pywhispercpp'

MODELS_DIR module-attribute

MODELS_DIR = Path(user_data_dir(PACKAGE_NAME)) / 'models'

AVAILABLE_MODELS module-attribute

AVAILABLE_MODELS = [
    "base",
    "base-q5_1",
    "base-q8_0",
    "base.en",
    "base.en-q5_1",
    "base.en-q8_0",
    "large-v1",
    "large-v2",
    "large-v2-q5_0",
    "large-v2-q8_0",
    "large-v3",
    "large-v3-q5_0",
    "large-v3-turbo",
    "large-v3-turbo-q5_0",
    "large-v3-turbo-q8_0",
    "medium",
    "medium-q5_0",
    "medium-q8_0",
    "medium.en",
    "medium.en-q5_0",
    "medium.en-q8_0",
    "small",
    "small-q5_1",
    "small-q8_0",
    "small.en",
    "small.en-q5_1",
    "small.en-q8_0",
    "tiny",
    "tiny-q5_1",
    "tiny-q8_0",
    "tiny.en",
    "tiny.en-q5_1",
    "tiny.en-q8_0",
]

PARAMS_SCHEMA module-attribute

PARAMS_SCHEMA = {
    "n_threads": {
        "type": int,
        "description": "Number of threads to allocate for the inferencedefault to min(4, available hardware_concurrency)",
        "options": None,
        "default": None,
    },
    "n_max_text_ctx": {
        "type": int,
        "description": "max tokens to use from past text as prompt for the decoder",
        "options": None,
        "default": 16384,
    },
    "offset_ms": {
        "type": int,
        "description": "start offset in ms",
        "options": None,
        "default": 0,
    },
    "duration_ms": {
        "type": int,
        "description": "audio duration to process in ms",
        "options": None,
        "default": 0,
    },
    "translate": {
        "type": bool,
        "description": "whether to translate the audio to English",
        "options": None,
        "default": False,
    },
    "no_context": {
        "type": bool,
        "description": "do not use past transcription (if any) as initial prompt for the decoder",
        "options": None,
        "default": True,
    },
    "no_timestamps": {
        "type": bool,
        "description": "do not generate timestamps",
        "options": None,
        "default": False,
    },
    "single_segment": {
        "type": bool,
        "description": "force single segment output (useful for streaming)",
        "options": None,
        "default": False,
    },
    "print_special": {
        "type": bool,
        "description": "print special tokens (e.g. <SOT>, <EOT>, <BEG>, etc.)",
        "options": None,
        "default": False,
    },
    "print_progress": {
        "type": bool,
        "description": "print progress information",
        "options": None,
        "default": True,
    },
    "print_realtime": {
        "type": bool,
        "description": "print results from within whisper.cpp (avoid it, use callback instead)",
        "options": None,
        "default": False,
    },
    "print_timestamps": {
        "type": bool,
        "description": "print timestamps for each text segment when printing realtime",
        "options": None,
        "default": True,
    },
    "token_timestamps": {
        "type": bool,
        "description": "enable token-level timestamps",
        "options": None,
        "default": False,
    },
    "thold_pt": {
        "type": float,
        "description": "timestamp token probability threshold (~0.01)",
        "options": None,
        "default": 0.01,
    },
    "thold_ptsum": {
        "type": float,
        "description": "timestamp token sum probability threshold (~0.01)",
        "options": None,
        "default": 0.01,
    },
    "max_len": {
        "type": int,
        "description": "max segment length in characters, note: token_timestamps needs to be set to True for this to work",
        "options": None,
        "default": 0,
    },
    "split_on_word": {
        "type": bool,
        "description": "split on word rather than on token (when used with max_len)",
        "options": None,
        "default": False,
    },
    "max_tokens": {
        "type": int,
        "description": "max tokens per segment (0 = no limit)",
        "options": None,
        "default": 0,
    },
    "debug_mode": {
        "type": bool,
        "description": "enable debug mode in whisper.cpp",
        "options": None,
        "default": False,
    },
    "audio_ctx": {
        "type": int,
        "description": "overwrite the audio context size (0 = use default)",
        "options": None,
        "default": 0,
    },
    "tdrz_enable": {
        "type": bool,
        "description": "enable tinydiarize speaker turn detection",
        "options": None,
        "default": False,
    },
    "initial_prompt": {
        "type": str,
        "description": "Initial prompt, these are prepended to any existing text context from a previous call",
        "options": None,
        "default": None,
    },
    "prompt_tokens": {
        "type": Tuple,
        "description": "tokens to provide to the whisper decoder as initial prompt",
        "options": None,
        "default": None,
    },
    "prompt_n_tokens": {
        "type": int,
        "description": "tokens to provide to the whisper decoder as initial prompt",
        "options": None,
        "default": 0,
    },
    "carry_initial_prompt": {
        "type": bool,
        "description": "always prepend the initial prompt to each decode window",
        "options": None,
        "default": False,
    },
    "language": {
        "type": str,
        "description": 'for auto-detection, set to None, "" or "auto"',
        "options": None,
        "default": "",
    },
    "detect_language": {
        "type": bool,
        "description": "enable automatic language detection during transcription",
        "options": None,
        "default": False,
    },
    "suppress_blank": {
        "type": bool,
        "description": "common decoding parameters",
        "options": None,
        "default": True,
    },
    "suppress_non_speech_tokens": {
        "type": bool,
        "description": "common decoding parameters",
        "options": None,
        "default": False,
    },
    "suppress_nst": {
        "type": bool,
        "description": "canonical whisper.cpp name for non-speech token suppression",
        "options": None,
        "default": False,
    },
    "suppress_regex": {
        "type": str,
        "description": "regex pattern used to suppress matching text during decoding",
        "options": None,
        "default": "",
    },
    "temperature": {
        "type": float,
        "description": "initial decoding temperature",
        "options": None,
        "default": 0.0,
    },
    "max_initial_ts": {
        "type": float,
        "description": "max_initial_ts",
        "options": None,
        "default": 1.0,
    },
    "length_penalty": {
        "type": float,
        "description": "length_penalty",
        "options": None,
        "default": -1.0,
    },
    "temperature_inc": {
        "type": float,
        "description": "temperature_inc",
        "options": None,
        "default": 0.2,
    },
    "entropy_thold": {
        "type": float,
        "description": 'similar to OpenAI\'s "compression_ratio_threshold"',
        "options": None,
        "default": 2.4,
    },
    "logprob_thold": {
        "type": float,
        "description": "logprob_thold",
        "options": None,
        "default": -1.0,
    },
    "no_speech_thold": {
        "type": float,
        "description": "no_speech_thold",
        "options": None,
        "default": 0.6,
    },
    "greedy": {
        "type": dict,
        "description": "greedy",
        "options": None,
        "default": {"best_of": 5},
    },
    "beam_search": {
        "type": dict,
        "description": "beam_search",
        "options": None,
        "default": {"beam_size": -1, "patience": -1.0},
    },
    "vad": {
        "type": bool,
        "description": "Enable VAD",
        "options": None,
        "default": False,
    },
    "vad_model_path": {
        "type": str,
        "description": "Path to VAD model",
        "options": None,
        "default": None,
    },
}

pywhispercpp.utils

Helper functions

download_model

download_model(
    model_name, download_dir=None, chunk_size=1024
)

Helper function to download the ggml models

Parameters:

  • model_name (str) –

    name of the model, one of ::: constants.AVAILABLE_MODELS

  • download_dir

    Where to store the models

  • chunk_size

    size of the download chunk

Returns:

  • str

    Absolute path of the downloaded model

Source code in pywhispercpp/utils.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def download_model(model_name: str, download_dir=None, chunk_size=1024) -> str:
    """
    Helper function to download the `ggml` models
    :param model_name: name of the model, one of ::: constants.AVAILABLE_MODELS
    :param download_dir: Where to store the models
    :param chunk_size: size of the download chunk

    :return: Absolute path of the downloaded model
    """
    if model_name not in AVAILABLE_MODELS:
        logger.error(f"Invalid model name `{model_name}`, available models are: {AVAILABLE_MODELS}")
        return
    if download_dir is None:
        download_dir = MODELS_DIR
        logger.info(f"No download directory was provided, models will be downloaded to {download_dir}")

    os.makedirs(download_dir, exist_ok=True)

    url = _get_model_url(model_name=model_name)
    file_path = Path(download_dir) / os.path.basename(url)
    # check if the file is already there
    if file_path.exists():
        logger.info(f"Model {model_name} already exists in {download_dir}")
    else:
        # download it from huggingface
        resp = requests.get(url, stream=True)
        total = int(resp.headers.get('content-length', 0))

        progress_bar = tqdm(desc=f"Downloading Model {model_name} ...",
                            total=total,
                            unit='iB',
                            unit_scale=True,
                            unit_divisor=1024)

        try:
            with open(file_path, 'wb') as file, progress_bar:
                for data in resp.iter_content(chunk_size=chunk_size):
                    size = file.write(data)
                    progress_bar.update(size)
            logger.info(f"Model downloaded to {file_path.absolute()}")
        except Exception as e:
            # error download, just remove the file
            os.remove(file_path)
            raise e
    return str(file_path.absolute())

resolve_model_path

resolve_model_path(model_name, models_dir=None)

Resolve a model name to a local model file.

Resolution order: 1. If model_name is an existing file path, return it. 2. Look for model_name and model_name.bin in models_dir. 3. If no local file is found, fall back to downloading a built-in model.

Parameters:

  • model_name (str) –

    A built-in model name, a custom model name, or a direct path to a model file.

  • models_dir

    Directory to search for local models before downloading. Defaults to MODELS_DIR.

Returns:

  • str

    Absolute path to the resolved model file.

Source code in pywhispercpp/utils.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def resolve_model_path(model_name: str, models_dir=None) -> str:
    """
    Resolve a model name to a local model file.

    Resolution order:
    1. If `model_name` is an existing file path, return it.
    2. Look for `model_name` and `model_name.bin` in `models_dir`.
    3. If no local file is found, fall back to downloading a built-in model.

    :param model_name: A built-in model name, a custom model name, or a direct path to a model file.
    :param models_dir: Directory to search for local models before downloading. Defaults to `MODELS_DIR`.
    :return: Absolute path to the resolved model file.
    """
    if Path(model_name).is_file():
        return str(Path(model_name).resolve())

    search_dir = Path(models_dir) if models_dir is not None else MODELS_DIR

    candidates = [
        search_dir / model_name,
        search_dir / f"{model_name}.bin",
    ]

    for candidate in candidates:
        if candidate.is_file():
            return str(candidate.resolve())

    return download_model(model_name, search_dir)

to_timestamp

to_timestamp(t, separator=',')

376 -> 00:00:03,760 1344 -> 00:00:13,440

Implementation from whisper.cpp/examples/main

Parameters:

  • t (int) –

    input time from whisper timestamps

  • separator

    seprator between seconds and milliseconds

Returns:

  • str

    time representation in hh: mm: ss[separator]ms

Source code in pywhispercpp/utils.py
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def to_timestamp(t: int, separator=',') -> str:
    """
    376 -> 00:00:03,760
    1344 -> 00:00:13,440

    Implementation from `whisper.cpp/examples/main`

    :param t: input time from whisper timestamps
    :param separator: seprator between seconds and milliseconds
    :return: time representation in hh: mm: ss[separator]ms
    """
    # logic exactly from whisper.cpp

    msec = t * 10
    hr = msec // (1000 * 60 * 60)
    msec = msec - hr * (1000 * 60 * 60)
    min = msec // (1000 * 60)
    msec = msec - min * (1000 * 60)
    sec = msec // 1000
    msec = msec - sec * 1000
    return f"{int(hr):02,.0f}:{int(min):02,.0f}:{int(sec):02,.0f}{separator}{int(msec):03,.0f}"

output_txt

output_txt(segments, output_file_path)

Creates a raw text from a list of segments

Implementation from whisper.cpp/examples/main

Parameters:

  • segments (list) –

    list of segments

Returns:

  • str

    path of the file

Source code in pywhispercpp/utils.py
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def output_txt(segments: list, output_file_path: str) -> str:
    """
    Creates a raw text from a list of segments

    Implementation from `whisper.cpp/examples/main`

    :param segments: list of segments
    :return: path of the file
    """
    if not output_file_path.endswith('.txt'):
        output_file_path = output_file_path + '.txt'

    absolute_path = Path(output_file_path).absolute()

    with open(str(absolute_path), 'w') as file:
        for seg in segments:
            file.write(seg.text)
            file.write('\n')
    return absolute_path

output_vtt

output_vtt(segments, output_file_path)

Creates a vtt file from a list of segments

Implementation from whisper.cpp/examples/main

Parameters:

  • segments (list) –

    list of segments

Returns:

  • str

    Absolute path of the file

Source code in pywhispercpp/utils.py
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
def output_vtt(segments: list, output_file_path: str) -> str:
    """
    Creates a vtt file from a list of segments

    Implementation from `whisper.cpp/examples/main`

    :param segments: list of segments
    :return: path of the file

    :return: Absolute path of the file
    """
    if not output_file_path.endswith('.vtt'):
        output_file_path = output_file_path + '.vtt'

    absolute_path = Path(output_file_path).absolute()

    with open(absolute_path, 'w') as file:
        file.write("WEBVTT\n\n")
        for seg in segments:
            file.write(f"{to_timestamp(seg.t0, separator='.')} --> {to_timestamp(seg.t1, separator='.')}\n")
            file.write(f"{seg.text}\n\n")
    return absolute_path

output_srt

output_srt(segments, output_file_path)

Creates a srt file from a list of segments

Parameters:

  • segments (list) –

    list of segments

Returns:

  • str

    Absolute path of the file

Source code in pywhispercpp/utils.py
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
def output_srt(segments: list, output_file_path: str) -> str:
    """
    Creates a srt file from a list of segments

    :param segments: list of segments
    :return: path of the file

    :return: Absolute path of the file
    """
    if not output_file_path.endswith('.srt'):
        output_file_path = output_file_path + '.srt'

    absolute_path = Path(output_file_path).absolute()

    with open(absolute_path, 'w') as file:
        for i in range(len(segments)):
            seg = segments[i]
            file.write(f"{i+1}\n")
            file.write(f"{to_timestamp(seg.t0, separator=',')} --> {to_timestamp(seg.t1, separator=',')}\n")
            file.write(f"{seg.text}\n\n")
    return absolute_path

output_csv

output_csv(segments, output_file_path)

Creates a srt file from a list of segments

Parameters:

  • segments (list) –

    list of segments

Returns:

  • str

    Absolute path of the file

Source code in pywhispercpp/utils.py
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
def output_csv(segments: list, output_file_path: str) -> str:
    """
    Creates a srt file from a list of segments

    :param segments: list of segments
    :return: path of the file

    :return: Absolute path of the file
    """
    if not output_file_path.endswith('.csv'):
        output_file_path = output_file_path + '.csv'

    absolute_path = Path(output_file_path).absolute()

    with open(absolute_path, 'w') as file:
        for seg in segments:
            file.write(f"{10 * seg.t0}, {10 * seg.t1}, \"{seg.text}\"\n")
    return absolute_path

redirect_stderr

redirect_stderr(to=False)

Redirect stderr to the specified target.

Parameters:

  • to (bool | TextIO | str | None, default: False ) –
    • None to suppress output (redirect to devnull),
    • sys.stdout to redirect to stdout,
    • A file path (str) to redirect to a file,
    • False to do nothing (no redirection).
Source code in pywhispercpp/utils.py
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
@contextlib.contextmanager
def redirect_stderr(to: bool | TextIO | str | None = False) -> None:
    """
    Redirect stderr to the specified target.

    :param to:
        - None to suppress output (redirect to devnull),
        - sys.stdout to redirect to stdout,
        - A file path (str) to redirect to a file,
        - False to do nothing (no redirection).
    """

    if to is False:
        # do nothing
        yield
        return

    def _resolve_target(target):
        opened_stream = None
        if target is None:
            opened_stream = open(os.devnull, "w")
            return opened_stream, True
        if isinstance(target, str):
            opened_stream = open(target, "w")
            return opened_stream, True
        if hasattr(target, "write"):
            return target, False
        raise ValueError(
            "Invalid `to` parameter; expected None, a filepath string, or a file-like object."
        )

    sys.stderr.flush()
    try:
        original_fd = sys.stderr.fileno()
    except (AttributeError, OSError):
        # Jupyter or non-standard stderr implementations
        original_fd = None

    stream, should_close = _resolve_target(to)

    if original_fd is not None and hasattr(stream, "fileno"):
        saved_fd = os.dup(original_fd)
        try:
            os.dup2(stream.fileno(), original_fd)
            yield
        finally:
            os.dup2(saved_fd, original_fd)
            os.close(saved_fd)
            if should_close:
                stream.close()
        return

    # Fallback: Python-level redirect
    try:
        with contextlib.redirect_stderr(stream):
            yield
    finally:
        if should_close:
            stream.close()

pywhispercpp.examples

assistant

A simple example showcasing the use of pywhispercpp as an assistant. The idea is to use a VAD to detect speech (in this example we used webrtcvad), and when speech is detected we run the inference.

Assistant

Assistant(
    model="tiny",
    input_device=None,
    silence_threshold=8,
    q_threshold=16,
    block_duration=30,
    commands_callback=None,
    **model_params
)

Assistant class

Example usage

from pywhispercpp.examples.assistant import Assistant

my_assistant = Assistant(commands_callback=print, n_threads=8)
my_assistant.start()

Parameters:

  • model

    whisper.cpp model name or a direct path to aggml model

  • input_device (int, default: None ) –

    The input device (aka microphone), keep it None to take the default

  • silence_threshold (int, default: 8 ) –

    The duration of silence after which the inference will be running

  • q_threshold (int, default: 16 ) –

    The inference won't be running until the data queue is having at least q_threshold elements

  • block_duration (int, default: 30 ) –

    minimum time audio updates in ms

  • commands_callback (Callable[[str], None], default: None ) –

    The callback to run when a command is received

  • model_log_level

    Logging level

  • model_params

    any other parameter to pass to the whsiper.cpp model see ::: pywhispercpp.constants.PARAMS_SCHEMA

Source code in pywhispercpp/examples/assistant.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def __init__(self,
             model='tiny',
             input_device: int = None,
             silence_threshold: int = 8,
             q_threshold: int = 16,
             block_duration: int = 30,
             commands_callback: Callable[[str], None] = None,
             **model_params):

    """
    :param model: whisper.cpp model name or a direct path to a`ggml` model
    :param input_device: The input device (aka microphone), keep it None to take the default
    :param silence_threshold: The duration of silence after which the inference will be running
    :param q_threshold: The inference won't be running until the data queue is having at least `q_threshold` elements
    :param block_duration: minimum time audio updates in ms
    :param commands_callback: The callback to run when a command is received
    :param model_log_level: Logging level
    :param model_params: any other parameter to pass to the whsiper.cpp model see ::: pywhispercpp.constants.PARAMS_SCHEMA
    """

    self.input_device = input_device
    self.sample_rate = constants.WHISPER_SAMPLE_RATE  # same as whisper.cpp
    self.channels = 1  # same as whisper.cpp
    self.block_duration = block_duration
    self.block_size = int(self.sample_rate * self.block_duration / 1000)
    self.q = queue.Queue()

    self.vad = webrtcvad.Vad()
    self.silence_threshold = silence_threshold
    self.q_threshold = q_threshold
    self._silence_counter = 0

    self.pwccp_model = Model(model,
                             print_realtime=False,
                             print_progress=False,
                             print_timestamps=False,
                             single_segment=True,
                             no_context=True,
                             **model_params)
    self.commands_callback = commands_callback
start
start()

Use this function to start the assistant

Returns:

  • None

    None

Source code in pywhispercpp/examples/assistant.py
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def start(self) -> None:
    """
    Use this function to start the assistant
    :return: None
    """
    logging.info(f"Starting Assistant ...")
    with sd.InputStream(
            device=self.input_device,  # the default input device
            channels=self.channels,
            samplerate=constants.WHISPER_SAMPLE_RATE,
            blocksize=self.block_size,
            callback=self._audio_callback):

        try:
            logging.info(f"Assistant is listening ... (CTRL+C to stop)")
            while True:
                time.sleep(0.1)
        except KeyboardInterrupt:
            logging.info("Assistant stopped")

gui

WorkerSignals

Bases: QObject

Defines signals available from a running worker thread. Supported signals are: - finished: No data - error: tuple (exctype, value, traceback.format_exc()) - result: list (the transcribed segments) - progress: int (0-100) - status_update: str

PyWhisperCppWorker

PyWhisperCppWorker(
    audio_file_path, model_name, **transcribe_params
)

Bases: Thread

Source code in pywhispercpp/examples/gui.py
257
258
259
260
261
262
263
def __init__(self, audio_file_path, model_name, **transcribe_params):
    super().__init__()
    self.audio_file_path = audio_file_path
    self.model_name = model_name
    self.transcribe_params = transcribe_params
    self.signals = WorkerSignals()
    self._is_running = False
run
run()

Executes the transcription process.

Source code in pywhispercpp/examples/gui.py
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
def run(self):
    """
    Executes the transcription process.
    """
    try:
        self._is_running = True
        self.signals.status_update.emit(f"Loading model: {self.model_name}...")

        # pywhispercpp will download the specified model if not found
        model_init_params = {}
        if 'n_threads' in self.transcribe_params and self.transcribe_params['n_threads'] is not None:
            model_init_params['n_threads'] = self.transcribe_params['n_threads']
            # Remove from transcribe_params as it's a model init param
            del self.transcribe_params['n_threads']

        model = Model(self.model_name, **model_init_params)

        self.signals.status_update.emit("Model loaded. Starting transcription...")

        def new_segment_callback(segment):
            if not self._is_running:
                raise RuntimeError("Transcription manually stopped")
            self.signals.segment.emit(segment)

        segments = model.transcribe(self.audio_file_path,
                                    new_segment_callback=new_segment_callback,
                                    progress_callback=lambda progress: self.signals.progress.emit(progress),
                                    **self.transcribe_params)

        self.signals.status_update.emit("Transcription complete!")
        self.signals.result.emit(segments)

    except Exception as e:
        print(e)
        self.signals.status_update.emit(f"Error: {str(e)}")
        self.signals.error.emit((type(e), e, str(e)))
    finally:
        self._is_running = False
        self.signals.finished.emit()

TranscriptionApp

TranscriptionApp()

Bases: QWidget

Source code in pywhispercpp/examples/gui.py
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
def __init__(self):
    super().__init__()
    self.selected_file_path = None
    self.whisper_thread = None
    # Settings widgets
    self.model_combo = None
    self.language_input = None
    self.translate_checkbox = None
    self.n_threads_spinbox = None
    self.no_context_checkbox = None
    self.temperature_spinbox = None
    self.settings_content_frame = None  # Frame to hold collapsible settings
    self.toggle_settings_button = None  # Button to toggle settings
    self.status_bar_label = None  # New label for the status bar
    self.about_button = None  # About button
    self.segments = []  # Store segments for export
    self.copy_text_button = None  # New button for copy text

    self.initUI()
initUI
initUI()

Initializes the user interface of the application.

Source code in pywhispercpp/examples/gui.py
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
def initUI(self):
    """
    Initializes the user interface of the application.
    """
    self.setWindowTitle('PyWhisperCpp Simple GUI')
    self.setGeometry(100, 100, 450, 500)
    # Apply the updated stylesheet
    self.setStyleSheet(STYLESHEET)

    # Main vertical layout
    main_layout = QVBoxLayout()
    # Set bottom margin to 0 for the main layout to ensure status bar is flush
    main_layout.setContentsMargins(4, 4, 4, 0)
    main_layout.setSpacing(10)

    # --- Header (Title + About Button) ---
    header_layout = QHBoxLayout()
    title_label = QLabel("PyWhisperCpp Simple GUI")  # Updated main title label
    title_label.setObjectName("TitleLabel")  # Add objectName for styling
    title_label.setAlignment(Qt.AlignLeft)  # Keep title centered within its allocated space

    # Adding stretch before and after title to center it
    # header_layout.addStretch()
    header_layout.addWidget(title_label)
    header_layout.addStretch()

    # About button
    self.about_button = QPushButton("About")
    self.about_button.clicked.connect(self.show_about_dialog)
    # Removed setFixedSize to allow text to fit, or adjust as needed
    # self.about_button.setFixedSize(50, 25)
    header_layout.addWidget(self.about_button)  # Add it to the header layout

    main_layout.addLayout(header_layout)  # Add the combined header to main layout

    # --- File Selection Area ---
    file_frame = QFrame()
    file_layout = QHBoxLayout(file_frame)
    file_layout.setContentsMargins(0, 0, 0, 0)
    file_layout.setSpacing(10)

    self.select_button = QPushButton("Select Audio File")
    self.select_button.clicked.connect(self.select_file)

    self.file_label = QLabel("No file selected.")
    self.file_label.setObjectName("file_label")  # Added objectName for styling
    self.file_label.setSizePolicy(QSizePolicy.Expanding, QSizePolicy.Preferred)

    file_layout.addWidget(self.select_button)
    file_layout.addWidget(self.file_label)
    main_layout.addWidget(file_frame)

    # --- Collapsible Settings Section ---
    settings_group = QGroupBox()  # No title here, using QToolButton for title
    settings_group_layout = QVBoxLayout(settings_group)
    settings_group_layout.setContentsMargins(5, 5, 5, 5)

    # Custom title bar for the collapsible group box
    header_layout_settings = QHBoxLayout()  # Renamed to avoid clash
    self.toggle_settings_button = QToolButton(settings_group)
    self.toggle_settings_button.setText("Transcription Settings")
    self.toggle_settings_button.setToolButtonStyle(Qt.ToolButtonTextBesideIcon)
    self.toggle_settings_button.setArrowType(Qt.RightArrow)
    self.toggle_settings_button.setCheckable(True)
    self.toggle_settings_button.setChecked(False)  # Start collapsed
    self.toggle_settings_button.clicked.connect(self.toggle_settings_visibility)

    header_layout_settings.addWidget(self.toggle_settings_button)
    header_layout_settings.addStretch()  # Push button to left

    settings_group_layout.addLayout(header_layout_settings)

    # Frame to hold the actual settings form (this will be hidden/shown)
    self.settings_content_frame = QFrame()
    settings_form_layout = QFormLayout(self.settings_content_frame)
    settings_form_layout.setContentsMargins(15, 5, 10, 10)
    settings_form_layout.setSpacing(8)

    # Model Selection
    self.model_combo = QComboBox()
    self.model_combo.addItems(AVAILABLE_MODELS)
    self.model_combo.setCurrentText("tiny")  # Default to 'tiny' as requested
    settings_form_layout.addRow("Model:", self.model_combo)

    # Language Input
    self.language_input = QLineEdit()
    self.language_input.setPlaceholderText('e.g., "en", "es", or leave empty for auto-detect')
    self.language_input.setText("")  # Default to auto-detect
    settings_form_layout.addRow("Language:", self.language_input)

    # Translate Checkbox
    self.translate_checkbox = QCheckBox("Translate to English")
    self.translate_checkbox.setChecked(False)  # Default
    settings_form_layout.addRow("Translate:", self.translate_checkbox)

    # N Threads Spinbox
    self.n_threads_spinbox = QSpinBox()
    self.n_threads_spinbox.setRange(1, os.cpu_count() if os.cpu_count() else 8)  # Max threads based on CPU cores
    self.n_threads_spinbox.setValue(4)  # Sensible default
    settings_form_layout.addRow("Number of Threads:", self.n_threads_spinbox)

    # No Context Checkbox
    self.no_context_checkbox = QCheckBox("No Context (do not use past transcription)")
    self.no_context_checkbox.setChecked(False)  # Default
    settings_form_layout.addRow("No Context:", self.no_context_checkbox)

    # Temperature Spinbox
    self.temperature_spinbox = QDoubleSpinBox()
    self.temperature_spinbox.setRange(0.0, 1.0)
    self.temperature_spinbox.setSingleStep(0.1)
    self.temperature_spinbox.setValue(0.0)  # Default
    settings_form_layout.addRow("Temperature:", self.temperature_spinbox)

    settings_group_layout.addWidget(self.settings_content_frame)
    self.settings_content_frame.setVisible(False)  # Initially hidden

    main_layout.addWidget(settings_group)

    # --- Transcription Button ---
    self.transcribe_button = QPushButton("Transcribe")
    self.transcribe_button.setObjectName("TranscribeButton")  # Add objectName for styling
    self.transcribe_button.setEnabled(False)
    self.transcribe_button.clicked.connect(self.start_transcription)
    main_layout.addWidget(self.transcribe_button)

    # --- Stop Button ---
    self.stop_button = QPushButton("Stop")
    self.stop_button.setObjectName("StopButton")  # Add objectName for styling
    self.stop_button.setEnabled(True)
    self.stop_button.setVisible(False)
    self.stop_button.clicked.connect(self.stop_transcription)
    main_layout.addWidget(self.stop_button)

    # --- Progress Bar ---
    progress_frame = QFrame()
    progress_layout = QVBoxLayout(progress_frame)
    progress_layout.setContentsMargins(0, 5, 0, 5)
    progress_layout.setSpacing(5)

    self.progress_bar = QProgressBar()
    self.progress_bar.setVisible(False)

    progress_layout.addWidget(self.progress_bar)
    main_layout.addWidget(progress_frame)

    # --- Transcription Output Table ---
    output_label = QLabel("Transcription Output:")
    main_layout.addWidget(output_label)

    self.results_table = QTableWidget()
    self.results_table.setColumnCount(3)
    self.results_table.setHorizontalHeaderLabels(["Start Time", "End Time", "Text"])
    header = self.results_table.horizontalHeader()
    header.setSectionResizeMode(0, QHeaderView.ResizeToContents)
    header.setSectionResizeMode(1, QHeaderView.ResizeToContents)
    header.setSectionResizeMode(2, QHeaderView.Stretch)
    self.results_table.verticalHeader().setVisible(False)
    main_layout.addWidget(self.results_table)

    # --- Output Buttons (Export and Copy) ---
    output_buttons_layout = QHBoxLayout()
    output_buttons_layout.addStretch()  # Pushes buttons to the right

    # Export Button with Menu
    self.export_button = QPushButton("Export as...")
    self.export_button.setEnabled(False)
    self.export_menu = QMenu(self)

    self.export_action_txt = self.export_menu.addAction("Plain Text (.txt)")
    self.export_action_srt = self.export_menu.addAction("SRT Subtitle (.srt)")
    self.export_action_vtt = self.export_menu.addAction("VTT Subtitle (.vtt)")
    self.export_action_csv = self.export_menu.addAction("CSV (.csv)")

    self.export_action_txt.triggered.connect(lambda: self.export_transcription("txt"))
    self.export_action_srt.triggered.connect(lambda: self.export_transcription("srt"))
    self.export_action_vtt.triggered.connect(lambda: self.export_transcription("vtt"))
    self.export_action_csv.triggered.connect(lambda: self.export_transcription("csv"))

    self.export_button.setMenu(self.export_menu)
    output_buttons_layout.addWidget(self.export_button)

    # Copy Text Button
    self.copy_text_button = QPushButton("Copy Text")
    self.copy_text_button.setEnabled(False)  # Initially disabled
    self.copy_text_button.clicked.connect(self.copy_all_text_to_clipboard)  # Connect to new method
    output_buttons_layout.addWidget(self.copy_text_button)

    main_layout.addLayout(output_buttons_layout)

    # --- Status Bar at the very bottom ---
    self.status_bar_label = QLabel("Ready.")
    self.status_bar_label.setObjectName("status_bar_label")  # Add objectName for styling
    self.status_bar_label.setAlignment(Qt.AlignLeft | Qt.AlignVCenter)
    self.status_bar_label.setContentsMargins(5, 2, 5, 2)
    main_layout.addWidget(self.status_bar_label)

    self.setLayout(main_layout)
toggle_settings_visibility
toggle_settings_visibility()

Toggles the visibility of the settings content frame and updates the arrow.

Source code in pywhispercpp/examples/gui.py
529
530
531
532
533
534
535
536
def toggle_settings_visibility(self):
    """Toggles the visibility of the settings content frame and updates the arrow."""
    is_visible = self.settings_content_frame.isVisible()
    self.settings_content_frame.setVisible(not is_visible)
    if not is_visible:
        self.toggle_settings_button.setArrowType(Qt.DownArrow)
    else:
        self.toggle_settings_button.setArrowType(Qt.RightArrow)
select_file
select_file()

Opens a file dialog to select an audio file.

Source code in pywhispercpp/examples/gui.py
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
def select_file(self):
    """
    Opens a file dialog to select an audio file.
    """
    options = QFileDialog.Options()
    file_path, _ = QFileDialog.getOpenFileName(
        self, "Select a Media File", "",
        "All Files (*)",
        options=options
    )
    if file_path:
        self.selected_file_path = file_path
        self.file_label.setText(f"Selected: {os.path.basename(file_path)}")
        self.transcribe_button.setEnabled(True)
        self.results_table.setRowCount(0)
        self.export_button.setEnabled(False)  # Disable export until transcription
        self.copy_text_button.setEnabled(False)  # Disable copy until transcription
        self.update_status("File selected: " + os.path.basename(file_path))  # Update new status bar
start_transcription
start_transcription()

Starts the transcription process in a separate thread, passing selected settings.

Source code in pywhispercpp/examples/gui.py
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
def start_transcription(self):
    """
    Starts the transcription process in a separate thread, passing selected settings.
    """
    if self.selected_file_path:
        self.transcribe_button.setVisible(False)
        self.stop_button.setVisible(True)
        self.select_button.setEnabled(False)
        self.progress_bar.setVisible(True)
        self.progress_bar.setValue(0)
        self.results_table.setRowCount(0)
        self.export_button.setEnabled(False)  # Disable export during transcription
        self.copy_text_button.setEnabled(False)  # Disable copy during transcription
        self.update_status("Starting transcription...")
        self.segments = []  # Clear segments for new transcription

        # Gather settings from GUI widgets
        selected_model = self.model_combo.currentText()
        transcribe_params = {
            "language": self.language_input.text() if self.language_input.text() else None,
            "translate": self.translate_checkbox.isChecked(),
            "n_threads": self.n_threads_spinbox.value(),
            "no_context": self.no_context_checkbox.isChecked(),
            "temperature": self.temperature_spinbox.value(),
        }
        # Remove None values to use pywhispercpp defaults where applicable
        transcribe_params = {k: v for k, v in transcribe_params.items() if v is not None}

        # Create and start the worker thread
        self.whisper_thread = PyWhisperCppWorker(
            self.selected_file_path,
            selected_model,
            **transcribe_params
        )
        self.whisper_thread.signals.result.connect(self.on_transcription_result)
        self.whisper_thread.signals.segment.connect(self.on_new_segment)
        self.whisper_thread.signals.finished.connect(self.on_transcription_finished)
        self.whisper_thread.signals.error.connect(self.on_transcription_error)
        self.whisper_thread.signals.progress.connect(self.update_progress)
        self.whisper_thread.signals.status_update.connect(self.update_status)
        self.whisper_thread.start()
format_time
format_time(milliseconds)

Converts milliseconds to HH:MM:SS.ms format.

Source code in pywhispercpp/examples/gui.py
625
626
627
628
629
630
def format_time(self, milliseconds):
    """Converts milliseconds to HH:MM:SS.ms format."""
    seconds_total = milliseconds / 1000
    minutes, seconds = divmod(seconds_total, 60)
    hours, minutes = divmod(minutes, 60)
    return f"{int(hours):02d}:{int(minutes):02d}:{seconds:06.3f}"
on_transcription_result
on_transcription_result(segments)

Populates the results table with the transcription segments. Stores segments for export.

Source code in pywhispercpp/examples/gui.py
646
647
648
649
650
651
652
653
def on_transcription_result(self, segments):
    """
    Populates the results table with the transcription segments.
    Stores segments for export.
    """
    self.segments = segments  # Store segments
    self.export_button.setEnabled(True if segments else False)  # Enable export if segments exist
    self.copy_text_button.setEnabled(True if segments else False)  # Enable copy if segments exist
on_transcription_finished
on_transcription_finished()

Cleans up after the transcription thread is finished.

Source code in pywhispercpp/examples/gui.py
655
656
657
658
659
660
661
662
663
664
665
666
667
668
def on_transcription_finished(self):
    """
    Cleans up after the transcription thread is finished.
    """
    self.transcribe_button.setVisible(True)
    self.transcribe_button.setEnabled(True)
    self.stop_button.setVisible(False)
    self.select_button.setEnabled(True)
    self.progress_bar.setVisible(False)
    if self.results_table.rowCount() == 0:
        self.update_status("Finished. No transcription data.")
    else:
        self.update_status("Transcription finished successfully!")
    self.whisper_thread = None
on_transcription_error
on_transcription_error(err)

Displays an error message if transcription fails.

Source code in pywhispercpp/examples/gui.py
670
671
672
673
674
675
676
677
def on_transcription_error(self, err):
    """
    Displays an error message if transcription fails.
    """
    exctype, value, tb = err
    error_message = f"Error: {value}"
    self.update_status(error_message)  # Update new status bar
    self.on_transcription_finished()
export_transcription
export_transcription(format_type)

Handles exporting the transcription to a chosen file format.

Source code in pywhispercpp/examples/gui.py
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
def export_transcription(self, format_type):
    """
    Handles exporting the transcription to a chosen file format.
    """
    if not self.segments:
        self.update_status("No transcription data to export.")
        return

    file_dialog_filter = {
        "txt": "Plain Text Files (*.txt)",
        "srt": "SRT Subtitle Files (*.srt)",
        "vtt": "VTT Subtitle Files (*.vtt)",
        "csv": "CSV (Comma Separated Values) Files (*.csv)",
    }

    default_file_name = os.path.basename(self.selected_file_path).rsplit('.', 1)[
                            0] + f".{format_type}" if self.selected_file_path else f"transcription.{format_type}"

    options = QFileDialog.Options()
    file_path, _ = QFileDialog.getSaveFileName(
        self, f"Save Transcription as {format_type.upper()}",
        default_file_name,
        file_dialog_filter.get(format_type, "All Files (*)"),
        options=options
    )

    if file_path:
        try:
            # Use pywhispercpp.utils functions based on format_type
            if format_type == "txt":
                # For TXT, we'll re-use the text from the table or segments
                all_text = []
                for segment in self.segments:
                    all_text.append(segment.text.strip())
                output_txt_content = "\n".join(all_text)
                with open(file_path, 'w', encoding='utf-8') as f:
                    f.write(output_txt_content)

            elif format_type == "srt":
                if output_srt:
                    output_srt(self.segments, file_path)
                else:
                    raise ImportError("pywhispercpp.utils.output_srt not available.")
            elif format_type == "vtt":
                if output_vtt:
                    output_vtt(self.segments, file_path)
                else:
                    raise ImportError("pywhispercpp.utils.output_vtt not available.")
            elif format_type == "csv":
                if output_csv:
                    # For CSV, we need to pass a list of lists/tuples representing rows
                    # pywhispercpp.utils.output_csv expects a list of segments and a file path
                    output_csv(self.segments, file_path)
                else:
                    raise ImportError("pywhispercpp.utils.output_csv not available.")

            self.update_status(f"Transcription successfully exported to {os.path.basename(file_path)}")
        except Exception as e:
            self.update_status(f"Error exporting to {format_type.upper()}: {e}")
    else:
        self.update_status("Export cancelled.")
copy_all_text_to_clipboard
copy_all_text_to_clipboard()

Concatenates all text from segments and copies it to the clipboard.

Source code in pywhispercpp/examples/gui.py
741
742
743
744
745
746
747
748
749
750
751
752
753
754
def copy_all_text_to_clipboard(self):
    """
    Concatenates all text from segments and copies it to the clipboard.
    """
    if not self.segments:
        self.update_status("No transcription data to copy.")
        return

    all_text = []
    for segment in self.segments:
        all_text.append(segment.text.strip())

    QApplication.clipboard().setText("\n".join(all_text))
    self.update_status("Text copied to clipboard!")
show_about_dialog
show_about_dialog()

Opens a small dialog with About information.

Source code in pywhispercpp/examples/gui.py
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
def show_about_dialog(self):
    """Opens a small dialog with About information."""
    about_dialog = QDialog(self)
    about_dialog.setWindowTitle("About PyWhisperCPP Simple GUI")
    about_dialog.setFixedSize(400, 220)

    dialog_layout = QVBoxLayout(about_dialog)
    dialog_layout.setContentsMargins(20, 20, 20, 20)

    info_text = QLabel()
    info_text.setTextFormat(Qt.RichText)
    info_text.setText(
        "<b>PyWhisperCPP Simple GUI</b><br>"
        f"Version {__version__}<br>"
        "<br>"
        "A simple graphical user interface for PyWhisperCpp Using PyQt.<br><br>"
        "<a href='https://github.com/absadiki/pywhispercpp'>PyWhisperCpp GitHub repository</a><br>"
        "<br>"
        f"Copyright © {datetime.now().year}"
    )
    info_text.setOpenExternalLinks(True)

    dialog_layout.addWidget(info_text)

    close_button = QPushButton("Close")
    close_button.clicked.connect(about_dialog.accept)
    dialog_layout.addWidget(close_button, alignment=Qt.AlignCenter)

    about_dialog.exec_()

livestream

Quick and dirty realtime livestream transcription.

Not fully satisfying though :) You are welcome to make it better.

LiveStream

LiveStream(
    url,
    model="tiny.en",
    block_size=1024,
    buffer_size=20,
    sample_size=4,
    output_device=None,
    model_log_level=logging.CRITICAL,
    **model_params
)

LiveStream class

Note

It heavily depends on the machine power, the processor will jump quickly to 100% with the wrong parameters.

Example usage

from pywhispercpp.examples.livestream import LiveStream

url = ""  # Make sure it is a direct stream URL
ls = LiveStream(url=url, n_threads=4)
ls.start()

Parameters:

Source code in pywhispercpp/examples/livestream.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
def __init__(self,
             url,
             model='tiny.en',
             block_size: int = 1024,
             buffer_size: int = 20,
             sample_size: int = 4,
             output_device: int = None,
             model_log_level=logging.CRITICAL,
             **model_params):

    """
    :param url: Live stream url <a direct stream URL>
    :param model: whisper.cpp model
    :param block_size: block size, default to 1024
    :param buffer_size: number of blocks used for buffering, default to 20
    :param sample_size: sample size
    :param output_device: the output device, aka the speaker, leave it None to take the default
    :param model_log_level: logging level
    :param model_params: any other whisper.cpp params
    """
    self.url = url
    self.block_size = block_size
    self.buffer_size = buffer_size
    self.sample_size = sample_size
    self.output_device = output_device

    self.channels = 1
    self.samplerate = constants.WHISPER_SAMPLE_RATE

    self.q = queue.Queue(maxsize=buffer_size)
    self.audio_data = np.array([])

    self.pwccp_model = Model(model,
                             log_level=model_log_level,
                             print_realtime=True,
                             print_progress=False,
                             print_timestamps=False,
                             single_segment=True,
                             **model_params)

main

A simple Command Line Interface to test the package

recording

A simple example showcasing how to use pywhispercpp to transcribe a recording.

Recording

Recording(duration, model='tiny.en', **model_params)

Recording class

Example usage

from pywhispercpp.examples.recording import Recording

myrec = Recording(5)
myrec.start()

Source code in pywhispercpp/examples/recording.py
38
39
40
41
42
43
44
45
def __init__(self,
             duration: int,
             model: str = 'tiny.en',
             **model_params):
    self.duration = duration
    self.sample_rate = pywhispercpp.constants.WHISPER_SAMPLE_RATE
    self.channels = 1
    self.pwcpp_model = Model(model, print_realtime=True, **model_params)