AI Speech-to-Text Web App: Install the Development Environment

In this guide we will configure the development environment for our AI-powered speech-to-text web application. We create a Python virtual environment, install all necessary packages and prepare the folder structure.

Create a Project Directory

# Create a new project directory
mkdir speech-to-text-app
cd speech-to-text-app

Create a Python Virtual Environment

Virtual environments let you isolate dependencies between different projects:

python -m venv venv

# Activate the environment
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

When the environment is active, its name appears in parentheses in the terminal.

Install Required Packages

pip install --upgrade pip
pip install flask openai-whisper pyaudio numpy
pip install flask-bootstrap

Flask is the web framework, openai-whisper handles speech recognition, PyAudio manages microphone recording, NumPy provides numerical operations, and Flask-Bootstrap integrates Bootstrap.

Troubleshooting PyAudio

If installation fails, try:

Windows

pip install pipwin
pipwin install pyaudio

macOS

brew install portaudio
pip install pyaudio

Linux (Ubuntu/Debian)

sudo apt-get install python3-pyaudio

Create requirements.txt

pip freeze > requirements.txt

Structure the Project

mkdir -p static/js static/css templates
touch app.py templates/index.html static/js/main.js static/css/style.css

Test the Environment

Create a simple Flask app and HTML template according to the original example and run python app.py. Visit http://127.0.0.1:5000/ to verify the installation.

Install the Whisper Model

import whisper
model = whisper.load_model("base")

Conclusion

The environment is now ready, and we are prepared to build the actual application.