Você também pode ler em Português 🇧🇷
Backend API for audio and video transcription with Google authentication and Stripe payments.
Caption Generator is an application that allows users to upload audio or video files and receive real-time transcriptions. The backend handles authentication, file processing, Whisper transcriptions, and payments.
This project's main focus is the practical application of three essential technical concepts:
- Node.js Streams and Server-Sent Events for real-time file processing
- Complete Stripe integration for payment system and subscriptions
- Terraform to automate deployment and manage infrastructure on AWS
- CI/CD for continuous deployment (GitHub Actions) on AWS
The development prioritizes learning and experimentation with these technologies, implementing an architecture that demonstrates how to integrate data streaming, secure payments, and Infrastructure as Code (IaC). Some features were developed with an MVP approach to accelerate the learning and prototyping process of the core concepts.
💡 Details about the project's infrastructure and deployment (CI/CD) on AWS at this link.
This project implements a CI/CD pipeline using GitHub Actions to automate the entire build and deployment process of the application.
The workflow works as follows:
- On every push to the main branch, the pipeline is triggered.
- The application is built into a Docker image and pushed to Amazon Elastic Container Registry (ECR).
- After that, the pipeline connects to an Amazon EC2 instance via SSH.
- The latest image is pulled from ECR, and the old container is stopped and removed.
- A new container is started with the updated version of the application.
This makes the deployment process automated, secure, and reproducible.
- 🔐 Google Authentication via Passport.js
- 🎵 Audio/video transcription using Whisper
- 💳 Payment system with Stripe (Free and Premium plans)
- 🔄 Real-time processing with Server-Sent Events (SSE)
- 📁 File upload with type and size validation
- 🗄️ Database managed via Prisma ORM
- Node.js + TypeScript
- Express.js - Web framework
- Passport.js - Google authentication
- Stripe - Payment processing
- Prisma ORM - Database management
- Supabase - PostgreSQL database
- Whisper - Audio to text transcription
- Multer - File handling
- Docker - Containerization
- Terraform - Infrastructure as code
- GitHub Actions - Continuous integration
- Node.js (v18 or higher)
- Docker and Docker Compose
- Google Cloud Console account (for OAuth)
- Stripe account
- Supabase account
- Clone the repository
git clone https://github.com/Darlan0307/Capition-Generate-API.git
cd Capition-Generate-API/backend- Configure environment variables
cp .env.example .envFill the .env file with your configurations:
PORT=4000
WHISPER_MODEL_PATH= # Path to Whisper model (base.en, tiny.en, other)
WHISPER_BIN= # Path to Whisper binary (whisper.cli, whisper.cpp, other)
DATABASE_URL=
DIRECT_URL=
FRONTEND_URL=
AUTH_SECRET=
JWT_SECRET=
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=
GOOGLE_CALLBACK_URL=
NODE_ENV=
STRIPE_WEBHOOK_SECRET=
STRIPE_SECRET_KEY=
STRIPE_SUBSCRIPTION_PRICE_ID=- Run with Docker
docker compose up -dThe server will be running at http://localhost:4000
Initiates Google OAuth authentication process
Callback to process Google authentication return
Uploads and transcribes audio/video file
Headers:
cookie: auth-token=<jwt-token>
Body:
media: Audio or video file
Response: Event stream (SSE) with transcription progress
Creates Stripe checkout session for subscription
Processes Stripe webhook events
The project includes Docker configuration with:
- Whisper installation
- Node.js environment setup
- System dependencies for audio processing
The project uses Prisma ORM with Supabase (PostgreSQL).
To run migrations:
npx prisma migrate devTo view the database:
npx prisma studio
