Caption Generator - Backend

Você também pode ler em Português 🇧🇷

Backend API for audio and video transcription with Google authentication and Stripe payments.

📋 About the Project

Caption Generator is an application that allows users to upload audio or video files and receive real-time transcriptions. The backend handles authentication, file processing, Whisper transcriptions, and payments.

🎯 Project Objective

This project's main focus is the practical application of three essential technical concepts:

Node.js Streams and Server-Sent Events for real-time file processing
Complete Stripe integration for payment system and subscriptions
Terraform to automate deployment and manage infrastructure on AWS
CI/CD for continuous deployment (GitHub Actions) on AWS

The development prioritizes learning and experimentation with these technologies, implementing an architecture that demonstrates how to integrate data streaming, secure payments, and Infrastructure as Code (IaC). Some features were developed with an MVP approach to accelerate the learning and prototyping process of the core concepts.

🤖 Pipeline Description

💡 Details about the project's infrastructure and deployment (CI/CD) on AWS at this link.

This project implements a CI/CD pipeline using GitHub Actions to automate the entire build and deployment process of the application.
The workflow works as follows:

On every push to the main branch, the pipeline is triggered.
The application is built into a Docker image and pushed to Amazon Elastic Container Registry (ECR).
After that, the pipeline connects to an Amazon EC2 instance via SSH.
The latest image is pulled from ECR, and the old container is stopped and removed.
A new container is started with the updated version of the application.

This makes the deployment process automated, secure, and reproducible.

📸 Frontend Screenshots

Homepage

Subscription Page

✨ Key Features

🔐 Google Authentication via Passport.js
🎵 Audio/video transcription using Whisper
💳 Payment system with Stripe (Free and Premium plans)
🔄 Real-time processing with Server-Sent Events (SSE)
📁 File upload with type and size validation
🗄️ Database managed via Prisma ORM

🛠️ Technologies

Node.js + TypeScript
Express.js - Web framework
Passport.js - Google authentication
Stripe - Payment processing
Prisma ORM - Database management
Supabase - PostgreSQL database
Whisper - Audio to text transcription
Multer - File handling
Docker - Containerization
Terraform - Infrastructure as code
GitHub Actions - Continuous integration

🚀 How to Run Locally

Prerequisites

Node.js (v18 or higher)
Docker and Docker Compose
Google Cloud Console account (for OAuth)
Stripe account
Supabase account

Installation

Clone the repository

git clone https://github.com/Darlan0307/Capition-Generate-API.git

cd Capition-Generate-API/backend

Configure environment variables

cp .env.example .env

Fill the .env file with your configurations:

PORT=4000

WHISPER_MODEL_PATH= # Path to Whisper model (base.en, tiny.en, other)
WHISPER_BIN= # Path to Whisper binary (whisper.cli, whisper.cpp, other)

DATABASE_URL=
DIRECT_URL=
FRONTEND_URL=

AUTH_SECRET=
JWT_SECRET=

GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=
GOOGLE_CALLBACK_URL=

NODE_ENV=

STRIPE_WEBHOOK_SECRET=
STRIPE_SECRET_KEY=
STRIPE_SUBSCRIPTION_PRICE_ID=

Run with Docker

docker compose up -d

The server will be running at http://localhost:4000

📚 API Documentation

Authentication

`GET /auth/google`

Initiates Google OAuth authentication process

`GET /auth/google/callback`

Callback to process Google authentication return

Transcription

`POST /transcribe`

Uploads and transcribes audio/video file

Headers:

cookie: auth-token=<jwt-token>

Body:

media: Audio or video file

Response: Event stream (SSE) with transcription progress

Payments

`POST /checkout-session`

Creates Stripe checkout session for subscription

`POST /webhook`

Processes Stripe webhook events

🐳 Docker

The project includes Docker configuration with:

Whisper installation
Node.js environment setup
System dependencies for audio processing

🗄️ Database

The project uses Prisma ORM with Supabase (PostgreSQL).

To run migrations:

npx prisma migrate dev

To view the database:

npx prisma studio

🌐 Demo

https://site-caption-generator.vercel.app/

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
backend		backend
infra		infra
.gitignore		.gitignore
LICENSE		LICENSE
README-PT.md		README-PT.md
README.md		README.md
print-home.png		print-home.png
print-subscription.png		print-subscription.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Caption Generator - Backend

📋 About the Project

🎯 Project Objective

🤖 Pipeline Description

📸 Frontend Screenshots

Homepage

Subscription Page

✨ Key Features

🛠️ Technologies

🚀 How to Run Locally

Prerequisites

Installation

📚 API Documentation

Authentication

`GET /auth/google`

`GET /auth/google/callback`

Transcription

`POST /transcribe`

Payments

`POST /checkout-session`

`POST /webhook`

🐳 Docker

🗄️ Database

🌐 Demo

About

Uh oh!

Releases

Packages

Languages

License

Darlan0307/Capition-Generate-API

Folders and files

Latest commit

History

Repository files navigation

Caption Generator - Backend

📋 About the Project

🎯 Project Objective

🤖 Pipeline Description

📸 Frontend Screenshots

Homepage

Subscription Page

✨ Key Features

🛠️ Technologies

🚀 How to Run Locally

Prerequisites

Installation

📚 API Documentation

Authentication

GET /auth/google

GET /auth/google/callback

Transcription

POST /transcribe

Payments

POST /checkout-session

POST /webhook

🐳 Docker

🗄️ Database

🌐 Demo

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`GET /auth/google`

`GET /auth/google/callback`

`POST /transcribe`

`POST /checkout-session`

`POST /webhook`

Packages