Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/python
{
"name": "Python 3",
"name": "Data Formulator Dev",
// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
"image": "mcr.microsoft.com/devcontainers/python:1-3.12-bullseye",
"image": "mcr.microsoft.com/devcontainers/python:1-3.11-bullseye",

// Features to add to the dev container. More info: https://containers.dev/features.
"features": {
"ghcr.io/devcontainers/features/node:1": {
"version": "18"
},
"ghcr.io/devcontainers/features/azure-cli:1": {}
},
"features": {
"ghcr.io/devcontainers/features/node:1": {
"version": "18"
},
"ghcr.io/devcontainers/features/azure-cli:1": {},
"ghcr.io/astral-sh/uv:1": {}
},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
"forwardPorts": [5000, 5173],

// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "cd /workspaces/data-formulator && npm install && npm run build && python3 -m venv /workspaces/data-formulator/venv && . /workspaces/data-formulator/venv/bin/activate && pip install -e /workspaces/data-formulator --verbose && data_formulator"
"postCreateCommand": "cd /workspaces/data-formulator && npm install && npm run build && uv sync && uv run data_formulator"

// Configure tool-specific properties.
// "customizations": {},
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

*env
.venv/
*api-keys.env
**/*.ipynb_checkpoints/
.DS_Store
Expand Down
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.11
139 changes: 127 additions & 12 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,34 @@
How to set up your local machine.

## Prerequisites
* Python > 3.11
* Python >= 3.11
* Node.js
* Yarn
* [uv](https://docs.astral.sh/uv/) (recommended) or pip

## Backend (Python)

### Option 1: With uv (recommended)

uv is faster and provides reproducible builds via lockfile.

```bash
uv sync # Creates .venv and installs all dependencies
uv run data_formulator # Run app (opens browser automatically)
uv run data_formulator --dev # Run backend only (for frontend development)
```

**Which command to use:**
- **End users / testing the full app**: `uv run data_formulator` - starts server and opens browser to http://localhost:5000
- **Frontend development**: `uv run data_formulator --dev` - starts backend server only, then run `yarn start` separately for the Vite dev server on http://localhost:5173

### Option 2: With pip (fallback)

- **Create a Virtual Environment**
```bash
python -m venv venv
.\venv\Scripts\activate
source venv/bin/activate # Unix
# or .\venv\Scripts\activate # Windows
```

- **Install Dependencies**
Expand Down Expand Up @@ -41,14 +59,16 @@ How to set up your local machine.


- **Run the app**
- **Windows**
```bash
.\local_server.bat
```

- **Unix-based**
```bash
# Unix
./local_server.sh

# Windows
.\local_server.bat

# Or directly
data_formulator # Opens browser automatically
data_formulator --dev # Backend only (for frontend development)
```

## Frontend (TypeScript)
Expand All @@ -61,7 +81,12 @@ How to set up your local machine.

- **Development mode**

Run the front-end in development mode using, allowing real-time edits and previews:
First, start the backend server (in a separate terminal):
```bash
uv run data_formulator --dev # or ./local_server.sh
```

Then, run the frontend in development mode with hot reloading:
```bash
yarn start
```
Expand All @@ -81,6 +106,10 @@ How to set up your local machine.
Then, build python package:

```bash
# With uv
uv build

# Or with pip
pip install build
python -m build
```
Expand Down Expand Up @@ -116,9 +145,10 @@ When deploying Data Formulator to production, please be aware of the following s

1. **Local DuckDB Files**: When database functionality is enabled (default), Data Formulator stores DuckDB database files locally on the server. These files contain user data and are stored in the system's temporary directory or a configured `LOCAL_DB_DIR`.

2. **Session Management**:
- When database is **enabled**: Session IDs are stored in Flask sessions (cookies) and linked to local DuckDB files
- When database is **disabled**: No persistent storage is used, and no cookies are set. Session IDs are generated per request for API consistency
2. **Identity Management**:
- Each user's data is isolated by a namespaced identity key (e.g., `user:alice@example.com` or `browser:550e8400-...`)
- Anonymous users get a browser-based UUID stored in localStorage
- Authenticated users get their verified user ID from the auth provider

3. **Data Persistence**: User data processed through Data Formulator may be temporarily stored in these local DuckDB files, which could be a security risk in multi-tenant environments.

Expand All @@ -142,5 +172,90 @@ For production deployment, consider:
python -m data_formulator.app --disable-database
```

## Authentication Architecture

Data Formulator supports a **hybrid identity system** that supports both anonymous and authenticated users.

### Identity Flow Overview

```
┌─────────────────────────────────────────────────────────────────────┐
│ Frontend Request │
├─────────────────────────────────────────────────────────────────────┤
│ Headers: │
│ X-Identity-Id: "browser:550e8400-..." (namespace sent by client) │
│ Authorization: Bearer <jwt> (if custom auth implemented) │
│ (Azure also adds X-MS-CLIENT-PRINCIPAL-ID automatically) │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Backend Identity Resolution │
│ (auth.py: get_identity_id) │
├─────────────────────────────────────────────────────────────────────┤
│ Priority 1: Azure X-MS-CLIENT-PRINCIPAL-ID → "user:<azure_id>" │
│ Priority 2: JWT Bearer token (if implemented) → "user:<jwt_sub>" │
│ Priority 3: X-Identity-Id header → ALWAYS "browser:<id>" │
│ (client-provided namespace is IGNORED for security) │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Storage Isolation │
├─────────────────────────────────────────────────────────────────────┤
│ "user:alice@example.com" → alice's DuckDB file (ONLY via auth) │
│ "browser:550e8400-..." → anonymous user's DuckDB file │
└─────────────────────────────────────────────────────────────────────┘
```

### Security Model

**Critical Security Rule:** The backend NEVER trusts the namespace prefix from the client-provided `X-Identity-Id` header. Even if a client sends `X-Identity-Id: "user:alice@..."`, the backend strips the prefix and forces `browser:alice@...`. Only verified authentication (Azure headers or JWT) can result in a `user:` prefixed identity.

The key security principle is **namespaced isolation with forced prefixing**:

| Scenario | X-Identity-Id Sent | Backend Resolution | Storage Key |
|----------|-------------------|-------------------|-------------|
| Anonymous user | `browser:550e8400-...` | Strips prefix, forces `browser:` | `browser:550e8400-...` |
| Azure logged-in user | `browser:550e8400-...` | Uses Azure header (priority 1) | `user:alice@...` |
| Attacker spoofing | `user:alice@...` (forged) | No valid auth, strips & forces `browser:` | `browser:alice@...` |

**Why this is secure:** An attacker sending `X-Identity-Id: user:alice@...` gets `browser:alice@...` as their storage key, which is completely separate from the real `user:alice@...` that only authenticated Alice can access.

### Implementing Custom Authentication

To add JWT-based authentication:

1. **Backend** (`tables_routes.py`): Uncomment and configure the JWT verification code in `get_identity_id()`
2. **Frontend** (`utils.tsx`): Implement `getAuthToken()` to retrieve the JWT from your auth context
3. **Add JWT secret** to Flask config: `current_app.config['JWT_SECRET']`

### Azure App Service Authentication

When deployed to Azure with EasyAuth enabled:
- Azure automatically adds `X-MS-CLIENT-PRINCIPAL-ID` header to authenticated requests
- The backend reads this header first (highest priority)
- No frontend changes needed - Azure handles the auth flow

### Frontend Identity Management

The frontend (`src/app/identity.ts`) manages identity as follows:

```typescript
// Identity is always initialized with browser ID
identity: { type: 'browser', id: getBrowserId() }

// If user logs in (e.g., via Azure), it's updated to:
identity: { type: 'user', id: userInfo.userId }

// All API requests send namespaced identity:
// X-Identity-Id: "browser:550e8400-..." or "user:alice@..."
```

This ensures:
1. **Anonymous users**: Work immediately with localStorage-based browser ID
2. **Logged-in users**: Get their verified user ID from the auth provider
3. **Cross-tab consistency**: Browser ID is shared via localStorage across all tabs

## Usage
See the [Usage section on the README.md page](README.md#usage).
34 changes: 29 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
<p align="center">
<a href="https://data-formulator.ai"><img src="https://img.shields.io/badge/🚀_Try_Online_Demo-data--formulator.ai-F59E0B?style=for-the-badge" alt="Try Online Demo"></a>
&nbsp;
<a href="#get-started"><img src="https://img.shields.io/badge/💻_Install_Locally-pip_install-3776AB?style=for-the-badge" alt="Install Locally"></a>
<a href="#get-started"><img src="https://img.shields.io/badge/💻_Install_Locally-uvx_|_pip-3776AB?style=for-the-badge" alt="Install Locally"></a>
</p>

<p align="center">
Expand All @@ -32,6 +32,9 @@ https://github.com/user-attachments/assets/8ca57b68-4d7a-42cb-bcce-43f8b1681ce2


## News 🔥🔥🔥
[01-31-2025] **uv support** — Faster installation with uv
- 🚀 **Install with uv**: Data Formulator now supports installation via [uv](https://docs.astral.sh/uv/), the ultra-fast Python package manager. Get started in seconds with `uvx data_formulator` or `uv pip install data_formulator`.

[01-25-2025] **Data Formulator 0.6** — Real-time insights from live data
- ⚡ **Connect to live data**: Connect to URLs and databases with automatic refresh intervals. Visualizations update automatically as your data changes to provide you live insights. [Demo: track international space station position speed live](https://github.com/microsoft/data-formulator/releases/tag/0.6)
- 🎨 **UI Updates**: Unified UI for data loading; direct drag-and-drop fields from the data table to update visualization designs.
Expand Down Expand Up @@ -127,9 +130,30 @@ Data Formulator enables analysts to iteratively explore and visualize data. Star

Play with Data Formulator with one of the following options:

- **Option 1: Install via Python PIP**
- **Option 1: Install via uv (recommended)**

[uv](https://docs.astral.sh/uv/) is an extremely fast Python package manager. If you have uv installed, you can run Data Formulator directly without any setup:

```bash
# Run data formulator directly (no install needed)
uvx data_formulator
```

Or install it in a project/virtual environment:

```bash
# Install data_formulator
uv pip install data_formulator

# Run data formulator
python -m data_formulator
```

Data Formulator will be automatically opened in the browser at [http://localhost:5000](http://localhost:5000).

- **Option 2: Install via pip**

Use Python PIP for an easy setup experience, running locally (recommend: install it in a virtual environment).
Use pip for installation (recommend: install it in a virtual environment).

```bash
# install data_formulator
Expand All @@ -143,13 +167,13 @@ Play with Data Formulator with one of the following options:

*you can specify the port number (e.g., 8080) by `python -m data_formulator --port 8080` if the default port is occupied.*

- **Option 2: Codespaces (5 minutes)**
- **Option 3: Codespaces (5 minutes)**

You can also run Data Formulator in Codespaces; we have everything pre-configured. For more details, see [CODESPACES.md](CODESPACES.md).

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/data-formulator?quickstart=1)

- **Option 3: Working in the developer mode**
- **Option 4: Working in the developer mode**

You can build Data Formulator locally if you prefer full control over your development environment and develop your own version on top. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).

Expand Down
9 changes: 8 additions & 1 deletion local_server.bat
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,11 @@
:: set https_proxy=http://127.0.0.1:7890

set FLASK_RUN_PORT=5000
python -m py-src.data_formulator.app --port %FLASK_RUN_PORT% --dev

:: Use uv if available, otherwise fall back to python
where uv >nul 2>nul
if %ERRORLEVEL% EQU 0 (
uv run data_formulator --port %FLASK_RUN_PORT% --dev
) else (
python -m data_formulator.app --port %FLASK_RUN_PORT% --dev
)
9 changes: 7 additions & 2 deletions local_server.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
# export http_proxy=http://127.0.0.1:7890
# export https_proxy=http://127.0.0.1:7890

#env FLASK_APP=py-src/data_formulator/app.py FLASK_RUN_PORT=5000 FLASK_RUN_HOST=0.0.0.0 flask run
export FLASK_RUN_PORT=5000
python -m py-src.data_formulator.app --port ${FLASK_RUN_PORT} --dev

# Use uv if available, otherwise fall back to python
if command -v uv &> /dev/null; then
uv run data_formulator --port ${FLASK_RUN_PORT} --dev
else
python -m data_formulator.app --port ${FLASK_RUN_PORT} --dev
fi
Binary file added public/screenshot-stock-price-live.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion py-src/data_formulator/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

def run_app():
"""Launch the Data Formulator Flask application."""
# Import app only when actually running to avoid side effects
# Import app only when actually running to avoid heavy imports at package load
from data_formulator.app import run_app as _run_app
return _run_app()

Expand Down
Loading
Loading