Skip to content

Commit 468df54

Browse files
passrenPeng Ren
andauthored
0.2.4 - Fixed several bugs and add alias support (#6)
* Update connection string pattern * Fixed pagination bug * Fixed the type_code issue in desc * Reformat code * Unify the format of code * Support alias in SQL query * Update README to reflect the new changes * Fix test case issue --------- Co-authored-by: Peng Ren <ia250@cummins.com>
1 parent f0738f4 commit 468df54

24 files changed

+439
-288
lines changed

README.md

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,18 @@ PyMongoSQL is a Python [DB API 2.0 (PEP 249)](https://www.python.org/dev/peps/pe
1414

1515
## Objectives
1616

17-
PyMongoSQL implements the DB API 2.0 interfaces to provide SQL-like access to MongoDB. The project aims to:
17+
PyMongoSQL implements the DB API 2.0 interfaces to provide SQL-like access to MongoDB, built on PartiQL syntax for querying semi-structured data. The project aims to:
1818

19-
- Bridge the gap between SQL and NoSQL by providing SQL capabilities for MongoDB
20-
- Support standard SQL DQL (Data Query Language) operations including SELECT statements with WHERE, ORDER BY, and LIMIT clauses
19+
- Bridge the gap between SQL and NoSQL by providing SQL capabilities for MongoDB's nested document structures
20+
- Support standard SQL DQL (Data Query Language) operations including SELECT statements with WHERE, ORDER BY, and LIMIT clauses on nested and hierarchical data
2121
- Provide seamless integration with existing Python applications that expect DB API 2.0 compliance
22-
- Enable easy migration from traditional SQL databases to MongoDB
22+
- Enable easy migration from traditional SQL databases to MongoDB without rewriting queries for document traversal
2323

2424
## Features
2525

2626
- **DB API 2.0 Compliant**: Full compatibility with Python Database API 2.0 specification
27+
- **PartiQL-based SQL Syntax**: Built on [PartiQL](https://partiql.org/tutorial.html) (SQL for semi-structured data), enabling seamless SQL querying of nested and hierarchical MongoDB documents
28+
- **Nested Structure Support**: Query and filter deeply nested fields and arrays within MongoDB documents using standard SQL syntax
2729
- **SQLAlchemy Integration**: Complete ORM and Core support with dedicated MongoDB dialect
2830
- **SQL Query Support**: SELECT statements with WHERE conditions, field selection, and aliases
2931
- **Connection String Support**: MongoDB URI format for easy configuration
@@ -140,6 +142,7 @@ while users:
140142
### SELECT Statements
141143
- Field selection: `SELECT name, age FROM users`
142144
- Wildcards: `SELECT * FROM products`
145+
- **Field aliases**: `SELECT name as user_name, age as user_age FROM users`
143146
- **Nested fields**: `SELECT profile.name, profile.age FROM users`
144147
- **Array access**: `SELECT items[0], items[1].name FROM orders`
145148

@@ -176,6 +179,27 @@ while users:
176179

177180
These features are on our development roadmap and contributions are welcome!
178181

182+
## Apache Superset Integration
183+
184+
PyMongoSQL can be used as a database driver in Apache Superset for querying and visualizing MongoDB data:
185+
186+
1. **Install PyMongoSQL**: Install PyMongoSQL on the Superset app server:
187+
```bash
188+
pip install pymongosql
189+
```
190+
2. **Create Connection**: Connect to your MongoDB instance using the connection URI with superset mode:
191+
```
192+
mongodb://username:password@host:port/database?mode=superset
193+
```
194+
or for MongoDB Atlas:
195+
```
196+
mongodb+srv://username:password@host/database?mode=superset
197+
```
198+
3. **Use SQL Lab**: Write and execute SQL queries against MongoDB collections directly in Superset's SQL Lab
199+
4. **Create Visualizations**: Build charts and dashboards from your MongoDB queries using Superset's visualization tools
200+
201+
This allows seamless integration between MongoDB data and Superset's BI capabilities without requiring data migration to traditional SQL databases.
202+
179203
## Contributing
180204

181205
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

pymongosql/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
if TYPE_CHECKING:
77
from .connection import Connection
88

9-
__version__: str = "0.2.3"
9+
__version__: str = "0.2.4"
1010

1111
# Globals https://www.python.org/dev/peps/pep-0249/#globals
1212
apilevel: str = "2.0"

pymongosql/connection.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,18 @@ def __init__(
3838
3939
Supports connection string patterns:
4040
- mongodb://host:port/database - Core driver (no subquery support)
41-
- mongodb+superset://host:port/database - Superset driver with subquery support
41+
- mongodb+srv://host:port/database - Cloud/SRV connection string
42+
- mongodb://host:port/database?mode=superset - Superset driver with subquery support
43+
- mongodb+srv://host:port/database?mode=superset - Cloud SRV with superset mode
44+
45+
Mode is specified via the ?mode= query parameter. If not specified, defaults to "standard".
4246
4347
See PyMongo MongoClient documentation for full parameter details.
4448
https://www.mongodb.com/docs/languages/python/pymongo-driver/current/connect/mongoclient/
4549
"""
4650
# Check if connection string specifies mode
4751
connection_string = host if isinstance(host, str) else None
48-
mode, host = ConnectionHelper.parse_connection_string(connection_string)
52+
mode, db_from_uri, host = ConnectionHelper.parse_connection_string(connection_string)
4953

5054
self._mode = kwargs.pop("mode", None)
5155
if not self._mode and mode:
@@ -56,7 +60,10 @@ def __init__(
5660
self._port = port or 27017
5761

5862
# Handle database parameter separately (not a MongoClient parameter)
63+
# Explicit 'database' parameter takes precedence over database in URI
5964
self._database_name = kwargs.pop("database", None)
65+
if not self._database_name and db_from_uri:
66+
self._database_name = db_from_uri
6067

6168
# Store all PyMongo parameters to pass through directly
6269
self._pymongo_params = kwargs.copy()

pymongosql/executor.py

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,4 @@
11
# -*- coding: utf-8 -*-
2-
"""
3-
Query execution strategies for handling both simple and subquery-based SQL operations.
4-
5-
This module provides different execution strategies:
6-
- StandardExecution: Direct MongoDB query for simple SELECT statements
7-
8-
The intermediate database is configurable - any backend implementing QueryDatabase
9-
interface can be used (SQLite3, PostgreSQL, MySQL, etc.).
10-
"""
11-
122
import logging
133
from abc import ABC, abstractmethod
144
from dataclasses import dataclass

pymongosql/helper.py

Lines changed: 64 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
import logging
99
from typing import Optional, Tuple
10-
from urllib.parse import urlparse
10+
from urllib.parse import parse_qs, urlparse
1111

1212
_logger = logging.getLogger(__name__)
1313

@@ -17,52 +17,80 @@ class ConnectionHelper:
1717
1818
Supports connection string patterns:
1919
- mongodb://host:port/database - Core driver (no subquery support)
20-
- mongodb+superset://host:port/database - Superset driver with subquery support
20+
- mongodb+srv://host:port/database - Cloud/SRV connection string
21+
- mongodb://host:port/database?mode=superset - Superset driver with subquery support
22+
- mongodb+srv://host:port/database?mode=superset - Cloud SRV with superset mode
23+
24+
Mode is specified via query parameter (?mode=superset) and defaults to "standard" if not specified.
2125
"""
2226

2327
@staticmethod
24-
def parse_connection_string(connection_string: str) -> Tuple[str, str, Optional[str], int, Optional[str]]:
28+
def parse_connection_string(connection_string: Optional[str]) -> Tuple[Optional[str], Optional[str], Optional[str]]:
2529
"""
26-
Parse PyMongoSQL connection string and determine driver mode.
30+
Parse MongoDB connection string and extract driver mode from query parameters.
31+
32+
Mode is extracted from the 'mode' query parameter and removed from the normalized
33+
connection string. Database name is extracted from the path. If mode is not specified,
34+
it defaults to "standard".
35+
36+
Supports all standard MongoDB connection string patterns:
37+
mongodb://[username:password@]host1[:port1][,host2[:port2]...][/[defaultauthdb]?options]
38+
39+
Args:
40+
connection_string: MongoDB connection string
41+
42+
Returns:
43+
Tuple of (mode, database_name, normalized_connection_string)
44+
- mode: "standard" (default) or other mode values specified via ?mode= parameter
45+
- database_name: extracted database name from path, or None if not specified
46+
- normalized_connection_string: connection string without the mode parameter
2747
"""
2848
try:
2949
if not connection_string:
30-
return "standard", None
50+
return "standard", None, None
3151

3252
parsed = urlparse(connection_string)
33-
scheme = parsed.scheme
3453

3554
if not parsed.scheme:
36-
return "standard", connection_string
37-
38-
base_scheme = "mongodb"
39-
mode = "standard"
40-
41-
# Determine mode from scheme
42-
if "+" in scheme:
43-
base_scheme = scheme.split("+")[0].lower()
44-
mode = scheme.split("+")[-1].lower()
45-
46-
host = parsed.hostname or "localhost"
47-
port = parsed.port or 27017
48-
database = parsed.path.lstrip("/") if parsed.path else None
49-
50-
# Build normalized connection string with mongodb scheme (removing any +mode)
51-
# Reconstruct netloc with credentials if present
52-
netloc = host
53-
if parsed.username:
54-
creds = parsed.username
55-
if parsed.password:
56-
creds += f":{parsed.password}"
57-
netloc = f"{creds}@{host}"
58-
netloc += f":{port}"
59-
60-
query_part = f"?{parsed.query}" if parsed.query else ""
61-
normalized_connection_string = f"{base_scheme}://{netloc}/{database or ''}{query_part}"
62-
63-
_logger.debug(f"Parsed connection string - Mode: {mode}, Host: {host}, Port: {port}, Database: {database}")
64-
65-
return mode, normalized_connection_string
55+
return "standard", None, connection_string
56+
57+
# Extract mode from query parameters (defaults to "standard" if not specified)
58+
query_params = parse_qs(parsed.query, keep_blank_values=True) if parsed.query else {}
59+
mode = query_params.get("mode", ["standard"])[0]
60+
61+
# Extract database name from path
62+
database_name = None
63+
if parsed.path:
64+
# Remove leading slash and trailing slashes
65+
path_parts = parsed.path.strip("/").split("/")
66+
if path_parts and path_parts[0]: # Get the first path segment as database name
67+
database_name = path_parts[0]
68+
69+
# Remove mode from query parameters
70+
query_params.pop("mode", None)
71+
72+
# Rebuild query string without mode parameter
73+
query_string = (
74+
"&".join(f"{k}={v}" if v else k for k, v_list in query_params.items() for v in v_list)
75+
if query_params
76+
else ""
77+
)
78+
79+
# Reconstruct the connection string without mode parameter
80+
if query_string:
81+
if parsed.path:
82+
normalized_connection_string = f"{parsed.scheme}://{parsed.netloc}{parsed.path}?{query_string}"
83+
else:
84+
normalized_connection_string = f"{parsed.scheme}://{parsed.netloc}?{query_string}"
85+
else:
86+
if parsed.path:
87+
normalized_connection_string = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
88+
else:
89+
normalized_connection_string = f"{parsed.scheme}://{parsed.netloc}"
90+
91+
_logger.debug(f"Parsed connection string - Mode: {mode}, Database: {database_name}")
92+
93+
return mode, database_name, normalized_connection_string
6694

6795
except Exception as e:
6896
_logger.error(f"Failed to parse connection string: {e}")

pymongosql/result_set.py

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def __init__(
4242
self._is_closed = False
4343
self._cache_exhausted = False
4444
self._total_fetched = 0
45-
self._description: Optional[List[Tuple[str, str, None, None, None, None, None]]] = None
45+
self._description: Optional[List[Tuple[str, Any, None, None, None, None, None]]] = None
4646
self._column_names: Optional[List[str]] = None # Track column order for sequences
4747
self._errors: List[Dict[str, str]] = []
4848

@@ -69,20 +69,22 @@ def _build_description(self) -> None:
6969
if not self._execution_plan.projection_stage:
7070
# No projection specified, build description from column names if available
7171
if self._column_names:
72-
self._description = [
73-
(col_name, "VARCHAR", None, None, None, None, None) for col_name in self._column_names
74-
]
72+
self._description = [(col_name, str, None, None, None, None, None) for col_name in self._column_names]
7573
else:
7674
# Will be built dynamically when columns are established
7775
self._description = None
7876
return
7977

8078
# Build description from projection (now in MongoDB format {field: 1})
8179
description = []
80+
column_aliases = getattr(self._execution_plan, "column_aliases", {})
81+
8282
for field_name, include_flag in self._execution_plan.projection_stage.items():
8383
# SQL cursor description format: (name, type_code, display_size, internal_size, precision, scale, null_ok)
8484
if include_flag == 1: # Field is included in projection
85-
description.append((field_name, "VARCHAR", None, None, None, None, None))
85+
# Use alias if available, otherwise use field name
86+
display_name = column_aliases.get(field_name, field_name)
87+
description.append((display_name, str, None, None, None, None, None))
8688

8789
self._description = description
8890

@@ -202,19 +204,15 @@ def rowcount(self) -> int:
202204
@property
203205
def description(
204206
self,
205-
) -> Optional[List[Tuple[str, str, None, None, None, None, None]]]:
207+
) -> Optional[List[Tuple[str, Any, None, None, None, None, None]]]:
206208
"""Return column description"""
207209
if self._description is None:
208210
# Try to build description from established column names
209211
try:
210-
if not self._cache_exhausted:
211-
# Fetch one result to establish column names if needed
212-
self._ensure_results_available(1)
213-
214212
if self._column_names:
215213
# Build description from established column names
216214
self._description = [
217-
(col_name, "VARCHAR", None, None, None, None, None) for col_name in self._column_names
215+
(col_name, str, None, None, None, None, None) for col_name in self._column_names
218216
]
219217
except Exception as e:
220218
_logger.warning(f"Could not build dynamic description: {e}")
@@ -272,7 +270,6 @@ def fetchall(self) -> List[Sequence[Any]]:
272270

273271
# Now get everything from cache
274272
all_results.extend(self._cached_results)
275-
self._total_fetched += len(self._cached_results)
276273
self._cached_results.clear()
277274
self._cache_exhausted = True
278275

pymongosql/sql/ast.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,11 @@ def parse_to_execution_plan(self) -> ExecutionPlan:
5050
"""Convert the parse result to an ExecutionPlan using BuilderFactory"""
5151
builder = BuilderFactory.create_query_builder().collection(self._parse_result.collection)
5252

53-
builder.filter(self._parse_result.filter_conditions).project(self._parse_result.projection).sort(
54-
self._parse_result.sort_fields
55-
).limit(self._parse_result.limit_value).skip(self._parse_result.offset_value)
53+
builder.filter(self._parse_result.filter_conditions).project(self._parse_result.projection).column_aliases(
54+
self._parse_result.column_aliases
55+
).sort(self._parse_result.sort_fields).limit(self._parse_result.limit_value).skip(
56+
self._parse_result.offset_value
57+
)
5658

5759
return builder.build()
5860

pymongosql/sql/builder.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,4 @@
11
# -*- coding: utf-8 -*-
2-
"""
3-
Query builder for constructing MongoDB queries in a fluent, readable way
4-
"""
52
import logging
63
from dataclasses import dataclass, field
74
from typing import Any, Dict, List, Optional, Union
@@ -16,6 +13,7 @@ class ExecutionPlan:
1613
collection: Optional[str] = None
1714
filter_stage: Dict[str, Any] = field(default_factory=dict)
1815
projection_stage: Dict[str, Any] = field(default_factory=dict)
16+
column_aliases: Dict[str, str] = field(default_factory=dict) # Maps field_name -> alias
1917
sort_stage: List[Dict[str, int]] = field(default_factory=list)
2018
limit_stage: Optional[int] = None
2119
skip_stage: Optional[int] = None
@@ -56,6 +54,7 @@ def copy(self) -> "ExecutionPlan":
5654
collection=self.collection,
5755
filter_stage=self.filter_stage.copy(),
5856
projection_stage=self.projection_stage.copy(),
57+
column_aliases=self.column_aliases.copy(),
5958
sort_stage=self.sort_stage.copy(),
6059
limit_stage=self.limit_stage,
6160
skip_stage=self.skip_stage,
@@ -156,6 +155,16 @@ def skip(self, count: int) -> "MongoQueryBuilder":
156155
_logger.debug(f"Set skip to: {count}")
157156
return self
158157

158+
def column_aliases(self, aliases: Dict[str, str]) -> "MongoQueryBuilder":
159+
"""Set column aliases mapping (field_name -> alias)"""
160+
if not isinstance(aliases, dict):
161+
self._add_error("Column aliases must be a dictionary")
162+
return self
163+
164+
self._execution_plan.column_aliases = aliases
165+
_logger.debug(f"Set column aliases to: {aliases}")
166+
return self
167+
159168
def where(self, field: str, operator: str, value: Any) -> "MongoQueryBuilder":
160169
"""Add a where condition in a readable format"""
161170
condition = self._build_condition(field, operator, value)

pymongosql/sql/handler.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,4 @@
11
# -*- coding: utf-8 -*-
2-
"""
3-
Expression handlers for converting SQL expressions to MongoDB query format
4-
"""
52
import logging
63
import re
74
from abc import ABC, abstractmethod
@@ -42,6 +39,7 @@ class ParseResult:
4239
# Visitor parsing state fields
4340
collection: Optional[str] = None
4441
projection: Dict[str, Any] = field(default_factory=dict)
42+
column_aliases: Dict[str, str] = field(default_factory=dict) # Maps field_name -> alias
4543
sort_fields: List[Dict[str, int]] = field(default_factory=list)
4644
limit_value: Optional[int] = None
4745
offset_value: Optional[int] = None
@@ -894,14 +892,19 @@ def can_handle(self, ctx: Any) -> bool:
894892

895893
def handle_visitor(self, ctx: PartiQLParser.SelectItemsContext, parse_result: "ParseResult") -> Any:
896894
projection = {}
895+
column_aliases = {}
897896

898897
if hasattr(ctx, "projectionItems") and ctx.projectionItems():
899898
for item in ctx.projectionItems().projectionItem():
900899
field_name, alias = self._extract_field_and_alias(item)
901900
# Use MongoDB standard projection format: {field: 1} to include field
902901
projection[field_name] = 1
902+
# Store alias if present
903+
if alias:
904+
column_aliases[field_name] = alias
903905

904906
parse_result.projection = projection
907+
parse_result.column_aliases = column_aliases
905908
return projection
906909

907910
def _extract_field_and_alias(self, item) -> Tuple[str, Optional[str]]:

0 commit comments

Comments
 (0)