Skip to content

Commit 71e8337

Browse files
committed
chore: update docs
1 parent 4a0202d commit 71e8337

File tree

3 files changed

+205
-41
lines changed

3 files changed

+205
-41
lines changed

.claude/CLAUDE.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Repository Overview
66

7-
A comprehensive geographical database (151k+ cities, 5k+ states, 250 countries) available in 9 formats. This is a **data repository** focused on data integrity and multi-format exports.
7+
A comprehensive geographical database (153k+ cities, 5k+ states, 250 countries) available in 9 formats. This is a **data repository** focused on data integrity and multi-format exports.
88

99
## Architecture: Two-Phase Build System
1010

@@ -26,7 +26,7 @@ contributions/ → [Python Import] → MySQL → [PHP Export] → jso
2626
**Phase 2: PHP Export** (`bin/Commands/Export*.php`)
2727
- Symfony Console commands (one per format)
2828
- Reads directly from MySQL via SELECT queries
29-
- Memory limit: unlimited (handles 151k+ records)
29+
- Memory limit: unlimited
3030
- Auto-discovered by `bin/console` application
3131

3232
## Data Contribution Workflows
@@ -73,7 +73,7 @@ mysql -uroot -proot -e "CREATE DATABASE world CHARACTER SET utf8mb4 COLLATE utf8
7373
mysql -uroot -proot --default-character-set=utf8mb4 world < sql/world.sql
7474

7575
# Validate
76-
mysql -uroot -proot -e "USE world; SELECT COUNT(*) FROM cities;" # Should be ~151,024
76+
mysql -uroot -proot -e "USE world; SELECT COUNT(*) FROM cities;"
7777
```
7878

7979
### Import & Export (Local Testing)
@@ -241,7 +241,7 @@ mysql -uroot -e "USE world; SHOW TABLES;" # Run queries
241241

242242
## Performance Expectations
243243

244-
- MySQL import: ~3 seconds (151k+ records)
244+
- MySQL import: ~3 seconds
245245
- JSON export: ~4 seconds
246246
- CSV export: ~1 second
247247
- XML export: ~9 seconds

.github/CONTRIBUTING.md

Lines changed: 151 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,19 @@ We've made contributing easier! You can now edit simple JSON files organized by
6262
**Important for Contributors:**
6363
-**DO**: Edit JSON files in `contributions/` directory
6464
-**DON'T**: Edit SQL, CSV, XML, YAML, or other export files (auto-generated)
65+
-**DON'T**: Edit GeoJSON or TOON format files (auto-generated from database)
6566
-**DON'T**: Run build scripts or exports locally (GitHub Actions handles this)
6667
- 🔒 **MySQL workflow**: Reserved for repository maintainers only
6768

69+
### Understanding Export Formats
70+
71+
All data you contribute via JSON is automatically exported to **11 different formats**:
72+
- **Core Formats**: JSON, MySQL, PostgreSQL, SQLite, SQL Server, MongoDB, XML, YAML, CSV
73+
- **Geographic Format**: GeoJSON (RFC 7946 standard for mapping applications)
74+
- **AI-Optimized Format**: TOON (Token-Oriented Object Notation - reduces LLM token usage by ~40%)
75+
76+
You don't need to worry about these formats - they're automatically generated from the MySQL database!
77+
6878
## Glance at Table Structure
6979

7080
### regions.sql
@@ -128,15 +138,17 @@ We've made contributing easier! You can now edit simple JSON files organized by
128138
| ----------------- | --------------- | -------------- | -------------- |
129139
| `id` | integer | Unique ID - omit for new states (auto-assigned) | Auto |
130140
| `name` | string | The official name of the state. Use WikiData or Wikipedia or some other legitimate source. | required |
141+
| `state_code` | string | State/province code (e.g., "CA" for California) | required |
131142
| `country_id` | integer | Unique id of parent country from `countries.sql` | required |
132143
| `country_code` | string | ISO2 code of the parent country | required |
133144
| `fips_code` | string | ISO-3166-2 subdivision code for the state |
134145
| `iso2` | string | ISO2 code of the parent state |
135146
| `iso3166_2` | string | ISO 3166-2 subdivision code |
136-
| `type` | string | Type of state (province, state, etc.) |
147+
| `type` | string | Type of state (province, state, region, etc.) |
137148
| `level` | integer | Administrative level of the subdivision |
138149
| `parent_id` | integer | ID of parent administrative division |
139150
| `native` | string | Native name of the state |
151+
| `population` | integer | Population of the state - [Wikipedia](https://en.wikipedia.org/wiki/List_of_states_by_population) |
140152
| `latitude` | decimal | Latitude coordinates |
141153
| `longitude` | decimal | Longitude coordinates |
142154
| `timezone` | string | IANA timezone identifier (e.g., America/New_York) |
@@ -158,9 +170,146 @@ We've made contributing easier! You can now edit simple JSON files organized by
158170
| `latitude` | decimal | Latitude coordinates | required
159171
| `longitude` | decimal | Longitude coordinates | required
160172
| `native` | string | Native name of the city |
161-
| `timezone` | string | IANA timezone identifier (e.g., America/New_York) |
173+
| `population` | integer | Population of the city - [Wikipedia](https://en.wikipedia.org/wiki/List_of_cities_by_population) |
174+
| `type` | string | Type of settlement (city, town, village, etc.) |
175+
| `level` | integer | Administrative level |
176+
| `parent_id` | integer | ID of parent administrative division |
177+
| `timezone` | string | IANA timezone identifier (e.g., America/New_York) - **REQUIRED for all cities** |
162178
| `translations` | text | JSON object with name translations |
163179
| `created_at` | timestamp | Optional - Creation timestamp (ISO 8601 format). If omitted, database uses default value. |
164180
| `updated_at` | timestamp | Optional - Last update timestamp (ISO 8601 format). If omitted, database auto-updates. |
165181
| `flag`| boolean | Optional - Auto-managed by system, defaults to 1. Contributors can omit this field. |
166182
| `wikiDataId` | string | The unique ID from wikiData.org |
183+
184+
## Data Quality Guidelines
185+
186+
### Required Data Standards
187+
188+
#### Timezone Information (Critical!)
189+
- **100% of cities MUST have valid IANA timezone identifiers**
190+
- Use tools like [TimeZoneDB](https://timezonedb.com/) or [GeoNames](https://www.geonames.org/) to find correct timezones
191+
- Format: `Continent/City` (e.g., `America/New_York`, `Europe/London`, `Asia/Tokyo`)
192+
- **Why it matters**: This database maintains 100% timezone coverage - don't break it!
193+
194+
#### Coordinates Accuracy
195+
- Use precise decimal coordinates (minimum 5 decimal places recommended)
196+
- Verify coordinates using Google Maps, OpenStreetMap, or official sources
197+
- Format:
198+
- Latitude: -90 to +90 (negative = South, positive = North)
199+
- Longitude: -180 to +180 (negative = West, positive = East)
200+
201+
#### Naming Conventions
202+
- Use official, commonly recognized names in English
203+
- Add native names in the `native` field
204+
- Use proper capitalization (e.g., "New York" not "new york")
205+
- Avoid abbreviations unless officially used (e.g., "St." in "St. Louis" is acceptable)
206+
207+
### Data Sources
208+
209+
**Recommended Sources (in priority order):**
210+
1. **Official Government Websites** - Most authoritative
211+
2. **WikiData** ([wikidata.org](https://www.wikidata.org/)) - Structured, multilingual data
212+
3. **Wikipedia** - Well-sourced, community-verified
213+
4. **GeoNames** ([geonames.org](https://www.geonames.org/)) - Comprehensive geographic database
214+
5. **OpenStreetMap** - Community-maintained geographic data
215+
216+
**Always include source in your PR description!**
217+
218+
### Common Mistakes to Avoid
219+
220+
**Don't Do This:**
221+
- Adding cities without timezone information
222+
- Using approximate coordinates (e.g., country center for city location)
223+
- Copying data without verification
224+
- Adding duplicate entries (check first!)
225+
- Using non-standard timezone names (e.g., "PST" instead of "America/Los_Angeles")
226+
227+
**Do This Instead:**
228+
- Research proper IANA timezone for each city
229+
- Use precise coordinates for the city center or main landmark
230+
- Verify data from multiple reliable sources
231+
- Search existing data before adding new entries
232+
- Use official IANA timezone database format
233+
234+
### Population Data (Optional but Recommended)
235+
236+
When adding population data:
237+
- Use recent census data or official estimates
238+
- Include source year if possible in PR description
239+
- Round to reasonable precision (avoid false precision)
240+
- **Format**: Integer (e.g., `1000000` not `"1,000,000"`)
241+
242+
### How to Find Foreign Keys
243+
244+
**Finding State IDs:**
245+
```bash
246+
# Search in contributions/states/states.json
247+
grep -A 5 '"name": "California"' contributions/states/states.json
248+
```
249+
250+
**Finding Country IDs:**
251+
```bash
252+
# Search in contributions/countries/countries.json
253+
grep -A 5 '"name": "United States"' contributions/countries/countries.json
254+
```
255+
256+
Or use the [CSC Update Tool](https://manager.countrystatecity.in/) which automatically looks up IDs for you!
257+
258+
## Pull Request Guidelines
259+
260+
### Before Submitting
261+
262+
- [ ] Data verified from authoritative sources
263+
- [ ] Timezone validated using IANA timezone database
264+
- [ ] Coordinates checked on a map
265+
- [ ] No duplicate entries
266+
- [ ] Source included in PR description
267+
- [ ] Only JSON files in `contributions/` edited
268+
269+
### PR Description Template
270+
271+
```markdown
272+
## Summary
273+
[Brief description of changes]
274+
275+
## Type of Change
276+
- [ ] New city/state/country
277+
- [ ] Update existing data
278+
- [ ] Fix incorrect data
279+
- [ ] Add missing fields
280+
281+
## Data Sources
282+
- Source 1: [URL]
283+
- Source 2: [URL]
284+
285+
## Checklist
286+
- [ ] Timezones verified
287+
- [ ] Coordinates verified
288+
- [ ] Data sources cited
289+
- [ ] No duplicate entries
290+
```
291+
292+
### Review Process
293+
294+
1. **Automated Checks**: GitHub Actions validates JSON format
295+
2. **Data Import**: Your changes are imported to MySQL
296+
3. **Export Generation**: All 11 formats regenerated
297+
4. **Maintainer Review**: Human review of data quality and sources
298+
5. **Merge**: Changes go live in next release!
299+
300+
## Need Help?
301+
302+
### Tools & Resources
303+
- **[CSC Update Tool](https://manager.countrystatecity.in/)** - Easiest way to contribute (GUI)
304+
- **[API Documentation](https://docs.countrystatecity.in/)** - Explore existing data
305+
- **[Demo Database](https://demo.countrystatecity.in/)** - Browse online
306+
- **[IANA Timezone Database](https://www.iana.org/time-zones)** - Official timezone reference
307+
308+
### Questions?
309+
- Open a [GitHub Discussion](https://github.com/dr5hn/countries-states-cities-database/discussions)
310+
- Check existing [Issues](https://github.com/dr5hn/countries-states-cities-database/issues)
311+
- Review [contributions/README.md](../contributions/README.md) for detailed examples
312+
313+
## Recognition
314+
315+
All contributors are recognized in our [README](../README.md) and commit history. Thank you for helping maintain the most comprehensive open geographical database! 🌍

README.md

Lines changed: 50 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,15 @@
88
![release](https://img.shields.io/github/v/release/dr5hn/countries-states-cities-database?style=flat-square)
99
![size](https://img.shields.io/github/repo-size/dr5hn/countries-states-cities-database?label=size&style=flat-square)
1010

11-
Full Database of city state country available in JSON, MYSQL, PSQL, SQLITE, SQLSERVER, XML, YAML, MONGODB & CSV format.
11+
Full Database of city state country available in **11 formats**: JSON, MYSQL, PSQL, SQLITE, SQLSERVER, XML, YAML, MONGODB, CSV, GEOJSON & TOON.
1212
All Countries, States & Cities are Covered & Populated with Different Combinations & Versions.
1313

1414
## Why Choose This Database?
1515

16-
***Most Comprehensive** - 151,024+ cities from 250 countries with timezone & multilingual support (19 languages)
16+
***Most Comprehensive** - 153K+ cities from 250 countries with 100% timezone coverage & multilingual support (19 languages)
1717
***Multiple Integration Options** - NPM/PyPI packages, REST API, Export Tool, or direct downloads
1818
***Production Ready** - Trusted by thousands of developers, monthly updates
19-
***Every Format You Need** - JSON, SQL, MongoDB, CSV, XML, YAML - use what fits your stack
19+
***Every Format You Need** - JSON, SQL, MongoDB, CSV, XML, YAML, GeoJSON, Toon - use what fits your stack
2020
***100% Free & Open Source** - ODbL licensed, no usage restrictions, developer-friendly
2121

2222
Save hundreds of hours collecting and maintaining geographical data. Get accurate, structured, ready-to-use data right now.
@@ -113,32 +113,38 @@ npm install @countrystatecity/timezones
113113

114114
## Available Formats
115115

116-
- JSON
117-
- MYSQL
118-
- PSQL
119-
- SQLITE
120-
- SQLSERVER
121-
- MONGODB
122-
- XML
123-
- YAML
124-
- CSV
116+
### Core Formats
117+
- **JSON** - Lightweight data interchange format
118+
- **MYSQL** - MySQL database dumps with complete schema
119+
- **PSQL** - PostgreSQL database exports
120+
- **SQLITE** - Portable, self-contained database files
121+
- **SQLSERVER** - Microsoft SQL Server compatible scripts
122+
- **MONGODB** - NoSQL document collections + dump
123+
- **XML** - Structured markup language format
124+
- **YAML** - Human-readable configuration format
125+
- **CSV** - Spreadsheet-compatible tabular data
125126

126-
**Note:** DuckDB format is available via manual conversion from SQLite files. See the [Export to DuckDB](#export-to-duckdb) section for instructions.
127+
### Geographic & AI-Optimized Formats
128+
- **GEOJSON** - RFC 7946 standard for geographic features (Point geometry)
129+
- **TOON** - Token-Oriented Object Notation for LLM consumption (~40% fewer tokens vs JSON) [📖 Format Spec](https://github.com/toon-format/toon)
130+
131+
### Optional Formats
132+
- **DuckDB** - Available via manual conversion from SQLite files. See [Export to DuckDB](#export-to-duckdb) for instructions.
127133

128134
## Distribution Files Info
129135

130-
| File | JSON | MYSQL | PSQL | SQLITE | SQLSERVER | MONGODB | XML | YAML | CSV |
131-
| :------------------------- | :--- | :---- | :--- | :----- | :-------- | :------ | :-- | :--- | :-- |
132-
| Regions |🗜️ |🗜️ |||||🗜️ ||🗜️ |
133-
| Subregions |🗜️ |🗜️ |||||🗜️ ||🗜️ |
134-
| Countries |🗜️ |🗜️ |||||🗜️ ||🗜️ |
135-
| States |🗜️ |🗜️ |||||🗜️ ||🗜️ |
136-
| Cities |🗜️ |🗜️ |||||🗜️ ||🗜️ |
137-
| Country+States |🗜️ | NA | NA | NA | NA | NA | NA | NA | NA |
138-
| Country+Cities |🗜️ | NA | NA | NA | NA | NA | NA | NA | NA |
139-
| Country+State+Cities/World |🗜️ |🗜️ ||||| NA | NA | NA |
136+
| File | JSON | MYSQL | PSQL | SQLITE | SQLSERVER | MONGODB | XML | YAML | CSV | GEOJSON | TOON |
137+
| :------------------------- | :--- | :---- | :--- | :----- | :-------- | :------ | :-- | :--- | :-- | :------ | :--- |
138+
| Regions ||||||||| | NA | NA |
139+
| Subregions ||||||||| | NA | NA |
140+
| Countries ||||||||| || |
141+
| States ||||||||| || |
142+
| Cities ||||||||| || |
143+
| Country+States || NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
144+
| Country+Cities || NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
145+
| Country+State+Cities/World ||||||| NA | NA | NA | NA | NA |
140146

141-
**Legend:** ✅ = Available | 🗜️ = Compressed (.gz) version also available
147+
**Legend:** ✅ = Available | NA = Not applicable for this format
142148

143149

144150
## Demo
@@ -150,11 +156,11 @@ https://dr5hn.github.io/countries-states-cities-database/
150156
Total Regions : 6 <br>
151157
Total Sub Regions : 22 <br>
152158
Total Countries : 250 <br>
153-
Total States/Regions/Municipalities : 5,038 <br>
154-
Total Cities/Towns/Districts : 151,024 <br>
155-
Total Timezones : 423 (97.9% IANA coverage) <br>
159+
Total States/Regions/Municipalities : 5,299 <br>
160+
Total Cities/Towns/Districts : 153,765 <br>
161+
Total Timezones : 423 (100% IANA coverage) <br>
156162

157-
Last Updated On : 03th Dec 2025
163+
Last Updated On : 3rd Dec 2025
158164

159165
## Repository Architecture
160166

@@ -217,13 +223,22 @@ The conversion script will create DuckDB database files that maintain the same s
217223
### Export Performance
218224
| Format | Export Time | World DB Size | Compressed (.gz) |
219225
|--------|-------------|---------------|------------------|
220-
| **CSV** | ~1s | 45 MB | 9 MB (fastest) |
221-
| **JSON** | ~4s | 161 MB | 18 MB |
222-
| **MongoDB** | ~1s | 140 MB | - |
223-
| **SQL** | ~3s | 180 MB | 22 MB |
224-
| **SQLite** | ~45s | 85 MB | - |
225-
| **XML** | ~9s | 220 MB | 15 MB |
226-
| **YAML** | ~17s | 195 MB | - |
226+
| **CSV** | ~1s | 40 MB | 9 MB (fastest) |
227+
| **JSON** | ~4s | 271 MB | 18 MB |
228+
| **MongoDB** | ~1s | 30 MB | 20 MB (dump) |
229+
| **SQL** | ~3s | 86 MB | 22 MB |
230+
| **SQLite** | ~45s | 89 MB | - |
231+
| **XML** | ~9s | 91 MB | 15 MB |
232+
| **YAML** | ~17s | 68 MB | - |
233+
| **GeoJSON** | ~8s | 208 MB | 24 MB |
234+
| **Toon** | ~5s | 23 MB | 20 MB |
235+
236+
> **💡 Format Recommendations:**
237+
> - **Web/Mobile Apps**: Use JSON or CSV for easy parsing
238+
> - **Databases**: Import SQL, PSQL, or SQLite files directly
239+
> - **GIS/Mapping**: Use GeoJSON for Leaflet, Mapbox, or PostGIS
240+
> - **AI/LLM Projects**: Use TOON format to reduce token usage by ~40%
241+
> - **Analytics**: DuckDB or SQLite for fast analytical queries
227242
228243
### API Response Times (Average)
229244
- Countries: ~50ms | States: ~180ms | Cities by State: ~80ms | Search: ~120ms

0 commit comments

Comments
 (0)