diff --git a/di/simtick/README.md b/di/simtick/README.md new file mode 100644 index 00000000..142589ab --- /dev/null +++ b/di/simtick/README.md @@ -0,0 +1,235 @@ +# di.simtick + +Realistic intraday tick data simulator for KDB-X with configurable market microstructure. + +For a detailed explanation of the mathematical foundations, see the [Technical Paper](docs/IntradayTickSimulatorPaper.pdf). + +## About + +Realistic synthetic tick data is valuable for many quantitative finance workflows. This module generates trade and quote data that captures key statistical properties of real markets. + +The module is designed for **progressive complexity**: configure from simple to sophisticated scenarios by adjusting parameters: + +- **Baseline**: Set `alpha:0` and equal multipliers for basic Poisson arrivals with GBM prices +- **Add seasonality**: Vary `openmult`, `midmult`, `closemult` for U-shape or J-shape intraday patterns +- **Add clustering**: Increase `alpha` to enable Hawkes self-excitation for realistic trade bursts +- **Add jumps**: Switch to `pricemodel:jump` for discontinuous price moves +- **Add quotes**: Set `generatequotes:1b` for bid-ask spread dynamics + +This flexibility allows the same module to serve quick prototypes and sophisticated stress-testing scenarios. + +### Key Features + +- **Trade clustering** — real trades arrive in bursts, not uniformly. We use a Hawkes process to model this self-exciting behavior. +- **Intraday seasonality** — trading activity is high at open and close, low at midday. Configurable U-shape or J-shape patterns. +- **Price dynamics** — GBM with optional jump-diffusion captures continuous price movement and occasional discontinuities. +- **Microstructure** — bid-ask spreads that widen at open/close, quote updates between trades. + +### Use Cases + +**Stress testing and scenario analysis** — Generate data under severe but plausible conditions. Simulate liquidity shocks by lowering `baseintensity`, gap moves using the jump-diffusion model (`pricemodel:jump`), or extreme volatility regimes by increasing `vol`. Test how your systems behave when markets break from normal patterns. + +**Sensitivity and robustness testing** — Vary parameters systematically to understand how strategies respond to changes in volatility, trade frequency, or spread dynamics. Identify breaking points before they occur in production. + +**System development** — Stress-test data ingestion pipelines by adjusting trade arrival rates. Increase `baseintensity` (e.g., from 0.5 to 50) and `alpha` to simulate high-frequency bursts. This lets you verify that your database, message queues, and processing logic handle peak loads without data loss or latency spikes. + +**Real-time demos** — Feed simulated data to dashboards, visualization tools, or trading interfaces. Useful for demos, training sessions, or testing UI responsiveness without connecting to live markets. + +### Limitations + +This module emphasizes **trade generation** and derives quotes in a simplified manner. Quotes are constructed *after* trades to ensure consistency with executed prices. This approach is computationally efficient but inverts the true market causality where quotes exist first and trades result from order matching. + +**Not suitable for:** + +- **Advanced Market-making research** — no order book queue dynamics, no queue position modeling +- **Execution optimization** — no realistic fill probability or market impact simulation +- **HFT strategy development** — quote generation is not causally realistic + +For these advanced use cases, a full limit order book simulator with queue dynamics would be preferred. + +### Next Steps + +A future module will extend this simulator to support **multi-instrument generation with correlation**. Using KDB-X module hierarchy, a new `di.simmulti` module will build on `di.simtick` as the single-instrument foundation, adding: + +- Correlated processes +- Configurable correlation matrices +- Synchronized or independent arrival processes + +Correlated price paths across assets are essential for: + +- **Portfolio risk management** — stress testing diversified portfolios under correlated drawdowns +- **Value at Risk (VaR) and Expected Shortfall (ES)** — generating scenarios for tail risk estimation +- **Cross-asset strategy testing** — pairs trading, statistical arbitrage, index replication + + +### Configuration + +Simulations are driven by a configuration dictionary containing all model parameters (arrival rates, volatility, spread settings, etc.). Rather than building these manually, the module reads configurations from a **CSV file**. + +A ready-to-use file `presets.csv` is included with five market scenarios (default, liquid, illiquid, volatile, jumpy). You can: + +- Use presets directly: `cfg:cfgs`default` +- Modify values for specific runs: `cfg[`vol]:0.4` +- Add new rows to define custom scenarios +- Create your own CSV following the same schema + +To see all available parameters and their descriptions: +```q +q)simtick.describe[] +``` + + +## Overview + +A KDB-X module for simulating realistic intraday trade and quote data. Features: + +- **Hawkes process** for trade arrivals (self-exciting, captures trade clustering) +- **GBM / Jump-diffusion** for price dynamics +- **Configurable intraday patterns** (U-shape or J-shape intensity) +- **Quote generation** with realistic bid-ask spreads +- **CSV-based presets** for different market scenarios + +## Installation + +1. Add this repository to your `QPATH`: +```bash +export QPATH=$QPATH:/path/to/kdbx-modules +``` + +2. Load the module: +```q +q)simtick:use`di.simtick +``` + +## Usage + +### Basic usage +```q +q)simtick:use`di.simtick +q)cfgs:simtick.loadconfig`:di/simtick/presets.csv +q)cfg:cfgs`default +q)simtick.run[cfg] +time price qty +------------------------------------------ +2026.01.20D09:30:02.487640474 100 43 +2026.01.20D09:30:03.846514899 100.0011 32 +2026.01.20D09:30:04.444929571 100.0122 78 +... +``` + +### With quote generation +```q +q)cfg[`generatequotes]:1b +q)result:simtick.run[cfg] +q)result`trade +q)result`quote +``` + +## API + +| Function | Description | +|----------|-------------| +| `simtick.run[cfg]` | Full simulation - returns trades (or dict with quotes) | +| `simtick.arrivals[cfg]` | Generate arrival times only (seconds from open) | +| `simtick.price[cfg;times]` | Generate prices for given times | +| `simtick.loadconfig[filepath]` | Load presets from CSV | +| `simtick.describe[]` | Return configuration schema as table | + +## Presets + +| Preset | Description | +|--------|-------------| +| `default` | Standard trading day | +| `liquid` | High volume, tighter spreads | +| `illiquid` | Low volume | +| `volatile` | Higher price volatility | +| `jumpy` | Jump-diffusion price model | + +## Configuration Parameters + +| Parameter | Description | Example | +|-----------|-------------|---------| +| `baseintensity` | Base arrival rate (trades/sec) | 0.5 | +| `alpha` | Hawkes excitation (0 = Poisson) | 0.3 | +| `beta` | Hawkes decay (must be > alpha) | 1.0 | +| `vol` | Annualized volatility | 0.2 | +| `drift` | Annualized drift | 0.05 | +| `transitionpoint` | Intraday shape (0.3=J, 0.5=U) | 0.3 | +| `pricemodel` | `gbm` or `jump` | `gbm` | +| `qtymodel` | `lognormal` or `constant` | `lognormal` | +| `avgqty` | Average trade size | 100 | +| `basespread` | Base bid-ask spread (fraction) | 0.001 | +| `generatequotes` | Generate quotes flag | 0b | +| `openmult` | Opening intensity multiplier | 1.5 | +| `midmult` | Midday intensity multiplier | 0.5 | +| `closemult` | Closing intensity multiplier | 3.0 | + +## Testing + +```q +q)k4unit:use`di.k4unit +q)k4unit.moduletest`di.simtick +``` + +### Test Coverage + +| Group | Tests | Description | +|-------|-------|-------------| +| Validation | 3 | Bad configs throw correct errors (alpha >= beta, negative intensity, zero multipliers) | +| Arrivals | 5 | Output properties: non-empty, sorted, positive, within duration, correct type | +| Shape | 3 | Intraday pattern: open > mid, close > mid, J-shape verification | +| Price | 6 | Positive prices, startprice correct, realized vol within tolerance, jump model works | +| Trades | 8 | Correct schema, sorted times, positive prices/qty, integer qty, within session | +| Quotes | 8 | Correct schema, sorted times, bid < ask, positive sizes, quote before first trade | +| Config | 7 | Keyed table, correct column count, correct types (float, symbol, date) | +| Describe | 3 | Returns table, correct columns, correct parameter count | +| Constant Qty | 2 | All quantities equal, quantity equals avgqty | +| Reproducibility | 1 | Same seed produces same output | +| **Total** | **46** | | + +## Documentation + +The `docs/` folder contains: + +- **[IntradayTickSimulatorPaper.pdf](docs/IntradayTickSimulatorPaper.pdf)** — Technical paper detailing the mathematical foundations of this module (Hawkes process, GBM, jump-diffusion, quote generation) +- **[HawkesProcessesInFinance.pdf](docs/HawkesProcessesInFinance.pdf)** — Reference paper on Hawkes processes in finance (Bacry et al., 2015) + +## Notebooks + +An interactive **[example](notebooks/simtickDemo.ipynb)** using PyKX is available in `notebooks/`. + +### Setup + +```bash +cd di/simtick +python -m venv .venv +source .venv/bin/activate # Linux/Mac +pip install -r requirements.txt +jupyter lab +``` + +### Available Notebooks + +| Notebook | Description | +|----------|-------------| +| `simtickDemo.ipynb` | Load module, run simulation, visualize price and quantity | + +## Project Structure + +``` +di/simtick/ +├── init.q # Module code +├── presets.csv # Market scenario presets +├── test.csv # Unit tests (k4unit format) +├── README.md # This file +├── requirements.txt # Python dependencies +├── docs/ +│ ├── IntradayTickSimulatorPaper.pdf +│ └── HawkesProcessesInFinance.pdf +└── notebooks/ + └── simtickDemo.ipynb +``` + +## License + +MIT diff --git a/di/simtick/docs/HawkesProcessesInFinance.pdf b/di/simtick/docs/HawkesProcessesInFinance.pdf new file mode 100644 index 00000000..37edd4c9 Binary files /dev/null and b/di/simtick/docs/HawkesProcessesInFinance.pdf differ diff --git a/di/simtick/docs/IntradayTickSimulatorPaper.pdf b/di/simtick/docs/IntradayTickSimulatorPaper.pdf new file mode 100644 index 00000000..3dc6b152 Binary files /dev/null and b/di/simtick/docs/IntradayTickSimulatorPaper.pdf differ diff --git a/di/simtick/init.q b/di/simtick/init.q new file mode 100644 index 00000000..f8d62ed5 --- /dev/null +++ b/di/simtick/init.q @@ -0,0 +1,543 @@ +/ di.simtick - realistic intraday tick simulator + +/ Hawkes process: safety multiplier for lambda upper bound +/ ensures thinning algorithm acceptance rate stays reasonable +/ higher values = more conservative bound = slower but safer +excitebuffer:3 + +/ quote generation: maximum intermediate quote updates between trades +/ caps computation cost for large time gaps +maxquoteupdates:10 + +/ quote generation: random jitter range for initial quote offset (milliseconds) +/ adds realism by varying the pre-trade quote timing +initquotejitterms:100 + +/ price movement: fractional tick size for intermediate quote mid-price drift +/ controls how much the mid moves between trades (as fraction of price) +quoteticksize:0.0001 + +/ time unit conversions +nsperms:1000000 +nspersec:1000000000 + + +val.haskeys:{[cfg;reqkeys;fn] + / check config dictionary has all required keys + / cfg: configuration dictionary + / reqkeys: symbol list of required keys + / fn: function name string for error context + if[count missing:reqkeys where not reqkeys in key cfg; + '"(",fn,"): missing config keys - ",", " sv string missing]; + }; + +val.nonempty:{[x;name;fn] + / check list is non-empty + / x: list to check + / name: parameter name for error message + / fn: function name string for error context + if[not count x; '"(",fn,"): ",name," cannot be empty"]; + }; + +val.hascols:{[t;reqcols;fn] + / check table has required columns + / t: table to check + / reqcols: symbol list of required columns + / fn: function name string for error context + if[not all reqcols in cols t; + '"(",fn,"): table missing columns - ",", " sv string reqcols where not reqcols in cols t]; + }; + + +rng.boxmuller:{[n] + / Box-Muller transform for n standard normal random variates + / n: number of samples required + / returns: list of n standard normal floats + m:2*(n+1) div 2; / ensure even count + u:m?1.0; + u:2 0N#u; + r:sqrt -2f*log u 0; + theta:2f*acos[-1]*u 1; + n#(r*cos theta),r*sin theta + }; + +rng.normal:{[n;cfg] + / generate n standard normal random samples + / n: number of samples required + / cfg: config dict containing `rngmodel + / returns: list of n standard normal floats + model:cfg`rngmodel; + $[model=`pseudo; .z.m.rng.boxmuller[n]; + '"rng.normal: unknown rngmodel - ",string model] + }; + + +shape:{[cfg;progress] + / intraday intensity multiplier using cosine interpolation + / cfg: config dict with `openmult`midmult`closemult`transitionpoint + / progress: fraction of trading day elapsed (0 to 1) + / returns: intensity multiplier for current time + / + / transitionpoint controls when to switch from open->mid to mid->close + / 0.5 = symmetric (U-shape), 0.3 = asymmetric (J-shape) + openmult:cfg`openmult; + midmult:cfg`midmult; + closemult:cfg`closemult; + tp:cfg`transitionpoint; + $[progress=duration; :state,enlist[`done]!enlist 1b]; + + / decay excitation + excitation:state[`excitation]*exp neg beta*wait; + + / current intensity + progress:t%duration; + lambda0:baseintensity*.z.m.shape[cfg;progress]; + lambda:lambda0+excitation; + + / accept/reject + accept:(first 1?1.0)=close; '"arrivals: openingtime must be before closingtime"]; + duration:`long$(close-open)%nspersec; + + / upper bound for intensity (for thinning) + maxmult:cfg[`openmult]|cfg[`midmult]|cfg`closemult; + excitationbuffer:1+excitebuffer*alpha%beta; + lambdamax:baseintensity*maxmult*excitationbuffer; + + / params for step function + params:`duration`lambdamax`baseintensity`alpha`beta`cfg!( + duration;lambdamax;baseintensity;alpha;beta;cfg); + + / initial state + init:`t`excitation`times`done!(0f;0f;`float$();0b); + + / run until done + final:.z.m.hawkes.step[params]/[{not x`done};init]; + + final`times + }; + +gbm:{[s;r;eps;t] + / GBM single-step return factor + / s: annualized volatility (sigma) + / r: annualized drift (mu) + / eps: standard normal random variate + / t: time step in years + / returns: multiplicative return factor exp((r - 0.5*s^2)*t + s*sqrt(t)*eps) + exp (t*r-.5*s*s)+eps*s*sqrt t + }; + +pricegbm:{[cfg;dts] + / generate price path using geometric Brownian motion + / cfg: config dict with `startprice`vol`drift`rngmodel + / dts: list of time deltas in years (first element is time to first trade) + / returns: list of prices corresponding to each time point + eps:.z.m.rng.normal[-1+count dts;cfg]; + cfg[`startprice]*prds 1.0,.z.m.gbm[cfg`vol;cfg`drift;eps;1_ dts] + }; + +pricejump:{[cfg;dts] + / generate price path using Merton jump-diffusion model + / dS/S = μdt + σdW + J·dN where J is lognormal, N is Poisson + / cfg: config dict with `startprice`vol`drift`tradingdays`jumpintensity`jumpmean`jumpvol`rngmodel + / dts: list of time deltas in years + / returns: list of prices corresponding to each time point + n:-1+count dts; + stepdts:1_ dts; + + / diffusion component + eps:.z.m.rng.normal[n;cfg]; + diffusion:.z.m.gbm[cfg`vol;cfg`drift;eps;stepdts]; + + / jump component: Poisson arrivals with lognormal sizes + dtdays:stepdts*cfg`tradingdays; + hasjump:(n?1.0)<1-exp neg cfg[`jumpintensity]*dtdays; + epsj:.z.m.rng.normal[n;cfg]; + jumps:exp hasjump*(cfg[`jumpmean]+cfg[`jumpvol]*epsj); + + cfg[`startprice]*prds 1.0,diffusion*jumps + }; + +price:{[cfg;times] + / generate prices for given arrival times + / cfg: configuration dictionary + / times: list of arrival times in seconds from session start + / returns: list of prices corresponding to each arrival time + / + / Required config keys: + / openingtime, closingtime, tradingdays, pricemodel, startprice, vol, drift + / For jump model: jumpintensity, jumpmean, jumpvol + + / validate inputs + .z.m.val.nonempty[times;"times";"price"]; + if[any times<0; '"price: times must be non-negative"]; + + reqkeys:`openingtime`closingtime`tradingdays`pricemodel`startprice`vol`drift; + .z.m.val.haskeys[cfg;reqkeys;"price"]; + + / convert times to dt in years + open:`timespan$cfg`openingtime; + close:`timespan$cfg`closingtime; + secsperyear:cfg[`tradingdays]*`long$(close-open)%nspersec; + dts:deltas[times]%secsperyear; + + $[cfg[`pricemodel]=`jump; .z.m.pricejump[cfg;dts]; .z.m.pricegbm[cfg;dts]] + }; + +qty.constant:{[n;cfg] + / generate constant quantities + / n: number of quantities + / cfg: config dict with `qty + / returns: list of n identical quantities + n#cfg`avgqty + }; + +qty.lognormal:{[n;cfg] + / generate lognormal random quantities + / n: number of quantities + / cfg: config dict with `avgqty`qtyvol`rngmodel + / returns: list of n integer quantities (minimum 1) + avgqty:cfg`avgqty; + qtyvol:cfg`qtyvol; + mu:log[avgqty]-0.5*qtyvol*qtyvol; + eps:.z.m.rng.normal[n;cfg]; + `long$1|floor exp mu+qtyvol*eps + }; + +qty.gen:{[n;cfg] + / dispatch to appropriate quantity generator + / n: number of quantities + / cfg: config dict with `qtymodel and model-specific params + / returns: list of n quantities + model:cfg`qtymodel; + $[model=`constant; .z.m.qty.constant[n;cfg]; + model=`lognormal; .z.m.qty.lognormal[n;cfg]; + '"qty.gen: unknown qtymodel - ",string model] + }; + +quote.generate:{[cfg;trades] + / generate quote updates for trades (fully vectorized) + / cfg: configuration dictionary + / trades: trade table with `time`price columns + / returns: quote table with `time`bid`ask`bidsize`asksize + + / validate inputs + .z.m.val.hascols[trades;`time`price;"quote.generate"]; + + n:count trades; + if[n=0; :([]time:`timestamp$();bid:`float$();ask:`float$();bidsize:`long$();asksize:`long$())]; + + tradetimes:trades`time; + tradeprices:trades`price; + + / parameters + basespread:cfg`basespread; + pretradeoffset:cfg`pretradeoffset; + quoteupdaterate:cfg`quoteupdaterate; + avgquotesize:cfg`avgquotesize; + + / === 1. initial quote (before first trade) === + initoffset:`timespan$`long$nsperms*pretradeoffset+first 1?initquotejitterms; + inittime:tradetimes[0]-initoffset; + initprice:tradeprices[0]; + initspread:basespread*initprice*cfg`spreadopenmult; + + / === 2. pre-trade quotes (one per trade, vectorized) === + / times: random offset before each trade + randoffsets:n?pretradeoffset; + pretimes:tradetimes-`timespan$`long$(pretradeoffset+randoffsets)*nsperms; + + / spreads based on time of day (vectorized) + prespreadmults:.z.m.quote.spreadmults[cfg;tradetimes]; + prespreads:basespread*tradeprices*prespreadmults; + prebids:tradeprices-prespreads%2; + preasks:tradeprices+prespreads%2; + + / sizes (vectorized) + prebidsizes:avgquotesize+`long$100*.z.m.rng.boxmuller[n]; + preasksizes:avgquotesize+`long$100*.z.m.rng.boxmuller[n]; + + / === 3. intermediate quotes (vectorized) === + / only if we have at least 2 trades + intresult:$[n>1; + .z.m.quote.intermediates[cfg;tradetimes;tradeprices;basespread;pretradeoffset;quoteupdaterate;avgquotesize]; + `times`bids`asks`bidsizes`asksizes!5#enlist`float$() + ]; + + / === 4. combine all quotes === + alltimes:(enlist inittime),intresult[`times],pretimes; + allbids:(enlist initprice-initspread%2),intresult[`bids],prebids; + allasks:(enlist initprice+initspread%2),intresult[`asks],preasks; + allbidsizes:(enlist avgquotesize),intresult[`bidsizes],prebidsizes; + allasksizes:(enlist avgquotesize),intresult[`asksizes],preasksizes; + + / build table, enforce minimum size of 1, sort by time + quotes:([]time:alltimes;bid:allbids;ask:allasks;bidsize:allbidsizes;asksize:allasksizes); + quotes:update bidsize:1|bidsize,asksize:1|asksize from quotes; + `time xasc quotes + }; + +quote.intermediates:{[cfg;tradetimes;tradeprices;basespread;pretradeoffset;quoteupdaterate;avgquotesize] + / generate all intermediate quotes across all gaps (fully vectorized) + / returns dict with `times`bids`asks`bidsizes`asksizes + n:count tradetimes; + empty:`times`bids`asks`bidsizes`asksizes!5#enlist`float$(); + + / gap times in ms between consecutive trades + prevtimes:tradetimes til n-1; + nexttimes:tradetimes 1+til n-1; + prevprices:tradeprices til n-1; + nextprices:tradeprices 1+til n-1; + gaps:`long$(nexttimes-prevtimes)%nsperms; + + / number of intermediate quotes per gap (capped) + nupdates:maxquoteupdates&`long$floor quoteupdaterate*gaps%1000; + + / filter gaps that are too short (need room for quotes before pretradeoffset) + mingap:2*pretradeoffset; + nupdates:nupdates*gaps>mingap; + + totint:sum nupdates; + if[totint=0; :empty]; + + / expand gap indices: create nupdates[i] copies of index i for each gap + / e.g., if nupdates=(0 2 0 3), gapidx=(1 1 3 3 3) + gapidx:raze {x#y}'[nupdates; til count nupdates]; + + / position within each gap (0, 1, 2, ... for each gap) + / e.g., if nupdates=(0 2 0 3), positions=(0 1 0 1 2) + positions:raze til each nupdates; + + / gap-specific values expanded to each intermediate quote + gapnupdates:nupdates gapidx; + gapprevtimes:prevtimes gapidx; + gapnexttimes:nexttimes gapidx; + gapprevprices:prevprices gapidx; + gapnextprices:nextprices gapidx; + + / times: evenly spaced within [prevtime, nexttime - pretradeoffset] + availdurations:gapnexttimes-gapprevtimes-`timespan$`long$pretradeoffset*nsperms; + fractions:(1+positions)%1+gapnupdates; + inttimes:gapprevtimes+`timespan$`long$fractions*`long$availdurations; + + / prices: interpolate from prev toward next trade price, plus noise + midprices:gapprevprices+fractions*(gapnextprices-gapprevprices); + noise:quoteticksize*midprices*.z.m.rng.boxmuller[totint]; + midprices+:noise; + + / spreads (vectorized across all intermediate quotes) + intspreadmults:.z.m.quote.spreadmults[cfg;inttimes]; + spreadvar:1+0.1*abs .z.m.rng.boxmuller[totint]; + intspreads:basespread*midprices*intspreadmults*spreadvar; + intbids:midprices-intspreads%2; + intasks:midprices+intspreads%2; + + / sizes + intbidsizes:avgquotesize+`long$100*.z.m.rng.boxmuller[totint]; + intasksizes:avgquotesize+`long$100*.z.m.rng.boxmuller[totint]; + + `times`bids`asks`bidsizes`asksizes!(inttimes;intbids;intasks;intbidsizes;intasksizes) + }; + +quote.spreadmults:{[cfg;times] + / spread multiplier based on time of day (vectorized) + / cfg: config dict with spread parameters + / times: list of timestamps + / returns: list of spread multipliers (wider at open/close, tighter at midday) + opentime:`timespan$cfg`openingtime; + closetime:`timespan$cfg`closingtime; + duration:closetime-opentime; + + / time of day as timespan + timeofday:times-`timestamp$`date$times; + + / progress through trading day (0 to 1) + progress:(timeofday-opentime)%duration; + progress:0f|progress&1f; + + / vectorized conditional: early part vs late part of day + earlyvals:cfg[`spreadopenmult]+(cfg[`spreadmidmult]-cfg`spreadopenmult)*2*progress; + latevals:cfg[`spreadmidmult]+(cfg[`spreadclosemult]-cfg`spreadmidmult)*2*progress-0.5; + early:progress<0.5; + (early*earlyvals)+(not early)*latevals + }; + +validate:{[cfg] + / validate configuration dictionary for run + / cfg: configuration dictionary + / returns: cfg if valid, throws error otherwise + / + / Checks: + / - Hawkes stability: alpha < beta + / - Positive multipliers: openmult, midmult, closemult > 0 + / - Positive base intensity + / - Transitionpoint in valid range (prevents division by zero) + + / check Hawkes stability condition + if[cfg[`alpha]>=cfg`beta; '"validate: Hawkes unstable - alpha must be < beta"]; + / check multipliers positive + if[0>=min cfg`openmult`midmult`closemult; '"validate: multipliers must be positive"]; + / check base intensity + if[0>=cfg`baseintensity; '"validate: baseintensity must be positive"]; + / check transitionpoint bounds (prevents division by zero in shape function) + if[not cfg[`transitionpoint] within 0.01 0.99; + '"validate: transitionpoint must be between 0.01 and 0.99"]; + cfg + }; + + +run:{[cfg] + / main simulation entry point + / cfg: configuration dictionary (typically loaded via loadconfig) + / returns: trade table if generatequotes=0b, else dict with `trade`quote + / + / Example: + / cfg:first loadconfig`:presets.csv + / trades:run[cfg] + / cfg[`generatequotes]:1b + / result:run[cfg] / result`trade, result`quote + cfg:.z.m.validate[cfg]; + + / set seed for reproducibility + if[cfg[`seed]>0; system "S ",string cfg`seed]; + + / generate arrival times (seconds from open) + arrs:.z.m.arrivals[cfg]; + n:count arrs; + + if[n=0; + trades:([]time:`timestamp$();price:`float$();qty:`long$()); + :$[cfg`generatequotes; + `trade`quote!(trades;([]time:`timestamp$();bid:`float$();ask:`float$();bidsize:`long$();asksize:`long$())); + trades] + ]; + + / convert to timestamps + basetime:cfg[`tradingdate]+`timespan$cfg`openingtime; + times:basetime+`timespan$`long$arrs*nspersec; + + / generate prices + prices:.z.m.price[cfg;arrs]; + + / generate quantities + qtys:.z.m.qty.gen[n;cfg]; + + trades:([]time:times;price:prices;qty:qtys); + + / return trades only or dictionary with quotes + $[cfg`generatequotes; + `trade`quote!(trades;.z.m.quote.generate[cfg;trades]); + trades] + }; + +/ configuration schema: column name -> (type; description) +/ type codes: S=symbol, D=date, U=minute, F=float, J=long, B=boolean +schema:()!() +schema[`name]:("S";"preset name (key)") +schema[`tradingdate]:("D";"simulation date") +schema[`openingtime]:("U";"market open time") +schema[`closingtime]:("U";"market close time") +schema[`startprice]:("F";"initial price") +schema[`seed]:("J";"random seed (0 = no seed)") +schema[`rngmodel]:("S";"RNG model (`pseudo)") +schema[`drift]:("F";"annualized drift") +schema[`vol]:("F";"annualized volatility") +schema[`tradingdays]:("J";"trading days per year") +schema[`pricemodel]:("S";"price model (`gbm or `jump)") +schema[`jumpintensity]:("F";"jump arrival rate (jumps/day)") +schema[`jumpmean]:("F";"log jump mean") +schema[`jumpvol]:("F";"log jump volatility") +schema[`baseintensity]:("F";"base trade arrival rate (trades/sec)") +schema[`alpha]:("F";"Hawkes excitation parameter") +schema[`beta]:("F";"Hawkes decay parameter (must be > alpha)") +schema[`transitionpoint]:("F";"intraday shape parameter (0.3=J, 0.5=U)") +schema[`openmult]:("F";"intensity multiplier at open") +schema[`midmult]:("F";"intensity multiplier at midday") +schema[`closemult]:("F";"intensity multiplier at close") +schema[`qtymodel]:("S";"quantity model (`constant or `lognormal)") +schema[`avgqty]:("J";"average trade quantity") +schema[`qtyvol]:("F";"quantity volatility (for lognormal)") +schema[`generatequotes]:("B";"generate quotes flag") +schema[`basespread]:("F";"base bid-ask spread (fraction of price)") +schema[`spreadopenmult]:("F";"spread multiplier at open") +schema[`spreadmidmult]:("F";"spread multiplier at midday") +schema[`spreadclosemult]:("F";"spread multiplier at close") +schema[`pretradeoffset]:("J";"min ms before trade for quote") +schema[`quoteupdaterate]:("F";"quote updates per second") +schema[`avgquotesize]:("J";"average quote size") + +/ derive type string from schema +csvtypes:raze first each value schema + +loadconfig:{[filepath] + / load preset configurations from CSV file + / filepath: file handle to CSV (e.g., `:presets.csv) + / returns: keyed table with preset name as key + / + / Example: + / cfgs:loadconfig`:di/simtick/presets.csv + / cfg:cfgs`default + / run[cfg] + if[not -11h=type filepath; '"loadconfig: filepath must be a file handle"]; + 1!(.z.m.csvtypes;enlist csv) 0: filepath + }; + +describe:{[] + / return configuration schema as a table + / useful for documentation and introspection + / Example: + / simtick.describe[] + ([]param:key .z.m.schema;typ:first each value .z.m.schema;description:last each value .z.m.schema) + }; + +/ export public interface +export:([run;arrivals;price;loadconfig;describe]) diff --git a/di/simtick/notebooks/simtickDemo.ipynb b/di/simtick/notebooks/simtickDemo.ipynb new file mode 100644 index 00000000..b2a93b93 --- /dev/null +++ b/di/simtick/notebooks/simtickDemo.ipynb @@ -0,0 +1,304 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "257066ec-d9f0-4274-87ca-5e4d260eb658", + "metadata": {}, + "source": [ + "# di.simtick Demo\n", + "\n", + "This notebook demonstrates the `di.simtick` module — a realistic intraday tick data simulator for KDB-X." + ] + }, + { + "cell_type": "markdown", + "id": "50b498eb-e2e1-4173-a4da-978e63f5ee9e", + "metadata": {}, + "source": [ + "## Import Libraries" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "59f2cf0b-f88f-47f8-9244-279040513e5a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Module loaded successfully!\n" + ] + } + ], + "source": [ + "import os\n", + "import pykx as kx\n", + "import matplotlib.pyplot as plt\n", + "import pandas as pd\n", + "\n", + "# Get paths\n", + "module_path = os.path.expanduser('~/kdbx-modules')\n", + "presets_path = f\"{module_path}/di/simtick/presets.csv\"\n", + "\n", + "# Set QPATH and load module\n", + "kx.q(f'setenv[`QPATH;\"{module_path}\"]')\n", + "kx.q('simtick:use`di.simtick')\n", + "kx.q(f'cfgs:simtick.loadconfig`$\":{presets_path}\"')\n", + "\n", + "print(\"Module loaded successfully!\")" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "cc3c8523-a2f8-4a67-8568-fad87a9afbc1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
name
0default
1liquid
2illiquid
3volatile
4jumpy
\n", + "
" + ], + "text/plain": [ + " name\n", + "0 default\n", + "1 liquid\n", + "2 illiquid\n", + "3 volatile\n", + "4 jumpy" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "kx.q('key cfgs').pd()" + ] + }, + { + "cell_type": "markdown", + "id": "71356987-cb0c-4c41-b058-42ca13316271", + "metadata": {}, + "source": [ + "## Select a scenario" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "6bf1641a-6eb8-4231-b219-1dd39d5d347f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Generated 30191 trades\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
timepriceqty
02026-01-20 09:30:00.689317695100.00000060
12026-01-20 09:30:01.868664471100.00082479
22026-01-20 09:30:03.16424453199.99633460
32026-01-20 09:30:03.21602883999.99573481
42026-01-20 09:30:03.665811537100.009424118
\n", + "
" + ], + "text/plain": [ + " time price qty\n", + "0 2026-01-20 09:30:00.689317695 100.000000 60\n", + "1 2026-01-20 09:30:01.868664471 100.000824 79\n", + "2 2026-01-20 09:30:03.164244531 99.996334 60\n", + "3 2026-01-20 09:30:03.216028839 99.995734 81\n", + "4 2026-01-20 09:30:03.665811537 100.009424 118" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "scenario = 'default' # Change to: 'liquid', 'illiquid', 'volatile', 'jumpy'\n", + "\n", + "result = kx.q(f'simtick.run[cfgs`{scenario}]')\n", + "trades = result.pd()\n", + "\n", + "print(f\"Generated {len(trades)} trades\")\n", + "trades.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "e1b5e47c-a135-4bdd-86a2-2ed163799c12", + "metadata": { + "jupyter": { + "source_hidden": true + } + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# Cell: Plot price and quantity\n", + "fig, ax1 = plt.subplots(figsize=(12, 5))\n", + "\n", + "# Price on left axis\n", + "ax1.plot(trades['time'], trades['price'], 'b-', linewidth=0.5, label='Price')\n", + "ax1.set_xlabel('Time')\n", + "ax1.set_ylabel('Price', color='blue')\n", + "ax1.tick_params(axis='y', labelcolor='blue')\n", + "\n", + "# Quantity on right axis\n", + "ax2 = ax1.twinx()\n", + "ax2.bar(trades['time'], trades['qty'], width=0.0001, alpha=0.3, color='orange', label='Quantity')\n", + "ax2.set_ylabel('Quantity', color='orange')\n", + "ax2.tick_params(axis='y', labelcolor='orange')\n", + "\n", + "plt.title(f'Simulated Trades - {scenario}')\n", + "plt.tight_layout()\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b0c2675a-f174-49d3-bbc0-daab09d6fb8c", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/di/simtick/presets.csv b/di/simtick/presets.csv new file mode 100644 index 00000000..ad2bc28c --- /dev/null +++ b/di/simtick/presets.csv @@ -0,0 +1,6 @@ +name,tradingdate,openingtime,closingtime,startprice,seed,rngmodel,drift,vol,tradingdays,pricemodel,jumpintensity,jumpmean,jumpvol,baseintensity,alpha,beta,transitionpoint,openmult,midmult,closemult,qtymodel,avgqty,qtyvol,generatequotes,basespread,spreadopenmult,spreadmidmult,spreadclosemult,pretradeoffset,quoteupdaterate,avgquotesize +default,2026.01.20,09:30,16:00,100.0,0,pseudo,0.05,0.2,252,gbm,0.0,-0.01,0.03,0.5,0.3,1.0,0.3,1.5,0.5,3.0,lognormal,100,0.8,0,0.001,1.5,1.0,2.0,50,2.0,500 +liquid,2026.01.20,09:30,16:00,100.0,0,pseudo,0.05,0.2,252,gbm,0.0,-0.01,0.03,2.0,0.5,1.0,0.3,1.5,0.5,3.0,lognormal,200,0.8,0,0.001,1.5,1.0,2.0,50,2.0,500 +illiquid,2026.01.20,09:30,16:00,100.0,0,pseudo,0.05,0.2,252,gbm,0.0,-0.01,0.03,0.1,0.2,1.0,0.3,1.5,0.5,3.0,lognormal,50,0.8,0,0.001,1.5,1.0,2.0,50,2.0,500 +volatile,2026.01.20,09:30,16:00,100.0,0,pseudo,0.05,0.4,252,gbm,0.0,-0.01,0.03,0.5,0.6,1.5,0.3,1.5,0.5,3.0,lognormal,100,0.8,0,0.001,1.5,1.0,2.0,50,2.0,500 +jumpy,2026.01.20,09:30,16:00,100.0,0,pseudo,0.05,0.2,252,jump,2.0,-0.005,0.02,0.5,0.3,1.0,0.3,1.5,0.5,3.0,lognormal,100,0.8,0,0.001,1.5,1.0,2.0,50,2.0,500 diff --git a/di/simtick/requirements.txt b/di/simtick/requirements.txt new file mode 100644 index 00000000..893eae72 --- /dev/null +++ b/di/simtick/requirements.txt @@ -0,0 +1,5 @@ +pykx==4.0.0b4 +jupyterlab +matplotlib +pandas +ipywidgets diff --git a/di/simtick/test.csv b/di/simtick/test.csv new file mode 100644 index 00000000..70addaf2 --- /dev/null +++ b/di/simtick/test.csv @@ -0,0 +1,103 @@ +action,ms,bytes,lang,code,repeat,comment +before,0,0,q,simtick:use`di.simtick,1,,Initialize module +before,0,0,q,cfgs:simtick.loadconfig`:di/simtick/presets.csv,1,,Load presets +before,0,0,q,cfg:cfgs`default,1,,Get default config +before,0,0,q,duration:23400,1,,Session duration in seconds (6.5 hours) +comment,0,0,q,,1,,=== Validation Tests === +before,0,0,q,badcfg1:cfg,1,,Setup bad config for alpha >= beta +before,0,0,q,badcfg1[`alpha]:2.0,1,,Set alpha > beta +fail,0,0,q,simtick.run[badcfg1],1,,alpha >= beta throws error +before,0,0,q,badcfg2:cfg,1,,Setup bad config for negative baseintensity +before,0,0,q,badcfg2[`baseintensity]:-1.0,1,,Set negative baseintensity +fail,0,0,q,simtick.run[badcfg2],1,,Negative baseintensity throws error +before,0,0,q,badcfg3:cfg,1,,Setup bad config for zero multiplier +before,0,0,q,badcfg3[`openmult]:0.0,1,,Set zero openmult +fail,0,0,q,simtick.run[badcfg3],1,,Zero multiplier throws error +comment,0,0,q,,1,,=== Arrivals Tests === +before,0,0,q,cfg[`seed]:42,1,,Set seed for reproducibility +before,0,0,q,arr:simtick.arrivals[cfg],1,,Generate arrivals +true,0,0,q,0duration-3600,1,,Count last hour arrivals +true,0,0,q,firsthour>midhour,1,,Open activity > mid activity +true,0,0,q,lasthour>midhour,1,,Close activity > mid activity +true,0,0,q,lasthour>firsthour,1,,J-shape: close > open (transitionpoint 0.3) +comment,0,0,q,,1,,=== Price Tests === +before,0,0,q,times:1+til 100,1,,Setup times +before,0,0,q,prices:simtick.price[cfg;times],1,,Generate prices +true,0,0,q,(count times)=count prices,1,,Price count matches input +true,0,0,q,all 0=opentime,1,,All trades at or after open +true,0,0,q,all trades[`time]<=closetime,1,,All trades at or before close +comment,0,0,q,,1,,=== Quotes Tests === +before,0,0,q,cfg[`generatequotes]:1b,1,,Enable quotes +before,0,0,q,result:simtick.run[cfg],1,,Generate trades and quotes +before,0,0,q,trades:result`trade,1,,Extract trades +before,0,0,q,quotes:result`quote,1,,Extract quotes +true,0,0,q,99h=type result,1,,Result with quotes is dict +true,0,0,q,`trade`quote~key result,1,,Dict has trade and quote keys +true,0,0,q,`time`bid`ask`bidsize`asksize~cols quotes,1,,Quotes has correct columns +true,0,0,q,(asc quotes`time)~quotes`time,1,,Quotes sorted by time +true,0,0,q,all quotes[`bid]