-
Notifications
You must be signed in to change notification settings - Fork 340
Description
Describe the bug
KQL datetime types are implicity converted to a pandas/numpy datetime type but timespans are not converted and returned as strings.
To Reproduce
Setup:
import msticpy as mp
mp.init_notebook()
qp_sentinel = mp.QueryProvider('MSSentinel')
qp_sentinel.connect()Issue:
q = '''
let TimeSpanSmall = 1ms;
print(TimeSpanSmall)
'''
df_timespan_small = qp_sentinel.exec_query(query=q)
print(df_timespan_small['print_0'].dtype)
print(type(df_timespan_small['print_0'].values[0]))Output:
object
<class 'str'>Expected behavior
timedelta64[ns]
<class 'numpy.timedelta64'>Screenshots and/or Traceback
From a test notebook: msticpy_large_timespan_type_conversion_fails.zip
Environment (please complete the following information):
- Python Version: 3.11
- OS: Ubuntu
- Python environment: venv
- MSTICPy Version: 2.15
Additional context
It's arguable if this should be classified as a feature request vs bug. However, given the behavior for datetime64, I argue that the lack of consistency and convenience makes it a bug.
Another problem is that for larger timespan values, pandas does support handling the string format for type conversion to timedeltas.
- Smaller timespans (< 1 day) can be converted to pandas timedelata types.
- Larger timestamps (>= 1 day) cannot be readily converted to pandas timedelta types without a helper function.
E.g. an "ValueError: expecting hh:mm:ss format, received: 1.00:00:00" exception occurs for a 1 day timespan.
Demo in notebook: msticpy_large_timespan_type_conversion_fails.zip
Client wrapper function to help compensate.
from typing import Optional, Union
import re
import pandas as pd
import pprint
def parse_timespan(timespan: Optional[Union[str, pd.Timedelta]]) -> pd.Timedelta:
"""
Parse a Kusto timespan-like formatted string into a pandas Timedelta object.
- Timespans a day or larger are returned by Kusto as strings, e.g. "1.19:37:05.1697513", are not converted correctly into timedeltas which expects only hh:mm:ss formats of less than one day
- Require at least "hh:mm:ss" to be present.
- Return a ns resolution Timedelta object (match pandas default).
"""
if isinstance(timespan, pd.Timedelta):
return timespan
# require at minium an "hh:mm:ss" format
pattern = re.compile(r'(?P<sign>-?)(?:(?P<days>\d+)\.)?(?P<hours>\d+):(?P<minutes>\d+):(?P<seconds>\d+)(?P<frac_seconds>\.\d+)?')
match = pattern.match(timespan)
if match:
match_groups = match.groupdict()
sign = -1 if match_groups['sign'] == '-' else 1
days = int(match_groups['days']) if match_groups['days'] else 0
hours = int(match_groups['hours'])
minutes = int(match_groups['minutes'])
seconds = int(match_groups['seconds'])
frac_seconds = float(match_groups['frac_seconds']) if match_groups['frac_seconds'] else 0
nano_seconds = int(round(frac_seconds, ndigits=9) * 1e9)
return sign * pd.Timedelta(days=days, hours=hours, minutes=minutes, seconds=seconds, nanoseconds=nano_seconds)
else:
raise ValueError(f"Invalid timespan format: {timespan}")
# Tests
parse_timespan_tests = dict(
pos_small_frac = '00:00:00.001',
neg_small_frac = '-00:00:00.001',
zero_frac = '00:00:00.000',
pos_large = '1.00:00:00',
neg_large = '-1.00:00:00',
pos_large_frac_round = '3.23:59:59.99999999999',
neg_large_frac_round = '-3.23:59:59.99999999999'
)
parse_timespan_tests = {k: (v, parse_timespan(v)) for k, v in parse_timespan_tests.items()}
pprint.pp(parse_timespan_tests)
