Skip to content

[Bug]: msticpy KQL timespans are not implicity converted to timedeltas #829

@JPvRiel

Description

@JPvRiel

Describe the bug

KQL datetime types are implicity converted to a pandas/numpy datetime type but timespans are not converted and returned as strings.

To Reproduce

Setup:

import msticpy as mp


mp.init_notebook()
qp_sentinel = mp.QueryProvider('MSSentinel')
qp_sentinel.connect()

Issue:

q = '''
let TimeSpanSmall = 1ms;
print(TimeSpanSmall)
'''
df_timespan_small = qp_sentinel.exec_query(query=q)
print(df_timespan_small['print_0'].dtype)
print(type(df_timespan_small['print_0'].values[0]))

Output:

object
<class 'str'>

Expected behavior

timedelta64[ns]
<class 'numpy.timedelta64'>

Screenshots and/or Traceback

From a test notebook: msticpy_large_timespan_type_conversion_fails.zip

Image

Environment (please complete the following information):

  • Python Version: 3.11
  • OS: Ubuntu
  • Python environment: venv
  • MSTICPy Version: 2.15

Additional context

It's arguable if this should be classified as a feature request vs bug. However, given the behavior for datetime64, I argue that the lack of consistency and convenience makes it a bug.

Another problem is that for larger timespan values, pandas does support handling the string format for type conversion to timedeltas.

  • Smaller timespans (< 1 day) can be converted to pandas timedelata types.
  • Larger timestamps (>= 1 day) cannot be readily converted to pandas timedelta types without a helper function.

E.g. an "ValueError: expecting hh:mm:ss format, received: 1.00:00:00" exception occurs for a 1 day timespan.

Image

Demo in notebook: msticpy_large_timespan_type_conversion_fails.zip

Client wrapper function to help compensate.

from typing import Optional, Union
import re
import pandas as pd
import pprint


def parse_timespan(timespan: Optional[Union[str, pd.Timedelta]]) -> pd.Timedelta:
    """
    Parse a Kusto timespan-like formatted string into a pandas Timedelta object.

    - Timespans a day or larger are returned by Kusto as strings, e.g. "1.19:37:05.1697513", are not converted correctly into timedeltas which expects only hh:mm:ss formats of less than one day
    - Require at least "hh:mm:ss" to be present.
    - Return a ns resolution Timedelta object (match pandas default).
    """

    if isinstance(timespan, pd.Timedelta):
        return timespan

    # require at minium an "hh:mm:ss" format
    pattern = re.compile(r'(?P<sign>-?)(?:(?P<days>\d+)\.)?(?P<hours>\d+):(?P<minutes>\d+):(?P<seconds>\d+)(?P<frac_seconds>\.\d+)?')
    match = pattern.match(timespan)
    if match:
        match_groups = match.groupdict()
        sign = -1 if match_groups['sign'] == '-' else 1
        days = int(match_groups['days']) if match_groups['days'] else 0
        hours = int(match_groups['hours'])
        minutes = int(match_groups['minutes'])
        seconds = int(match_groups['seconds'])
        frac_seconds = float(match_groups['frac_seconds']) if match_groups['frac_seconds'] else 0
        nano_seconds = int(round(frac_seconds, ndigits=9) * 1e9)
        return sign * pd.Timedelta(days=days, hours=hours, minutes=minutes, seconds=seconds, nanoseconds=nano_seconds)
    else:
        raise ValueError(f"Invalid timespan format: {timespan}")


# Tests
parse_timespan_tests = dict(
    pos_small_frac = '00:00:00.001',
    neg_small_frac = '-00:00:00.001',
    zero_frac = '00:00:00.000',
    pos_large = '1.00:00:00',
    neg_large = '-1.00:00:00',
    pos_large_frac_round = '3.23:59:59.99999999999',
    neg_large_frac_round = '-3.23:59:59.99999999999'
)
parse_timespan_tests = {k: (v, parse_timespan(v)) for k, v in parse_timespan_tests.items()}
pprint.pp(parse_timespan_tests)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions