Skip to content

Conversation

@bcmeireles
Copy link
Contributor

@bcmeireles bcmeireles commented Mar 14, 2025

Summary

Currently when creating a reflection, dbt marks the job as done as soon as it sends the API call to Dremio. This new option allows users to only continue after the reflection is fully created on Dremio's side.

Description

  • Introduces a new strategy: reflection_strategy with 3 valid values: trigger, wait and depend.
    • trigger is the default and keeps the old behavior
    • wait will try to wait for the reflection to be live, up to a maximum of max_wait_time seconds, 30 being the default value
    • depend will not proceed until the reflection is live
  • New config options for reflections:
    • max_wait_time: Maximum time in seconds that the wait strategy will wait for the reflection to be ready. Default value is 30
    • check_interval: Time between the API calls that check if a reflection is ready. Default value is 5

Test Results

Added 2 extra tests, one for trigger and another one for wait

Changelog

  • Added a summary of what this PR accomplishes to CHANGELOG.md

Related Issue

Suggested in #184

@bcmeireles bcmeireles added the enhancement New feature or request label Mar 14, 2025
@bcmeireles bcmeireles self-assigned this Mar 14, 2025
@bcmeireles bcmeireles linked an issue Mar 14, 2025 that may be closed by this pull request
@bcmeireles bcmeireles marked this pull request as ready for review March 14, 2025 18:06
@bcmeireles bcmeireles changed the title Add depend strategy for reflections Add strategies to reflections Mar 17, 2025
- `strategy`: `trigger`, `wait` or `depend`
- `max_wait_time`: time in seconds to wait for the reflection to be created before moving on to other models
- default: `30`
- `wait_interval`: time in seconds to wait between checks for the reflection to be created
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you updates this to be called check_interval instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

computations: List[str], partition_by: List[str], partition_transform: List[str],
partition_method: str, distribute_by: List[str], localsort_by: List[str],
arrow_cache: bool) -> None:
arrow_cache: bool, reflection_strategy: str, max_wait_time: int, check_interval: int) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the connector check for constrains of the values provided here? For instance, a negative or 0 value of check_interval might lead to errors or accidental rate limiting by Dremio. (See DC rate limits here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be adding a > 0 constraint to this

"name_reflection_from_alias.sql": name_reflection_from_alias_model,
"name_reflection_from_filename.sql": name_reflection_from_filename_model,
"wait_strategy_timeout_reflection.sql": wait_strategy_timeout_reflection,
"trigger_strategy_timeout_reflection.sql": trigger_strategy_timeout_reflection,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize testing depend is not that easy, but there must be a way. Is it possible to create a reflection that will never be finish materializing and then try to depend on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of depend is that it will be stuck at the job running that reflection until it is complete. Even if we could make a reflection that would never materialize we'd just be stuck waiting for it forever

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea would've been to assert that the function hasn't returned after a few seconds and then killing the thread. But I don't know how complex we want this to get

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with Simon that we should have guarantees that it will end. Waiting without a proof for a green result may be frustrating for customers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the wait strategy where the customer would set the maximum time they'd want to wait. Should we remove depend in that case? Or have it act as wait but throw an error instead of just skipping?


def testWaitStrategyTimeoutReflection(self, project):
(results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "wait_strategy_timeout_reflection"])
assert "did not become available within 1 seconds, skipping wait" in log_output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a positive test case for the wait strategy as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could but I was avoiding adding it as a reflection going through is already tested by all the other reflection tests + it has the potential to become flaky if it takes longer than expected to be materialized

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a proper mocks these tests could be all eventually flaky

Copy link
Contributor

@howareyouman howareyouman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should polish depend strategy somehow before the release (or at least create a ticket for it)


- Added 3 strategies for reflections: `trigger`, `wait` and `depend`
- `trigger` keeps the previous behavior for reflections, where the job to create them is triggered and then dbt-dremio moves on to other models
- `wait` will wait for the reflection to be created before moving on to other models, up to a `max_wait_time` timeout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add here what will happen after max_wait_time is over.

computations: List[str], partition_by: List[str], partition_transform: List[str],
partition_method: str, distribute_by: List[str], localsort_by: List[str],
arrow_cache: bool) -> None:
arrow_cache: bool, reflection_strategy: str, max_wait_time: int, check_interval: int) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we define the default values here or in the template? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are being defined in the reflection materialization, reflection.sql has the following lines:

 {%- set max_wait_time = config.get('max_wait_time', validator=validation.any[int]) or 30 -%}
 {%- set check_interval = config.get('check_interval', validator=validation.any[int]) or 5 -%}

This is on pair with how we do it in other situations

rest_client.create_reflection(payload)
created_reflection = rest_client.create_reflection(payload)

if reflection_strategy == "wait":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use match pattern here. And put all possible values somewhere in enum

Updated. I saw a validation in the Jinja, let's add match here at least. If/elif seems like a bad design pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly sure if I'm seeing what you're suggesting here?

elif reflection_strategy == "depend":
reflection_id = created_reflection["id"]

while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we in this case give some state of the reflection - how much time left? What is the status? Is everything ok?
Because while true without any status update will be painful for customers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could show the status -> combinedStatus that is given by the API response to our call. Docs here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could ask the question to our reflections team about it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reached out to them 👍

"name_reflection_from_alias.sql": name_reflection_from_alias_model,
"name_reflection_from_filename.sql": name_reflection_from_filename_model,
"wait_strategy_timeout_reflection.sql": wait_strategy_timeout_reflection,
"trigger_strategy_timeout_reflection.sql": trigger_strategy_timeout_reflection,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with Simon that we should have guarantees that it will end. Waiting without a proof for a green result may be frustrating for customers.


def testWaitStrategyTimeoutReflection(self, project):
(results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "wait_strategy_timeout_reflection"])
assert "did not become available within 1 seconds, skipping wait" in log_output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a proper mocks these tests could be all eventually flaky

@bcmeireles bcmeireles marked this pull request as draft March 21, 2025 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

Add depend strategy for reflections

4 participants