-
Notifications
You must be signed in to change notification settings - Fork 34
Add strategies to reflections #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3fd8f64 to
1e86f83
Compare
ee838d9 to
7db5ad5
Compare
| - `strategy`: `trigger`, `wait` or `depend` | ||
| - `max_wait_time`: time in seconds to wait for the reflection to be created before moving on to other models | ||
| - default: `30` | ||
| - `wait_interval`: time in seconds to wait between checks for the reflection to be created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you updates this to be called check_interval instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call
| computations: List[str], partition_by: List[str], partition_transform: List[str], | ||
| partition_method: str, distribute_by: List[str], localsort_by: List[str], | ||
| arrow_cache: bool) -> None: | ||
| arrow_cache: bool, reflection_strategy: str, max_wait_time: int, check_interval: int) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the connector check for constrains of the values provided here? For instance, a negative or 0 value of check_interval might lead to errors or accidental rate limiting by Dremio. (See DC rate limits here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be adding a > 0 constraint to this
| "name_reflection_from_alias.sql": name_reflection_from_alias_model, | ||
| "name_reflection_from_filename.sql": name_reflection_from_filename_model, | ||
| "wait_strategy_timeout_reflection.sql": wait_strategy_timeout_reflection, | ||
| "trigger_strategy_timeout_reflection.sql": trigger_strategy_timeout_reflection, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize testing depend is not that easy, but there must be a way. Is it possible to create a reflection that will never be finish materializing and then try to depend on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of depend is that it will be stuck at the job running that reflection until it is complete. Even if we could make a reflection that would never materialize we'd just be stuck waiting for it forever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea would've been to assert that the function hasn't returned after a few seconds and then killing the thread. But I don't know how complex we want this to get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with Simon that we should have guarantees that it will end. Waiting without a proof for a green result may be frustrating for customers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be the wait strategy where the customer would set the maximum time they'd want to wait. Should we remove depend in that case? Or have it act as wait but throw an error instead of just skipping?
|
|
||
| def testWaitStrategyTimeoutReflection(self, project): | ||
| (results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "wait_strategy_timeout_reflection"]) | ||
| assert "did not become available within 1 seconds, skipping wait" in log_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a positive test case for the wait strategy as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could but I was avoiding adding it as a reflection going through is already tested by all the other reflection tests + it has the potential to become flaky if it takes longer than expected to be materialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a proper mocks these tests could be all eventually flaky
howareyouman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should polish depend strategy somehow before the release (or at least create a ticket for it)
|
|
||
| - Added 3 strategies for reflections: `trigger`, `wait` and `depend` | ||
| - `trigger` keeps the previous behavior for reflections, where the job to create them is triggered and then dbt-dremio moves on to other models | ||
| - `wait` will wait for the reflection to be created before moving on to other models, up to a `max_wait_time` timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add here what will happen after max_wait_time is over.
| computations: List[str], partition_by: List[str], partition_transform: List[str], | ||
| partition_method: str, distribute_by: List[str], localsort_by: List[str], | ||
| arrow_cache: bool) -> None: | ||
| arrow_cache: bool, reflection_strategy: str, max_wait_time: int, check_interval: int) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we define the default values here or in the template? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are being defined in the reflection materialization, reflection.sql has the following lines:
{%- set max_wait_time = config.get('max_wait_time', validator=validation.any[int]) or 30 -%}
{%- set check_interval = config.get('check_interval', validator=validation.any[int]) or 5 -%}This is on pair with how we do it in other situations
| rest_client.create_reflection(payload) | ||
| created_reflection = rest_client.create_reflection(payload) | ||
|
|
||
| if reflection_strategy == "wait": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use match pattern here. And put all possible values somewhere in enum
Updated. I saw a validation in the Jinja, let's add match here at least. If/elif seems like a bad design pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly sure if I'm seeing what you're suggesting here?
| elif reflection_strategy == "depend": | ||
| reflection_id = created_reflection["id"] | ||
|
|
||
| while True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we in this case give some state of the reflection - how much time left? What is the status? Is everything ok?
Because while true without any status update will be painful for customers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could show the status -> combinedStatus that is given by the API response to our call. Docs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could ask the question to our reflections team about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reached out to them 👍
| "name_reflection_from_alias.sql": name_reflection_from_alias_model, | ||
| "name_reflection_from_filename.sql": name_reflection_from_filename_model, | ||
| "wait_strategy_timeout_reflection.sql": wait_strategy_timeout_reflection, | ||
| "trigger_strategy_timeout_reflection.sql": trigger_strategy_timeout_reflection, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with Simon that we should have guarantees that it will end. Waiting without a proof for a green result may be frustrating for customers.
|
|
||
| def testWaitStrategyTimeoutReflection(self, project): | ||
| (results, log_output) = run_dbt_and_capture(["run", "--select", "view1", "wait_strategy_timeout_reflection"]) | ||
| assert "did not become available within 1 seconds, skipping wait" in log_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a proper mocks these tests could be all eventually flaky
Summary
Currently when creating a reflection, dbt marks the job as done as soon as it sends the API call to Dremio. This new option allows users to only continue after the reflection is fully created on Dremio's side.
Description
reflection_strategywith 3 valid values:trigger,waitanddepend.triggeris the default and keeps the old behaviorwaitwill try to wait for the reflection to be live, up to a maximum ofmax_wait_timeseconds,30being the default valuedependwill not proceed until the reflection is livemax_wait_time: Maximum time in seconds that thewaitstrategy will wait for the reflection to be ready. Default value is30check_interval: Time between the API calls that check if a reflection is ready. Default value is5Test Results
Added 2 extra tests, one for
triggerand another one forwaitChangelog
Related Issue
Suggested in #184