Update README to reflect project vision #71

andygrove · 2025-03-02T16:10:49Z

This PR updates the README to better explain the features that we plan on supporting:

Rendered version: https://github.com/andygrove/datafusion-ray/blob/new-readme/README.md

robtandy · 2025-03-02T16:24:36Z

A few suggestions:

Can we call Greedy, Streaming? I think its more communicative about how it functions and conveys its major feature in the name.

for the code snippet, I think it should read:

import ray
from datafusion_ray import DFRayContext

ray.init()
session = DFRayContext()
df = session.sql("SELECT * FROM my_table WHERE value > 100")
df.show()

I'm not sure about the trade offs as written. I think its possible that, depending on the query, a batch mode could be faster than a streaming mode for smaller queries due to less overhead. We'll have to implement the batch mode to define this more clearly.
We should indicate that the batch mode is planned for 0.2.0 and 0.1.0 will include Streaming only

robtandy · 2025-03-02T16:29:30Z

For the code snippet, i forgot to include ray.init(runtime_env=df_ray_runtime_env)

We should have, I think,

import ray
from datafusion_ray import DFRayContext, df_ray_runtime_env

ray.init(runtime_env=df_ray_runtime_env)
session = DFRayContext()
df = session.sql("SELECT * FROM my_table WHERE value > 100")
df.show()

As df_ray_runtime_env is necessary to set up logging correctly in Ray workers

andygrove · 2025-03-02T16:30:21Z

Thank you @andygrove !

A few suggestions:

* Can we call `Greedy`, `Streaming`?   I think its more communicative about how it functions and conveys its major feature in the name.

* for the code snippet, I think it should read:
  ```python
  import ray
  from datafusion_ray import DFRayContext
  
  ray.init()
  session = DFRayContext()
  df = session.sql("SELECT * FROM my_table WHERE value > 100")
  df.show()
  ```

* I'm not sure about the trade offs as written.   I think its possible that, depending on the query,  a batch mode could be faster than a streaming mode for smaller queries due to less overhead.  We'll have to implement the batch mode to define this more clearly.

* We should indicate that the batch mode is planned for `0.2.0` and `0.1.0` will include Streaming only

I made the following updates:

Use Streaming and Batch terminology
Added note that batch is not implemented yet, with link to the tracking issue
Updated code example

Do you have specific suggestions for updating the trade-offs?

andygrove added 3 commits March 2, 2025 09:10

Update README

09e22b3

update

69ee301

update

fbeb8d1

andygrove added 3 commits March 2, 2025 09:25

use terms batch vs streaming

322a6a3

update code sample

b131d5a

add note that batch execution is not yet implemented

da23593

andygrove added 2 commits March 2, 2025 09:36

remove trade offs

aab64fd

address feedback

a7ddca5

robtandy approved these changes Mar 2, 2025

View reviewed changes

andygrove merged commit 8e1a56a into apache:main Mar 2, 2025
1 check passed

andygrove deleted the new-readme branch March 2, 2025 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update README to reflect project vision #71

Update README to reflect project vision #71

Uh oh!

andygrove commented Mar 2, 2025 •

edited

Loading

Uh oh!

robtandy commented Mar 2, 2025

Uh oh!

robtandy commented Mar 2, 2025 •

edited

Loading

Uh oh!

andygrove commented Mar 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update README to reflect project vision #71

Update README to reflect project vision #71

Uh oh!

Conversation

andygrove commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robtandy commented Mar 2, 2025

Uh oh!

robtandy commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andygrove commented Mar 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andygrove commented Mar 2, 2025 •

edited

Loading

robtandy commented Mar 2, 2025 •

edited

Loading