Skip to content

Conversation

@khustup2
Copy link
Contributor

🚀 🚀 Pull Request

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)
  • Enhancement/New feature (adds functionality without impacting existing logic)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

Things to be aware of

Things to worry about

Additional Context

Copilot AI review requested due to automatic review settings January 19, 2026 19:34
@sonarqubecloud
Copy link

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the row count reporting in pg-deeplake by correcting deeplake_relation_size to return the actual physical storage size (1 block) instead of incorrectly converting total dataset bytes to blocks. This ensures PostgreSQL's ANALYZE command properly updates pg_class.reltuples statistics, which customer tooling relies on.

Changes:

  • Modified deeplake_relation_size to return exactly BLCKSZ (1 block) matching the physical storage created in relation_set_new_node
  • Updated references from num_rows() to num_total_rows() throughout the codebase to ensure accurate row counts including uncommitted/staged data
  • Enhanced deeplake_scan_analyze_next_block to properly consume blocks from the read stream, ensuring PostgreSQL's BlockSampler correctly tracks sampled blocks

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
postgres/tests/py_tests/test_statistics_integration.py Added comprehensive tests validating pg_class.reltuples updates correctly after ANALYZE for both regular and small tables
cpp/deeplake_pg/timing_guard.hpp New header file defining a timing guard utility class for performance instrumentation
cpp/deeplake_pg/timing_guard.cpp Implementation of timing guard utility for logging operation durations
cpp/deeplake_pg/table_scan_impl.hpp Updated to use num_total_rows() and removed empty loop
cpp/deeplake_pg/table_scan.hpp Added get_num_rows() getter method
cpp/deeplake_pg/table_data_impl.hpp Enhanced row count tracking with comments explaining refresh behavior and delete handling
cpp/deeplake_pg/table_am.cpp Core fix: corrected deeplake_relation_size, updated ANALYZE block handling, and added physical storage creation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

get_session_start_time();

std::ostringstream filename;
filename << "/home/ubuntu/deeplake_timing_" << std::this_thread::get_id() << ".log";
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded path /home/ubuntu/ is not portable. Consider using an environment variable, a configurable path, or a system temp directory to ensure the code works across different environments and users.

Copilot uses AI. Check for mistakes.

uint64_t deeplake_relation_size(Relation rel, ForkNumber)
uint64_t deeplake_relation_size(Relation rel, ForkNumber fork)
{
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter fork is declared but never used in the function body. If it's required by the function signature/interface, consider adding a comment explaining why it's unused or use a cast to void to indicate intentional non-use.

Suggested change
{
{
/* ForkNumber is part of the table AM API but is not used by Deeplake. */
(void)fork;

Copilot uses AI. Check for mistakes.
@khustup2 khustup2 merged commit c502c44 into main Jan 19, 2026
6 checks passed
@khustup2 khustup2 deleted the fix-stats branch January 19, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants