Skip to content

Conversation

@rpadma2
Copy link
Contributor

@rpadma2 rpadma2 commented Dec 23, 2025

Test-tag: DlckBasicFaultTest DlckBasicTest

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Test-tag: DlckBasicFaultTest DlckBasicTest

Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
@github-actions
Copy link

Ticket title is 'Basic dlck test: scan the DAOS system by running the dlck tool.'
Status is 'In Progress'
Labels: 'test_2.8,testp2'
https://daosio.atlassian.net/browse/DAOS-17519

Test-tag: DlckBasicTest DlckBasicFaultTest
Skip-func-hw-test-medium: false
Skip-func-hw-test-medium-md-on-ssd: false
Skip-unit-test: true
Skip-fault-injection-test: true

Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
'id': '131584',
'probability_x': '100',
'probability_y': '100',
'interval': '2'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO without a comment it is hard to understand why the interval value is as it is.

Suggested change
'interval': '2'},
'interval': '2'}, # skip sys_db

'id': '131584',
'probability_x': '100',
'probability_y': '100',
'interval': '2'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you decided to skip the max_faults parameter for a number of faults?

Suggested change
'interval': '2'},
'interval': '2',
'max_faults': '1'},

'id': '131586',
'probability_x': '100',
'probability_y': '100',
'interval': '2'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here. Please apply to all instances below.

Suggested change
'interval': '2'},
'interval': '2'}, # skip sys_db

Comment on lines +241 to +242
'probability_x': '100',
'probability_y': '100',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know these are defaults so adding them to all of the records just bloats the code with no added value.

Suggested change
'probability_x': '100',
'probability_y': '100',

Comment on lines +304 to +309
'DAOS_FAULT_BTREE_OPEN_INV_CLASS': {
'id': '131590',
'probability_x': '100',
'probability_y': '100',
'interval': '28',
'max_faults': '1'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fault can be fine-tuned to hit either the containers tree (interval = 28) or the gc tree (interval = 29). Hence it seems we need two records with distinctive keys. I am sorry it was not obvious from the fault_injection_dlck.yaml file.

And IMHO we really need to give comments here explaining where 28 and 29 come from. Otherwise you won't reverse-engineer their meaning and considering these values are fine-tuned it could be crucial to adjust them in the future.

Suggested change
'DAOS_FAULT_BTREE_OPEN_INV_CLASS': {
'id': '131590',
'probability_x': '100',
'probability_y': '100',
'interval': '28',
'max_faults': '1'},
'DAOS_FAULT_BTREE_OPEN_INV_CLASS_28': {
'id': '131590',
'probability_x': '100',
'probability_y': '100',
'interval': '28', # containers tree fine-tuned
'max_faults': '1'},
'DAOS_FAULT_BTREE_OPEN_INV_CLASS_29': {
'id': '131590',
'probability_x': '100',
'probability_y': '100',
'interval': '29', # gc tree fine-tuned
'max_faults': '1'},

Comment on lines +82 to +87
if self.server_managers[0].manager.job.using_control_metadata:
dlck_cmd = DlckCommand(host, self.bin, pool_uuids[0], nvme_conf=nvme_conf,
storage_mount=scm_mount, env_str=env_str)
else:
dlck_cmd = DlckCommand(host, self.bin, pool_uuids[0], storage_mount=scm_mount,
env_str=env_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above. There is no need to have an if just to provide nvme_conf=None for some cases. Please reduce.

Comment on lines +88 to +91
result = dlck_cmd.run()
if not result.passed:
errors.append(f"dlck failed on {result.failed_hosts}")
self.log.info("dlck basic test output:\n%s", result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as with dumping the contents of fault injection file you are processing and printing the command result twice in this code. Please write a helper function.

self.log.info("dlck basic test output:\n%s", result)
dmg.system_start()
if not errors:
self.fail("No Errors detected:\n{}".format("\n".join(errors)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a very elaborate way of printing an empty list. 😉

Suggested change
self.fail("No Errors detected:\n{}".format("\n".join(errors)))
self.fail("No Errors detected.")

Comment on lines +14 to +16
1:
targets: 4
storage: auto
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here. A single engine should be enough.

result = dlck_cmd.run()
if not result.passed:
errors.append(f"dlck failed on {result.failed_hosts}")
self.log.info("dlck basic test output:\n%s", result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.log.info("dlck basic test output:\n%s", result)
self.log.info(f"dlck basic test output:\n{result}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants