-
Notifications
You must be signed in to change notification settings - Fork 338
DAOS-17519 test: Automate dlck testing (basic/fault_injection) #17307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Test-tag: DlckBasicFaultTest DlckBasicTest Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
|
Ticket title is 'Basic dlck test: scan the DAOS system by running the dlck tool.' |
Test-tag: DlckBasicTest DlckBasicFaultTest Skip-func-hw-test-medium: false Skip-func-hw-test-medium-md-on-ssd: false Skip-unit-test: true Skip-fault-injection-test: true Signed-off-by: rpadma2 <ravindran.padmanabhan@hpe.com>
| 'id': '131584', | ||
| 'probability_x': '100', | ||
| 'probability_y': '100', | ||
| 'interval': '2'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO without a comment it is hard to understand why the interval value is as it is.
| 'interval': '2'}, | |
| 'interval': '2'}, # skip sys_db |
| 'id': '131584', | ||
| 'probability_x': '100', | ||
| 'probability_y': '100', | ||
| 'interval': '2'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you decided to skip the max_faults parameter for a number of faults?
| 'interval': '2'}, | |
| 'interval': '2', | |
| 'max_faults': '1'}, |
| 'id': '131586', | ||
| 'probability_x': '100', | ||
| 'probability_y': '100', | ||
| 'interval': '2'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here. Please apply to all instances below.
| 'interval': '2'}, | |
| 'interval': '2'}, # skip sys_db |
| 'probability_x': '100', | ||
| 'probability_y': '100', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know these are defaults so adding them to all of the records just bloats the code with no added value.
| 'probability_x': '100', | |
| 'probability_y': '100', |
| 'DAOS_FAULT_BTREE_OPEN_INV_CLASS': { | ||
| 'id': '131590', | ||
| 'probability_x': '100', | ||
| 'probability_y': '100', | ||
| 'interval': '28', | ||
| 'max_faults': '1'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fault can be fine-tuned to hit either the containers tree (interval = 28) or the gc tree (interval = 29). Hence it seems we need two records with distinctive keys. I am sorry it was not obvious from the fault_injection_dlck.yaml file.
And IMHO we really need to give comments here explaining where 28 and 29 come from. Otherwise you won't reverse-engineer their meaning and considering these values are fine-tuned it could be crucial to adjust them in the future.
| 'DAOS_FAULT_BTREE_OPEN_INV_CLASS': { | |
| 'id': '131590', | |
| 'probability_x': '100', | |
| 'probability_y': '100', | |
| 'interval': '28', | |
| 'max_faults': '1'}, | |
| 'DAOS_FAULT_BTREE_OPEN_INV_CLASS_28': { | |
| 'id': '131590', | |
| 'probability_x': '100', | |
| 'probability_y': '100', | |
| 'interval': '28', # containers tree fine-tuned | |
| 'max_faults': '1'}, | |
| 'DAOS_FAULT_BTREE_OPEN_INV_CLASS_29': { | |
| 'id': '131590', | |
| 'probability_x': '100', | |
| 'probability_y': '100', | |
| 'interval': '29', # gc tree fine-tuned | |
| 'max_faults': '1'}, |
| if self.server_managers[0].manager.job.using_control_metadata: | ||
| dlck_cmd = DlckCommand(host, self.bin, pool_uuids[0], nvme_conf=nvme_conf, | ||
| storage_mount=scm_mount, env_str=env_str) | ||
| else: | ||
| dlck_cmd = DlckCommand(host, self.bin, pool_uuids[0], storage_mount=scm_mount, | ||
| env_str=env_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above. There is no need to have an if just to provide nvme_conf=None for some cases. Please reduce.
| result = dlck_cmd.run() | ||
| if not result.passed: | ||
| errors.append(f"dlck failed on {result.failed_hosts}") | ||
| self.log.info("dlck basic test output:\n%s", result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as with dumping the contents of fault injection file you are processing and printing the command result twice in this code. Please write a helper function.
| self.log.info("dlck basic test output:\n%s", result) | ||
| dmg.system_start() | ||
| if not errors: | ||
| self.fail("No Errors detected:\n{}".format("\n".join(errors))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a very elaborate way of printing an empty list. 😉
| self.fail("No Errors detected:\n{}".format("\n".join(errors))) | |
| self.fail("No Errors detected.") |
| 1: | ||
| targets: 4 | ||
| storage: auto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here. A single engine should be enough.
| result = dlck_cmd.run() | ||
| if not result.passed: | ||
| errors.append(f"dlck failed on {result.failed_hosts}") | ||
| self.log.info("dlck basic test output:\n%s", result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.log.info("dlck basic test output:\n%s", result) | |
| self.log.info(f"dlck basic test output:\n{result}") |
Test-tag: DlckBasicFaultTest DlckBasicTest
Steps for the author:
After all prior steps are complete: