-
Notifications
You must be signed in to change notification settings - Fork 67
Description
What would you like to report?
Here is a list of requirements that (in my opinion) should be checked to ensure that the combo QuAcc/Parsl is production-ready.
-
Use the full capabilities of the software and the HPC: run flexible parallel commands without additional complexity. For now, this is done by providing
parsl_resource_specificationparameter in the job decorator. This gives access to thePARSL_MPI_PREFIXenvironment variable that should be used to launch the job. This is critical because without it jobs are run fully relying on the good faith of the task executor (srun mainly) to correctly dispatch jobs to the nodes. For example, this does not work on the MMM Young cluster which usesmpirun. -
HPCs time limitations should not be a burden: For this Parsl checkpoints seem like a natural solution: they allow users to keep results in the cache, and to restart a workflow only where it stopped.
-
If possible, it should be possible to ask for retries in case of job failures. If possible there should be various options depending on the nature of the error, and it should be possible to customize the input parameters for retries (e.g. my meta-gga calculation failed, I lower the
mixing_beta, etc...)
For the first point, it seems that it does not work correctly currently as the environment variable is retrieved in settings.py which is probably done before the job is launched (#2411).
For the second point, I think the Parsl checkpointing currently does not work with Quacc, as Parsl keeps complaining about Atoms object not being hashable (to check).
For the third point, Parsl retry handling does not allow to specify new parameters. This makes it generally not fitted for computational chemistry as most users will not care much about blindly repeating the same calculations. Some Parsl team members suggested that, conceptually, if one wants to change the parameters, they should instead launch a new job, this approach will be definitely possible with in-built try-except statements, after (#2410), although somewhat limited.
Additionally, reading QuAcc documentation it is a little bit confusing what functionality is usable with QuAcc.
I will work on this when I have more time to do what I like (end of september 😅) if I still have access to some HPCs.