What problem would this solve?
When a workflow contains multiple jobs running in parallel, and one of those jobs fails, it is currently necessary to wait for all other parallel jobs to finish before being able to re-run the failed job.
This creates unnecessary delays, especially in workflows with long-running parallel jobs.
It would be very useful to allow users to immediately re-run a failed job independently, without waiting for the remaining parallel jobs to complete.
Screenshots
- Screenshot 1: A failed parallel job where the Re-run buttons are not yet available because other parallel jobs are still running.
- Screenshot 2: The same workflow 7 minutes later, after all parallel jobs have finished, where the Re-run buttons finally appear.
This demonstrates that re-running a failed job is currently blocked until every parallel job completes.
What do you propose?
If a job fails in a parallel execution stage:
- The failed job should become available for re-run immediately.
- Other parallel jobs should continue running normally.
- The user should not need to wait for all parallel jobs to finish before retrying the failed job.
What problem would this solve?
When a workflow contains multiple jobs running in parallel, and one of those jobs fails, it is currently necessary to wait for all other parallel jobs to finish before being able to re-run the failed job.
This creates unnecessary delays, especially in workflows with long-running parallel jobs.
It would be very useful to allow users to immediately re-run a failed job independently, without waiting for the remaining parallel jobs to complete.
Screenshots
This demonstrates that re-running a failed job is currently blocked until every parallel job completes.
What do you propose?
If a job fails in a parallel execution stage: