Skip to content

Add autoscaling policy to add an instance when load is high #1317

@KlaasH

Description

@KlaasH

We have an autoscaling policy to reduce the instance count when the site is not heavily loaded, but we don't have one to increase capacity when it is.
This is causing downtime, since we always have worker churn within the app tasks, and when the load gets higher the chances that all 4 workers will be down at the same time increases. When that happens the health check fails and ECS replaces the task, but since we only have one EC2 instance, it can't start a new task before stopping the old one. So we end up with the service down entirely while it makes the switch (also it seems like we might be waiting on some sort of timeout or cooldown, because that usually takes a little more than an hour, which is longer than it should).

Increasing the instance count when the load is high, and keeping the desired task count in ECS permanently high, should make it so that a new instance will come up and, hopefully, a new task will be running on it by the time the existing task gets killed due to health check failure. Or possibly it would reduce the chances of health check failure in the first place by taking some of the load. In any case, it seems worth doing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions