DR Process
decide which stack to launch and in which region if we wish to restore in any region except us-east-1 we first would need to enable the region within control tower
copy primary stack definition change region value remove cluster attribute we will not generally want to deploy multiple clusters within a dr region so do not append date string attributes to eks and helm resources
-
use leapp to establish a terminal session to Palolo root account
-
execute drup workflow
time atmos workflow up -f provision.yaml -s palolo-labs-us-east-1
real 55m47.187s
user 12m18.104s
sys 0m53.208s
- connect dr vpc to network hub
time atmos terraform apply vpc-peering -s palolo-network-us-west-2
real 5m3.527s
user 7m39.813s
sys 0m56.083s
- use leapp to establish a terminal session to Product deployment account (labs in this case)
- use klogin to authenticate to new cluster
klogin us-east-1
- restore database
time dbdr
real 4m45s
- apply helm dependencies to cluster
time atmos workflow helm -f provision.yaml -s palolo-labs-us-east-1
real 4m40.589s
user 3m13.503s
sys 0m10.705s
check/create overrides file for env/region add new cluster name to deployment workflow(s)
- use github actions to perform deployment to dr cluster
real 10m 0s
concerns
- we should support existence of original backup bucket until transition is completed remove from tf state and manually manage deletion when no longer needed
key improvements
-
naming CDN collision () -- regional dns deployments?
-
naming s3 backup bucket collision with primary -- regional named backup buckets
-
we need to be able to customize the aws region in application helm file
-
we need to be able to customize the archive bucket for backup
-
inject database_host into deployment templates
-
DATABASE_HOST url is statically coded in application
-
sqs component needs applied twice to succeed
-
if eks cluster attribute is not specified generate one
-
modify klogin so we can specify region
-
practice and document process for restoring database
https://stackoverflow.com/questions/31062365/get-last-modified-object-from-s3-using-aws-cli -
[-] archive buckets x remove existing buckets from terraform state x create new buckets reconfigure app deployments to use new buckets remove old buckets
-
[-] cdn resources x remove existing from terraform state x create new reconfigure app deployments to use new remove old
-
CDN url is statically coded in application
-
kustomize envrc relies on leapp to set the operating region
-
implement cross region secrets replication -- pull secrets from region where deploying in the event of a catastrophic region failure
-
i would like to have consistent vpc cidr ranges.. even numbers in us-west-2 region odd numbers in us-east-1 region
-
pre-create dr stack definitions we have established the process and effectiveness of using attributes to facilitate launching multiple eks clusters within a region but for the most part we should avoid this it's probably better to practice the DR process in the situations where launching another cluster might be beneficial