1. What system to undergo Chaos?
2. What are the entry and exit criteria for chaos?
3. What is the tool selection criteria and process?
4. Chaos engineering framework – How should we perform Chaos?
1. What system to undergo Chaos?
Response: If the system is composed of multiple distributed components/services where unavailability of any component may impact the SLA .
2. What are the entry and exit criteria for chaos? – may be others would be able to provide better answer.
Response: Entry criteria: system is not able to meet SLAs
Exit criteria: I don’t see any exit criteria and is continuous automated system resiliency evaluation but SLA can be one of the exit criteria.
3. What is the tool selection criteria and process?
Response:
1.security
2.ease of integration with the system
3.ease of chaos execution
4.types of chaos tests supported
5.Type of systems supported
6.On-Prem/Off-Prem support/Hybrid support
7.Restoration of system post chaos execution.
8.Cost of chaos injection.
4. Chaos engineering framework – How should we perform Chaos?
Response: Execute chaos in your dev/test first before running those in production. Also perform chaos using CI/CD to test resiliency of the systems in continuous fashion. It should not be a one time activity.