Triaging and managing Skampi bugs¶
This document defines a process for Triaging and managing Skampi bugs so that any SKA team member knows how to handle the funnel of incoming bugs, the allocation, distribution and management of them.
The standard process for changing software includes the following phases:
- Problem/modification identification, classification, and prioritization
- Regression/system testing
- Acceptance testing
The above process is no different for triaging and managing a bug in skampi. In the present document we will focus on how to identify a problem or bug from incoming information and event notifications and how to assign it to the right team(s).
The problem identification phase starts when there is an indication of a failure. This information can be raised by a developer (in any shared slack channel like the team-system-support) or by an alert in the following slack channels:
Any project member can join these channels to gain visibility of this information.
Other source of information are:
Allocating ownership to teams¶
The following are general rules for allocating ownership to teams:
- The primary responsibility for a failed pipeline is the owner of the first commit to the branch since the last successful run of the pipeline. It is therfore the responsibility of the committer to follow up on the pipeline status after each git push.
- For every test case failing, the creator(s) of the test must be involved in order to assign the bug to the appropriate team.
- The System Team should be involved in the problem identification in order to understand whether the problem is infrastructure related (related to a k8s cluster or any layer below it - docker, VM, virtualization etc).
- For prometheus alerts, the system team must provide the analysis of the alert details in order to understand the cause, and give input into assigning it to the right team(s).