In this step, we should be able to identify that ‘Paying for an order’ and ‘Payment for your item list’ are the same scenario. Also, that it involves the entity, ‘Delivery address’ and the action, ‘Confirm’. This helps us trace the Test cases to its Requirements.
Similarly, while analyzing the Defect description, we can identify that ‘Confirmation’ action is not being performed on ‘Delivery address’ entity. This helps us in tracking the defect to its requirements. Also, in the second defect, we can see that it is about ‘Default’ and ‘Account settings.’ However, when we look at the combined entities as a whole ‘Payment mode in Account settings’ and understand that it is not related to any requirement that is being considered in the example.
Step 4: Similarity score normalization
Certain entities and actions may frequently occur in the set of requirements and test cases. These terms may not be the primary entity or the action, based on which we can decide the traceability. For example, most of the test cases will have the Login and Authentication step. Therefore the weightage for the Login Action will be low, and the other actors present in the test case will be given importance. Similarly, entities like, ‘system’, ‘application’ or actions like ‘click’ may occur frequently.
All the extracted entities and action will be put together to form the corpus which is used to derive vector representations. Then, based on the frequency, the normalization factor is derived. More weightage will be given for less frequent terms.
Some basic steps for semantic score normalization are as follows:
- Remove stop words (articles, prepositions, etc.) and punctuation from the corpus. Natural language toolkits like NLTK can be used for the purpose
- Get vector representation of entities by training the word embedding models (Word2Vec or GloVe) from the corpus
- Wherever possible, it is better to form vectors by considering noun phrases (e.g. ‘item list’) and verb phrases (e.g. ‘paying order’)
- Normalize term significance with techniques like TF-IDF
- Finally, derive the final score (between 0 and 1) using cosine similarity. Typically, 0.7 is a reasonable threshold to shortlist the terms
What are the advantages?
We have illustrated how to develop traceability between requirement, test case, and Defect using the four-step NLP based approach. This approach can lead to the following benefits:
• Automated impact analysis
Lifecycle traceability gives the relationship between requirements, test cases and defects at one go. This makes it easy to analyze the impact of any change request
• Improved requirement and test coverage
NLP technique helps us to know whether the given test case for a particular requirement is testing all the scenarios and stipulations mentioned in the requirement. This will ensure that all requirements are being developed without any gaps and perform as expected. Also, non-atomic requirements will be validated for all actions present in the requirement
• Updated traceability matrix
NLP approach to traceability can be used to keep the test cases used for regression testing updated and valid. Further, outdated and duplicate test cases from any previous releases and sprints can be identified and removed
Conclusions and future work
Research shows that on average only 29% of projects were successfully delivered in the year 2015. Poor requirement quality is one of the major contributors to this problem since poor requirements definition causes 40% to 60% of the defects in software.
NLP based approach for SDLC traceability is more efficient and effective than the traditional approach based on manual interventions. It brings in higher predictability in software delivery by ensuring that different stages of the product being developed are traced back to business needs with minimal intervention. While it is difficult to calculate exact ROI, given the multiplicity of factors impacting project success, based on relative value and initial pilots, we believe NLP based traceability has the potential to bring down the review and rework effort by up to 20% on average as major slippages and oversights related to business requirements are avoided. Further, this approach aids and accelerates change management, thereby making project teams more responsive to evolving business needs.
In future, the effectiveness of NLP based traceability on project KPIs can be further augmented with the recommendations below:
- Detailed designs can be processed using NLP to obtain traceability between requirements and design elements. As a pre-requisite, this may require that development teams standardize and publish upfront the nomenclature used for various application elements.
- In the case of ETL projects, mapping documents can be generated to include DB related artefacts in traceability
- Reverse engineering and code coverage tools can be explored to extend traceability between developed code and functional specs
IEEE Transactions on Knowledge and Data Engineering (Volume: 18, Issue: 8, Aug. 2006)