Abstract
In the software development lifecycle (SDLC), traceability is an important discipline and refers to the mapping of requirements throughout the application development process, ensuring that the delivered software fulfills the user’s expectations. In this paper, we will look into the limitations in the traditional traceability process and how natural language processing techniques will enhance traceability across SDLC.
Introduction: Why traceability matters?
Customer satisfaction is the primary differentiating factor today for both sustenance and growth of the business. Customers will be satisfied with the product if the product fulfills their business needs. The key to delivering the right product is ensuring that the product developed fulfills all the agreed requirements and is tested for the accepted levels of performance. To meet this criterion, requirements need to be mapped to all the major SDLC phases (Fig. 1 illustrates SDLC flow).
How to measure, monitor and report traceability
Traceability is currently done manually for project artifacts even where Application Lifecycle Management (ALM) tools are deployed. Also, complete coverage of artifacts across SDLC is missing. Significant drawbacks of this approach are:
In software development, both business and system requirements are often captured in natural language. In this paper, we discuss how natural language processing can be used to process this information for automating traceability and enhancing it by covering all the artefacts across SDLC.
NLP based approach
The chances of success of the delivered product can increase significantly if a) traceability can extend beyond requirements and test cases and b) traceability for significant upstream artefacts can be established automatically. In the NLP based approach, we can accomplish both the goals above by extracting the hidden entities, relationships and actions in each artifact.
Step 1: Data pre-processing
System requirements will be broken down into tokens for identifying their Part of Speech tags. Then we will filter the tokens for the exclusion list, containing Auxiliary verbs and nouns, which are not critical items in our process. For further refinement, Named Entity Recognition and the Domain Based Dictionary are used to identify the nouns and actions correctly.
Apart from auxiliary verbs, certain main verbs like ‘be’ will also be filtered out through the exclusion list.
Step 2: Phrase extraction
In this stage, we apply our Natural Language Processing rules to identify the Noun phrase and Verb phrase. Noun phrases will be the candidate for key entities, and Verb phrases will be the candidate for Action.
Prepositions will be used to identify scenarios and cases. In the example that we are looking, it is important to consider the preposition ‘in’ between the Noun Phrases, ‘Delivery Address’ and ‘Account Settings'. Both the Noun Phrases will be combined to form one entity. Other Example, ‘Configuration in the system’ and ‘system configuration’, both the entities are same. The above example shows the importance of including the preposition between Noun Phrases.
Also, scenario from the second requirement, ‘paying - Order’ will be identified based on the preposition ‘while’.
Step 3: Semantic similarity
Semantic similarity between the various entities across the documents would help us create the Traceability automatically.
In this step, we should be able to identify that ‘Paying for an order’ and ‘Payment for your item list’ are the same scenario. Also, that it involves the entity, ‘Delivery address’ and the action, ‘Confirm’. This helps us trace the Test cases to its Requirements.
Similarly, while analyzing the Defect description, we can identify that ‘Confirmation’ action is not being performed on ‘Delivery address’ entity. This helps us in tracking the defect to its requirements. Also, in the second defect, we can see that it is about ‘Default’ and ‘Account settings.’ However, when we look at the combined entities as a whole ‘Payment mode in Account settings’ and understand that it is not related to any requirement that is being considered in the example.
Step 4: Similarity score normalization
Certain entities and actions may frequently occur in the set of requirements and test cases. These terms may not be the primary entity or the action, based on which we can decide the traceability. For example, most of the test cases will have the Login and Authentication step. Therefore the weightage for the Login Action will be low, and the other actors present in the test case will be given importance. Similarly, entities like, ‘system’, ‘application’ or actions like ‘click’ may occur frequently.
All the extracted entities and action will be put together to form the corpus which is used to derive vector representations. Then, based on the frequency, the normalization factor is derived. More weightage will be given for less frequent terms.
Some basic steps for semantic score normalization are as follows:
What are the advantages?
We have illustrated how to develop traceability between requirement, test case, and Defect using the four-step NLP based approach. This approach can lead to the following benefits:
• Automated impact analysis
Lifecycle traceability gives the relationship between requirements, test cases and defects at one go. This makes it easy to analyze the impact of any change request
• Improved requirement and test coverage
NLP technique helps us to know whether the given test case for a particular requirement is testing all the scenarios and stipulations mentioned in the requirement. This will ensure that all requirements are being developed without any gaps and perform as expected. Also, non-atomic requirements will be validated for all actions present in the requirement
• Updated traceability matrix
NLP approach to traceability can be used to keep the test cases used for regression testing updated and valid. Further, outdated and duplicate test cases from any previous releases and sprints can be identified and removed
Conclusions and future work
Research shows that on average only 29%[1] of projects were successfully delivered in the year 2015. Poor requirement quality is one of the major contributors to this problem since poor requirements definition causes 40% to 60%[2] of the defects in software.
NLP based approach for SDLC traceability is more efficient and effective than the traditional approach based on manual interventions. It brings in higher predictability in software delivery by ensuring that different stages of the product being developed are traced back to business needs with minimal intervention. While it is difficult to calculate exact ROI, given the multiplicity of factors impacting project success, based on relative value and initial pilots, we believe NLP based traceability has the potential to bring down the review and rework effort by up to 20% on average as major slippages and oversights related to business requirements are avoided. Further, this approach aids and accelerates change management, thereby making project teams more responsive to evolving business needs.
In future, the effectiveness of NLP based traceability on project KPIs can be further augmented with the recommendations below:
References
IEEE Transactions on Knowledge and Data Engineering (Volume: 18, Issue: 8, Aug. 2006)
[1] https://www.infoq.com/articles/standish-chaos-2015
[2] http://tynerblain.com/blog/2005/12/28/why-we-should-invest-in-requirements-management/
http://tynerblain.com/blog/2006/01/06/the-best-way-to-improve-roi-is-with-good-requirements/
Aman Chandra has 25 years of experience in the IT industry with the focus on delivery excellence, software engineering, automation and natural language processing. He currently drives the SDLC Automation charter for Wipro HOLMESTM and is responsible for the design, development and roll out of RPA and Cognitive assets across the Application Development Lifecycle enabling project teams deliver high quality apps in an accelerated mode. Aman has also filed several patents on application of NLP based algorithms for achieving left shift of quality and productivity enhancements. Earlier, as head of central Tools group, Aman was responsible for the roll out of Wipro’s ALM platform called CloudCLM, which has been adopted by over 50 customer accounts.
Sakthivasan is a consultant with 4+ years of experience in Analytics and Machine Learning. He has worked in various domains, including SDLC Automation, BFSI, People Supply Chain and Social Media Analytics. He specializes in NLP, which he has used to automate and drive various businesses. One of his key strengths is Statistics which he combines with his analytic skills and SME, to make the right business decisions in consulting.