An empirical comparison of commercial and open‐source web vulnerability scanners

Web vulnerability scanners (WVSs) are tools that can detect security vulnerabilities in web services. Although both commercial and open‐source WVSs exist, their vulnerability detection capability and performance vary. In this article, we report on a comparative study to determine the vulnerability detection capabilities of eight WVSs (both open and commercial) using two vulnerable web applications: WebGoat and Damn vulnerable web application. The eight WVSs studied were: Acunetix; HP WebInspect; IBM AppScan; OWASP ZAP; Skipfish; Arachni; Vega; and Iron WASP. The performance was evaluated using multiple evaluation metrics: precision; recall; Youden index; OWASP web benchmark evaluation; and the web application security scanner evaluation criteria. The experimental results show that, while the commercial scanners are effective in detecting security vulnerabilities, some open‐source scanners (such as ZAP and Skipfish) can also be effective. In summary, this study recommends improving the vulnerability detection capabilities of both the open‐source and commercial scanners to enhance code coverage and the detection rate, and to reduce the number of false‐positives.


INTRODUCTION
The economic importance of web applications in multiple domains, including banking, 1 transportation, 2 manufacturing, 3 business, 4 and education, 5 has increased the need for a mechanism to control and improve their quality.The extensive, almost ubiquitous, use of web applications has also resulted in an equally dramatic increase in attacks. 6These attacks normally target weaknesses, flaws, and errors, (commonly referred to as security vulnerabilities) that may cause an explicit failure to protect the confidentiality, integrity, and availability of the application. 7Examples of attacks include: command injection 8 ; buffer overflow 9,10 ; data or path manipulation 11 ; access control 12 ; session hijacking 13 ; and cookie poisoning. 6,14hen the attacks succeed, they can result in data breaches and have other serious security implications.
In an attempt to improve both vulnerability detection and the general quality of web applications, several web vulnerability scanners (WVSs) have been developed and studied, including: the web application attack and audit framework (W3af) 15 ; OWASP zed attack proxy (OWASP ZAP) 16 ; Skipfish 17 ; Arachni 18 ; Vega 19 ; Stalker 20 ; and IronWASP. 21orrection added on 3 August 2020, after first online publication: the ORCID information of the fourth author has been added] Seng et al 22 defined WVSs as tools used to test and detect common security breaches in web applications.5][26] A key question regarding both commercial and open-source WVSs is: Which WVSs is most suited for detecting a particular class of security vulnerability, doing so with high detection and low false-positive rates?Previous studies have attempted to answer this, with Fonseca et al 20 and Suto, 27 for example, performing comparative studies of various open-source and commercial WVSs.Antunes and Vieira 28 investigated the vulnerability detection capabilities of three WVSs (IPT-WS, SIGN-WS, and RAD-WS), assessing their effectiveness based on coverage and false-positives, and finding that they could effectively detect the topmost web vulnerabilities, such as SQL injection and cross-site scripting (XSS).Makino and Kleve 25 examined the vulnerability detection capability of two open-source scanners, OWASP ZAP and Skipfish, using the damn vulnerable web application (DVWA) and web application vulnerability scanner project 29,30 : Their experimental results showed ZAP to be superior to Skipfish.
Although there are several comparative studies on WVSs, the focus has mainly been on commercial scanners, with few studies empirically examining the effectiveness of open-source tools.To address this, following a similar procedure to that of Makino and Kleve, 25 this study examines the vulnerability detection capabilities of both the commercial scanners Acunetix, 22 HP Webinspect, 19 IBM Appscan, 31 and the open-source scanners OWASP Zed Attack Proxy (OWASP ZAP), 16 Skipfish, 32 Arachni, Vega, 33 and Iron WASP. 34This choice of WVSs was partly motivated by software vendor interest in these specific tools (including reported skepticism over their detection capabilities, in terms of their false positive, false negative and coverage 35 ), but also due to their apparent wide usage and regular updates. 36In addition, vendors need to be well-informed of the effectiveness of the tools (both open-source and commercial) to enable appropriate evaluation and informed choices.This comparative study of the detection capabilities of the tools (both open-source and commercial) will support vendors' selection of the most appropriate WVSs.
To the best of our knowledge, no other study has empirically analyzed these scanners against the DVWA 37 and Web-Goat tools, 38 using our selected evaluation metrics (precision; recall; Youden index; OWASP web benchmark evaluation (WBE); and the web application security scanner evaluation criteria (WASSEC)). 39,40This study makes the following contributions:

• An extensive experiment evaluating the vulnerability detection effectiveness of eight commercial and open-source
WVSs is reported on.
• The functionality of the commercial and open-source WVSs is studied and compared.
• A number of possible measures to improve the commercial and open-source WVSs are suggested.
The rest of this article is structured as follows: Section 2 presents the background of the study and some previous related research.The methodology and experimental setup are given in Section 3. The experimental results are presented in Section 4. Section 5 presents a detailed discussion of the results.Section 6 examines the threats to validity of the study, and, finally, the conclusion and recommendations are presented in Section 7.

BACKGROUND AND RELATED WORK
This section presents the background to the study and an overview of some related work.It includes a description of the evolution of web applications, WVSs, and the various security vulnerabilities in web applications.There is also a summary of recent research into the evaluation of web scanners.The web application security consortium (WASC) 41 defines a web application as "a software application executed by a web server, which responds to dynamic web page requests over HTTP."According to Paulson, 42 the turning point in web application development was the introduction of Asynchronous JavaScript and XML (AJAX), a technique for creating better, faster, and more interactive web applications, which helped transition of the old concept of static web pages into a method for deploying interactive web applications.The common gateway interface (CGI) became the first standard environment used to generate dynamic web pages, with the use of CGI for website processing becoming known as web applications. 43The introduction of CGI led to the appearance of other web application development tools such as PHP, Perl, Java Server Pages (JSP), JavaScript, and VBScript. 43Figure 1 shows the evolution of web applications.A web application typically includes a client, a web server, an application server (sometimes several), and a persistent database server, often with a firewall placed between the client and the webserver/application.Figure 2 depicts a simplified web application framework.
A WVSs performs penetration testing by going through its web pages without executing the program.Most MVSs have three main components: one for crawling, one for attacking, and one for analysis. 44The crawling component identifies the input and related pages of the web application based on its uniform resource locator (URL).The attacking component breaks down information discovered from the various webpages for each input vector and vulnerability type, and then sends the content to the webserver.The analysis component evaluates and interprets the responses from the server to determine if the attacks were successful or not.Techniques for testing web applications for vulnerabilities can be categorized as either white or black box testing. 45White box testing is often used to analyze the application's source code (manually or using a code analysis tool); Black box testing, also known as penetration testing, executes the application to detect and locate security vulnerabilities. 46Ashcan, 47 Web King, 48 Web Inspect, 49 and Topsider 50 are some of the most widely applied commercial web application scanners.
Since its creation in 1997, the National Vulnerability Database 51 has published information about more than 43 000 software vulnerabilities affecting more than 17 000 software applications. 524][55] Our study also used vulnerabilities presented in this database, as well as the vulnerabilities in DVWA and WebGoat.Table 1 presents a summary of the studied vulnerability types.There has been growing interest in research evaluating WVS.For example, Vieira et al 66 evaluated the flaw detection capability of four commercial WVSs (Webinspect, Appscan, WSDigger, and Wsfuzzer): They conducted an experiment using 300 well-known web applications, finding that the selected scanners generated false positives between 35% and 40% of the time.Parvez et al 26 later conducted a comparative study of three other WVSs (Acunetix, Appscan, and ZAP), with results indicating an improved detection rate.
Alsaleh et al 67 examined four open-source scanners, finding similar detection rates for all four.More recently, Sagar et al 7 evaluated the vulnerability detection capability of three other open-source WVSs (w3af, Skipfish, and OWASP ZAP) on the DVWA, concluding that OWASP ZAP performed better than the other scanning tools.An examination of these related studies reveals that most evaluated the effectiveness of commercial scanners or open-source scanners, but not both.Most studies focused only on SQL injection and cross-site scripting.Finally, none of the studies examined and compared the WVS performance based on both DVWA and OWASP WebGoat, using all the metrics used in our study.

Vulnerability type
Abbr.

Vulnerability description
Denial of Service DOS Event or action that reduces or prevents the function of a user's target resource or application. 56de Execution CMD Exec A situation where an attacker capitalizes on the weakness of a web application injects and executes a malicious server script on the targeted application to gain access to authorized resources. 57ffer Overflow BO When an attacker exploits a vulnerability to exceed the memory buffer size and copy data from the adjacent memory location to make changes to the application. 58thentication Flaws AF When an attacker gains access to a user's data through an exposed password. 59These types of weaknesses can allow an attacker to either capture or bypass the authentication methods that are used by a web application.
Cross-Site Scripting XSS When an attacker gains access to a user's web application privileges by injecting malicious JavaScript code into the user's web browser. 44oss-Site-Request Forgery CSRF When the attacker sends an unauthenticated HTTP request to a user's browser intending to send information (such as the user's session cookie and other relevant information) to a web application. 60

SQL injection (blind) BSQLi
When an attacker has access to security details (error details) which developers have hidden.The attack uses a sequence of SQL statements to snip the hidden details to perform malicious activities. 61le inclusion FI This error is caused when an application builds a path to executable code using an attacker-controlled variable in a way that allows the attacker to control which file is executed at run time. 62flected Cross-site scripting RXSS When an attacker supplies code (using different dynamic programming languages such as ActiveX, Flash, JavaScript, or Java) to the web browser of a user through viewed pages. 63L Injection SQL When an attacker inserts unvalidated input into the database of the web application to compromise its expected use.64 Access Control Flaws ACF It is an unintended access decision caused by misconfigured rules, policies, or algorithms within an access control system.65

EMPIRICAL STUDY
Our empirical study first identified the most widely used and applied open-source and commercial WVSs, according to criteria from the WASC. 68We scanned the two benchmark web applications (WebGoat and DVWA) for vulnerabilities by configuring the browser and the selected WVSs for vulnerability detection.The detection results for each scanner were analyzed, and the performances were compared using the target metrics (precision, recall, Youden index, WBE, and WASSEC).

Research questions
Most commercial WVSs have automated crawlers and scanners, simplifying the vulnerability detection process.Open-source scanners, in contrast, typically do not have automated crawlers and scanners, and require human intervention, including to configure the tool as a proxy server.Because of this, it may be expected that commercial scanners would outperform the open-source scanners.Therefore, our first research question addresses the effectiveness of the commercial and open-source WVSs: • RQ1-How do commercial WVSs compare with open-source WVSs, in terms of detection capability, for all vulnerability types in web applications?
Similar to the motivation behind RQ1, it may also seem more likely that open-source WVSs would generate more false-positive results than commercial WVSs.Automated crawlers in commercial WVSs can more efficiently crawl all parts of a web application than the manual crawling of open-source WVSs.This leads to the second research question: • RQ2-How well do commercial WVSs compare with open-source WVS in terms of the number of false-positives generated?
Penetration testing is an important issue in cybersecurity, which partly explains the large number of WVSs developed.Typical questions asked by stakeholders lead to the third research question(s): • RQ3-Which WVS is the most effective for vulnerability detection?

Experimental setup
The experimental activity was divided into three steps: pre-experimental activities, experimental activities, and post-experimental activities.In the first stage, we conducted a detailed analysis of the eight WVSs to generate the workload (ie, an idea of the actual work going to be performed).This was followed by the selection and detection of vulnerabilities in the respective vulnerable web applications.The last stage involved the analysis and performance evaluation of the WVSs against the target metrics.The experiment was conducted on a workstation with an Intel(R) Core (TM) i5-6500 CPU at 3.20GHz, 4 GB of RAM, running Windows 7 Ultimate.

Vulnerable web applications
To test our approach, we used two vulnerable web application programs: DVWA and WebGoat.Both DVWA and Web-Goat consist of the OWASP TOP 10 security vulnerabilities.DVWA has a friendly user interface that allows developers, teachers, and students to explore and analyze web service security.It consists of multiple vulnerabilities, including command execution; cross-site request forgery; insecure captcha; file inclusion; SQL injection (standard and blind); reflected cross-site scripting (RXSS); and stored cross-site scripting (XSS). 25WebGoat is an open-source OWASP application created to help developers and experts examine the detection capability of WVS tools.The vulnerability types in WebGoat include: access control flaws; ajax security issues; authentication flaws; buffer overflows; poor code quality problems; concurrency cross-site scripting; bypass error handling flaws; injection flaws; denial of service; insecure communication; insecure configuration; insecure storage; malicious execution; parameter tampering; and session management flaws. 69These vulnerabilities which we intend to detect in DVWA and WebGoat are intentionally injected based on the OWASP TOP 10 vulnerability in our study.These main web application vulnerability types in DVWA and WebGoat are shown in Table 1.

WVSs under-study
Although there are several distributed network scanners with complex architecture, Makino and Kleve 25 reported that WVS architecture generally includes four modules: scan engine; scan database; report module; and user interface.The scan engine identifies security vulnerabilities with respect to its installed plug-ins and compares the outcome with known vulnerabilities.The scan database stores detailed information about various vulnerabilities.The report module presents a scan result with recommended solutions for developers and security administrators.The user interface provides a visual platform that can be graphical or command-driven, or both, for users to interact with the WVS.Our study examined eight WVSs, both commercial and open-source, all of which have a graphical user interface and run under Windows OS: • Acunetix 70 WVS is a commercial security web tool that scans web applications to detect exploitable vulnerabilities.It scans for cross-site scripting, SQL Injections, and other types of vulnerabilities in web applications.In addition, the tool uses a multi-threaded fast approach to crawl through a series of web pages without breaks and produces various forms of compliance and technical reports.
• WebInspect 71 is an automated commercial web application security testing tool that identifies known and unknown vulnerabilities, including parameter injection; cross-site scripting; and directory traversal in web applications.
• AppScan 22 is a commercial secure web that finds and resolves known vulnerabilities in web applications.
• ZAP 25 is an open-source WVS with a user-friendly interface used for penetration testing.It can be used by people with different software security abilities.
• Skipfish 72 is an open-source web application security reconnaissance tool.It provides an interactive sitemap for the targeted site by carrying out a recursive crawl and dictionary-based probes.The resulting map is then annotated with the output from several active security checks.The final report generated by the tool is meant to serve as a foundation for professional web application security assessments.
• Arachni 21 is an effective and user-friendly open-source WVS, written in Ruby.It is very fast at scanning, and offers different user interfaces.It also provides a customized, command-driven input, and its output is in the form of HTML.
• IronWASP 73 (iron web application advanced security testing platform) is an advanced open-source web application security testing platform that comes in various external libraries such as IronPython, IronRuby, JSON, and.NET.
• Vega 74 is an automated open-source WVS for detecting SQL and other vulnerability types.
These scanners were selected for the study based on the comparison criteria proposed by the WASC, Web Application Security Scanner Evaluation 75 and a study conducted by Suteva et al 34 on the most popular open-source vulnerability scanners.

Performance metrics
Similar to previous studies, 25,76 we compared the performance of the eight WVSs using five evaluation metrics: precision; recall; Youden index; OWASP WBE; and the WASSEC.Table 2 summarizes the notation and abbreviations. 77

Precision
OWASP 78 defined precision as the percentage of correctly detected vulnerabilities as a proportion of all reported vulnerabilities (including those incorrectly labeled).The formula for this metric is given in Equation (1).High precision values indicate a high detection accuracy of actual vulnerabilities.

Recall
Recall 79 is the number of correctly detected vulnerabilities represented as a proportion of all the known vulnerabilities (including those that should have been detected by the tool but were not).The formula for the recall is given in Equation 2.

Metrics Description
True Positive (TP) Correctly detected vulnerability.

False Positive (FP)
Vulnerabilities incorrectly classified as vulnerabilities.

True Negative (TN)
No vulnerabilities present, and the tool confirms by not detecting any.

False Negative (FN)
The tool does not identify a vulnerability that is actually present.

OWASP WBE
The OWASP benchmark project proposed a system for evaluating the effectiveness of static analysis tools called the WBE result interpretation guide. 78The guide is a visual representation of a tool's detection performance based on fall-out (false positive) and recall rates.As shown in Figure 3, the line extending from the point (0%, 0%) to (100%, 100%) is the "guessing line," with the bug detection true positive (TP) rate equal to the false positive (FP) rate: performance on this line indicates the same performance as random selection.A plot of a tool's FP rate against its TP rate that is located in the top right corner indicates that the tool reported everything as vulnerabilities; location in the bottom left corner means that the tool recorded no vulnerabilities.The top left corner is the ideal location, indicating the best detection accuracy.

Youden index
The Youden index 80

Web application security scanner evaluation criteria (WASSEC)
WASSEC 81 is comprised of six evaluation criteria/metrics that can help developers assess WVS detection capability.

Detection rates
Figure 4 presents the scanners' true-positive scores for the seven vulnerabilities in DVWA (BSQLi, CMDExec, CSRF, RXSS, SXSS, FI, and SQLi).As can be seen from the figure, while all scanners detected some CMDExec, RXSS, SXSS, and SQLi vulnerabilities, there was considerable variation in performance.For the RXSS vulnerabilities, for example, OWASP ZAP discovered 19; Acunetix and WebInspect detected five; Arachni detected four; Vega and AppScan detected three, and Skipfish and IronWasp detected one.(The remaining results can be obtained from Figure 4.) The variation in detection rates could be attributed to how individual scanners are developed for specific vulnerability classes, with licensing also appearing to have an influence: The free edition of Acunetix, for example, was only able to detect XSS vulnerabilities.Furthermore, the detection capabilities of the scanners also vary from one web application to another.
Figure 5 shows the true-positive scores for the nine vulnerabilities in WebGoat (DoS, CMDExec, BF, XSS, CQ, BP, ACF, AF, and BSQLi,).Apart from IronWasp (which only detected two vulnerabilities), all tools were able to detect multiple vulnerabilities.Although no tool was able to detect all WebGoat vulnerabilities, the individual WVS performances are a clear indication that the tools were developed differently, leading to different strengths and weaknesses.The Web-Goat vulnerabilities most detected were XSS and SQLi.Overall, the results give an indication of the commonalities and complementary strengths among the WVSs.

Scanning time
We also evaluated the efficiency of the scanners based on the time required to complete the detection of vulnerabilities in both DVWA and WebGoat.The processing time for each scanner was calculated in seconds.We recorded the time for  4. It can be seen from Table 4 that the running time for DVWA ranges from 30 to 360 seconds, and WebGoat ranges from 30 to 900 seconds.The performance differences between the scanners could be due to the URL injection points, with fewer injection points requiring less time than more points.Furthermore, variations in individual tool detection speed (time) could also be attributed to the internal security components of the applications.For instance, ZAP took 360 seconds in DVWA, but only 60 seconds in WebGoat.The scan profile of the tools for vulnerability detection could impact on the detection time.

Vulnerability severity
The vulnerabilities detected in DVWA and WebGoat were ranked according to their severity levels, 82 with high severity meaning the impact of the vulnerability is devastating; medium meaning that the impact is dangerous; low meaning that the impact is minor; and informational severity having a negligible impact.One hundred and forty six vulnerabilities were found in DVWA, of which 28 were of high severity; 29 medium; 50 low; and 39 informational.Acunetix and AppScan found the highest number of high-severity vulnerabilities in DVWA (10), followed by WebInspect (8).Not all open-source scanners found high-severity vulnerabilities, but this could be attributed to the licensing and profile settings of the tools.OWASP ZAP, for example, detected 30 vulnerabilities in DVWA (five medium, 20 low, and five informational).
IronWasp, an open-source web application security tool, detected the least number of vulnerabilities.One hundred and nine vulnerabilities were found in WebGoat, of which 23 were of high severity; 26 medium; 23 low; and 37 informational.Acunetix, WebInspect, Vega, and AppScan found 10, seven, one, and five high-severity vulnerabilities, respectively.
The different severity ratings of vulnerabilities detected by the scanners in DVWA and WebGoat could be ascribed to the internal security architecture of the two web applications.The results also indicate that open-source scanners could not detect high-severity web vulnerabilities.

DISCUSSION
This section presents a detailed analysis and evaluation of the tools.

Precision and recall analysis of scanners
In this study, both precision and recall were measured in the range of 0%-100%: An effective tool, with no false negatives or false positives, would have a value of 100% for both precision and recall.Figures 6 and 7 show the web vulnerability scanners (WVS) precision and recall values for DVWA and WebGoat, respectively.While the figures show that all scanners achieved a 100% recall score, indicating their ability to detect real vulnerabilities, there is considerable variation in their precision scores.This variation in precision could be attributed to each tool's uniqueness in vulnerability detection.Skipfish, for example, had a precision score of 75% for both DVWA and WebGoat, but Acunetix scored 68% for DVWA and 64% for WebGoat.ZAP, Arachni, and Vega all had precision scores of 56% with DVWA.These scores of less than 100% reflect the scanners flagging as vulnerabilities some issues that were not actual vulnerabilities (false positives).
Answer to RQ1: Detection capability of scanners.Both the open-source and commercial scanners were effective at detecting vulnerabilities in web applications, with the main differences between the two groups being in the different levels of precision (false positives).This finding suggests that stakeholders should consider assessing the tools based on the lowest numbers of false positives.Some tools were very effective for a specific type of vulnerability: While Acunetix, for example, was very effective at detecting RXXS vulnerabilities, OWASP ZAP was good at detecting command execution (CMDExec) vulnerabilities.

OWASP WBE
The OWASP WBE results interpretation guide (Section 3.5.3)provides a graphical representation of a tool's effectiveness, mapping its true positive against its false-positive rates, as shown in Figure 3.In our experiments, as shown in Equations ( 4) and ( 5), we defined the total true-positive and total false-positive rates as the total number across both DVWA and WebGoat (TP t and FP t are the total true and false-positive rates, respectively; TP d and FP d are the true-positive and false-positive rates, respectively, for DVWA; and TP w and FP w are the true-positive and false-positive rates, respectively, for WebGoat).
TP t = TP d + TP w (4) Figure 8 presents the WBE results for the WVSs under study.As explained in Section 3.5.3, the scanner's effectiveness is represented by its position.According to ZAP's position at the top right corner, the tool detects and reports that "everything is vulnerable"-both true and false positive rates are high.IronWasp's position corresponds to the "nothing is vulnerable" category-both true and false positive rates are low.The performance of IronWasp could be attributed to it having been designed for a specific type of vulnerability detection.The remaining scanners fell into the "tool reports nothing is vulnerable" category, except Arachni, which was close to the "tool reports vulnerability randomly" category.
Answer to RQ2: False-positive analysis of scanners.
According to the experimental data, there was no single scanner that offered ideal detection for all vulnerabilities.There were differences in the false-positive rates of vulnerabilities reported by both open-source and commercial scanners, with the rates being relatively higher for open-source tools.This performance difference could be attributed to most commercial scanners having automated scanners and crawlers, which could be more efficient and effective than the manual configuration and intervention necessary for open-source scanners.The generally high false-positive rates reflected an almost random vulnerability detection.

Youden index
Figure 9 presents the Youden index (Section 3.5.4) of the scanners under study.IronWASP has the highest Youden index (0.83), which indicates its effectiveness detecting known vulnerabilities, with little or no false positives.The next highest scoring scanners were Skipfish, Appscan, Webinspect, and Acunetix, with 0.45, 0.31, 0.23, and 0.21, respectively.The results also indicate that several open-source scanners can function as effectively as some commercial web scanners.Thus, licensing alone should not be used as a standard metric for estimating the effectiveness of a tool.

Web application security scanner evaluation criteria (WASSEC)
Table 5 shows the WASSEC (Section 3.5.5)results for the scanners under test.The results in Table 5 indicate that Acunetix has the best protocol support, followed by Appscan and Skipfish.The differences for session management, however, were much more marginal.Although there are differences in the performance of the scanners, there are similarities in the area of crawling, authentication, and testing.Figure 10 shows the average WASSEC results, according to which Acunetix has the best performance, followed by Appscan, with scores of 0.81 and 0.65, respectively.However, the third and fourth-best performers, open-source scanners Skipfish and ZAP-with scores of 0.43 and 0.40, respectively-are also good performers.
Answer to RQ3: Effectiveness of the scanners for vulnerability detection.The experimental results show that there is no single WVS that can effectively detect all vulnerability classes.Although the results indicate that the commercial scanners Acunetix and Appscan may be the most effective, the open-source scanners Skipfish and ZAP also performed well, outperforming other commercial WVSs.

THREATS TO VALIDITY
A threat to internal validity relates to the number of vulnerabilities used in the experimental analysis, that is, the total vulnerabilities in DVWA and WebGoat.To mitigate this threat, we estimated the total number of vulnerabilities by the aggregation of each scanner's true-positive to form a true representation for our experiment.There were challenges configuring the tools due to their functionalities not being compatible with the Java platform (new version) employed in this study.We used several versions with limited functionality to validate the effectiveness of the tools: This can affect the vulnerability detection rate compared to the tools with the full versions.A threat to external validity relates to the generalizability of our results because we used vulnerability data from only two vulnerable web applications to verify the efficiency of the eight WVSs studied.Our future work will address this threat by examining other vulnerabilities and implementation tools.

CONCLUSION AND FUTURE DIRECTIONS
This article has reported on a comparative study of the vulnerability detection capabilities of eight WVSs using two vulnerable web applications (DVWA and WebGoat).Of the eight WVSs studied, three were commercial scanners (Acunetix, HP Webinspect, and IBM Appscan), and five were open-source scanners (OWASP ZAP, Skipfish, Arachni, Vega, and IronWASP).Their performance was examined using five metrics: precision; recall; Youden index; OWASP WBE; and the WASSEC.The experimental results show that the commercial scanners were effective at detecting security vulnerabilities, but that there were also open-source scanners (ZAP and Skipfish) that were equally efficient at detecting some vulnerabilities (including command execution, cross-site scripting, and SQL injection).Based on the experimental analysis, we recommend improving the vulnerability detection capabilities of the commercial and open-source scanners, to enhance code coverage and detection rates, and to reduce false positives.The development of WVSs should be standardized, to improve the systems, and promote the production of high-quality tools.Reports generated by scanners should not be difficult for users to interpret and understand (such as the HTML and XML reports provided by ZAP).In our future work, we will extend this study to include more state-of-the-art tools, and to examine performance with different vulnerable web applications.

F I G U R E 1
Web application evolution [Colour figure can be viewed at wileyonlinelibrary.com]F I G U R E 2 Simplified view of a web application framework [Colour figure can be viewed at wileyonlinelibrary.com] was proposed to evaluate the performance of analytical (diagnostic) tests.It outputs values in the range [−1, 1], where a value of 1 (perfect detection) indicates detection of all vulnerabilities with no false positives; −1 indicates only false positives, and no true positives (no actual vulnerabilities detected); and a Youden index of 0 means the tool recorded the same result for a web application with vulnerabilities and without vulnerabilities-an invalid result.Equation (3) shows the formula for calculating the Youden index.

F I G U R E 4 F I G U R E 5 4
Vulnerability detection capability (true positive count) in DVWA [Colour figure can be viewed at wileyonlinelibrary.com]Vulnerability detection capability (true positive count) in WebGoat [Colour figure can be viewed at wileyonlinelibrary.com]Observed running time of scanners each scanner in both DVWA and WebGoat and present the result in Table

F I G U R E 6
figure can be viewed at wileyonlinelibrary.com]

F G U R E 8
OWASP WBE interpretation guide [Colour figure can be viewed at wileyonlinelibrary.com]
F I G U R E 3 OWASP WBE interpretation [Colour figure can be viewed at wileyonlinelibrary.com]