First Line Software is a premier provider of software engineering, software enablement, and digital transformation services. Headquartered in Cambridge, Massachusetts, the global staff of 450 technical experts serve clients across North America, Europe, Asia, and Australia.
When Agile methods were first emerging, their proponents claimed that everything was going to be simple: we code units and the unit tests to test them, and the Customer writes acceptance tests. If the tests pass, it’s done.
Later came the understanding that the Customer is seldom able or willing to write any kind of formalized code. Still, the testing was necessary, and so either the programmers or the testers have to do it.
At the same time, the traditional approach to testing, with its voluminous test plans that need to get prior approvals, and the implementation by extremely patient and stress-resistant people trained to click on buttons, fits very poorly (or rather not at all) with Agile, which emphasizes automated unit testing and continuous integration. Systems for testing of the final product via GUI have been around for a while, but they traditionally suffer from the chasm between the testers’ needs (emphasis on test scenarios, usage scenarios or their equivalents) and the necessity to program the interaction with the system. Approaches along the lines of “let’s have a tester do some clicking around, we’ll record the sequence and call that a test scenario” produce results that break apart and turn to dust at the slightest contact with reality. With Ajax web applications gaining ground, the testing problem was getting even more difficult.
From the programmers’ standpoint, the ideal method to describe a sequence of steps would be a programming language, preferably the same one used for coding the system itself. Analysts and testers prefer natural human language. An obvious approach would be to separate the tests into the descriptive part easily understandable by testers and analysts, and the code that does the button-clicking, with some connection between the former and the latter. Such systems came about as “tester’s workbenches”, with a test scenario in one window and the code in another, with a special UI bringing them together. These are really cumbersome, and don’t play well with continuous integration.
Finally, the ideal: plain text describing a test scenario, plus code, plus a minimalistic mapping between the two, with no new artifacts (each new artifact = cost of keeping it up to date), with the possibility of running it at build time, automatically. That’s how Cucumber came about, first for Ruby (which is what it is written in), then integrated with many other programming languages. It fits really well into BDD (behavior-driven development), which is basically an outgrowth of TDD (test-driven development) that addresses not just units but the development of the system as a whole (describing the system’s functionality through a set of usage scenarios, with the development of the system being a way to achieve the described functionality). The success criteria here are passing those scenarios as tests. Technically, the solution is simple genius: a relatively small engine searching for pieces of code responsible for the steps in the scenario based on annotations (for Java) or their equivalents for other languages, using regular expressions. It works with a scenario written in one of the supported natural languages using simple rules and with simple structures, and with the code capable of executing the desired steps. In principle, Cucumber doesn’t care what it automates, although its main area of application is the description of functionality of web applications.
When our project was starting, it was apparent that we would need to automate testing – not just unit testing, but integration testing as well, for the following reasons:
The project was large enough, with lots of expansion of functionality, and a few rounds of manual testing, as detailed as it may be, was not going to solve our problem. At the same time, maintaining a full time testing team was going to be too thick.
The project was a corporate web application, and as such, it typically has lots of similar forms, with users performing fairly typical actions (a diverse range of UI approaches was not warranted, as it would only confuse the users). This gave us hope that we could build and maintain a library to implement those typical actions for various scenarios.
Since this was a Java project, we used Java to describe the steps in the scenarios, even though it would have probably been more compact if written in Ruby. For testing automation, we use Selenium 2.0, probably the most widely used system fresh off a new release after along period of waiting. Execution of the test scenarios was integrated into the build process on the build server – Cucumber supports that possibility, including usage of Maven.
The world is not perfect, so our project does not adhere 100% faithfully to the BDD ideology: Cucumber is only used as a means to automate testing. Since we demo our system to various clients from time to time as a working prototype, we made the natural decision to cover the demo scenarios with the tests. It is still too early to make far-reaching conclusions, but there are some things that are already clear at this point:
The main costs are expended not on writing the plain text tests, but on the automation code in Java. One factor in this is the fact that the system is being actively built and expanded, and it is objectively difficult to automate modern dynamic web interfaces.
A tester or an analyst can produce a lot of scenarios if they pass at first run. If the automation is not up to par, and they fail, the productivity declines significantly: too much time is spent investigating the reasons and waiting for fixes. Sometimes people try to fix real system defects as if they were automation errors.
It is relatively cheap to keep the plain text scenarios for the relatively stable parts of the system current.
An interesting question is who should write the text scenarios. We tried the approach where the analyst wrote “high level” scenarios and discussed them with the architect, one or two sentences per form, and the tester wrote down values for various fields, button clicks, etc. A minus in this approach is the fact that the tester does not fully understand the application’s logic, and sometimes tries to adjust the scenario to fit the system. At the next stage, we may try to have the analyst approve the scenarios – getting him to write them is highly unlikely.
We are expecting further returns on Cucumber in the future, when we start adding functionality with a solid and stable GUI in place.