Qt
Internal/Contributor docs for the Qt SDK. <b>Note:</b> These are NOT official API docs; those are found <a href='https://doc.qt.io/'>here</a>.
Loading...
Searching...
No Matches
qttest-best-practices.qdoc
Go to the documentation of this file.
1// Copyright (C) 2019 The Qt Company Ltd.
2// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GFDL-1.3-no-invariants-only
3
4/*!
5 \page qttest-best-practices.qdoc
6
7 \title Qt Test Best Practices
8
9 \brief Guidelines for creating Qt tests.
10
11 We recommend that you add Qt tests for bug fixes and new features. Before
12 you try to fix a bug, add a \e {regression test} (ideally automatic) that
13 fails before the fix, exhibiting the bug, and passes after the fix. While
14 you're developing new features, add tests to verify that they work as
15 intended.
16
17 Conforming to a set of coding standards will make it more likely for
18 Qt autotests to work reliably in all environments. For example, some
19 tests need to read data from disk. If no standards are set for how this
20 is done, some tests won't be portable. For example, a test that assumes
21 its test-data files are in the current working directory only works for
22 an in-source build. In a shadow build (outside the source directory), the
23 test will fail to find its data.
24
25 The following sections contain guidelines for writing Qt tests:
26
27 \list
28 \li \l {General Principles}
29 \li \l {Writing Reliable Tests}
30 \li \l {Improving Test Output}
31 \li \l {Writing Testable Code}
32 \li \l {Setting up Test Machines}
33 \endlist
34
35 \section1 General Principles
36
37 The following sections provide general guidelines for writing unit tests:
38
39 \list
40 \li \l {Verify Tests}
41 \li \l {Give Test Functions Descriptive Names}
42 \li \l {Write Self-contained Test Functions}
43 \li \l {Test the Full Stack}
44 \li \l {Make Tests Complete Quickly}
45 \li \l {Use Data-driven Testing}
46 \li \l {Use Coverage Tools}
47 \li \l {Select Appropriate Mechanisms to Exclude Tests}
48 \li \l {Avoid Q_ASSERT}
49 \endlist
50
51 \section2 Verify Tests
52
53 Write and commit your tests along with your fix or new feature on a new
54 branch. Once you're done, you can check out the branch on which your work
55 is based, and then check out into this branch the test-files for your new
56 tests. This enables you to verify that the tests do fail on the prior
57 branch, and therefore actually do catch a bug or test a new feature.
58
59 For example, the workflow to fix a bug in the \c QDateTime class could be
60 like this if you use the Git version control system:
61
62 \list 1
63 \li Create a branch for your fix and test:
64 \c {git checkout -b fix-branch 5.14}
65 \li Write a test and fix the bug.
66 \li Build and test with both the fix and the new test, to verify that
67 the new test passes with the fix.
68 \li Add the fix and test to your branch:
69 \c {git add tests/auto/corelib/time/qdatetime/tst_qdatetime.cpp src/corelib/time/qdatetime.cpp}
70 \li Commit the fix and test to your branch:
71 \c {git commit -m 'Fix bug in QDateTime'}
72 \li To verify that the test actually catches something for which you
73 needed the fix, checkout the branch you based your own branch on:
74 \c {git checkout 5.14}
75 \li Checkout only the test file to the 5.14 branch:
76 \c {git checkout fix-branch -- tests/auto/corelib/time/qdatetime/tst_qdatetime.cpp}
77
78 Only the test is now on the fix-branch. The rest of the source tree
79 is still on 5.14.
80 \li Build and run the test to verify that it fails on 5.14, and
81 therefore does indeed catch a bug.
82 \li You can now return to the fix branch:
83 \c {git checkout fix-branch}
84 \li Alternatively, you can restore your work tree to a clean state on
85 5.14:
86 \c{git checkout HEAD -- tests/auto/corelib/time/qdatetime/tst_qdatetime.cpp}
87 \endlist
88
89 When you're reviewing a change, you can adapt this workflow to check that
90 the change does indeed come with a test for a problem it does fix.
91
92 \section2 Give Test Functions Descriptive Names
93
94 Naming test cases is important. The test name appears in the failure report
95 for a test run. For data-driven tests, the name of the data row also appears
96 in the failure report. The names give those reading the report a first
97 indication of what has gone wrong.
98
99 Test function names should make it obvious what the function is trying to
100 test. Do not simply use the bug-tracking identifier, because the identifiers
101 become obsolete if the bug-tracker is replaced. Also, some bug-trackers may
102 not be accessible to all users. When the bug report may be of interest to
103 later readers of the test code, you can mention it in a comment alongside a
104 relevant part of the test.
105
106 Likewise, when writing data-driven tests, give descriptive names to the
107 test-cases, that indicate what aspect of the functionality each focuses on.
108 Do not simply number the test-case, or use bug-tracking identifiers. Someone
109 reading the test output will have no idea what the numbers or identifiers
110 mean. You can add a comment on the test-row that mentions the bug-tracking
111 identifier, when relevant. It's best to avoid spacing characters and
112 characters that may be significant to command-line shells on which you may
113 want to run tests. This makes it easier to specify the test and tag on \l{Qt
114 Test Command Line Arguments}{the command-line} to your test program - for
115 example, to limit a test run to just one test-case.
116
117 \section2 Write Self-contained Test Functions
118
119 Within a test program, test functions should be independent of each other
120 and they should not rely upon previous test functions having been run. You
121 can check this by running the test function on its own with
122 \c {tst_foo testname}.
123
124 Do not re-use instances of the class under test in several tests. Test
125 instances (for example widgets) should not be member variables of the
126 tests, but preferably be instantiated on the stack to ensure proper
127 cleanup even if a test fails, so that tests do not interfere with
128 each other.
129
130 \section2 Test the Full Stack
131
132 If an API is implemented in terms of pluggable or platform-specific backends
133 that do the heavy-lifting, make sure to write tests that cover the
134 code-paths all the way down into the backends. Testing the upper layer API
135 parts using a mock backend is a nice way to isolate errors in the API layer
136 from the backends, but it is complementary to tests that run the actual
137 implementation with real-world data.
138
139 \section2 Make Tests Complete Quickly
140
141 Tests should not waste time by being unnecessarily repetitious, by using
142 inappropriately large volumes of test data, or by introducing needless
143 idle time.
144
145 This is particularly true for unit testing, where every second of extra
146 unit test execution time makes CI testing of a branch across multiple
147 targets take longer. Remember that unit testing is separate from load and
148 reliability testing, where larger volumes of test data and longer test
149 runs are expected.
150
151 Benchmark tests, which typically execute the same test multiple times,
152 should be located in a separate \c tests/benchmarks directory and they
153 should not be mixed with functional unit tests.
154
155 \section2 Use Data-driven Testing
156
157 \l{Chapter 2: Data Driven Testing}{Data-driven tests} make it easier to add
158 new tests for boundary conditions found in later bug reports.
159
160 Using a data-driven test rather than testing several items in sequence in
161 a test saves repetition of very similar code and ensures later cases are
162 tested even when earlier ones fail. It also encourages systematic and
163 uniform testing, because the same tests are applied to each data sample.
164
165 When a test is data-driven, you can specify its data-tag along with the
166 test-function name, as \c{function:tag}, on the command-line of the test to
167 run the test on just one specific test-case, rather than all test-cases of
168 the function. This can be used for either a global data tag or a local tag,
169 identifying a row from the function's own data; you can even combine them as
170 \c{function:global:local}.
171
172 \section2 Use Coverage Tools
173
174 Use a coverage tool such as \l {Coco} or \l {gcov}
175 to help write tests that cover as many statements, branches, and conditions
176 as possible in the function or class being tested. The earlier this is done
177 in the development cycle for a new feature, the easier it will be to catch
178 regressions later when the code is refactored.
179
180 \section2 Select Appropriate Mechanisms to Exclude Tests
181
182 It is important to select the appropriate mechanism to exclude inapplicable
183 tests.
184
185 Use \l QSKIP() to handle cases where a whole test function is found at
186 run-time to be inapplicable in the current test environment. When just a
187 part of a test function is to be skipped, a conditional statement can be
188 used, optionally with a \c qDebug() call to report the reason for skipping
189 the inapplicable part.
190
191 When there are known test failures that should eventually be fixed,
192 \l QEXPECT_FAIL is recommended, as it supports running the rest of the
193 test, when possible. It also verifies that the issue still exists, and
194 lets the code's maintainer know if they unwittingly fix it, a benefit
195 which is gained even when using the \l {QTest::}{Abort} flag.
196
197 Test functions or data rows of a data-driven test can be limited to
198 particular platforms, or to particular features being enabled using
199 \c{#if}. However, beware of \l moc limitations when using \c{#if} to
200 skip test functions. The \c moc preprocessor does not have access to
201 all the \c builtin macros of the compiler that are often used for
202 feature detection of the compiler. Therefore, \c moc might get a different
203 result for a preprocessor condition from that seen by the rest of your
204 code. This may result in \c moc generating meta-data for a test slot that
205 the actual compiler skips, or omitting the meta-data for a test slot that
206 is actually compiled into the class. In the first case, the test will
207 attempt to run a slot that is not implemented. In the second case, the
208 test will not attempt to run a test slot even though it should.
209
210 If an entire test program is inapplicable for a specific platform or unless
211 a particular feature is enabled, the best approach is to use the parent
212 directory's build configuration to avoid building the test. For example, if
213 the \c tests/auto/gui/someclass test is not valid for \macOS, wrap its
214 inclusion as a subdirectory in \c{tests/auto/gui/CMakeLists.txt} in a
215 platform check:
216
217 \badcode
218 if(NOT APPLE)
219 add_subdirectory(someclass)
220 endif
221 \endcode
222
223 or, if using \c qmake, add the following line to \c tests/auto/gui.pro:
224
225 \badcode
226 mac*: SUBDIRS -= someclass
227 \endcode
228
229 See also \l {Chapter 6: Skipping Tests with QSKIP}
230 {Skipping Tests with QSKIP}.
231
232 \section2 Avoid Q_ASSERT
233
234 The \l Q_ASSERT macro causes a program to abort whenever the asserted
235 condition is \c false, but only if the software was built in debug mode.
236 In both release and debug-and-release builds, \c Q_ASSERT does nothing.
237
238 \c Q_ASSERT should be avoided because it makes tests behave differently
239 depending on whether a debug build is being tested, and because it causes
240 a test to abort immediately, skipping all remaining test functions and
241 returning incomplete or malformed test results.
242
243 It also skips any tear-down or tidy-up that was supposed to happen at the
244 end of the test, and might therefore leave the workspace in an untidy state,
245 which might cause complications for further tests.
246
247 Instead of \c Q_ASSERT, the \l QCOMPARE() or \l QVERIFY() macro variants
248 should be used. They cause the current test to report a failure and
249 terminate, but allow the remaining test functions to be executed and the
250 entire test program to terminate normally. \l QVERIFY2() even allows a
251 descriptive error message to be recorded in the test log.
252
253 \section1 Writing Reliable Tests
254
255 The following sections provide guidelines for writing reliable tests:
256
257 \list
258 \li \l {Avoid Side-effects in Verification Steps}
259 \li \l {Avoid Fixed Timeouts}
260 \li \l {Beware of Timing-dependent Behavior}
261 \li \l {Avoid Bitmap Capture and Comparison}
262 \endlist
263
264 \section2 Avoid Side-effects in Verification Steps
265
266 When performing verification steps in an autotest using \l QCOMPARE(),
267 \l QVERIFY(), and so on, side-effects should be avoided. Side-effects
268 in verification steps can make a test difficult to understand. Also,
269 they can easily break a test in ways that are difficult to diagnose
270 when the test is changed to use \l QTRY_VERIFY(), \l QTRY_COMPARE() or
271 \l QBENCHMARK(). These can execute the passed expression multiple times,
272 thus repeating any side-effects.
273
274 When side-effects are unavoidable, ensure that the prior state is restored
275 at the end of the test function, even if the test fails. This commonly
276 requires use of an RAII (resource acquisition is initialization) class
277 that restores state when the function returns, or a \c cleanup() method.
278 Do not simply put the restoration code at the end of the test. If part of
279 the test fails, such code will be skipped and the prior state will not be
280 restored.
281
282 \section2 Avoid Fixed Timeouts
283
284 Avoid using hard-coded timeouts, such as QTest::qWait() to wait for some
285 conditions to become true. Consider using the \l QSignalSpy class,
286 the \l QTRY_VERIFY() or \l QTRY_COMPARE() macros, or the \c QSignalSpy
287 class in conjunction with the \c QTRY_ macro variants.
288
289 The \c qWait() function can be used to set a delay for a fixed period
290 between performing some action and waiting for some asynchronous behavior
291 triggered by that action to be completed. For example, changing the state
292 of a widget and then waiting for the widget to be repainted. However,
293 such timeouts often cause failures when a test written on a workstation is
294 executed on a device, where the expected behavior might take longer to
295 complete. Increasing the fixed timeout to a value several times larger
296 than needed on the slowest test platform is not a good solution, because
297 it slows down the test run on all platforms, particularly for table-driven
298 tests.
299
300 If the code under test issues Qt signals on completion of the asynchronous
301 behavior, a better approach is to use the \l QSignalSpy class to notify
302 the test function that the verification step can now be performed.
303
304 If there are no Qt signals, use the \c QTRY_COMPARE() and \c QTRY_VERIFY()
305 macros, which periodically test a specified condition until it becomes true
306 or some maximum timeout is reached. These macros prevent the test from
307 taking longer than necessary, while avoiding breakages when tests are
308 written on workstations and later executed on embedded platforms.
309
310 If there are no Qt signals, and you are writing the test as part of
311 developing a new API, consider whether the API could benefit from the
312 addition of a signal that reports the completion of the asynchronous
313 behavior.
314
315 \section2 Beware of Timing-dependent Behavior
316
317 Some test strategies are vulnerable to timing-dependent behavior of certain
318 classes, which can lead to tests that fail only on certain platforms or that
319 do not return consistent results.
320
321 One example of this is text-entry widgets, which often have a blinking
322 cursor that can make comparisons of captured bitmaps succeed or fail
323 depending on the state of the cursor when the bitmap is captured. This,
324 in turn, may depend on the speed of the machine executing the test.
325
326 When testing classes that change their state based on timer events, the
327 timer-based behavior needs to be taken into account when performing
328 verification steps. Due to the variety of timing-dependent behavior, there
329 is no single generic solution to this testing problem.
330
331 For text-entry widgets, potential solutions include disabling the cursor
332 blinking behavior (if the API provides that feature), waiting for the
333 cursor to be in a known state before capturing a bitmap (for example, by
334 subscribing to an appropriate signal if the API provides one), or
335 excluding the area containing the cursor from the bitmap comparison.
336
337 \section2 Avoid Bitmap Capture and Comparison
338
339 While verifying test results by capturing and comparing bitmaps is sometimes
340 necessary, it can be quite fragile and labor-intensive.
341
342 For example, a particular widget may have different appearance on different
343 platforms or with different widget styles, so reference bitmaps may need to
344 be created multiple times and then maintained in the future as Qt's set of
345 supported platforms evolves. Making changes that affect the bitmap thus
346 means having to recreate the expected bitmaps on each supported platform,
347 which would require access to each platform.
348
349 Bitmap comparisons can also be influenced by factors such as the test
350 machine's screen resolution, bit depth, active theme, color scheme,
351 widget style, active locale (currency symbols, text direction, and so
352 on), font size, transparency effects, and choice of window manager.
353
354 Where possible, use programmatic means, such as verifying properties of
355 objects and variables, instead of capturing and comparing bitmaps.
356
357 \section1 Improving Test Output
358
359 The following sections provide guidelines for producing readable and
360 helpful test output:
361
362 \list
363 \li \l {Test for Warnings}
364 \li \l {Avoid Printing Debug Messages from Autotests}
365 \li \l {Write Well-structured Diagnostic Code}
366 \endlist
367
368 \section2 Test for Warnings
369
370 Just as when building your software, if test output is cluttered with
371 warnings you will find it harder to notice a warning that really is a clue
372 to the emergence of a bug. It is thus prudent to regularly check your test
373 logs for warnings, and other extraneous output, and investigate the
374 causes. When they are signs of a bug, you can make warnings trigger test
375 failure.
376
377 When the code under test \e should produce messages, such as warnings
378 about misguided use, it is also important to test that it \e does produce
379 them when so used. You can test for expected messages from the code under
380 test, produced by \l qWarning(), \l qDebug(), \l qInfo() and friends,
381 using \l QTest::ignoreMessage(). This will verify that the message is
382 produced and filter it out of the output of the test run. If the message
383 is not produced, the test will fail.
384
385 If an expected message is only output when Qt is built in debug mode, use
386 \l QLibraryInfo::isDebugBuild() to determine whether the Qt libraries were
387 built in debug mode. Using \c{#ifdef QT_DEBUG} is not enough, as it will
388 only tell you whether \e{the test} was built in debug mode, and that does
389 not guarantee that the \e{Qt libraries} were also built in debug mode.
390
391 Your tests can (since Qt 6.3) verify that they do not trigger calls to
392 \l qWarning() by calling \l QTest::failOnWarning(). This takes the warning
393 message to test for or a \l QRegularExpression to match against warnings; if
394 a matching warning is produced, it will be reported and cause the test to
395 fail. For example, a test that should produce no warnings at all can
396 \c{QTest::failOnWarning(QRegularExpression(u".*"_s))}, which will match any
397 warning at all.
398
399 You can also set the environment variable \c QT_FATAL_WARNINGS to cause
400 warnings to be treated as fatal errors. See \l qWarning() for details; this
401 is not specific to autotests. If warnings would otherwise be lost in vast
402 test logs, the occasional run with this environment variable set can help
403 you to find and eliminate any that do arise.
404
405 \section2 Avoid Printing Debug Messages from Autotests
406
407 Autotests should not produce any unhandled warning or debug messages.
408 This will allow the CI Gate to treat new warning or debug messages as
409 test failures.
410
411 Adding debug messages during development is fine, but these should be
412 either disabled or removed before a test is checked in.
413
414 \section2 Write Well-structured Diagnostic Code
415
416 Any diagnostic output that would be useful if a test fails should be part
417 of the regular test output rather than being commented-out, disabled by
418 preprocessor directives, or enabled only in debug builds. If a test fails
419 during continuous integration, having all of the relevant diagnostic output
420 in the CI logs could save you a lot of time compared to enabling the
421 diagnostic code and testing again. Epecially, if the failure was on a
422 platform that you don't have on your desktop.
423
424 Diagnostic messages in tests should use Qt's output mechanisms, such as
425 \c qDebug() and \c qWarning(), rather than \c stdio.h or \c iostream.h output
426 mechanisms. The latter bypass Qt's message handling and prevent the
427 \c -silent command-line option from suppressing the diagnostic messages.
428 This could result in important failure messages being hidden in a large
429 volume of debugging output.
430
431 \section1 Writing Testable Code
432
433 The following sections provide guidelines for writing code that is easy to
434 test:
435
436 \list
437 \li \l {Break Dependencies}
438 \li \l {Compile All Classes into Libraries}
439 \endlist
440
441 \section2 Break Dependencies
442
443 The idea of unit testing is to use every class in isolation. Since many
444 classes instantiate other classes, it is not possible to instantiate one
445 class separately. Therefore, you should use a technique called
446 \e {dependency injection} that separates object creation from object use.
447 A factory is responsible for building object trees. Other objects manipulate
448 these objects through abstract interfaces.
449
450 This technique works well for data-driven applications. For GUI
451 applications, this approach can be difficult as objects are frequently
452 created and destructed. To verify the correct behavior of classes that
453 depend on abstract interfaces, \e mocking can be used. For example, see
454 \l {Googletest Mocking (gMock) Framework}.
455
456 \section2 Compile All Classes into Libraries
457
458 In small to medium sized projects, a build script typically lists all
459 source files and then compiles the executable in one go. This means that
460 the build scripts for the tests must list the needed source files again.
461
462 It is easier to list the source files and the headers only once in a
463 script to build a static library. Then the \c main() function will be
464 linked against the static library to build the executable and the tests
465 will be linked against the static libraries.
466
467 For projects where the same source files are used in building several
468 programs, it may be more appropriate to build the shared classes into
469 a dynamically-linked (or shared object) library that each program,
470 including the test programs, can load at run-time. Again, having the
471 compiled code in a library helps to avoid duplication in the description
472 of which components to combine to make the various programs.
473
474 \section1 Setting up Test Machines
475
476 The following sections discuss common problems caused by test machine setup:
477
478 \list
479 \li \l {Screen Savers}
480 \li \l {System Dialogs}
481 \li \l {Display Usage}
482 \li \l {Window Managers}
483 \endlist
484
485 All of these problems can typically be solved by the judicious use of
486 virtualisation.
487
488 \section2 Screen Savers
489
490 Screen savers can interfere with some of the tests for GUI classes, causing
491 unreliable test results. Screen savers should be disabled to ensure that
492 test results are consistent and reliable.
493
494 \section2 System Dialogs
495
496 Dialogs displayed unexpectedly by the operating system or other running
497 applications can steal input focus from widgets involved in an autotest,
498 causing unreproducible failures.
499
500 Examples of typical problems include online update notification dialogs
501 on macOS, false alarms from virus scanners, scheduled tasks such as virus
502 signature updates, software updates pushed out to workstations, and chat
503 programs popping up windows on top of the stack.
504
505 \section2 Display Usage
506
507 Some tests use the test machine's display, mouse, and keyboard, and can
508 thus fail if the machine is being used for something else at the same
509 time or if multiple tests are run in parallel.
510
511 The CI system uses dedicated test machines to avoid this problem, but if
512 you don't have a dedicated test machine, you may be able to solve this
513 problem by running the tests on a second display.
514
515 On Unix, one can also run the tests on a nested or virtual X-server, such as
516 Xephyr. For example, to run the entire set of tests on Xephyr, execute the
517 following commands:
518
519 \code
520 Xephyr :1 -ac -screen 1920x1200 >/dev/null 2>&1 &
521 sleep 5
522 DISPLAY=:1 icewm >/dev/null 2>&1 &
523 cd tests/auto
524 make
525 DISPLAY=:1 make -k -j1 check
526 \endcode
527
528 Users of NVIDIA binary drivers should note that Xephyr might not be able to
529 provide GLX extensions. Forcing Mesa libGL might help:
530
531 \code
532 export LD_PRELOAD=/usr/lib/mesa-diverted/x86_64-linux-gnu/libGL.so.1
533 \endcode
534
535 However, when tests are run on Xephyr and the real X-server with different
536 libGL versions, the QML disk cache can make the tests crash. To avoid this,
537 use \c QML_DISABLE_DISK_CACHE=1.
538
539 Alternatively, use the offscreen plugin:
540
541 \code
542 TESTARGS="-platform offscreen" make check -k -j1
543 \endcode
544
545 \section2 Window Managers
546
547 On Unix, at least two autotests (\c tst_examples and \c tst_gestures)
548 require a window manager to be running. Therefore, if running these
549 tests under a nested X-server, you must also run a window manager
550 in that X-server.
551
552 Your window manager must be configured to position all windows on the
553 display automatically. Some windows managers, such as Tab Window Manager
554 (twm), have a mode for manually positioning new windows, and this prevents
555 the test suite from running without user interaction.
556
557 \note Tab Window Manager is not suitable for running the full suite of
558 Qt autotests, as the \c tst_gestures autotest causes it to forget its
559 configuration and revert to manual window placement.
560*/