1// Copyright (C) 2014 Klarälvdalens Datakonsult AB, a KDAB Group company, info@kdab.com, author Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
2// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GFDL-1.3-no-invariants-only
12 \brief The QStringIterator class provides a Unicode-aware iterator over QString.
16 QStringIterator is a Java-like, bidirectional, const iterator over the contents of a
17 QString. Unlike QString's own iterators, which manage the individual UTF-16 code units,
18 QStringIterator is Unicode-aware: it will transparently handle the \e{surrogate pairs}
19 that may be present in a QString, and return the individual Unicode code points.
21 You can create a QStringIterator that iterates over a given
22 QStringView by passing the string to the QStringIterator's constructor:
24 \snippet code/src_corelib_text_qstringiterator.cpp 0
26 A newly created QStringIterator will point before the first position in the
27 string. It is possible to check whether the iterator can be advanced by
28 calling hasNext(), and actually advance it (and obtain the next code point)
31 \snippet code/src_corelib_text_qstringiterator.cpp 1
33 Similarly, the hasPrevious() and previous() functions can be used to iterate backwards.
35 The peekNext() and peekPrevious() functions will return the code point
36 respectively after and behind the iterator's current position, but unlike
37 next() and previous() they will not move the iterator.
38 Similarly, the advance() and recede() functions will move the iterator
39 respectively after and behind the iterator's current position, but they
40 will not return the code point the iterator has moved through.
42 \section1 Unicode Handling
44 QString and all of its functions work in terms of UTF-16 code units. Unicode code points
45 that fall outside the Basic Multilingual Plane (U+10000 to U+10FFFF) will therefore
46 be represented by \e{surrogate pairs} in a QString, that is, a sequence of two
47 UTF-16 code units that encode a single code point.
49 QStringIterator will automatically handle surrogate pairs inside a QString,
50 and return the correctly decoded code point, while also moving the iterator by
51 the right amount of code units to match the decoded code points.
55 \snippet code/src_corelib_text_qstringiterator.cpp 2
57 If the iterator is not able to decode the next code point (or the previous
58 one, when iterating backwards), then it will return \c{0xFFFD}, that is,
59 Unicode's replacement character (see QChar::ReplacementCharacter).
60 It is possible to make QStringIterator return another value when it encounters
61 a decoding problem; please refer to the each function documentation for
64 \section1 Unchecked Iteration
66 It is possible to optimize iterating over a QString contents by skipping
67 some checks. This is in general not safe to do, because a QString is allowed
68 to contain malformed UTF-16 data; however, if we can trust a given QString,
69 then we can use the optimized \e{unchecked} functions.
71 QStringIterator provides the \e{unchecked} counterparts for next(),
72 peekNext(), advance(), previous(), peekPrevious(), and recede():
73 they're called, respectively,
74 nextUnchecked(), peekNextUnchecked(), advanceUnchecked(),
75 previousUnchecked(), peekPreviousUnchecked(), recedeUnchecked().
76 The counterparts work exactly like the original ones,
77 but they're faster as they're allowed to make certain assumptions about
80 \note please be extremely careful when using QStringIterator's unchecked functions,
81 as using them on a string containing malformed data leads to undefined behavior.
87 \fn QStringIterator::QStringIterator(QStringView string, qsizetype idx)
89 Constructs an iterator over the contents of \a string. The iterator will point
90 before position \a idx in the string.
92 The string view \a string must remain valid while the iterator is being used.
96 \fn QStringIterator::QStringIterator(const QChar *begin, const QChar *end)
98 Constructs an iterator which iterates over the range from \a begin to \a end.
99 The iterator will point before \a begin.
101 The range from \a begin to \a end must remain valid while the iterator is being used.
105 \fn QString::const_iterator QStringIterator::position() const
107 Returns the current position of the iterator.
111 \fn void QStringIterator::setPosition(QString::const_iterator position)
113 Sets the iterator's current position to \a position, which must be inside
114 of the iterable range.
118 \fn bool QStringIterator::hasNext() const
120 Returns true if the iterator has not reached the end of the valid iterable range
121 and therefore can move forward; false otherwise.
127 \fn void QStringIterator::advance()
129 Advances the iterator by one Unicode code point.
131 \note calling this function when the iterator is past the end of the iterable range
132 leads to undefined behavior.
134 \sa next(), hasNext()
138 \fn void QStringIterator::advanceUnchecked()
140 Advances the iterator by one Unicode code point.
142 \note calling this function when the iterator is past the end of the iterable range
143 or on a QString containing malformed UTF-16 data leads to undefined behavior.
145 \sa advance(), next(), hasNext()
149 \fn QStringIterator::peekNextUnchecked() const
151 Returns the Unicode code point that is immediately after the iterator's current
152 position. The current position is not changed.
154 \note calling this function when the iterator is past the end of the iterable range
155 or on a QString containing malformed UTF-16 data leads to undefined behavior.
157 \sa peekNext(), next(), hasNext()
161 \fn QStringIterator::peekNext(char32_t invalidAs = QChar::ReplacementCharacter) const
163 Returns the Unicode code point that is immediately after the iterator's current
164 position. The current position is not changed.
166 If the iterator is not able to decode the UTF-16 data after the iterator's current
167 position, this function returns \a invalidAs (by default, QChar::ReplacementCharacter,
168 which corresponds to \c{U+FFFD}).
170 \note calling this function when the iterator is past the end of the iterable range
171 leads to undefined behavior.
173 \sa next(), hasNext()
177 \fn QStringIterator::nextUnchecked()
179 Advances the iterator's current position by one Unicode code point,
180 and returns the Unicode code point that gets pointed by the iterator.
182 \note calling this function when the iterator is past the end of the iterable range
183 or on a QString containing malformed UTF-16 data leads to undefined behavior.
185 \sa next(), hasNext()
189 \fn QStringIterator::next(char32_t invalidAs = QChar::ReplacementCharacter)
191 Advances the iterator's current position by one Unicode code point,
192 and returns the Unicode code point that gets pointed by the iterator.
194 If the iterator is not able to decode the UTF-16 data at the iterator's current
195 position, this function returns \a invalidAs (by default, QChar::ReplacementCharacter,
196 which corresponds to \c{U+FFFD}).
198 \note calling this function when the iterator is past the end of the iterable range
199 leads to undefined behavior.
201 \sa peekNext(), hasNext()
206 \fn bool QStringIterator::hasPrevious() const
208 Returns true if the iterator is after the beginning of the valid iterable range
209 and therefore can move backwards; false otherwise.
215 \fn void QStringIterator::recede()
217 Moves the iterator back by one Unicode code point.
219 \note calling this function when the iterator is before the beginning of the iterable range
220 leads to undefined behavior.
222 \sa previous(), hasPrevious()
226 \fn void QStringIterator::recedeUnchecked()
228 Moves the iterator back by one Unicode code point.
230 \note calling this function when the iterator is before the beginning of the iterable range
231 or on a QString containing malformed UTF-16 data leads to undefined behavior.
233 \sa recede(), previous(), hasPrevious()
237 \fn QStringIterator::peekPreviousUnchecked() const
239 Returns the Unicode code point that is immediately before the iterator's current
240 position. The current position is not changed.
242 \note calling this function when the iterator is before the beginning of the iterable range
243 or on a QString containing malformed UTF-16 data leads to undefined behavior.
245 \sa previous(), hasPrevious()
249 \fn QStringIterator::peekPrevious(char32_t invalidAs = QChar::ReplacementCharacter) const
251 Returns the Unicode code point that is immediately before the iterator's current
252 position. The current position is not changed.
254 If the iterator is not able to decode the UTF-16 data before the iterator's current
255 position, this function returns \a invalidAs (by default, QChar::ReplacementCharacter,
256 which corresponds to \c{U+FFFD}).
258 \note calling this function when the iterator is before the beginning of the iterable range
259 leads to undefined behavior.
261 \sa previous(), hasPrevious()
265 \fn QStringIterator::previousUnchecked()
267 Moves the iterator's current position back by one Unicode code point,
268 and returns the Unicode code point that gets pointed by the iterator.
270 \note calling this function when the iterator is before the beginning of the iterable range
271 or on a QString containing malformed UTF-16 data leads to undefined behavior.
273 \sa previous(), hasPrevious()
277 \fn QStringIterator::previous(char32_t invalidAs = QChar::ReplacementCharacter)
279 Moves the iterator's current position back by one Unicode code point,
280 and returns the Unicode code point that gets pointed by the iterator.
282 If the iterator is not able to decode the UTF-16 data at the iterator's current
283 position, this function returns \a invalidAs (by default, QChar::ReplacementCharacter,
284 which corresponds to \c{U+FFFD}).
286 \note calling this function when the iterator is before the beginning of the iterable range
287 leads to undefined behavior.
289 \sa peekPrevious(), hasPrevious()