Burp Sequencer is a tool for analysing the degree of randomness manifested by
a sample of data items. In the context of web applications, the sampled data
will typically consist of session tokens, anti-XSRF nonces, or other items on
whose unpredictability the application depends for its security.
Burp Sequencer can be run in two modes:
live capture - The sample of tokens is acquired in real-time from
the application. The analysis of randomness can be performed and updated
while the capture is in progress.
manual load - The sample of tokens has already been acquired and
is loaded into the tool. The analysis of randomness is then performed.
Burp Sequencer contains various options which can be configured to control
both the live capture of tokens and their subsequent analysis. Individual tests
for randomness can be turned on or off according to your requirements. In some
cases, it may be necessary to understand the nature of the tests performed, and
any unusual characteristics of your sample, in order to use Burp Sequencer most
effectively.
How the randomness tests work
Burp Sequencer employs standard statistical tests for randomness. These are
based on the principle of testing a hypothesis against a sample of evidence, and
calculating the probability of the observed data occurring, assuming that the
hypothesis is true:
The hypothesis to be tested is: that the tokens are randomly
generated.
Each test observes specific properties of the sample that are
likely to have certain characteristics if the tokens are randomly generated.
The probability of the observed characteristics occurring is
calculated, working on the assumption that the hypothesis is true.
If this probability falls below a certain level (the "significance
level") then the hypothesis is rejected and the tokens are deemed to be
non-random.
The significance level is a key parameter in this methodology. Using a lower
significance level means that stronger evidence is required to reject the
hypothesis that the tokens are randomly generated, and so increases the chance
that non-random data will be treated as random. There is no universally "right"
significance level to use for any particular purpose: scientific experiments
often use significance levels in the region of 1% to 5%; the standard FIPS tests
for randomness (which are implemented within Burp Sequencer) use significance
levels in the region of 0.002% to 0.03%. Burp Sequencer lets you choose what
significance level you wish to use to interpret its findings:
Each individual test reports the computed probability of the observed
data occurring, assuming that the hypothesis is true. This probability
represents the boundary significance level at which the hypothesis would be
rejected, based solely upon this test.
Aggregated results from multiple tests are presented in terms of the
number of bits of effective entropy within the token at various significance
levels, ranging from 0.001% to 10%. This summary enables you to see how your
choice of significance level affects the "quantity" of randomness deemed to
exist within the sample. In typical cases, this summary demonstrates that
the choice of significance level is a moot point because the tokens possess
either a clearly satisfactory or clearly unsatisfactory amount of randomness
for any of the significance levels that you may reasonably choose.
Some important caveats arise with any statistical-based test for randomness.
The results may contain false negatives and positives for the following reasons:
Data that is generated in a completely deterministic way may be deemed
to be random by statistical-based tests. For example, a well-designed linear
congruential pseudo-random number generator, or an algorithm which computes
the hash of a sequential number, may produce seemingly random output even
though an attacker who knows the internal state of the generator can
extrapolate its output with complete reliability in both forwards and
reverse directions.
Data that is deemed to be non-random by statistical-based tests may not
actually be predictable in a practical situation because the patterns that
are discernible within the data do not sufficiently narrow down the range of
possible future outputs to a range that can be viably tested.
Because of these caveats, the results of using Burp Sequencer should be
interpreted only as an indicative guide to the randomness of the sampled data.
The tests performed by Burp Sequencer divide into two levels of analysis:
character-level and bit-level.
Character-level analysis
The character-level tests operate on each character position of the token in
its raw form. First, the size of the character set at each position is counted -
this is the number of different characters that appear at each position within
the sample data. Then, the following tests are performed using this information:
Character count analysis. This test analyses the distribution of
characters used at each position within the token. If the sample is randomly
generated, the distribution of characters employed is likely to be
approximately uniform. At each position, the test computes the probability
of the observed distribution arising if the tokens are random.
Character transition analysis. This test analyses the transitions
between successive tokens in the sample. If the sample is randomly
generated, a character appearing at a given position is equally likely to be
followed in the next token by any one of the characters that is used at that
position. At each position, the test computes the probability of the
observed transitions arising if the tokens are random.
Based on the above tests, the character-level analysis computes an overall
score for each character position - this is the lowest probability calculated at
each position by each of the character-level tests. The analysis then counts the
number of bits of effective entropy for various significance levels. Based on
the size of its character set, each position is assigned a number of bits of
entropy (2 bits if there are 4 characters, 3 bits if there are 8 characters,
etc.), and the total number of bits at or above each significance level are
calculated.
Bit-level analysis
The bit-level tests are more powerful than the character-level tests. To
enable bit-level analysis, each token is converted into a set of bits, with the
total number of bits determined by the size of the character set at each
character position. If any positions employ a character set whose size is not a
round power of two, some information within the sample will be lost in the
conversion to a bit sequence. This loss is typically very small and does not
materially affect the accuracy of the bit-level results.
When each token has been converted into a sequence of bits, the following
tests are performed at each bit position:
FIPS monobit test. This test analyses the distribution of ones
and zeroes at each bit position. If the sample is randomly generated, the
number of ones and zeroes is likely to be approximately equal. At each
position, the test computes the probability of the observed distribution
arising if the tokens are random. For each of the FIPS tests carried out, in
addition to reporting the probability of the observed data occurring, Burp
Sequencer also records whether each bit passed or failed the FIPS test. Note
that the FIPS pass criteria are recalibrated within Burp Sequencer to work
with arbitrary sample sizes, however the formal specification for the FIPS
tests assumes a sample of precisely 20,000 tokens. Hence, if you wish to
obtain results that are strictly compliant with the FIPS specification, you
should ensure that you use a sample of 20,000 tokens.
FIPS poker test. This test divides the bit sequence at each
position into consecutive, non-overlapping groups of four, and derives a
four-bit number from each group. It then counts the number of occurrences of
each of the 16 possible numbers, and performs a chi-square calculation to
evaluate this distribution. If the sample is randomly generated, the
distribution of four-bit numbers is likely to be approximately uniform. At
each position, the test computes the probability of the observed
distribution arising if the tokens are random.
FIPS runs tests. This test divides the bit sequence at each
position into runs of consecutive bits which have the same value. It then
counts the number of runs with a length of 1, 2, 3, 4, 5, and 6 and above.
If the sample is randomly generated, the number of runs with each of these
lengths is likely to be within a range determined by the size of the sample
set. At each position, the test computes the probability of the observed
runs occurring if the tokens are random.
FIPS long runs test. This test measures the longest run of bits
with the same value at each bit position. If the sample is randomly
generated, the longest run is likely to be within a range determined by the
size of the sample set. At each position, the test computes the probability
of the observed longest run arising if the tokens are random. Note that the
FIPS specification for this test only records a fail if the longest run of
bits is overly long. However, an overly short longest run of bits also
indicates that the sample is not random. Therefore some bits may record a
significance level that is below the FIPS pass level even though they do not
strictly fail the FIPS test.
Spectral tests. This test performs a sophisticated analysis of
the bit sequence at each position, and is capable of identifying evidence of
non-randomness in some samples which pass the other statistical tests. The
test works through the bit sequence and treats each series of consecutive
numbers as coordinates in a multi-dimensional space. It plots a point in
this space at each location determined by these co-ordinates. If the sample
is randomly generated, the distribution of points within this space is
likely to be approximately uniform; the appearance of clusters within the
space indicates that the data is likely to be non-random. At each position,
the test computes the probability of the observed distribution occurring if
the tokens are random. The test is repeated for multiple sizes of number
(between 1 and 8 bits) and for multiple numbers of dimensions (between 2 and
6).
Correlation test. Each of the other bit-level tests operates on
individual bit positions within the sampled tokens, and so the amount of
randomness at each bit position is calculated in isolation. Performing only
this type of test would prevent any meaningful assessment of the amount of
randomness in the token as a whole: a sample of tokens containing the same
bit value at each position may appear to contain more entropy than a sample
of shorter tokens containing different values at each position. Hence, it is
necessary to test for any statistically significant relationships between
the values at different bit positions within the tokens. If the sample is
randomly generated, a value at a given bit position is equally likely to be
accompanied by a one or a zero at any other bit position. At each position,
this test computes the probability of the relationships observed with bits
at other positions arising if the tokens are random. To prevent arbitrary
results, when a degree of correlation is observed between two bits, the test
adjusts the significance level of the bit whose significance level is lower
based on all of the other bit-level tests.
Compression test. This test does not use the statistical approach
employed by the other tests, but rather provides a simple intuitive
indication of the amount of entropy at each bit position. The test attempts
to compress the bit sequence at each position using standard ZLIB
compression. The results indicate the proportional reduction in the size of
the bit sequence when it was compressed. A higher degree of compression
indicates that the data is less likely to be randomly generated.
Based on the above tests, the bit-level analysis computes an overall
score for each bit position - this is the lowest probability calculated at
each position by each of the bit-level tests. The analysis then counts the
number of bits of effective entropy for various significance levels.
Obtaining a sample of tokens
Before it is possible to analyse the randomness of the tokens generated by an
application, it is necessary to obtain a suitable sample of tokens. This can be
done in two ways: by performing a live capture of tokens directly from the
target, or by loading a sample of tokens that you have already acquired.
Performing a live capture
To perform a live capture, you need to locate a request within the target
application which returns somewhere in its response the session token or other
item that you want to analyse. You can do this using the "send to sequencer"
option within any of the other Burp tools:
Now switch to the "live capture" tab of Burp Sequencer. The tool maintains a
list of all the requests that have been sent to it. If the request you are
interested in is not already selected, click on it in the list of requests. The
response to the selected request is displayed within the "token location" panel.
The next step is to identify the location of the token you are interested in.
If the token appears as the value of a Set-Cookie directive, or the value of a
form field, you can select the relevant item from one of the drop-down lists.
Alternatively, you can select an arbitrary position within the response where
the token appears. If you do this, Burp Sequencer automatically identifies a
unique prefix and delimiter which encapsulates the portion of the response you
have selected. In most cases, the values automatically identified will work
correctly. In some situations, you may wish to tweak these by specifying your
own unique prefix or offset, or your own delimiter or token length.
When you have identified the location of the token within the application's
response, you can configure various options affecting the live capture by
switching to the "capture options" tab. Here you can control the speed of token
acquisition, by specifying a number of request threads and a time throttle to
pause between requests. In general, you should try to obtain samples as quickly
as possible given the speed of your network connection and the target
application, to minimise the "loss" of tokens issued to other application users.
You can also instruct Burp Sequencer to ignore tokens whose length deviates
by a given threshold from the average token length. This can be useful if the
application occasionally returns an anomalous response containing a different item
in the location where the token normally appears.
When you have configured any required live capture options, click the "start
capture" button to begin the live capture. Burp Sequencer will repeatedly issue
your request and extract the relevant token from the application's responses:
You can use the "pause" and "stop" buttons to control the progress of the
live capture. You can use the "copy" and "save" buttons to retrieve the current
sample of tokens, for use in any other tool.
As soon as 100 tokens have been captured, you can perform an analysis of the
tokens, to get an initial rough indication of the quality of their randomness.
Click the "analyse now" button to do this. If you check the "auto analyse" box,
Burp Sequencer will automatically perform an analysis and update the results
periodically during the live capture.
Obviously, a larger sample size enables a more reliable analysis. A sample of
5,000 tokens is sufficient to perform a reliable analysis for most purposes. The live
capture continues until 20,000 tokens have been captured, which is sufficient to
perform FIPS-compliant statistical tests.
Performing a manual load
To perform a manual load, you first need to obtain your own sample of tokens
from the target application through some means, such as your own script or the
output from an earlier live capture. The tokens need to be in a simple newline-delimited
text format.
Go to the "manual load" tab of Burp Sequencer and use the "load" or "paste"
button to load your tokens into the tool. The loaded tokens, together with
details of their size, are displayed for you to sense-check that the sample has
loaded correctly:
To perform the analysis of the loaded tokens, click the "analyse now" button.
Analysis results
The results window contains full details of all of the tests performed. The
summary tab is the first place to look to get an overall conclusion about the
degree of randomness in the sample. It includes a chart showing the number of
bits of effective entropy at or above each significance level. This provides an
intuitive verdict on the number of bits that pass the randomness tests for
different possible significance levels. In the example shown, a large number of
bits pass the tests even at the strictest significance level of 10%:
Within the "character-level" and "bit-level" tabs, you can drill down into
the detail of each type of test, to gain a deeper understanding of the
properties of the sample, to identify the causes of any anomalies, and to assess
the possibilities for token prediction. Within each group of tests, there is a
summary tab showing the overall score achieved by each position within the
token, and also a tab for each individual test, reporting the results of that
test and the details of any anomalies identified. For example, the following
shows the results of the FIPS monobit test:
Within the bit-level analysis, there is also a tab showing how the
character-level data was converted into a sequence of bits to enable the
bit-level tests. This will enable you cross-reference individual bits within the
token back to the original character positions, if you need to.
Analysis options
Burp Sequencer lets you configure which individual tests are performed, and
how the raw token data should be interpreted, in the "options" tab. If the
tokens produced by the application have variable length, these will need to be
padded to enable the statistical tests to be performed. You can choose whether
the padding should be applied at the start or the end of each token, and you can
specify the token that will be used for padding. In most situations, padding the
start of tokens with the '0' character is the most appropriate option, but you should examine the tokens
produced by the application to determine whether a different setting is more
effective. You can also tell Burp Sequencer to Base64-decode the raw tokens before
analysing them, if that is necessary.
The analysis results windows also has an "options" tab which shows the
options that were used to generate the current analysis. You can modify these
within the results window and then click the "redo analysis" button to
re-perform the analysis with your new settings. For example, this enables you to
tweak the analysis options mid-way through a live capture, to reflect your
better understanding of the tokens' characteristics, or to isolate the effects
of any unusual characteristics manifested by your sample.
Copyright (c) 2010 PortSwigger Ltd. All rights reserved. Email us.