751 lines
28 KiB
Groff
751 lines
28 KiB
Groff
.\" Copyright 2019-2023 OARC, Inc.
|
|
.\" Copyright 2017-2018 Akamai Technologies
|
|
.\" Copyright 2006-2016 Nominum, Inc.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Licensed under the Apache License, Version 2.0 (the "License");
|
|
.\" you may not use this file except in compliance with the License.
|
|
.\" You may obtain a copy of the License at
|
|
.\"
|
|
.\" http://www.apache.org/licenses/LICENSE-2.0
|
|
.\"
|
|
.\" Unless required by applicable law or agreed to in writing, software
|
|
.\" distributed under the License is distributed on an "AS IS" BASIS,
|
|
.\" WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
.\" See the License for the specific language governing permissions and
|
|
.\" limitations under the License.
|
|
.TH resperf 1 "@PACKAGE_VERSION@" "resperf"
|
|
.SH NAME
|
|
resperf \- test the resolution performance of a caching DNS server
|
|
.SH SYNOPSIS
|
|
.hy 0
|
|
.ad l
|
|
\fBresperf\-report\fR\ [\fB\-a\ \fIlocal_addr\fR]
|
|
[\fB\-d\ \fIdatafile\fR]
|
|
[\fB\-R\fR]
|
|
[\fB\-M\ \fImode\fR]
|
|
[\fB\-s\ \fIserver_addr\fR]
|
|
[\fB\-p\ \fIport\fR]
|
|
[\fB\-x\ \fIlocal_port\fR]
|
|
[\fB\-t\ \fItimeout\fR]
|
|
[\fB\-b\ \fIbufsize\fR]
|
|
[\fB\-f\ \fIfamily\fR]
|
|
[\fB\-e\fR]
|
|
[\fB\-D\fR]
|
|
[\fB\-y\ \fI[alg:]name:secret\fR]
|
|
[\fB\-h\fR]
|
|
[\fB\-i\ \fIinterval\fR]
|
|
[\fB\-m\ \fImax_qps\fR]
|
|
[\fB\-r\ \fIrampup_time\fR]
|
|
[\fB\-c\ \fIconstant_traffic_time\fR]
|
|
[\fB\-L\ \fImax_loss\fR]
|
|
[\fB\-C\ \fIclients\fR]
|
|
[\fB\-q\ \fImax_outstanding\fR]
|
|
[\fB\-F\ \fIfall_behind\fR]
|
|
[\fB\-v\fR]
|
|
[\fB\-W\fR]
|
|
[\fB\-O\ \fIoption=value\fR]
|
|
.ad
|
|
.hy
|
|
.hy 0
|
|
.ad l
|
|
|
|
\fBresperf\fR\ [\fB\-a\ \fIlocal_addr\fR]
|
|
[\fB\-d\ \fIdatafile\fR]
|
|
[\fB\-R\fR]
|
|
[\fB\-M\ \fImode\fR]
|
|
[\fB\-s\ \fIserver_addr\fR]
|
|
[\fB\-p\ \fIport\fR]
|
|
[\fB\-x\ \fIlocal_port\fR]
|
|
[\fB\-t\ \fItimeout\fR]
|
|
[\fB\-b\ \fIbufsize\fR]
|
|
[\fB\-f\ \fIfamily\fR]
|
|
[\fB\-e\fR]
|
|
[\fB\-D\fR]
|
|
[\fB\-y\ \fI[alg:]name:secret\fR]
|
|
[\fB\-h\fR]
|
|
[\fB\-i\ \fIinterval\fR]
|
|
[\fB\-m\ \fImax_qps\fR]
|
|
[\fB\-P\ \fIplot_data_file\fR]
|
|
[\fB\-r\ \fIrampup_time\fR]
|
|
[\fB\-c\ \fIconstant_traffic_time\fR]
|
|
[\fB\-L\ \fImax_loss\fR]
|
|
[\fB\-C\ \fIclients\fR]
|
|
[\fB\-q\ \fImax_outstanding\fR]
|
|
[\fB\-F\ \fIfall_behind\fR]
|
|
[\fB\-v\fR]
|
|
[\fB\-W\fR]
|
|
[\fB\-O\ \fIoption=value\fR]
|
|
.ad
|
|
.hy
|
|
.SH DESCRIPTION
|
|
\fBresperf\fR is a companion tool to \fBdnsperf\fR.
|
|
\fBdnsperf\fR was primarily designed for benchmarking authoritative
|
|
servers, and it does not work well with caching servers that are talking
|
|
to the live Internet.
|
|
One reason for this is that dnsperf uses a "self-pacing" approach, which is
|
|
based on the assumption that you can keep the server 100% busy simply by
|
|
sending it a small burst of back-to-back queries to fill up network buffers,
|
|
and then send a new query whenever you get a response back.
|
|
This approach works well for authoritative servers that process queries in
|
|
order and one at a time; it also works pretty well for a caching server in
|
|
a closed laboratory environment talking to a simulated Internet that's all
|
|
on the same LAN.
|
|
Unfortunately, it does not work well with a caching server talking
|
|
to the actual Internet, which may need to work on thousands of queries in
|
|
parallel to achieve its maximum throughput.
|
|
There have been numerous attempts to use dnsperf (or its predecessor,
|
|
queryperf) for benchmarking live caching servers, usually with poor results.
|
|
Therefore, a separate tool designed specifically for caching servers is
|
|
needed.
|
|
.SS "How resperf works"
|
|
Unlike the "self-pacing" approach of dnsperf, \fBresperf\fR works by sending
|
|
DNS queries at a controlled, steadily increasing rate.
|
|
By default, \fBresperf\fR will send traffic for 60 seconds, linearly
|
|
increasing the amount of traffic from zero to 100,000 queries per second (or
|
|
\fImax_qps\fR).
|
|
|
|
During the test, \fBresperf\fR listens for responses from the server and
|
|
keeps track of response rates, failure rates, and latencies.
|
|
It will also continue listening for responses for an additional 40 seconds
|
|
after it has stopped sending traffic, so that there is time for the server
|
|
to respond to the last queries sent.
|
|
This time period was chosen to be longer than the overall query timeout of
|
|
both Nominum CacheServe and current versions of BIND.
|
|
|
|
If the test is successful, the query rate will at some point exceed the
|
|
capacity of the server and queries will be dropped, causing the response
|
|
rate to stop growing or even decrease as the query rate increases.
|
|
|
|
The result of the test is a set of measurements of the query rate, response
|
|
rate, failure response rate, and average query latency as functions of time.
|
|
.SS "What you will need"
|
|
Benchmarking a live caching server is serious business.
|
|
A fast caching server like Nominum CacheServe, resolving a mix of cacheable
|
|
and non-cacheable queries typical of ISP customer traffic, is capable of
|
|
resolving well over 1,000,000 queries per second.
|
|
In the process, it will send more than 40,000 queries per second to
|
|
authoritative servers on the Internet, and receive responses to most of them.
|
|
Assuming an average request size of 50 bytes and a response size of 150
|
|
bytes, this amounts to some 1216 Mbps of outgoing and 448 Mbps of incoming
|
|
traffic.
|
|
If your Internet connection can't handle the bandwidth, you will end up
|
|
measuring the speed of the connection, not the server, and may saturate the
|
|
connection causing a degradation in service for other users.
|
|
|
|
Make sure there is no stateful firewall between the server and the Internet,
|
|
because most of them can't handle the amount of UDP traffic the test will
|
|
generate and will end up dropping packets, skewing the test results.
|
|
Some will even lock up or crash.
|
|
|
|
You should run \fBresperf\fR on a machine separate from the server under test,
|
|
on the same LAN.
|
|
Preferably, this should be a Gigabit Ethernet network.
|
|
The machine running \fBresperf\fR should be at least as fast as the machine
|
|
being tested; otherwise, it may end up being the bottleneck.
|
|
|
|
There should be no other applications running on the machine running
|
|
\fBresperf\fR.
|
|
Performance testing at the traffic levels involved is essentially a
|
|
hard real-time application - consider the fact that at a query rate of
|
|
100,000 queries per second, if \fBresperf\fR gets delayed by just 1/100 of a
|
|
second, 1000 incoming UDP packets will arrive in the meantime.
|
|
This is more than most operating systems will buffer, which means packets
|
|
will be dropped.
|
|
|
|
Because the granularity of the timers provided by operating systems is
|
|
typically too coarse to accurately schedule packet transmissions at
|
|
sub-millisecond intervals, \fBresperf\fR will busy-wait between packet
|
|
transmissions, constantly polling for responses in the meantime.
|
|
Therefore, it is normal for \fBresperf\fR to consume 100% CPU during the
|
|
whole test run, even during periods where query rates are relatively low.
|
|
|
|
You will also need a set of test queries in the \fBdnsperf\fR file format.
|
|
See the \fBdnsperf\fR man page for instructions on how to construct this
|
|
query file.
|
|
To make the test as realistic as possible, the queries should be derived
|
|
from recorded production client DNS traffic, without removing duplicate
|
|
queries or other filtering.
|
|
With the default settings, \fBresperf\fR will use up to 3 million queries
|
|
in each test run.
|
|
|
|
If the caching server to be tested has a configurable limit on the number of
|
|
simultaneous resolutions, like the \fBmax\-recursive\-clients\fR statement
|
|
in Nominum CacheServe or the \fBrecursive\-clients\fR option in BIND 9, you
|
|
will probably have to increase it.
|
|
As a starting point, we recommend a value of 10000 for Nominum CacheServe
|
|
and 100000 for BIND 9.
|
|
Should the limit be reached, it will show up in the plots as an increase in
|
|
the number of failure responses.
|
|
|
|
The server being tested should be restarted at the beginning of each test to
|
|
make sure it is starting with an empty cache.
|
|
If the cache already contains data from a previous test run that used the
|
|
same set of queries, almost all queries will be answered from the cache,
|
|
yielding inflated performance numbers.
|
|
|
|
To use the \fBresperf\-report\fR script, you need to have \fBgnuplot\fR
|
|
installed.
|
|
Make sure your installed version of \fBgnuplot\fR supports the png terminal
|
|
driver.
|
|
If your \fBgnuplot\fR doesn't support png but does support gif, you can
|
|
change the line saying terminal=png in the \fBresperf\-report\fR script
|
|
to terminal=gif.
|
|
.SS "Running the test"
|
|
\fBresperf\fR is typically invoked via the \fBresperf\-report\fR script,
|
|
which will run \fBresperf\fR with its output redirected to a file and then
|
|
automatically generate an illustrated report in HTML format.
|
|
Command line arguments given to \fBresperf\-report\fR will be passed on
|
|
unchanged to \fBresperf\fR.
|
|
|
|
When running \fBresperf\-report\fR, you will need to specify at least the
|
|
server IP address and the query data file.
|
|
A typical invocation will look like
|
|
.RS
|
|
.hy 0
|
|
|
|
.nf
|
|
resperf\-report \-s 10.0.0.2 \-d queryfile
|
|
.fi
|
|
.hy
|
|
.RE
|
|
|
|
With default settings, the test run will take at most 100 seconds (60
|
|
seconds of ramping up traffic and then 40 seconds of waiting for responses),
|
|
but in practice, the 60-second traffic phase will usually be cut short.
|
|
To be precise, \fBresperf\fR can transition from the traffic-sending phase
|
|
to the waiting-for-responses phase in three different ways:
|
|
.IP \(bu 2
|
|
Running for the full allotted time and successfully reaching the maximum
|
|
query rate (by default, 60 seconds and 100,000 qps, respectively).
|
|
Since this is a very high query rate, this will rarely happen (with today's
|
|
hardware); one of the other two conditions listed below will usually occur
|
|
first.
|
|
.IP \(bu 2
|
|
Exceeding 65,536 outstanding queries.
|
|
This often happens as a result of (successfully) exceeding the capacity of
|
|
the server being tested, causing the excess queries to be dropped.
|
|
The limit of 65,536 queries comes from the number of possible values for
|
|
the ID field in the DNS packet.
|
|
\fBresperf\fR needs to allocate a unique ID for each outstanding query, and is
|
|
therefore unable to send further queries if the set of possible IDs is
|
|
exhausted.
|
|
.IP \(bu 2
|
|
When \fBresperf\fR finds itself unable to send queries fast enough.
|
|
\fBresperf\fR will notice if it is falling behind in its scheduled query
|
|
transmissions, and if this backlog reaches 1000 queries, it will print
|
|
a message like "Fell behind by 1000 queries" (or whatever the actual number
|
|
is at the time) and stop sending traffic.
|
|
.PP
|
|
Regardless of which of the above conditions caused the traffic-sending phase
|
|
of the test to end, you should examine the resulting plots to make sure the
|
|
server's response rate is flattening out toward the end of the test.
|
|
If it is not, then you are not loading the server enough.
|
|
If you are getting the "Fell behind" message, make sure that the machine
|
|
running \fBresperf\fR is fast enough and has no other applications running.
|
|
|
|
You should also monitor the CPU usage of the server under test.
|
|
It should reach close to 100% CPU at the point of maximum traffic; if it does
|
|
not, you most likely have a bottleneck in some other part of your test setup,
|
|
for example, your external Internet connection.
|
|
|
|
The report generated by \fBresperf\-report\fR will be stored with a unique
|
|
file name based on the current date and time, e.g.,
|
|
\fI20060812-1550.html\fR.
|
|
The PNG images of the plots and other auxiliary files will be stored in
|
|
separate files beginning with the same date-time string.
|
|
To view the report, simply open the \fI.html\fR file in a web browser.
|
|
|
|
If you need to copy the report to a separate machine for viewing, make sure
|
|
to copy the .png files along with the .html file (or simply copy all the
|
|
files, e.g., using scp 20060812-1550.* host:directory/).
|
|
.SS "Interpreting the report"
|
|
The \fI.html\fR file produced by \fBresperf\-report\fR consists of two
|
|
sections.
|
|
The first section, "Resperf output", contains output from the \fBresperf\fR
|
|
program such as progress messages, a summary of the command line arguments,
|
|
and summary statistics.
|
|
The second section, "Plots", contains two plots generated by \fBgnuplot\fR:
|
|
"Query/response/failure rate" and "Latency".
|
|
|
|
The "Query/response/failure rate" plot contains three graphs.
|
|
The "Queries sent per second" graph shows the amount of traffic being sent to
|
|
the server; this should be very close to a straight diagonal line, reflecting
|
|
the linear ramp-up of traffic.
|
|
|
|
The "Total responses received per second" graph shows how many of the
|
|
queries received a response from the server.
|
|
All responses are counted, whether successful (NOERROR or NXDOMAIN) or not
|
|
(e.g., SERVFAIL).
|
|
|
|
The "Failure responses received per second" graph shows how many of the
|
|
queries received a failure response.
|
|
A response is considered to be a failure if its RCODE is neither NOERROR
|
|
nor NXDOMAIN.
|
|
|
|
By visually inspecting the graphs, you can get an idea of how the server
|
|
behaves under increasing load.
|
|
The "Total responses received per second" graph will initially closely
|
|
follow the "Queries sent per second" graph (often rendering it invisible in
|
|
the plot as the two graphs are plotted on top of one another), but when the
|
|
load exceeds the server's capacity, the "Total responses received per second"
|
|
graph may diverge from the "Queries sent per second" graph and flatten out,
|
|
indicating that some of the queries are being dropped.
|
|
|
|
The "Failure responses received per second" graph will normally show a
|
|
roughly linear ramp close to the bottom of the plot with some random
|
|
fluctuation, since typical query traffic will contain some small percentage
|
|
of failing queries randomly interspersed with the successful ones.
|
|
As the total traffic increases, the number of failures will increase
|
|
proportionally.
|
|
|
|
If the "Failure responses received per second" graph turns sharply upwards,
|
|
this can be another indication that the load has exceeded the server's
|
|
capacity.
|
|
This will happen if the server reacts to overload by sending SERVFAIL
|
|
responses rather than by dropping queries.
|
|
Since Nominum CacheServe and BIND 9 will both respond with SERVFAIL when
|
|
they exceed their \fBmax\-recursive\-clients\fR or \fBrecursive\-clients\fR
|
|
limit, respectively, a sudden increase in the number of failures could mean
|
|
that the limit needs to be increased.
|
|
|
|
The "Latency" plot contains a single graph marked "Average latency".
|
|
This shows how the latency varies during the course of the test.
|
|
Typically, the latency graph will exhibit a downwards trend because the
|
|
cache hit rate improves as ever more responses are cached during the test,
|
|
and the latency for a cache hit is much smaller than for a cache miss.
|
|
The latency graph is provided as an aid in determining the point where the
|
|
server gets overloaded, which can be seen as a sharp upwards turn in the
|
|
graph.
|
|
The latency graph is not intended for making absolute latency measurements
|
|
or comparisons between servers; the latencies shown in the graph are not
|
|
representative of production latencies due to the initially empty cache and
|
|
the deliberate overloading of the server towards the end of the test.
|
|
|
|
Note that all measurements are displayed on the plot at the horizontal
|
|
position corresponding to the point in time when the query was sent, not
|
|
when the response (if any) was received.
|
|
This makes it it easy to compare the query and response rates; for example,
|
|
if no queries are dropped, the query and response graphs will be identical.
|
|
As another example, if the plot shows 10% failure responses at t=5 seconds,
|
|
this means that 10% of the queries sent at t=5 seconds eventually failed,
|
|
not that 10% of the responses received at t=5 seconds were failures.
|
|
.SS "Determining the server's maximum throughput"
|
|
Often, the goal of running \fBresperf\fR is to determine the server's
|
|
maximum throughput, in other words, the number of queries per second it is
|
|
capable of handling.
|
|
This is not always an easy task, because as a server is driven into overload,
|
|
the service it provides may deteriorate gradually, and this deterioration
|
|
can manifest itself either as queries being dropped, as an increase in the
|
|
number of SERVFAIL responses, or an increase in latency.
|
|
The maximum throughput may be defined as the highest level of traffic at
|
|
which the server still provides an acceptable level of service, but that
|
|
means you first need to decide what an acceptable level of service means in
|
|
terms of packet drop percentage, SERVFAIL percentage, and latency.
|
|
|
|
The summary statistics in the "Resperf output" section of the report
|
|
contains a "Maximum throughput" value which by default is determined from
|
|
the maximum rate at which the server was able to return responses, without
|
|
regard to the number of queries being dropped or failing at that point.
|
|
This method of throughput measurement has the advantage of simplicity, but
|
|
it may or may not be appropriate for your needs; the reported value should
|
|
always be validated by a visual inspection of the graphs to ensure that
|
|
service has not already deteriorated unacceptably before the maximum response
|
|
rate is reached.
|
|
It may also be helpful to look at the "Lost at that point" value in
|
|
the summary statistics; this indicates the percentage of the queries that
|
|
was being dropped at the point in the test when the maximum throughput was
|
|
reached.
|
|
|
|
Alternatively, you can make \fBresperf\fR report the throughput at the point
|
|
in the test where the percentage of queries dropped exceeds a given limit
|
|
(or the maximum as above if the limit is never exceeded).
|
|
This can be a more realistic indication of how much the server can be loaded
|
|
while still providing an acceptable level of service.
|
|
This is done using the \fB\-L\fR command line option; for example, specifying
|
|
\fB\-L 10\fR makes \fBresperf\fR
|
|
report the highest throughput reached before the server starts dropping more
|
|
than 10% of the queries.
|
|
|
|
There is no corresponding way of automatically constraining results based on
|
|
the number of failed queries, because unlike dropped queries, resolution
|
|
failures will occur even when the the server is not overloaded, and the
|
|
number of such failures is heavily dependent on the query data and network
|
|
conditions.
|
|
Therefore, the plots should be manually inspected to ensure that there is not
|
|
an abnormal number of failures.
|
|
.SH "GENERATING CONSTANT TRAFFIC"
|
|
In addition to ramping up traffic linearly, \fBresperf\fR also has the
|
|
capability to send a constant stream of traffic.
|
|
This can be useful when using \fBresperf\fR for tasks other than performance
|
|
measurement; for example, it can be used to "soak test" a server by
|
|
subjecting it to a sustained load for an extended period of time.
|
|
|
|
To generate a constant traffic load, use the \fB\-c\fR command line option,
|
|
together with the \fB\-m\fR option which specifies the desired constant
|
|
query rate.
|
|
For example, to send 10000 queries per second for an hour, use \fB\-m 10000
|
|
\-c 3600\fR.
|
|
This will include the usual 30-second gradual ramp-up of traffic at the
|
|
beginning, which may be useful to avoid initially overwhelming a server that
|
|
is starting with an empty cache.
|
|
To start the onslaught of traffic instantly, use \fB\-m 10000 \-c 3600
|
|
\-r 0\fR.
|
|
|
|
To be precise, \fBresperf\fR will do a linear ramp-up of traffic from 0 to
|
|
\fB\-m\fR queries per second over a period of \fB\-r\fR seconds, followed by
|
|
a plateau of steady traffic at \fB\-m\fR queries per second lasting for
|
|
\fB\-c\fR seconds, followed by waiting for responses for an extra 40
|
|
seconds.
|
|
Either the ramp-up or the plateau can be suppressed by supplying a duration
|
|
of zero seconds with \fB\-r 0\fR and \fB\-c 0\fR, respectively.
|
|
The latter is the default.
|
|
|
|
Sending traffic at high rates for hours on end will of course require very
|
|
large amounts of input data.
|
|
Also, a long-running test will generate a large amount of plot data, which is
|
|
kept in memory for the duration of the test.
|
|
To reduce the memory usage and the size of the plot file, consider
|
|
increasing the interval between measurements from the default of 0.5 seconds
|
|
using the \fB\-i\fR option in long-running tests.
|
|
|
|
When using \fBresperf\fR for long-running tests, it is important that the
|
|
traffic rate specified using the \fB\-m\fR is one that both \fBresperf\fR
|
|
itself and the server under test can sustain.
|
|
Otherwise, the test is likely to be cut short as a result of either running
|
|
out of query IDs (because of large numbers of dropped queries) or of
|
|
\fBresperf\fR falling behind its transmission schedule.
|
|
.SS "Using DNS-over-HTTPS"
|
|
When using DNS-over-HTTPS you must set the \fB-O doh\-uri=...\fR to something
|
|
that works with the server you're sending to.
|
|
Also note that the value for maximum outstanding queries will be used to
|
|
control the maximum concurrent streams within the HTTP/2 connection.
|
|
.SH OPTIONS
|
|
Because the \fBresperf\-report\fR script passes its command line options
|
|
directly to the \fBresperf\fR programs, they both accept the same set of
|
|
options, with one exception: \fBresperf\-report\fR automatically adds an
|
|
appropriate \fB\-P\fR to the \fBresperf\fR command line, and therefore does
|
|
not itself take a \fB\-P\fR option.
|
|
|
|
\fB-d \fIdatafile\fR
|
|
.br
|
|
.RS
|
|
Specifies the input data file.
|
|
If not specified, \fBresperf\fR will read from standard input.
|
|
.RE
|
|
|
|
\fB-R\fR
|
|
.br
|
|
.RS
|
|
Reopen the datafile if it runs out of data before the testing is completed.
|
|
This allows for long running tests on very small and simple query datafile.
|
|
.RE
|
|
|
|
\fB-M \fImode\fR
|
|
.br
|
|
.RS
|
|
Specifies the transport mode to use, "udp", "tcp", "dot" or "doh".
|
|
Default is "udp".
|
|
.RE
|
|
|
|
\fB-s \fIserver_addr\fR
|
|
.br
|
|
.RS
|
|
Specifies the name or address of the server to which requests will be sent.
|
|
The default is the loopback address, 127.0.0.1.
|
|
.RE
|
|
|
|
\fB-p \fIport\fR
|
|
.br
|
|
.RS
|
|
Sets the port on which the DNS packets are sent.
|
|
If not specified, the standard DNS port (udp/tcp 53, DoT 853, DoH 443) is used.
|
|
.RE
|
|
|
|
\fB-a \fIlocal_addr\fR
|
|
.br
|
|
.RS
|
|
Specifies the local address from which to send requests.
|
|
The default is the wildcard address.
|
|
.RE
|
|
|
|
\fB-x \fIlocal_port\fR
|
|
.br
|
|
.RS
|
|
Specifies the local port from which to send requests.
|
|
The default is the wildcard port (0).
|
|
|
|
If acting as multiple clients and the wildcard port is used, each client
|
|
will use a different random port.
|
|
If a port is specified, the clients will use a range of ports starting
|
|
with the specified one.
|
|
.RE
|
|
|
|
\fB-t \fItimeout\fR
|
|
.br
|
|
.RS
|
|
Specifies the request timeout value, in seconds.
|
|
\fBresperf\fR will no longer wait for a response to a particular request
|
|
after this many seconds have elapsed.
|
|
The default is 45 seconds.
|
|
|
|
\fBresperf\fR times out unanswered requests in order to reclaim query IDs so
|
|
that the query ID space will not be exhausted in a long-running test, such
|
|
as when "soak testing" a server for an day with \fB\-m 10000 \-c 86400\fR.
|
|
The timeouts and the ability to tune them are of little use in the more
|
|
typical use case of a performance test lasting only a minute or two.
|
|
|
|
The default timeout of 45 seconds was chosen to be longer than the query
|
|
timeout of current caching servers.
|
|
Note that this is longer than the corresponding default in \fBdnsperf\fR,
|
|
because caching servers can take many orders of magnitude longer to answer
|
|
a query than authoritative servers
|
|
do.
|
|
|
|
If a short timeout is used, there is a possibility that \fBresperf\fR will
|
|
receive a response after the corresponding request has timed out; in this
|
|
case, a message like Warning: Received a response with an unexpected id: 141
|
|
will be printed.
|
|
.RE
|
|
|
|
\fB-b \fIbufsize\fR
|
|
.br
|
|
.RS
|
|
Sets the size of the socket's send and receive buffers, in kilobytes.
|
|
If not specified, the operating system's default is used.
|
|
.RE
|
|
|
|
\fB-f \fIfamily\fR
|
|
.br
|
|
.RS
|
|
Specifies the address family used for sending DNS packets.
|
|
The possible values are "inet", "inet6", or "any".
|
|
If "any" (the default value) is specified, \fBresperf\fR will use whichever
|
|
address family is appropriate for the server it is sending packets to.
|
|
.RE
|
|
|
|
\fB-e\fR
|
|
.br
|
|
.RS
|
|
Enables EDNS0 [RFC2671], by adding an OPT record to all packets sent.
|
|
.RE
|
|
|
|
\fB-D\fR
|
|
.br
|
|
.RS
|
|
Sets the DO (DNSSEC OK) bit [RFC3225] in all packets sent.
|
|
This also enables EDNS0, which is required for DNSSEC.
|
|
.RE
|
|
|
|
\fB-y \fI[alg:]name:secret\fR
|
|
.br
|
|
.RS
|
|
Add a TSIG record [RFC2845] to all packets sent, using the specified TSIG
|
|
key algorithm, name and secret, where the algorithm defaults to hmac-md5 and
|
|
the secret is expressed as a base-64 encoded string.
|
|
.RE
|
|
|
|
\fB-h\fR
|
|
.br
|
|
.RS
|
|
Print a usage statement and exit.
|
|
.RE
|
|
|
|
\fB-i \fIinterval\fR
|
|
.br
|
|
.RS
|
|
Specifies the time interval between data points in the plot file.
|
|
The default is 0.5 seconds.
|
|
.RE
|
|
|
|
\fB-m \fImax_qps\fR
|
|
.br
|
|
.RS
|
|
Specifies the target maximum query rate (in queries per second).
|
|
This should be higher than the expected maximum throughput of the server
|
|
being tested.
|
|
Traffic will be ramped up at a linearly increasing rate until this value is
|
|
reached, or until one of the other conditions described in the section
|
|
"Running the test" occurs.
|
|
The default is 100000 queries per second.
|
|
.RE
|
|
|
|
\fB-P \fIplot_data_file\fR
|
|
.br
|
|
.RS
|
|
Specifies the name of the plot data file.
|
|
The default is \fIresperf.gnuplot\fR.
|
|
.RE
|
|
|
|
\fB-r \fIrampup_time\fR
|
|
.br
|
|
.RS
|
|
Specifies the length of time over which traffic will be ramped up.
|
|
The default is 60 seconds.
|
|
.RE
|
|
|
|
\fB-c \fIconstant_traffic_time\fR
|
|
.br
|
|
.RS
|
|
Specifies the length of time for which traffic will be sent at a constant
|
|
rate following the initial ramp-up.
|
|
The default is 0 seconds, meaning no sending of traffic at a constant rate
|
|
will be done.
|
|
.RE
|
|
|
|
\fB-L \fImax_loss\fR
|
|
.br
|
|
.RS
|
|
Specifies the maximum acceptable query loss percentage for purposes of
|
|
determining the maximum throughput value.
|
|
The default is 100%, meaning that \fBresperf\fR will measure the maximum
|
|
throughput without regard to query
|
|
loss.
|
|
.RE
|
|
|
|
\fB-C \fIclients\fR
|
|
.br
|
|
.RS
|
|
Act as multiple clients.
|
|
Requests are sent from multiple sockets.
|
|
The default is to act as 1 client.
|
|
.RE
|
|
|
|
\fB-q \fImax_outstanding\fR
|
|
.br
|
|
.RS
|
|
Sets the maximum number of outstanding requests.
|
|
\fBresperf\fR will stop ramping up traffic when this many queries are
|
|
outstanding.
|
|
The default is 64k, and the limit is 64k per client.
|
|
.RE
|
|
|
|
\fB-F \fIfall_behind\fR
|
|
.br
|
|
.RS
|
|
Sets the maximum number of queries that can fall behind being sent.
|
|
\fBresperf\fR will stop when this many queries should have been sent and it
|
|
can be relative easy to hit if \fImax_qps\fR is set too high.
|
|
The default is 1000 and setting it to zero (0) disables the check.
|
|
.RE
|
|
|
|
\fB-v\fR
|
|
.br
|
|
.RS
|
|
Enables verbose mode to report about network readiness and congestion.
|
|
.RE
|
|
|
|
\fB-W\fR
|
|
.br
|
|
.RS
|
|
Log warnings and errors to standard output instead of standard error making
|
|
it easier for script, test and automation to capture all output.
|
|
.RE
|
|
|
|
\fB-O \fIoption=value\fR
|
|
.br
|
|
.RS
|
|
Set an extended long option for various things to control different aspects
|
|
of testing or protocol modules, see EXTENDED OPTIONS in \fBdnsperf\fR(1) for
|
|
list of available options.
|
|
.RE
|
|
.SH "THE PLOT DATA FILE"
|
|
The plot data file is written by the \fBresperf\fR program and contains the
|
|
data to be plotted using \fBgnuplot\fR.
|
|
When running \fBresperf\fR via the \fBresperf\-report\fR script, there is
|
|
no need for the user to deal with this file directly, but its format and
|
|
contents are documented here for completeness and in case you wish to run
|
|
\fBresperf\fR directly and use its output for purposes other than viewing
|
|
it with \fBgnuplot\fR.
|
|
|
|
The first line of the file is a comment identifying the fields.
|
|
It may be recognized as a comment by its leading hash sign (#).
|
|
|
|
Subsequent lines contain the actual plot data.
|
|
For purposes of generating the plot data file, the test run is divided into
|
|
time intervals of 0.5 seconds (or some other length of time specified with
|
|
the \fB\-i\fR command line option).
|
|
Each line corresponds to one such interval, and contains the following values
|
|
as floating-point numbers:
|
|
|
|
\fBTime\fR
|
|
.br
|
|
.RS
|
|
The midpoint of this time interval, in seconds since the beginning of the
|
|
run
|
|
.RE
|
|
|
|
\fBTarget queries per second\fR
|
|
.br
|
|
.RS
|
|
The number of queries per second scheduled to be sent in this time interval
|
|
.RE
|
|
|
|
\fBActual queries per second\fR
|
|
.br
|
|
.RS
|
|
The number of queries per second actually sent in this time interval
|
|
.RE
|
|
|
|
\fBResponses per second\fR
|
|
.br
|
|
.RS
|
|
The number of responses received corresponding to queries sent in this time
|
|
interval, divided by the length of the interval
|
|
.RE
|
|
|
|
\fBFailures per second\fR
|
|
.br
|
|
.RS
|
|
The number of responses received corresponding to queries sent in this time
|
|
interval and having an RCODE other than NOERROR or NXDOMAIN, divided by the
|
|
length of the interval
|
|
.RE
|
|
|
|
\fBAverage latency\fR
|
|
.br
|
|
.RS
|
|
The average time between sending the query and receiving a response, for
|
|
queries sent in this time interval
|
|
.RE
|
|
|
|
\fBConnections\fR
|
|
.br
|
|
.RS
|
|
The number of connections done, including re-connections, during this time
|
|
interval.
|
|
This is only relevant to connection oriented protocols, such as TCP and DoT.
|
|
.RE
|
|
|
|
\fBAverage connection latency\fR
|
|
.br
|
|
.RS
|
|
The average time between starting to connect and having the connection ready
|
|
for sending queries to, for this time interval.
|
|
This is only relevant to connection oriented protocols, such as TCP and DoT.
|
|
.RE
|
|
|
|
.SH "SEE ALSO"
|
|
\fBdnsperf\fR(1)
|
|
.SH AUTHOR
|
|
Nominum, Inc.
|
|
.LP
|
|
Maintained by DNS-OARC
|
|
.LP
|
|
.RS
|
|
.I https://www.dns-oarc.net/
|
|
.RE
|
|
.LP
|
|
.SH BUGS
|
|
For issues and feature requests please use:
|
|
.LP
|
|
.RS
|
|
\fI@PACKAGE_URL@\fP
|
|
.RE
|
|
.LP
|
|
For question and help please use:
|
|
.LP
|
|
.RS
|
|
\fI@PACKAGE_BUGREPORT@\fP
|
|
.RE
|
|
.LP
|