1863 lines
63 KiB
TeX
1863 lines
63 KiB
TeX
\documentclass{report}
|
|
\usepackage{epsfig}
|
|
\usepackage{path}
|
|
\usepackage{fancyvrb}
|
|
|
|
\def\dsc{{\sc dsc}}
|
|
|
|
\DefineVerbatimEnvironment%
|
|
{MyVerbatim}{Verbatim}
|
|
{frame=lines,framerule=0.8mm,fontsize=\small}
|
|
|
|
\renewcommand{\abstractname}{}
|
|
|
|
\begin{document}
|
|
|
|
\begin{titlepage}
|
|
\title{DSC Manual}
|
|
\author{Duane Wessels, Measurement Factory\\
|
|
Ken Keys, CAIDA\\
|
|
\\
|
|
http://dns.measurement-factory.com/tools/dsc/}
|
|
\date{\today}
|
|
\end{titlepage}
|
|
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
\setlength{\parskip}{1ex}
|
|
\section{Copyright}
|
|
|
|
The DNS Statistics Collector (dsc)
|
|
|
|
Copyright 2003-2007 by The Measurement Factory, Inc., 2007-2008 by Internet
|
|
Systems Consortium, Inc., 2008-2019 by OARC, Inc.
|
|
|
|
{\em info@measurement-factory.com\/}, {\em info@isc.org\/}
|
|
|
|
\section{License}
|
|
|
|
{\dsc} is licensed under the terms of the BSD license:
|
|
|
|
Redistribution and use in source and binary forms, with or without
|
|
modification, are permitted provided that the following conditions
|
|
are met:
|
|
|
|
Redistributions of source code must retain the above copyright
|
|
notice, this list of conditions and the following disclaimer.
|
|
Redistributions in binary form must reproduce the above copyright
|
|
notice, this list of conditions and the following disclaimer in the
|
|
documentation and/or other materials provided with the distribution.
|
|
Neither the name of The Measurement Factory nor the names of its
|
|
contributors may be used to endorse or promote products derived
|
|
from this software without specific prior written permission.
|
|
|
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
|
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
|
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
|
|
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
|
|
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
|
|
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
|
|
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
|
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
|
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
|
|
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
\section{Contributors}
|
|
\begin{itemize}
|
|
\item Duane Wessels, Measurement Factory
|
|
\item Ken Keys, Cooperative Association for Internet Data Analysis
|
|
\item Sebastian Castro, New Zealand Registry Services
|
|
\end{itemize}
|
|
\end{abstract}
|
|
|
|
|
|
\tableofcontents
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\chapter{Introduction}
|
|
|
|
{\dsc} is a system for collecting and presenting statistics from
|
|
a busy DNS server.
|
|
|
|
\section{Components}
|
|
|
|
{\dsc} consists of the following components:
|
|
\begin{itemize}
|
|
\item A data collector
|
|
\item A data presenter, where data is archived and rendered
|
|
\item A method for securely transferring data from the collector
|
|
to the presenter
|
|
\item Utilities and scripts that parse XML and archive files from the collector
|
|
\item Utilities and scripts that generate graphs and HTML pages
|
|
\end{itemize}
|
|
|
|
\subsection{The Collector}
|
|
|
|
The collector is a binary program, named {\tt dsc\/}, which snoops
|
|
on DNS messages. It is written in C and uses {\em libpcap\/} for
|
|
packet capture.
|
|
|
|
{\tt dsc\/} uses a relatively simple configuration file called {\em
|
|
dsc.conf\/} to define certain parameters and options. The configuration
|
|
file also determines the {\em datasets\/} that {\tt dsc\/} collects.
|
|
|
|
A Dataset is a 2-D array of counters of IP/DNS message properties.
|
|
You can define each dimension of the array independently. For
|
|
example you might define a dataset categorized by DNS query type
|
|
along one dimension and TLD along the other.
|
|
{\tt dsc\/} dumps the datasets from memory to XML files every 60 seconds.
|
|
|
|
\subsection{XML Data Transfer}
|
|
|
|
You may run the {\dsc} collector on a remote machine. That
|
|
is, the collector may run on a different machine than where the
|
|
data is archived and displayed. {\dsc} includes some Perl and {\tt /bin/sh}
|
|
scripts to move XML files from collector to presenter. One
|
|
technique uses X.509 certificates and a secure HTTP server. The other
|
|
uses {\em rsync\/}, presumably over {\em ssh\/}.
|
|
|
|
\subsubsection{X.509/SSL}
|
|
|
|
To make this work, Apache/mod\_ssl should run on the machine where data
|
|
is archived and presented.
|
|
Data transfer is authenticated via SSL X.509 certificates. A Perl
|
|
CGI script handles all PUT requests on the server. If the client
|
|
certificate is allowed, XML files are stored in the appropriate
|
|
directory.
|
|
|
|
A shell script runs on the collector to upload the XML files. It
|
|
uses {\tt curl\/}\footnote{http://curl.haxx.se} to establish an
|
|
HTTPS connection. XML files are bundled together with {\tt tar\/}
|
|
before transfer to eliminate per-connection delays.
|
|
You could use {\tt scp\/} or {\tt rsync\/} instead of
|
|
{\tt curl\/} if you like.
|
|
|
|
\path|put-file.pl| is the script that accepts PUT requests on the
|
|
HTTP server. The HTTP server validates the client's X.509 certificate.
|
|
If the certificate is invalid, the PUT request is denied. This
|
|
script reads environment variables to get X.509 parameters. The
|
|
uploaded-data is stored in a directory based on the X.509 Organizational
|
|
Unit (server) and Common Name fields (node).
|
|
|
|
\subsubsection{rsync/ssh}
|
|
|
|
This technique uses the {\em rsync\/} utility to transfer files.
|
|
You'll probably want to use {\em ssh\/} as the underlying transport,
|
|
although you can still use the less-secure {\em rsh\/} or native
|
|
rsync server transports if you like.
|
|
|
|
If you use {\em ssh\/} then you'll need to create passphrase-less
|
|
SSH keys so that the transfer can occur automatically. You may
|
|
want to create special {\em dsc\/} userids on both ends as well.
|
|
|
|
\subsection{The Extractor}
|
|
|
|
The XML extractor is a Perl script that reads the XML files from
|
|
{\tt dsc\/}. The extractor essentially converts the XML-structured
|
|
data to a format that is easier (faster) for the graphing tools to
|
|
parse. Currently the extracted data files are line-based ASCII
|
|
text files. Support for SQL databases is planned for the future.
|
|
|
|
\subsection{The Grapher}
|
|
|
|
{\dsc} uses {\em Ploticus\/}\footnote{http://ploticus.sourceforge.net/}
|
|
as the graphing engine. A Perl module and CGI script read extracted
|
|
data files and generate Ploticus scriptfiles to generate plots. Plots
|
|
are always generated on demand via the CGI application.
|
|
|
|
\path|dsc-grapher.pl| is the script that displays graphs from the
|
|
archived data.
|
|
|
|
|
|
\section{Architecture}
|
|
|
|
Figure~\ref{fig-architecture} shows the {\dsc} architecture.
|
|
|
|
\begin{figure}
|
|
\centerline{\psfig{figure=dsc-arch.eps,width=3.5in}}
|
|
\caption{\label{fig-architecture}The {\dsc} architecture.}
|
|
\end{figure}
|
|
|
|
Note that {\dsc} utilizes the concept of {\em servers\/} and {\em
|
|
nodes\/}. A server is generally a logical service, which may
|
|
actually consist of multiple nodes. Figure~\ref{fig-architecture}
|
|
shows six collectors (the circles) and two servers (the rounded
|
|
rectangles). For a real-world example, consider a DNS root server.
|
|
IP Anycast allows a DNS root server to have geographically distributed
|
|
nodes that share a single IP address. We call each instance a
|
|
{\em node\/} and all nodes sharing the single IP address belong
|
|
to the same {\em server\/}.
|
|
|
|
The {\dsc} collector program runs on or near\footnote{by
|
|
``near'' we mean that packets may be sniffed remotely via Ethernet taps, switch
|
|
port mirroring, or a SPAN port.} the remote nodes. Its XML output
|
|
is transferred to the presentation machine via HTTPS PUTs (or something simpler
|
|
if you prefer).
|
|
|
|
The presentation machine includes an HTTP(S) server. The extractor looks
|
|
for XML files PUT there by the collectors. A CGI script also runs on
|
|
the HTTP server to display graphs and other information.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\chapter{Installing the Presenter}
|
|
|
|
You'll probably want to get the Presenter working before the Collector.
|
|
If you're using the secure XML data transfer, you'll need to
|
|
generate both client- and server-side X.509 certificates.
|
|
|
|
Installing the Presenter involves the following steps:
|
|
\begin{itemize}
|
|
\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
|
|
\item
|
|
Install Perl dependencies
|
|
\item
|
|
Install {\dsc} software
|
|
\item
|
|
Create X.509 certificates (optional)
|
|
\item
|
|
Set up a secure HTTP server (e.g., Apache and mod\_ssl)
|
|
\item
|
|
Add some cron jobs
|
|
\end{itemize}
|
|
|
|
|
|
\section{Install Perl Dependencies}
|
|
|
|
{\dsc} uses Perl for the extractor and grapher components. Chances are
|
|
that you'll need Perl-5.8, or maybe only Perl-5.6. You'll also need
|
|
these readily available third-party Perl modules, which you
|
|
can find via CPAN:
|
|
|
|
\begin{itemize}
|
|
\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
|
|
\item CGI-Untaint (CGI::Untaint)
|
|
\item CGI.pm (CGI)
|
|
\item Digest-MD5 (Digest::MD5)
|
|
\item File-Flock (File::Flock)
|
|
\item File-Spec (File::Spec)
|
|
\item File-Temp (File::Temp)
|
|
\item Geography-Countries (Geography::Countries)
|
|
\item Hash-Merge (Hash::Merge)
|
|
\item IP-Country (IP::Country)
|
|
\item MIME-Base64 (MIME::Base64)
|
|
\item Math-Calc-Units (Math::Calc::Units)
|
|
\item Scalar-List-Utils (List::Util)
|
|
\item Text-Template (Text::Template)
|
|
\item URI (URI::Escape)
|
|
\item XML-Simple (XML::Simple)
|
|
\item Net-DNS-Resolver (Net::DNS::Resolver)
|
|
|
|
\end{itemize}
|
|
|
|
\noindent
|
|
Also note that XML::Simple requires XML::Parser, which in
|
|
turn requires the {\em expat\/} package.
|
|
|
|
\section{Install Ploticus}
|
|
|
|
{\dsc} uses Ploticus to generate plots and graphs. You can find
|
|
this software at \verb|http://ploticus.sourceforge.net|. The {\em
|
|
Download\/} page has links to some pre-compiled binaries and packages.
|
|
FreeBSD and NetBSD users can find Ploticus in the ports/packages
|
|
collection.
|
|
|
|
|
|
\section{Install {\dsc} Software}
|
|
|
|
All of the extractor and grapher tools are Perl or {\tt /bin/sh}
|
|
scripts, so there is no need to compile anything. Still,
|
|
you should run {\tt make} first:
|
|
|
|
\begin{MyVerbatim}
|
|
% cd presenter
|
|
% make
|
|
\end{MyVerbatim}
|
|
|
|
If you see errors about missing Perl prerequisites, you may want
|
|
to correct those before continuing.
|
|
|
|
The next step is to install the files. Recall that
|
|
\path|/usr/local/dsc| is the hard-coded installation prefix.
|
|
You must create it manually:
|
|
|
|
\begin{MyVerbatim}
|
|
% mkdir /usr/local/dsc
|
|
% make install
|
|
\end{MyVerbatim}
|
|
|
|
Note that {\dsc}'s Perl modules are installed in the
|
|
``site\_perl'' directory. You'll probably need {\em root\/}
|
|
privileges to install files there.
|
|
|
|
\section{CGI Symbolic Links}
|
|
|
|
{\dsc} has a couple of CGI scripts that are installed
|
|
into \path|/usr/local/dsc/libexec|. You should add symbolic
|
|
links from your HTTP server's \path|cgi-bin| directory to
|
|
these scripts.
|
|
|
|
Both of these scripts have been designed to be mod\_perl-friendly.
|
|
|
|
\begin{MyVerbatim}
|
|
% cd /usr/local/apache/cgi-bin
|
|
% ln -s /usr/local/dsc/libexec/put-file.pl
|
|
% ln -s /usr/local/dsc/libexec/dsc-grapher.pl
|
|
\end{MyVerbatim}
|
|
|
|
You can skip the \path|put-file.pl| link if you plan to use
|
|
{\em rsync\/} to transfer XML files.
|
|
If you cannot create symbolic links, you'll need to manually
|
|
copy the scripts to the appropriate directory.
|
|
|
|
|
|
\section{/usr/local/dsc/data}
|
|
|
|
\subsection{X.509 method}
|
|
|
|
This directory is where \path|put-file.pl| writes incoming XML
|
|
files. It should have been created when you ran {\em make install\/} earlier.
|
|
XML files are actually placed in {\em server\/} and {\em
|
|
node\/} subdirectories based on the authorized client X.509 certificate
|
|
parameters. If you want \path|put-file.pl| to automatically create
|
|
the subdirectories, the \path|data| directory must be writable by
|
|
the process owner:
|
|
|
|
\begin{MyVerbatim}
|
|
% chgrp nobody /usr/local/dsc/data/
|
|
% chmod 2775 /usr/local/dsc/data/
|
|
\end{MyVerbatim}
|
|
|
|
Alternatively, you can create {\em server\/} and {\em node\/} directories
|
|
in advance and make those writable.
|
|
|
|
\begin{MyVerbatim}
|
|
% mkdir /usr/local/dsc/data/x-root/
|
|
% mkdir /usr/local/dsc/data/x-root/blah/
|
|
% mkdir /usr/local/dsc/data/x-root/blah/incoming/
|
|
% chgrp nobody /usr/local/dsc/data/x-root/blah/
|
|
% chmod 2775 /usr/local/dsc/data/x-root/blah/incoming/
|
|
\end{MyVerbatim}
|
|
|
|
Make sure that \path|/usr/local/dsc/data/| is on a large partition with
|
|
plenty of free space. You can make it a symbolic link to another
|
|
partition if necessary. Note that a typical {\dsc} installation
|
|
for a large DNS root server requires about 4GB to hold a year's worth
|
|
of data.
|
|
|
|
\subsection{rsync Method}
|
|
|
|
The directory structure is the same as above (for X.509). The only
|
|
differences are that:
|
|
\begin{itemize}
|
|
\item
|
|
The {\em server\/}, {\em node\/}, and {\em incoming\/}
|
|
directories must be made in advance.
|
|
\item
|
|
The directories should be writable by the userid associated
|
|
with the {\em rsync}/{\em ssh\/} connection. You may want
|
|
to create a dedicated {\em dsc\/} userid for this.
|
|
\end{itemize}
|
|
|
|
|
|
\section{/usr/local/dsc/var/log}
|
|
|
|
The \path|put-file.pl| script logs its activity to
|
|
\path|put-file.log| in this directory. It should have been
|
|
created when you ran {\em make install\/} earlier. The directory
|
|
should be writable by the HTTP server userid (usually {\em nobody\/}
|
|
or {\em www\/}). Unfortunately the installation isn't fancy enough
|
|
to determine that userid yet, so you must change the ownership manually:
|
|
|
|
\begin{MyVerbatim}
|
|
% chgrp nobody /usr/local/dsc/var/log/
|
|
\end{MyVerbatim}
|
|
|
|
Furthermore, you probably want to make sure the log file does not
|
|
grow indefinitely. For example, on FreeBSD we add this line to \path|/etc/newsyslog.conf|:
|
|
|
|
\begin{MyVerbatim}
|
|
/usr/local/dsc/var/log/put-file.log nobody:wheel 644 10 * @T00 BN
|
|
\end{MyVerbatim}
|
|
|
|
You need not worry about this directory if you are using the
|
|
{\em rsync\/} upload method.
|
|
|
|
\section{/usr/local/dsc/cache}
|
|
|
|
This directory, also created by {\em make install\/} above, holds cached
|
|
plot images. It also must be writable by the HTTP userid:
|
|
|
|
\begin{MyVerbatim}
|
|
% chgrp nobody /usr/local/dsc/cache/
|
|
\end{MyVerbatim}
|
|
|
|
\section{Cron Jobs}
|
|
|
|
{\dsc} requires two cron jobs on the Presenter. The first
|
|
is the one that processes incoming XML files. It is called
|
|
\path|refile-and-grok.sh|. We recommend running it every
|
|
minute. You also may want to run the jobs at a lowerer priority
|
|
with {\tt nice\/}. Here is the cron job that we use:
|
|
|
|
\begin{MyVerbatim}
|
|
* * * * * /usr/bin/nice -10 /usr/local/dsc/libexec/refile-and-grok.sh
|
|
\end{MyVerbatim}
|
|
|
|
The other useful cron script is \path|remove-xmls.pl|. It removes
|
|
XML files older than a specified number of days. Since most of the
|
|
information in the XML files is archived into easier-to-parse
|
|
data files, you can remove the XML files after a few days. This is
|
|
the job that we use:
|
|
|
|
\begin{MyVerbatim}
|
|
@midnight find /usr/local/dsc/data/ | /usr/local/dsc/libexec/remove-xmls.pl 7
|
|
\end{MyVerbatim}
|
|
|
|
\section{Data URIs}
|
|
|
|
{\dsc} uses ``Data URIs'' by default. This is a URI where the
|
|
content is base-64 encoded into the URI string. It allows us
|
|
to include images directly in HTML output, such that the browser
|
|
does not have to make additional HTTP requests for the images.
|
|
Data URIs may not work with some browsers.
|
|
|
|
To disable Data URIs, edit {\em presenter/perllib/DSC/grapher.pm\/}
|
|
and change this line:
|
|
|
|
\begin{verbatim}
|
|
$use_data_uri = 1;
|
|
\end{verbatim}
|
|
|
|
to
|
|
|
|
\begin{verbatim}
|
|
$use_data_uri = 0;
|
|
\end{verbatim}
|
|
|
|
Also make this symbolic link from your HTTP servers ``htdocs'' directory:
|
|
|
|
\begin{verbatim}
|
|
# cd htdocs
|
|
# ln -s /usr/local/dsc/share/html dsc
|
|
\end{verbatim}
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\chapter{Configuring the {\dsc} Presenter}
|
|
|
|
This chapter describes how to create X.509 certificates and configure
|
|
Apache/mod\_ssl. If you plan on using a different upload
|
|
technique (such as scp or rsync) you can skip these instructions.
|
|
|
|
\section{Generating X.509 Certificates}
|
|
|
|
We use X.509 certificates to authenticate both sides
|
|
of an SSL connection when uploading XML data files from
|
|
the collector to the presenter.
|
|
|
|
Certificate generation is a tricky thing. We use three different
|
|
types of certificates:
|
|
\begin{enumerate}
|
|
\item A self-signed root CA certificate
|
|
\item A server certificate
|
|
\item Client certificates for each collector node
|
|
\end{enumerate}
|
|
|
|
In the client certificates
|
|
we use X.509 fields to store the collector's server and node name.
|
|
The Organizational Unit Name (OU) becomes the server name and
|
|
the Common Name (CN) becomes the node name.
|
|
|
|
The {\dsc} source code distribution includes some shell scripts
|
|
that we have
|
|
used to create X.509 certificates. You can find them in the
|
|
\path|presenter/certs| directory. Note these are not installed
|
|
into \path|/usr/local/dsc|. You should edit \path|openssl.conf|
|
|
and enter the relevant information for your organization.
|
|
|
|
\subsection{Certificate Authority}
|
|
|
|
You may need to create a self-signed certificate authority if you
|
|
don't already have one. The CA signs client and server certificates.
|
|
You will need to distribute the CA and client certificates to
|
|
collector sites. Here is how to use our \path|create-ca-cert.sh|
|
|
script:
|
|
|
|
\begin{MyVerbatim}
|
|
% sh create-ca-cert.sh
|
|
CREATING CA CERT
|
|
Generating a 2048 bit RSA private key
|
|
..............................................................................
|
|
............+++
|
|
......+++
|
|
writing new private key to './private/cakey.pem'
|
|
Enter PEM pass phrase:
|
|
Verifying - Enter PEM pass phrase:
|
|
-----
|
|
\end{MyVerbatim}
|
|
|
|
|
|
\subsection{Server Certificate}
|
|
|
|
The server certificate is used by the HTTP server (Apache/mod\_ssl).
|
|
The clients will have a copy of the CA certificate so they
|
|
can validate the server's certificate when uploading XML files.
|
|
Use the \path|create-srv-cert.sh| script to create a server
|
|
certificate:
|
|
|
|
\begin{MyVerbatim}
|
|
% sh create-srv-cert.sh
|
|
CREATING SERVER REQUEST
|
|
Generating a 1024 bit RSA private key
|
|
..........................++++++
|
|
.....................................++++++
|
|
writing new private key to 'server/server.key'
|
|
Enter PEM pass phrase:
|
|
Verifying - Enter PEM pass phrase:
|
|
-----
|
|
You are about to be asked to enter information that will be incorporated
|
|
into your certificate request.
|
|
What you are about to enter is what is called a Distinguished Name or a DN.
|
|
There are quite a few fields but you can leave some blank
|
|
For some fields there will be a default value,
|
|
If you enter '.', the field will be left blank.
|
|
-----
|
|
Country Name (2 letter code) [AU]:US
|
|
State or Province Name (full name) [Some-State]:Colorado
|
|
Locality Name (eg, city) []:Boulder
|
|
Organization Name (eg, company) [Internet Widgits Pty Ltd]:The Measurement Factory, Inc
|
|
Organizational Unit Name (eg, section) []:DNS
|
|
Common Name (eg, YOUR name) []:dns.measurement-factory.com
|
|
Email Address []:wessels@measurement-factory.com
|
|
|
|
Please enter the following 'extra' attributes
|
|
to be sent with your certificate request
|
|
A challenge password []:
|
|
An optional company name []:
|
|
Enter pass phrase for server/server.key:
|
|
writing RSA key
|
|
CREATING SERVER CERT
|
|
Using configuration from ./openssl.conf
|
|
Enter pass phrase for ./private/cakey.pem:
|
|
Check that the request matches the signature
|
|
Signature ok
|
|
The Subject's Distinguished Name is as follows
|
|
countryName :PRINTABLE:'US'
|
|
stateOrProvinceName :PRINTABLE:'Colorado'
|
|
localityName :PRINTABLE:'Boulder'
|
|
organizationName :PRINTABLE:'The Measurement Factory, Inc'
|
|
organizationalUnitName:PRINTABLE:'DNS'
|
|
commonName :PRINTABLE:'dns.measurement-factory.com'
|
|
emailAddress :IA5STRING:'wessels@measurement-factory.com'
|
|
Certificate is to be certified until Jun 3 20:06:17 2013 GMT (3000 days)
|
|
Sign the certificate? [y/n]:y
|
|
|
|
|
|
1 out of 1 certificate requests certified, commit? [y/n]y
|
|
Write out database with 1 new entries
|
|
Data Base Updated
|
|
\end{MyVerbatim}
|
|
|
|
Note that the Common Name must match the hostname of the HTTP
|
|
server that receives XML files.
|
|
|
|
Note that the \path|create-srv-cert.sh| script rewrites the
|
|
server key file without the RSA password. This allows your
|
|
HTTP server to start automatically without prompting for
|
|
the password.
|
|
|
|
The script leaves the server certificate and key in the \path|server|
|
|
directory. You'll need to copy these over to the HTTP server config
|
|
directory as described later in this chapter.
|
|
|
|
\section{Client Certificates}
|
|
|
|
Generating client certificates is similar. Remember that
|
|
the Organizational Unit Name and Common Name correspond to the
|
|
collector's {\em server\/} and {\em node\/} names. For example:
|
|
|
|
\begin{MyVerbatim}
|
|
% sh create-clt-cert.sh
|
|
CREATING CLIENT REQUEST
|
|
Generating a 1024 bit RSA private key
|
|
................................++++++
|
|
..............++++++
|
|
writing new private key to 'client/client.key'
|
|
Enter PEM pass phrase:
|
|
Verifying - Enter PEM pass phrase:
|
|
-----
|
|
You are about to be asked to enter information that will be incorporated
|
|
into your certificate request.
|
|
What you are about to enter is what is called a Distinguished Name or a DN.
|
|
There are quite a few fields but you can leave some blank
|
|
For some fields there will be a default value,
|
|
If you enter '.', the field will be left blank.
|
|
-----
|
|
Country Name (2 letter code) [AU]:US
|
|
State or Province Name (full name) [Some-State]:California
|
|
Locality Name (eg, city) []:Los Angeles
|
|
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Some DNS Server
|
|
Organizational Unit Name (eg, section) []:x-root
|
|
Common Name (eg, YOUR name) []:LAX
|
|
Email Address []:noc@example.com
|
|
|
|
Please enter the following 'extra' attributes
|
|
to be sent with your certificate request
|
|
A challenge password []:
|
|
An optional company name []:
|
|
CREATING CLIENT CERT
|
|
Using configuration from ./openssl.conf
|
|
Enter pass phrase for ./private/cakey.pem:
|
|
Check that the request matches the signature
|
|
Signature ok
|
|
The Subject's Distinguished Name is as follows
|
|
countryName :PRINTABLE:'US'
|
|
stateOrProvinceName :PRINTABLE:'California'
|
|
localityName :PRINTABLE:'Los Angeles'
|
|
organizationName :PRINTABLE:'Some DNS Server'
|
|
organizationalUnitName:PRINTABLE:'x-root '
|
|
commonName :PRINTABLE:'LAX'
|
|
emailAddress :IA5STRING:'noc@example.com'
|
|
Certificate is to be certified until Jun 3 20:17:24 2013 GMT (3000 days)
|
|
Sign the certificate? [y/n]:y
|
|
|
|
|
|
1 out of 1 certificate requests certified, commit? [y/n]y
|
|
Write out database with 1 new entries
|
|
Data Base Updated
|
|
Enter pass phrase for client/client.key:
|
|
writing RSA key
|
|
writing RSA key
|
|
\end{MyVerbatim}
|
|
|
|
The client's key and certificate will be placed in a directory
|
|
based on the server and node names. For example:
|
|
|
|
\begin{MyVerbatim}
|
|
% ls -l client/x-root/LAX
|
|
total 10
|
|
-rw-r--r-- 1 wessels wessels 3311 Mar 17 13:17 client.crt
|
|
-rw-r--r-- 1 wessels wessels 712 Mar 17 13:17 client.csr
|
|
-r-------- 1 wessels wessels 887 Mar 17 13:17 client.key
|
|
-rw-r--r-- 1 wessels wessels 1953 Mar 17 13:17 client.pem
|
|
\end{MyVerbatim}
|
|
|
|
The \path|client.pem| (and \path|cacert.pem|) files should be copied
|
|
to the collector machine.
|
|
|
|
\section{Apache Configuration}
|
|
|
|
\noindent
|
|
You need to configure Apache for SSL. Here is what our configuration
|
|
looks like:
|
|
|
|
\begin{MyVerbatim}
|
|
SSLRandomSeed startup builtin
|
|
SSLRandomSeed startup file:/dev/random
|
|
SSLRandomSeed startup file:/dev/urandom 1024
|
|
SSLRandomSeed connect builtin
|
|
SSLRandomSeed connect file:/dev/random
|
|
SSLRandomSeed connect file:/dev/urandom 1024
|
|
|
|
<VirtualHost _default_:443>
|
|
DocumentRoot "/httpd/htdocs-ssl"
|
|
SSLEngine on
|
|
SSLCertificateFile /httpd/conf/SSL/server/server.crt
|
|
SSLCertificateKeyFile /httpd/conf/SSL/server/server.key
|
|
SSLCertificateChainFile /httpd/conf/SSL/cacert.pem
|
|
|
|
# For client-validation
|
|
SSLCACertificateFile /httpd/conf/SSL/cacert.pem
|
|
SSLVerifyClient require
|
|
|
|
SSLOptions +CompatEnvVars
|
|
Script PUT /cgi-bin/put-file.pl
|
|
</VirtualHost>
|
|
\end{MyVerbatim}
|
|
|
|
\noindent
|
|
Note the last line of the configuration specifies the CGI script
|
|
that accepts PUT requests. The {\em SSLOptions\/}
|
|
line is necessary so that the CGI script receives certain HTTP
|
|
headers as environment variables. Those headers/variables convey
|
|
the X.509 information to the script so it knows where to store
|
|
received XML files.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\chapter{Collector Installation}
|
|
|
|
|
|
A collector machine needs only the {\em dsc\/} binary, a configuration
|
|
file, and a couple of cron job scripts.
|
|
|
|
At this point, {\dsc} lacks certain niceties such as a \path|./configure|
|
|
script. The installation prefix, \path|/usr/local/dsc| is currently
|
|
hard-coded.
|
|
|
|
|
|
\section{Prerequisites}
|
|
|
|
You'll need a C/C++ compiler to compile the {\tt dsc\/} source code.
|
|
|
|
If the collector and archiver are different systems, you'll need a
|
|
way to transfer data files. We recommend that you use the {\tt
|
|
curl\/} HTTP/SSL client You may use another technique, such as {\tt
|
|
scp\/} or {\tt rsync\/} if you prefer.
|
|
|
|
\section{\tt Installation}
|
|
|
|
You can compile {\tt dsc\/} from the {\tt collector\/} directory:
|
|
|
|
\begin{MyVerbatim}
|
|
% cd collector
|
|
% make
|
|
\end{MyVerbatim}
|
|
|
|
Assuming there are no errors or problems during compilation, install
|
|
the {\tt dsc\/} binary and other scripts with:
|
|
|
|
\begin{MyVerbatim}
|
|
% make install
|
|
\end{MyVerbatim}
|
|
|
|
This installs five files:
|
|
\begin{Verbatim}
|
|
/usr/local/dsc/bin/dsc
|
|
/usr/local/dsc/etc/dsc.conf.sample
|
|
/usr/local/dsc/libexec/upload-prep.pl
|
|
/usr/local/dsc/libexec/upload-rsync.sh
|
|
/usr/local/dsc/libexec/upload-x509.sh
|
|
\end{Verbatim}
|
|
|
|
Of course, if you don't want to use the default installation
|
|
prefix, you can manually copy these files to a location
|
|
of your choosing. If you do that, you'll also need to
|
|
edit the cron scripts to match your choice of pathnames, etc.
|
|
|
|
\section{Uploading XML Files}
|
|
\label{sec-install-collector-cron}
|
|
|
|
This section describes how XML files are transferred from
|
|
the collector to one or more Presenter systems.
|
|
|
|
As we'll see in the next chapter, each {\tt dsc} process
|
|
has its own {\em run directory\/}. This is the directory
|
|
where {\tt dsc} leaves its XML files. It usually has a
|
|
name like \path|/usr/local/dsc/run/NODENAME|\@. XML files
|
|
are removed after they are successfully transferred. If the
|
|
Presenter is unreachable, XML files accumulate here until
|
|
they can be transferred. Make sure that you have
|
|
enough disk space to queue a lot of XML files in the
|
|
event of an outage.
|
|
|
|
In general we want to be able to upload XML files to multiple
|
|
presenters. This is the reason behind the {\tt upload-prep.pl}
|
|
script. This script runs every 60 seconds from cron:
|
|
|
|
\begin{MyVerbatim}
|
|
* * * * * /usr/local/dsc/libexec/upload-prep.pl
|
|
\end{MyVerbatim}
|
|
|
|
{\tt upload-prep.pl} looks for \path|dsc.conf| files in
|
|
\path|/usr/local/dsc/etc| by default. For each config file
|
|
found, it cd's to the {\em run\_dir\/} and links\footnote{as in
|
|
``hard link'' made with \path|/bin/ln|.}
|
|
XML files to one or more upload directories. The upload directories
|
|
are named \path|upload/dest1|, \path|upload/dest2|, and so on.
|
|
|
|
In order for all this to work, you must create the directories
|
|
in advance. For example, if you are collecting stats on
|
|
your nameserver named {\em ns0\/}, and want to send the XML files
|
|
to two presenters (named oarc and archive), the directory structure
|
|
might look like:
|
|
|
|
\begin{MyVerbatim}
|
|
% set prefix=/usr/local/dsc
|
|
% mkdir $prefix/run
|
|
% mkdir $prefix/run/ns0
|
|
% mkdir $prefix/run/ns0/upload
|
|
% mkdir $prefix/run/ns0/upload/oarc
|
|
% mkdir $prefix/run/ns0/upload/archive
|
|
\end{MyVerbatim}
|
|
|
|
With that directory structure, the {\tt upload-prep.pl} script moves
|
|
XML files from the \path|ns0| directory to the two
|
|
upload directories, \path|oarc| and \path|archive|.
|
|
|
|
To actually transfer files to the presenter, use either
|
|
\path|upload-x509.sh| or \path|upload-rsync.sh|.
|
|
|
|
\subsection{upload-x509.sh}
|
|
|
|
This cron script is responsible for
|
|
actually transferring XML files from the upload directories
|
|
to the remote server. It creates a {\em tar\/} archive
|
|
of XML files and then uploads it to the remote server with
|
|
{\tt curl}. The script takes three commandline arguments:
|
|
|
|
\begin{MyVerbatim}
|
|
% upload-x509.sh NODE DEST URI
|
|
\end{MyVerbatim}
|
|
|
|
{\em NODE\/} must match the name of a directory under
|
|
\path|/usr/local/dsc/run|. Similarly, {\em DEST\/} must match the
|
|
name of a directory under \path|/usr/local/dsc/run/NODE/upload|.
|
|
{\em URI\/} is the URL/URI that the data is uploaded to. Usually
|
|
it is just an HTTPS URL with the name of the destination server.
|
|
We also recommend running this from cron every 60 seconds. For
|
|
example:
|
|
|
|
\begin{MyVerbatim}
|
|
* * * * * /usr/local/dsc/libexec/upload-x509.sh ns0 oarc \
|
|
https://collect.oarc.isc.org/
|
|
* * * * * /usr/local/dsc/libexec/upload-x509.sh ns0 archive \
|
|
https://archive.example.com/
|
|
\end{MyVerbatim}
|
|
|
|
\path|upload-x509.sh| looks for X.509 certificates in
|
|
\path|/usr/local/dsc/certs|. The client certificate should be named
|
|
\path|/usr/local/dsc/certs/DEST/NODE.pem| and the CA certificate
|
|
should be named
|
|
\path|/usr/local/dsc/certs/DEST/cacert.pem|. Note that {\em DEST\/}
|
|
and {\em NODE\/} must match the \path|upload-x509.sh|
|
|
command line arguments.
|
|
|
|
\subsection{upload-rsync.sh}
|
|
|
|
This script can be used to transfer XML files files from the upload
|
|
directories to the remote server. It uses {\em rsync\/} and
|
|
assumes that {\em rsync\/} will use {\em ssh\/} for transport.
|
|
This script also takes three arguments:
|
|
|
|
\begin{MyVerbatim}
|
|
% upload-rsync.sh NODE DEST RSYNC-DEST
|
|
\end{MyVerbatim}
|
|
|
|
Note that {\em DEST\/} is the name of the local ``upload'' directory
|
|
and {\em RSYNC-DEST\/} is an {\em rsync\/} destination (i.e., hostname and remote directory).
|
|
Here is how you might use it in a crontab:
|
|
|
|
\begin{MyVerbatim}
|
|
* * * * * /usr/local/dsc/libexec/upload-rsync.sh ns0 oarc \
|
|
dsc@collect.oarc.isc.org:/usr/local/dsc/data/Server/ns0
|
|
* * * * * /usr/local/dsc/libexec/upload-rsync.sh ns0 archive \
|
|
dsc@archive.oarc.isc.org:/usr/local/dsc/data/Server/ns0
|
|
\end{MyVerbatim}
|
|
|
|
Also note that \path|upload-rsync.sh| will actually store the remote
|
|
XML files in \path|incoming/YYYY-MM-DD| subdirectories. That is,
|
|
if your {\em RSYNC-DEST\/} is \path|host:/usr/local/dsc/data/Server/ns0|
|
|
then files will actually be written to
|
|
\path|/usr/local/dsc/data/Server/ns0/incoming/YYYY-MM-DD| on {\em host},
|
|
where \path|YYYY-MM-DD| is replaced by the year, month, and date of the
|
|
XML files. These subdirectories reduce filesystem pressure in the event
|
|
of backlogs.
|
|
|
|
{\em rsync\/} over {\em ssh\/} requires you to use RSA or DSA public keys
|
|
that do not have a passphrase. If you do not want to use one of
|
|
{\em ssh\/}'s default identity files, you can create one specifically
|
|
for this script. It should be named \path|dsc_uploader_id| (and
|
|
\path|dsc_uploader_id.pub|) in the \$HOME/.ssh directory of the user
|
|
that will be running the script. For example, you can create it
|
|
with this command:
|
|
|
|
\begin{MyVerbatim}
|
|
% ssh-keygen -t dsa -C dsc-uploader -f $HOME/.ssh/dsc_uploader_id
|
|
\end{MyVerbatim}
|
|
|
|
Then add \path|dsc_uploader_id.pub| to the \path|authorized_keys|
|
|
file of the receiving userid on the presenter system.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\chapter{Configuring and Running the {\dsc} Collector}
|
|
|
|
\section{dsc.conf}
|
|
|
|
Before running {\tt dsc\/} you need to create a configuration file.
|
|
Note that configuration directive lines are terminated with a semi-colon.
|
|
The configuration file currently understands the following directives:
|
|
|
|
\begin{description}
|
|
|
|
\item[local\_address]
|
|
|
|
Specifies the DNS server's local IP address. It is used
|
|
to determine the ``direction'' of an IP packet: sending,
|
|
receiving, or other. You may specify multiple local addresses
|
|
by repeating the {\em local\_address} line any number of times.
|
|
|
|
Example: {\tt local\_address 172.16.0.1;\/}
|
|
Example: {\tt local\_address 2001:4f8:0:2::13;\/}
|
|
|
|
\item[run\_dir]
|
|
|
|
A directory that should become {\tt dsc\/}'s current directory
|
|
after it starts. XML files will be written here, as will
|
|
any core dumps.
|
|
|
|
Example: {\tt run\_dir "/var/run/dsc";\/}
|
|
|
|
\item[minfree\_bytes]
|
|
|
|
If the filesystem where {\tt dsc\/} writes its XML files
|
|
does not have at least this much free space, then
|
|
{\tt dsc\/} will not write the XML files. This prevents
|
|
{\tt dsc\/} from filling up the filesystem. The XML
|
|
files that would have been written are simply lost and
|
|
cannot be receovered. {\tt dsc\/} will begin writing
|
|
XML files again when the filesystem has the necessary
|
|
free space.
|
|
|
|
\item[bpf\_program]
|
|
|
|
A Berkeley Packet Filter program string. Normally you
|
|
should leave this unset. You may use this to further
|
|
restrict the traffic seen by {\tt dsc\/}. Note that {\tt
|
|
dsc\/} currently has one indexer that looks at all IP
|
|
packets. If you specify something like {\em udp port 53\/}
|
|
that indexer will not work.
|
|
|
|
However, if you want to monitor multiple DNS servers with
|
|
separate {\dsc} instances on one collector box, then you
|
|
may need to use {\em bpf\_program} to make sure that each
|
|
{\tt dsc} process sees only the traffic it should see.
|
|
|
|
Note that this directive must go before the {\em interface\/}
|
|
directive because {\tt dsc\/} makes only one pass through
|
|
the configuration file and the BPF filter is set when the
|
|
interface is initialized.
|
|
|
|
Example: {\tt bpf\_program "dst host 192.168.1.1";\/}
|
|
|
|
\item[interface]
|
|
|
|
The interface name to sniff packets from or a pcap file to
|
|
read packets from. You may specify multiple interfaces.
|
|
|
|
Example:
|
|
{\tt interface fxp0;\/}
|
|
{\tt interface /path/to/dump.pcap;\/}
|
|
|
|
\item[bpf\_vlan\_tag\_byte\_order]
|
|
|
|
{\tt dsc\/} knows about VLAN tags. Some operating systems
|
|
(FreeBSD-4.x) have a bug whereby the VLAN tag id is
|
|
byte-swapped. Valid values for this directive are {\tt
|
|
host\/} and {\tt net\/} (the default). Set this to {\tt
|
|
host\/} if you suspect your operating system has the VLAN
|
|
tag byte order bug.
|
|
|
|
Example: {\tt bpf\_vlan\_tag\_byte\_order host;\/}
|
|
|
|
\item[match\_vlan]
|
|
|
|
A list of VLAN identifiers (integers). If set, only the
|
|
packets belonging to these VLANs are counted.
|
|
|
|
Example: {\tt match\_vlan 101 102;\/}
|
|
|
|
\item[qname\_filter]
|
|
|
|
This directive allows you to define custom filters
|
|
to match query names in DNS messages. Please see
|
|
Section~\ref{sec-qname-filter} for more information.
|
|
|
|
\item[dataset]
|
|
|
|
This directive is the heart of {\dsc}. However, it is also
|
|
the most complex.
|
|
To save time we recommend that you copy interesting-looking
|
|
dataset definitions from \path|dsc.conf.sample|. Comment
|
|
out any that you feel are irrelevant or uninteresting.
|
|
Later, as you become more familiar with {\dsc}, you may
|
|
want to read the next chapter and add your own custom
|
|
datasets.
|
|
|
|
\item[output\_format]
|
|
|
|
Specify the output format, can be give multiple times to output in more then
|
|
one format. Default output format is XML.
|
|
|
|
Available formats are:
|
|
- XML
|
|
- JSON
|
|
|
|
Example: {\tt output\_format JSON}
|
|
\end{description}
|
|
|
|
|
|
\section{A Complete Sample dsc.conf}
|
|
|
|
Here's how your entire {\em dsc.conf\/} file might look:
|
|
|
|
\begin{MyVerbatim}
|
|
#bpf_program
|
|
interface em0;
|
|
|
|
local_address 192.5.5.241;
|
|
|
|
run_dir "/usr/local/dsc/run/foo";
|
|
|
|
dataset qtype dns All:null Qtype:qtype queries-only;
|
|
dataset rcode dns All:null Rcode:rcode replies-only;
|
|
dataset opcode dns All:null Opcode:opcode queries-only;
|
|
dataset rcode_vs_replylen dns Rcode:rcode ReplyLen:msglen replies-only;
|
|
dataset client_subnet dns All:null ClientSubnet:client_subnet queries-only
|
|
max-cells=200;
|
|
dataset qtype_vs_qnamelen dns Qtype:qtype QnameLen:qnamelen queries-only;
|
|
dataset qtype_vs_tld dns Qtype:qtype TLD:tld queries-only,popular-qtypes
|
|
max-cells=200;
|
|
dataset certain_qnames_vs_qtype dns CertainQnames:certain_qnames
|
|
Qtype:qtype queries-only;
|
|
dataset client_subnet2 dns Class:query_classification
|
|
ClientSubnet:client_subnet queries-only max-cells=200;
|
|
dataset client_addr_vs_rcode dns Rcode:rcode ClientAddr:client
|
|
replies-only max-cells=50;
|
|
dataset chaos_types_and_names dns Qtype:qtype Qname:qname
|
|
chaos-class,queries-only;
|
|
dataset idn_qname dns All:null IDNQname:idn_qname queries-only;
|
|
dataset edns_version dns All:null EDNSVersion:edns_version queries-only;
|
|
dataset do_bit dns All:null D0:do_bit queries-only;
|
|
dataset rd_bit dns All:null RD:rd_bit queries-only;
|
|
dataset tc_bit dns All:null TC:tc_bit replies-only;
|
|
dataset idn_vs_tld dns All:null TLD:tld queries-only,idn-only;
|
|
dataset ipv6_rsn_abusers dns All:null ClientAddr:client
|
|
queries-only,aaaa-or-a6-only,root-servers-n et-only max-cells=50;
|
|
dataset transport_vs_qtype dns Transport:transport Qtype:qtype queries-only;
|
|
|
|
dataset direction_vs_ipproto ip Direction:ip_direction IPProto:ip_proto
|
|
any;
|
|
\end{MyVerbatim}
|
|
|
|
\section{Running {\tt dsc}}
|
|
|
|
{\tt dsc\/} accepts a single command line argument, which is
|
|
the name of the configuration file. For example:
|
|
|
|
\begin{MyVerbatim}
|
|
% cd /usr/local/dsc
|
|
% bin/dsc etc/foo.conf
|
|
\end{MyVerbatim}
|
|
|
|
If you run {\tt ps} when {\tt dsc} is running, you'll see two processes:
|
|
|
|
\begin{MyVerbatim}
|
|
60494 ?? S 0:00.36 bin/dsc etc/foo.conf
|
|
69453 ?? Ss 0:10.65 bin/dsc etc/foo.conf
|
|
\end{MyVerbatim}
|
|
|
|
The first process simply forks off child processes every
|
|
60 seconds. The child processes do the work of analyzing
|
|
and tabulating DNS messages.
|
|
|
|
Please use NTP or another technique to keep the collector's
|
|
clock synchronized to the correct time.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\chapter{Viewing {\dsc} Graphs}
|
|
|
|
To view {\dsc} data in a web browser, simply enter the
|
|
URL to the \path|dsc-grapher.pl| CGI. But before you
|
|
do that, you'll need to create a grapher configuration file.
|
|
|
|
\path|dsc-grapher.pl| uses a simple configuration file to set certain
|
|
menu options. This configuration file is
|
|
\path|/usr/local/dsc/etc/dsc-grapher.cfg|. You should find
|
|
a sample version in the same directory. For example:
|
|
|
|
\begin{MyVerbatim}
|
|
server f-root pao1 sfo2
|
|
server isc senna+piquet
|
|
server tmf hq sc lgh
|
|
trace_windows 1hour 4hour 1day 1week 1month
|
|
accum_windows 1day 2days 3days 1week
|
|
timezone Asia/Tokyo
|
|
domain_list isc_tlds br nl ca cz il pt cl
|
|
domain_list isc_tlds sk ph hr ae bg is si za
|
|
valid_domains isc isc_tlds
|
|
|
|
\end{MyVerbatim}
|
|
|
|
\begin{figure}
|
|
\centerline{\psfig{figure=screenshot1.eps,width=6.5in}}
|
|
\caption{\label{fig-screenshot1}A sample graph}
|
|
\end{figure}
|
|
|
|
Refer to Figure~\ref{fig-screenshot1} to see how
|
|
the directives affect the visual display.
|
|
The following three directives should always be set in
|
|
the configuration file:
|
|
|
|
\begin{description}
|
|
\item[server]
|
|
This directive tells \path|dsc-grapher.pl| to list
|
|
the given server and its associated nodes in the
|
|
``Servers/Nodes'' section of its navigation menu.
|
|
You can repeat this directive for each server that
|
|
the Presenter has.
|
|
\item[trace\_windows]
|
|
Specifies the ``Time Scale'' menu options for
|
|
trace-based plots.
|
|
\item[accum\_windows]
|
|
Specifies the ``Time Scale'' menu options for
|
|
``cumulative'' plots, such as the Classification plot.
|
|
\end{description}
|
|
|
|
Note that the \path|dsc-grapher.cfg| only affects what
|
|
may appear in the navigation window. It does NOT prevent users
|
|
from entering other values in the URL parameters. For example,
|
|
if you have data for a server/node in your
|
|
\path|/usr/local/dsc/data/| directory that is not listed in
|
|
\path|dsc-grapher.cfg|, a user may still be able to view that
|
|
data by manually setting the URL query parameters.
|
|
|
|
The configuration file accepts a number of optional directives
|
|
as well. You may set these if you like, but they are not
|
|
required:
|
|
|
|
\begin{description}
|
|
\item[timezone]
|
|
Sets the time zone for dates and times displayed in the
|
|
graphs.
|
|
You can use this if you want to override the system
|
|
time zone.
|
|
The value for this directive should be the name
|
|
of a timezone entry in your system database (usually found
|
|
in {\path|/usr/share/zoneinfo|}.
|
|
For example, if your system time zone is set
|
|
to UTC but you want the times displayed for the
|
|
London timezone, you can set this directive to
|
|
{\tt Europe/London\/}.
|
|
\item[domain\_list]
|
|
This directive, along with {\em valid\_domains\/}, tell the
|
|
presenter which domains a nameserver is authoritative for.
|
|
That information is used in the TLDs subgraphs to differentiate
|
|
requests for ``valid'' and ``invalid'' domains.
|
|
|
|
The {\em domain\_list\/} creates a named list of domains.
|
|
The first token is a name for the list, and the remaining
|
|
tokens are domain names. The directive may be repeated with
|
|
the same list name, as shown in the above example.
|
|
\item[valid\_domains]
|
|
This directive glues servers and domain\_lists together. The
|
|
first token is the name of a {\em server\/} and the second token is
|
|
the name of a {\em domain\_list\/}.
|
|
\item[embargo]
|
|
The {\em embargo\/} directive may be used to delay the
|
|
availability of data via the presenter. For example, you
|
|
may have one instance of {\em dsc-grapher.pl\/} for internal
|
|
use only (password protected, etc). You may also have a
|
|
second instance for third-parties where data is delayed by
|
|
some amount of time, such as hours, days, or weeks. The value
|
|
of the {\em embargo\/} directive is the number of seconds which
|
|
data availability should be delayed. For example, if you set
|
|
it to 604800, then viewers will not be able to see any data
|
|
less than one week old.
|
|
\item[anonymize\_ip]
|
|
When the {\em anonymize\_ip\/} directive is given, IP addresses
|
|
in the display will be anonymized. The anonymization algorithm
|
|
is currently hard-coded and designed only for IPv4 addresses.
|
|
It masks off the lower 24 bits and leaves only the first octet
|
|
in place.
|
|
\item[hide\_nodes]
|
|
When the {\em hide\_nodes\/} directive is given, the presenter
|
|
will not display the list node names underneath the current
|
|
server. This might be useful if you have a number of nodes
|
|
but only want viewers to see the server as a whole, without
|
|
exposing the particular nodes in the cluster. Note, however,
|
|
that if someone already knows the name of a node they can
|
|
hand-craft query terms in the URL to display the data for
|
|
only that node. In other words, the {\em hide\_nodes\/}
|
|
only provides ``security through obscurity.''
|
|
\end{description}
|
|
|
|
|
|
The first few times you try \path|dsc-grapher.pl|, be sure to run
|
|
{\tt tail -f} on the HTTP server error.log file.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
\chapter{{\dsc} Datasets}
|
|
|
|
A {\em dataset\/} is a 2-D array of counters. For example, you
|
|
might have a dataset with ``Query Type'' along one dimension and
|
|
``Query Name Length'' on the other. The result is a table that
|
|
shows the distribution of query name lengths for each query type.
|
|
For example:
|
|
|
|
\vspace{1ex}
|
|
\begin{center}
|
|
\begin{tabular}{l|rrrrrr}
|
|
Len & A & AAAA & A6 & PTR & NS & SOA \\
|
|
\hline
|
|
$\cdots$ & & & & & \\
|
|
11 & 14 & 8 & 7 & 11 & 2 & 0 \\
|
|
12 & 19 & 2 & 3 & 19 & 4 & 1 \\
|
|
$\cdots$ & & & & & & \\
|
|
255 & 0 & 0 & 0 & 0 & 0 & 0 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{center}
|
|
\vspace{1ex}
|
|
|
|
\noindent
|
|
A dataset is defined by the following parameters:
|
|
\begin{itemize}
|
|
\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
|
|
\item A name
|
|
\item A protocol layer (IP or DNS)
|
|
\item An indexer for the first dimension
|
|
\item An indexer for the second dimension
|
|
\item One or more filters
|
|
\item Zero or more options and parameters
|
|
\end{itemize}
|
|
|
|
\noindent
|
|
The {\em dataset\/} definition syntax in \path|dsc.conf| is:
|
|
|
|
{\tt dataset\/}
|
|
{\em name\/}
|
|
{\em protocol\/}
|
|
{\em Label1:Indexer1\/}
|
|
{\em Label2:Indexer2\/}
|
|
{\em filter\/}
|
|
{\em [parameters]\/};
|
|
\vspace{2ex}
|
|
|
|
\section{Dataset Name}
|
|
|
|
The dataset name is used in the filename for {\tt dsc\/}'s XML
|
|
files. Although this is an opaque string in theory, the Presenter's
|
|
XML extractor routines must recognize the dataset name to properly
|
|
parse it. The source code file
|
|
\path|presenter/perllib/DSC/extractor/config.pm| contains an entry
|
|
for each known dataset name.
|
|
|
|
\section{Protocol}
|
|
|
|
{\dsc} currently knows about two protocol layers: IP and DNS.
|
|
On the {\tt dataset\/} line they are written as {\tt ip\/} and {\tt dns\/}.
|
|
|
|
|
|
\section{Indexers}
|
|
|
|
An {\em indexer\/} is simply a function that transforms the attributes
|
|
of an IP/DNS message into an array index. For some attributes the
|
|
transformation is straightforward. For example, the ``Query Type''
|
|
indexer simply extracts the query type value from a DNS message and
|
|
uses this 16-bit value as the array index.
|
|
|
|
Other attributes are slightly more complicated. For example, the
|
|
``TLD'' indexer extracts the TLD of the QNAME field of a DNS message
|
|
and maps it to an integer. The indexer maintains a simple internal
|
|
table of TLD-to-integer mappings. The actual integer values are
|
|
unimportant because the TLD strings, not the integers, appear in
|
|
the resulting XML data.
|
|
|
|
When you specify an indexer on a {\tt dataset\/} line, you must
|
|
provide both the name of the indexer and a label. The Label appears
|
|
as an attribute in the XML output. For example,
|
|
Figure~\ref{fig-sample-xml} shows the XML corresponding to this
|
|
{\em dataset\/} line:
|
|
|
|
\begin{MyVerbatim}
|
|
dataset the_dataset dns Foo:foo Bar:bar queries-only;
|
|
\end{MyVerbatim}
|
|
|
|
\begin{figure}
|
|
\begin{MyVerbatim}
|
|
<array name="the_dataset" dimensions="2" start_time="1091663940" ...
|
|
<dimension number="1" type="Foo"/>
|
|
<dimension number="2" type="Bar"/>
|
|
<data>
|
|
<Foo val="1">
|
|
<Bar val="0" count="4"/>
|
|
...
|
|
<Bar val="100" count="41"/>
|
|
</Foo>
|
|
<Foo val="2">
|
|
...
|
|
</Foo>
|
|
</data>
|
|
</array>
|
|
\end{MyVerbatim}
|
|
\caption{\label{fig-sample-xml}Sample XML output}
|
|
\end{figure}
|
|
|
|
In theory you are free to choose any label that you like, however,
|
|
the XML extractors look for specific labels. Please use the labels
|
|
given for the indexers in Tables~\ref{tbl-dns-indexers}
|
|
and~\ref{tbl-ip-indexers}.
|
|
|
|
\subsection{IP Indexers}
|
|
|
|
\begin{table}
|
|
\begin{center}
|
|
\begin{tabular}{|lll|}
|
|
\hline
|
|
Indexer & Label & Description \\
|
|
\hline
|
|
ip\_direction & Direction & one of sent, recv, or other \\
|
|
ip\_proto & IPProto & IP protocol (icmp, tcp, udp) \\
|
|
ip\_version & IP version number (4, 6) \\
|
|
\hline
|
|
\end{tabular}
|
|
\caption{\label{tbl-ip-indexers}IP packet indexers}
|
|
\end{center}
|
|
\end{table}
|
|
|
|
{\dsc} includes only minimal support for collecting IP-layer
|
|
stats. Mostly we are interested in finding out the mix of
|
|
IP protocols received by the DNS server. It can also show us
|
|
if/when the DNS server is the subject of denial-of-service
|
|
attack.
|
|
Table~\ref{tbl-ip-indexers} shows the indexers for IP packets.
|
|
Here are their longer descriptions:
|
|
|
|
\begin{description}
|
|
\item[ip\_direction]
|
|
One of three values: sent, recv, or else. Direction is determined
|
|
based on the setting for {\em local\_address\/} in the configuration file.
|
|
\item[ip\_proto]
|
|
The IP protocol type, e.g.: tcp, udp, icmp.
|
|
Note that the {\em bpf\_program\/} setting affects all traffic
|
|
seen by {\dsc}. If the program contains the word ``udp''
|
|
then you won't see any counts for non-UDP traffic.
|
|
\item[ip\_version]
|
|
The IP version number, e.g.: 4 or 6. Can be used to compare how much
|
|
traffic comes in via IPv6 compared to IPV4.
|
|
\end{description}
|
|
|
|
\subsection{IP Filters}
|
|
|
|
Currently there is only one IP protocol filter: {\tt any\/}.
|
|
It includes all received packets.
|
|
|
|
|
|
\subsection{DNS Indexers}
|
|
|
|
\begin{table}
|
|
\begin{center}
|
|
\begin{tabular}{|lll|}
|
|
\hline
|
|
Indexer & Label & Description \\
|
|
\hline
|
|
certain\_qnames & CertainQnames & Popular query names seen at roots \\
|
|
client\_subnet & ClientSubnet & The client's IP subnet (/24 for IPv4, /96 for IPv6) \\
|
|
client & ClientAddr & The client's IP address \\
|
|
do\_bit & DO & Whether the DO bit is on \\
|
|
edns\_version & EDNSVersion & The EDNS version number \\
|
|
idn\_qname & IDNQname & If the QNAME is in IDN format \\
|
|
msglen & MsgLen & The DNS message length \\
|
|
null & All & A ``no-op'' indexer \\
|
|
opcode & Opcode & DNS message opcode \\
|
|
qclass & - & Query class \\
|
|
qname & Qname & Full query name \\
|
|
qnamelen & QnameLen & Length of the query name \\
|
|
qtype & Qtype & DNS query type \\
|
|
query\_classification & Class & A classification for bogus queries \\
|
|
rcode & Rcode & DNS response code \\
|
|
rd\_bit & RD & Check if Recursion Desired bit set \\
|
|
tc\_bit & TC & Check if Truncated bit set \\
|
|
tld & TLD & TLD of the query name \\
|
|
transport & Transport & Transport protocol for the DNS message (UDP or TCP) \\
|
|
dns\_ip\_version & IPVersion & IP version of the packet carrying the DNS message \\
|
|
\hline
|
|
\end{tabular}
|
|
\caption{\label{tbl-dns-indexers}DNS message indexers}
|
|
\end{center}
|
|
\end{table}
|
|
|
|
Table~\ref{tbl-dns-indexers} shows the currently-defined indexers
|
|
for DNS messages, and here are their descriptions:
|
|
|
|
\begin{description}
|
|
\item[certain\_qnames]
|
|
This indexer isolates the two most popular query names seen
|
|
by DNS root servers: {\em localhost\/} and {\em
|
|
[a--m].root-servers.net\/}.
|
|
\item[client\_subnet]
|
|
Groups DNS messages together by the subnet of the
|
|
client's IP address. The subnet is maked by /24 for IPv4
|
|
and by /96 for IPv6. We use this to make datasets with
|
|
large, diverse client populations more manageable and to
|
|
provide a small amount of privacy and anonymization.
|
|
\item[client]
|
|
The IP (v4 and v6) address of the DNS client.
|
|
\item[do\_bit]
|
|
This indexer has only two values: 0 or 1. It indicates
|
|
whether or not the ``DO'' bit is set in a DNS query. According to
|
|
RFC 2335: {\em Setting the DO bit to one in a query indicates
|
|
to the server that the resolver is able to accept DNSSEC
|
|
security RRs.}
|
|
\item[edns\_version]
|
|
The EDNS version number, if any, in a DNS query. EDNS
|
|
Version 0 is documented in RFC 2671.
|
|
\item[idn\_qname]
|
|
This indexer has only two values: 0 or 1. It returns 1
|
|
when the first QNAME in the DNS message question section
|
|
is an internationalized domain name (i.e., containing
|
|
non-ASCII characters). Such QNAMEs begin with the string
|
|
{\tt xn--\/}. This convention is documented in RFC 3490.
|
|
\item[msglen]
|
|
The overall length (size) of the DNS message.
|
|
\item[null]
|
|
A ``no-op'' indexer that always returns the same value.
|
|
This can be used to effectively turn the 2-D table into a
|
|
1-D array.
|
|
\item[opcode]
|
|
The DNS message opcode is a four-bit field. QUERY is the
|
|
most common opcode. Additional currently defined opcodes
|
|
include: IQUERY, STATUS, NOTIFY, and UPDATE.
|
|
\item[qclass]
|
|
The DNS message query class (QCLASS) is a 16-bit value. IN
|
|
is the most common query class. Additional currently defined
|
|
query class values include: CHAOS, HS, NONE, and ANY.
|
|
\item[qname]
|
|
The full QNAME string from the first (and usually only)
|
|
QNAME in the question section of a DNS message.
|
|
\item[qnamelen]
|
|
The length of the first (and usually only) QNAME in a DNS
|
|
message question section. Note this is the ``expanded''
|
|
length if the message happens to take advantage of DNS
|
|
message ``compression.''
|
|
\item[qtype]
|
|
The query type (QTYPE) for the first QNAME in the DNS message
|
|
question section. Well-known query types include: A, AAAA,
|
|
A6, CNAME, PTR, MX, NS, SOA, and ANY.
|
|
\item[query\_classification]
|
|
A stateless classification of ``bogus'' queries:
|
|
\begin{itemize}
|
|
\setlength{\itemsep}{0ex plus 0.5ex minus 0.0ex}
|
|
\item non-auth-tld: when the TLD is not one of the IANA-approved TLDs.
|
|
\item root-servers.net: a query for a root server IP address.
|
|
\item localhost: a query for the localhost IP address.
|
|
\item a-for-root: an A query for the DNS root (.).
|
|
\item a-for-a: an A query for an IPv4 address.
|
|
\item rfc1918-ptr: a PTR query for an RFC 1918 address.
|
|
\item funny-class: a query with an unknown/undefined query class.
|
|
\item funny-qtype: a query with an unknown/undefined query type.
|
|
\item src-port-zero: when the UDP message's source port equals zero.
|
|
\item malformed: a malformed DNS message that could not be entirely parsed.
|
|
\end{itemize}
|
|
\item[rcode]
|
|
The RCODE value in a DNS response. The most common response
|
|
codes are 0 (NO ERROR) and 3 (NXDOMAIN).
|
|
\item[rd\_bit]
|
|
This indexer returns 1 if the RD (recursion desired) bit is
|
|
set in the query. Usually only stub resolvers set the RD bit.
|
|
Usually authoritative servers do not offer recursion to their
|
|
clients.
|
|
\item[tc\_bit]
|
|
This indexer returns 1 if the TC (truncated) bit is
|
|
set (in a response). An authoritative server sets the TC bit
|
|
when the entire response won't fit into a UDP message.
|
|
\item[tld]
|
|
the TLD of the first QNAME in a DNS message's question section.
|
|
\item[transport]
|
|
Indicates whether the DNS message is carried via UDP or TCP\@.
|
|
\item[dns\_ip\_version]
|
|
The IP version number that carried the DNS message.
|
|
\end{description}
|
|
|
|
\subsection{DNS Filters}
|
|
|
|
You must specify one or more of the following filters (separated by commas) on
|
|
the {\tt dataset\/} line:
|
|
|
|
\begin{description}
|
|
\item[any]
|
|
The no-op filter, counts all messages.
|
|
\item[queries-only]
|
|
Count only DNS query messages. A query is a DNS message
|
|
where the QR bit is set to 0.
|
|
\item[replies-only]
|
|
Count only DNS response messages. A query is a DNS message
|
|
where the QR bit is set to 1.
|
|
\item[popular-qtypes]
|
|
Count only DNS messages where the query type is one of:
|
|
A, NS, CNAME, SOA, PTR, MX, AAAA, A6, ANY.
|
|
\item[idn-only]
|
|
Count only DNS messages where the query name is in the
|
|
internationalized domain name format.
|
|
\item[aaaa-or-a6-only]
|
|
Count only DNS Messages where the query type is AAAA or A6.
|
|
\item[root-servers-net-only]
|
|
Count only DNS messages where the query name is within
|
|
the {\em root-servers.net\/} domain.
|
|
\item[chaos-class]
|
|
Counts only DNS messages where QCLASS is equal to
|
|
CHAOS (3). The CHAOS class is generally used
|
|
for only the special {\em hostname.bind\/} and
|
|
{\em version.bind\/} queries.
|
|
\end{description}
|
|
|
|
\noindent
|
|
Note that multiple filters are ANDed together. That is, they
|
|
narrow the input stream, rather than broaden it.
|
|
|
|
In addition to these pre-defined filters, you can add your own
|
|
custom filters.
|
|
|
|
\subsubsection{qname\_filter}
|
|
\label{sec-qname-filter}
|
|
|
|
The {\em qname\_filter} directive defines a new
|
|
filter that uses regular expression matching on the QNAME field of
|
|
a DNS message. This may be useful if you have a server that is
|
|
authoritative for a number of zones, but you want to limit
|
|
your measurements to a small subset. The {\em qname\_filter} directive
|
|
takes two arguments: a name for the filter and a regular expression.
|
|
For example:
|
|
|
|
\begin{MyVerbatim}
|
|
qname_filter MyFilterName example\.(com|net|org)$ ;
|
|
\end{MyVerbatim}
|
|
|
|
This filter matches queries (and responses) for names ending with
|
|
{\em example.com\/}, {\em example.net\/}, and {\em example.org\/}.
|
|
You can reference the named filter in the filters part of a {\em
|
|
dataset\/} line. For example:
|
|
|
|
\begin{MyVerbatim}
|
|
dataset qtype dns All:null Qtype:qtype queries-only,MyFilterName;
|
|
\end{MyVerbatim}
|
|
|
|
\subsection{Parameters}
|
|
\label{sec-dataset-params}
|
|
|
|
\noindent
|
|
{\tt dsc\/} currently supports the following optional parameters:
|
|
|
|
\begin{description}
|
|
\item[min-count={\em NN\/}]
|
|
Cells with counts less than {\em NN\/} are not included in
|
|
the output. Instead, they are aggregated into the special
|
|
values {\tt -:SKIPPED:-\/} and {\tt -:SKIPPED\_SUM:-\/}.
|
|
This helps reduce the size of datasets with a large number
|
|
of small counts.
|
|
\item[max-cells={\em NN\/}]
|
|
A different, perhaps better, way of limiting the size
|
|
of a dataset. Instead of trying to determine an appropriate
|
|
{\em min-count\/} value in advance, {\em max-cells\/}
|
|
allows you put a limit on the number of cells to
|
|
include for the second dataset dimension. If the dataset
|
|
has 9 possible first-dimension values, and you specify
|
|
a {\em max-cell\/} count of 100, then the dataset will not
|
|
have more than 900 total values. The cell values are sorted
|
|
and the top {\em max-cell\/} values are output. Values
|
|
that fall below the limit are aggregated into the special
|
|
{\tt -:SKIPPED:-\/} and {\tt -:SKIPPED\_SUM:-\/} entries.
|
|
\end{description}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\chapter{Data Storage}
|
|
|
|
\section{XML Structure}
|
|
|
|
A dataset XML file has the following structure:
|
|
|
|
\begin{MyVerbatim}
|
|
<array name="dataset-name" dimensions="2" start_time="unix-seconds"
|
|
stop_time="unix-seconds">
|
|
<dimension number="1" type="Label1"/>
|
|
<dimension number="2" type="Label2"/>
|
|
<data>
|
|
<Label1 val="D1-V1">
|
|
<Label2 val="D2-V1" count="N1"/>
|
|
<Label2 val="D2-V2" count="N2"/>
|
|
<Label2 val="D2-V3" count="N3"/>
|
|
</Label1>
|
|
<Label1 val="D1-V2">
|
|
<Label2 val="D2-V1" count="N1"/>
|
|
<Label2 val="D2-V2" count="N2"/>
|
|
<Label2 val="D2-V3" count="N3"/>
|
|
</Label1>
|
|
</data>
|
|
</array>
|
|
\end{MyVerbatim}
|
|
|
|
\noindent
|
|
{\em dataset-name\/},
|
|
{\em Label1\/}, and
|
|
{\em Label2\/} come from the dataset definition in {\em dsc.conf\/}.
|
|
|
|
The {\em start\_time\/} and {\em stop\_time\/} attributes
|
|
are given in Unix seconds. They are normally 60-seconds apart.
|
|
{\tt dsc} usually starts a new measurement interval on 60 second
|
|
boundaries. That is:
|
|
|
|
\begin{equation}
|
|
stop\_time \bmod{60} == 0
|
|
\end{equation}
|
|
|
|
The LABEL1 VAL attributes ({\em D1-V1\/}, {\em D1-V2\/}, etc) are
|
|
values for the first dimension indexer.
|
|
Similarly, the LABEL2 VAL attributes ({\em D2-V1\/}, {\em D2-V2\/},
|
|
{\em D2-V3\/}) are values for the second dimension indexer.
|
|
For some indexers these
|
|
values are numeric, for others they are strings. If the value
|
|
contains certain non-printable characters, the string is base64-encoded
|
|
and the optional BASE64 attribute is set to 1.
|
|
|
|
There are two special VALs that help keep large datasets down
|
|
to a reasonable size: {\tt -:SKIPPED:-\/} and {\tt -:SKIPPED\_SUM:-\/}.
|
|
These may be present on datasets that use the {\em min-count\/}
|
|
and {\em max-cells\/} parameters (see Section~\ref{sec-dataset-params}).
|
|
{\tt -:SKIPPED:-\/} is the number of cells that were not included
|
|
in the XML output. {\tt -:SKIPPED\_SUM:-\/}, on the other hand, is the
|
|
sum of the counts for all the skipped cells.
|
|
|
|
Note that ``one-dimensional datasets'' still use two dimensions in
|
|
the XML file. The first dimension type and value will be ``All'',
|
|
as shown in the example below.
|
|
|
|
The {\em count\/} values are always integers. If the count for
|
|
a particular tuple is zero, it should not be included in the
|
|
XML file.
|
|
|
|
Note that the contents of the XML file do not indicate
|
|
where it came from. In particular, the server and node that
|
|
it came from are not present. Instead, DSC relies on the
|
|
presenter to store XML files in a directory hierarchy
|
|
with the server and node as directory names.
|
|
|
|
|
|
\noindent
|
|
Here is a short sample XML file with real content:
|
|
\begin{MyVerbatim}
|
|
<array name="rcode" dimensions="2" start_time="1154649600"
|
|
stop_time="1154649660">
|
|
<dimension number="1" type="All"/>
|
|
<dimension number="2" type="Rcode"/>
|
|
<data>
|
|
<All val="ALL">
|
|
<Rcode val="0" count="70945"/>
|
|
<Rcode val="3" count="50586"/>
|
|
<Rcode val="4" count="121"/>
|
|
<Rcode val="1" count="56"/>
|
|
<Rcode val="5" count="44"/>
|
|
</All>
|
|
</data>
|
|
</array>
|
|
\end{MyVerbatim}
|
|
|
|
\noindent
|
|
Please see
|
|
\path|http://dns.measurement-factory.com/tools/dsc/sample-xml/|
|
|
for more sample XML files.
|
|
|
|
The XML is not very strict and might cause XML purists to cringe.
|
|
{\tt dsc} writes the XML files the old-fashioned way (with printf())
|
|
and reads them with Perl's XML::Simple module.
|
|
Here is a possibly-valid DTD for the dataset XML format.
|
|
Note, however, that the {\em LABEL1\/}
|
|
and {\em LABEL2\/} strings are different
|
|
for each dataset:
|
|
|
|
\begin{MyVerbatim}
|
|
<!DOCTYPE ARRAY [
|
|
|
|
<!ELEMENT ARRAY (DIMENSION+, DATA))>
|
|
<!ELEMENT DIMENSION>
|
|
<!ELEMENT DATA (LABEL1+)>
|
|
<!ELEMENT LABEL1 (LABEL2+)>
|
|
|
|
<!ATTLIST ARRAY NAME CDATA #REQUIRED>
|
|
<!ATTLIST ARRAY DIMENSIONS CDATA #REQUIRED>
|
|
<!ATTLIST ARRAY START_TIME CDATA #REQUIRED>
|
|
<!ATTLIST ARRAY STOP_TIME CDATA #REQUIRED>
|
|
<!ATTLIST DIMENSION NUMBER CDATA #REQUIRED>
|
|
<!ATTLIST DIMENSION TYPE CDATA #REQUIRED>
|
|
<!ATTLIST LABEL1 VAL CDATA #REQUIRED>
|
|
<!ATTLIST LABEL2 VAL CDATA #REQUIRED>
|
|
<!ATTLIST LABEL2 COUNT CDATA #REQUIRED>
|
|
|
|
]>
|
|
\end{MyVerbatim}
|
|
|
|
\subsection{XML File Naming Conventions}
|
|
|
|
{\tt dsc\/} relies on certain file naming conventions for XML files.
|
|
The file name should be of the format:
|
|
|
|
\begin{quote}
|
|
{\em timestamp\/}.dscdata.xml
|
|
\end{quote}
|
|
|
|
\noindent
|
|
For example:
|
|
|
|
\begin{quote}
|
|
1154649660.dscdata.xml
|
|
\end{quote}
|
|
|
|
NOTE: Versions of DSC prior to 2008-01-30 used a different naming
|
|
convention. Instead of ``dscdata'' the XML file was named after
|
|
the dataset that generated the data. The current XML extraction
|
|
code still supports the older naming convention for backward compatibility.
|
|
If the second component of the XML file name is not ``dscdata'' then
|
|
the extractor assume it is a dataset name.
|
|
|
|
\noindent
|
|
Dataset names come from {\em dsc.conf\/}, and should match the NAME
|
|
attribute of the ARRAY tag inside the XML file. The timestamp is in
|
|
Unix epoch seconds and is usually the same as the {\em stop\_time\/}
|
|
value.
|
|
|
|
|
|
\section{JSON Structure}
|
|
|
|
The JSON structure mimics the XML structure so that elements are the same.
|
|
|
|
\begin{MyVerbatim}
|
|
{
|
|
"name": "dataset-name",
|
|
"start_time": unix-seconds,
|
|
"stop_time": unix-seconds,
|
|
"dimensions": [ "Label1", "Label2" ],
|
|
"data": [
|
|
{
|
|
"Label1": "D1-V1",
|
|
"Label2": [
|
|
{ "val": "D2-V1", "count": N1 },
|
|
{ "val": "D2-V2", "count": N2 },
|
|
{ "val": "D2-V3", "count": N3 }
|
|
]
|
|
},
|
|
{
|
|
"Label1": "D1-V1-base64",
|
|
"base64": true,
|
|
"Label2": [
|
|
{ "val": "D2-V1", "count": N1 },
|
|
{ "val": "D2-V2-base64", "base64": true, "count": N2 },
|
|
{ "val": "D2-V3", "count": N3 }
|
|
]
|
|
}
|
|
]
|
|
}
|
|
\end{MyVerbatim}
|
|
|
|
|
|
\section{Archived Data Format}
|
|
|
|
{\dsc} actually uses four different file formats for archived
|
|
datasets. These are all text-based and designed to be quickly
|
|
read from, and written to, by Perl scripts.
|
|
|
|
\subsection{Format 1}
|
|
|
|
\noindent
|
|
\begin{tt}time $k1$ $N_{k1}$ $k2$ $N_{k2}$ $k3$ $N_{k3}$ ...
|
|
\end{tt}
|
|
|
|
\vspace{1ex}\noindent
|
|
This is a one-dimensional time-series format.\footnote{Which means
|
|
it can only be used for datasets where one of the indexers is set
|
|
to the Null indexer.} The first column is a timestamp (unix seconds).
|
|
The remaining space-separated fields are key-value pairs. For
|
|
example:
|
|
|
|
\begin{MyVerbatim}
|
|
1093219980 root-servers.net 122 rfc1918-ptr 112 a-for-a 926 funny-qclass 16
|
|
1093220040 root-servers.net 121 rfc1918-ptr 104 a-for-a 905 funny-qclass 15
|
|
1093220100 root-servers.net 137 rfc1918-ptr 116 a-for-a 871 funny-qclass 12
|
|
\end{MyVerbatim}
|
|
|
|
\subsection{Format 2}
|
|
|
|
\noindent
|
|
\begin{tt}time $j1$ $k1$:$N_{j1,k1}$:$k2$:$N_{j1,k2}$:... $j2$ $k1$:$N_{j2,k1}$:$k2$:$N_{j2,k2}$:... ...
|
|
\end{tt}
|
|
|
|
\vspace{1ex}\noindent
|
|
This is a two-dimensional time-series format. In the above,
|
|
$j$ represents the first dimension indexer and $k$ represents
|
|
the second. Key-value pairs for the second dimension are
|
|
separated by colons, rather than space. For example:
|
|
|
|
\begin{MyVerbatim}
|
|
1093220160 recv icmp:2397:udp:136712:tcp:428 sent icmp:819:udp:119191:tcp:323
|
|
1093220220 recv icmp:2229:udp:124708:tcp:495 sent icmp:716:udp:107652:tcp:350
|
|
1093220280 recv udp:138212:icmp:2342:tcp:499 sent udp:120788:icmp:819:tcp:364
|
|
1093220340 recv icmp:2285:udp:137107:tcp:468 sent icmp:733:udp:118522:tcp:341
|
|
\end{MyVerbatim}
|
|
|
|
\subsection{Format 3}
|
|
|
|
\noindent
|
|
\begin{tt}$k$ $N_{k}$
|
|
\end{tt}
|
|
|
|
\vspace{1ex}\noindent
|
|
This format is used for one-dimensional datasets where the key space
|
|
is (potentially) very large. That is, putting all the key-value pairs
|
|
on a single line would result in a very long line in the datafile.
|
|
Furthermore, for these larger datasets, it is prohibitive to
|
|
store the data as a time series. Instead the counters are incremented
|
|
over time. For example:
|
|
|
|
\begin{MyVerbatim}
|
|
10.0.160.0 3024
|
|
10.0.20.0 92
|
|
10.0.244.0 5934
|
|
\end{MyVerbatim}
|
|
|
|
\subsection{Format 4}
|
|
|
|
\noindent
|
|
\begin{tt}$j$ $k$ $N_{j,k}$
|
|
\end{tt}
|
|
|
|
\vspace{1ex}\noindent
|
|
This format is used for two-dimensional datasets where one or both
|
|
key spaces are very large. Again, counters are incremented over
|
|
time, rather than storing the data as a time series.
|
|
For example:
|
|
|
|
\begin{MyVerbatim}
|
|
10.0.0.0 non-auth-tld 105
|
|
10.0.0.0 ok 37383
|
|
10.0.0.0 rfc1918-ptr 5941
|
|
10.0.0.0 root-servers.net 1872
|
|
10.0.1.0 a-for-a 6
|
|
10.0.1.0 non-auth-tld 363
|
|
10.0.1.0 ok 144
|
|
\end{MyVerbatim}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\chapter{Bugs}
|
|
|
|
\begin{itemize}
|
|
|
|
\item
|
|
Seems too confusing to have an opaque name for indexers in
|
|
dsc.conf dataset line. The names are pre-determined anyway
|
|
since they must match what the XML extractors look for.
|
|
\item
|
|
Also stupid to have indexer names and a separate ``Label'' for
|
|
the XML file.
|
|
|
|
\item
|
|
{\dsc} perl modules are installed in the ``site\_perl'' directory
|
|
but they should probably be installed under /usr/local/dsc.
|
|
|
|
\item
|
|
{\dsc} collector silently drops UDP frags
|
|
|
|
\end{itemize}
|
|
|
|
\end{document}
|