<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[NZRS Blog]]></title><description><![CDATA[Research, thoughts, stories and ideas from the team at NZRS Ltd.  


We are a provider of critical Internet infrastructure and authoritative data services.






]]></description><link>http://blog.nzrs.net.nz/</link><generator>Ghost 0.11</generator><lastBuildDate>Sun, 21 Jan 2018 11:44:19 GMT</lastBuildDate><atom:link href="http://blog.nzrs.net.nz/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Registrar Size Prediction]]></title><description><![CDATA[<p>An interesting request we received after the <a href="https://vimeopro.com/nzrs/2017-nz-registrar-conference/video/217935980">register size prediction</a> presentation at the Registrar Conference earlier last year, was a suggestion of applying similar methodology on the data per registrar. In this blog post, we will share the methodology and insights found during the registrar size prediction modelling.</p>

<h2 id="descriptiveanalysis">Descriptive analysis</h2>]]></description><link>http://blog.nzrs.net.nz/registrar-size-prediction/</link><guid isPermaLink="false">5c8b0cc9-d479-44ef-85f6-fc68b9449f17</guid><dc:creator><![CDATA[Huayi Jing]]></dc:creator><pubDate>Mon, 15 Jan 2018 03:16:09 GMT</pubDate><content:encoded><![CDATA[<p>An interesting request we received after the <a href="https://vimeopro.com/nzrs/2017-nz-registrar-conference/video/217935980">register size prediction</a> presentation at the Registrar Conference earlier last year, was a suggestion of applying similar methodology on the data per registrar. In this blog post, we will share the methodology and insights found during the registrar size prediction modelling.</p>

<h2 id="descriptiveanalysis">Descriptive analysis</h2>

<p>As of 1st August 2017, there are 89 active registrars in the .nz register. The prediction procedure is not feasible for all the registrars. Two features help us to determine whether a registrar’s data is statistically ready for prediction: (1) the age, i.e. how long a registrar’s data has existed in the register. The more data points we have for prediction modelling, the more accurate the prediction will be. (2) the size. The larger a registrar’s size is, the clearer the trend we may find underlying the historical change in its size. </p>

<div>  
    <a href="https://plot.ly/~linking/17/?share_key=wGGmec5443CUOM7xbAwHKI" target="_blank" title="Plot 17" style="display: block; text-align: center;"><img src="https://plot.ly/~linking/17.png?share_key=wGGmec5443CUOM7xbAwHKI" alt="Plot 17" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="linking:17" sharekey-plotly="wGGmec5443CUOM7xbAwHKI" src="https://plot.ly/embed.js" async></script>
</div>

<p>As shown in the above figure, the distribution of a registrar’s age and size exhibits interesting points. Among these registrars, 40 of them have under 1000 active domains. The smallest registrar, although it has been recorded since June 2010, has only 7 domains. The age distribution shows the opposite. 76 registrars are older than 5 years old, among which 56 registrars are older than 10 years old (Note that there have been transfers of domains between registrars now and then for different reasons, so some of the information here might not be precise). The registrars that are at least 60 months old and have at least 5000 domains are selected for prediction. That leaves us 20 registrars.</p>

<h2 id="theprediction">The prediction</h2>

<p>The prediction procedure follows the one described in the previous <a href="http://blog.nzrs.net.nz/register-size-prediction/">blog</a>. Two assumptions are made so that the procedure can be applied: (1) although a domain might be transferred to another registrar, it is treated as a new create into that registrar. (2) different SLDs are assumed to behave similarly, so that we have enough data points for prediction.</p>

<p>Let’s first have a look at the retention behaviour. Take Registrar A as an example, the following two figures show the retention behaviour of <em>domains registered at different periods</em> (i.e. cohorts, the retention rate is estimated using multi-cohorts data). It can be seen that the drop out rate is high in the early years and then slows down as the domains stay longer. From the heat map we can observe, on average, the retention rate of relatively recent cohort is higher, which is a good thing to know.</p>

<div>  
    <a href="https://plot.ly/~linking/18/?share_key=G2k4OKpGamsmyLvlHtgGea" target="_blank" title="Plot 18" style="display: block; text-align: center;"><img src="https://plot.ly/~linking/18.png?share_key=G2k4OKpGamsmyLvlHtgGea" alt="Plot 18" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="linking:18" sharekey-plotly="G2k4OKpGamsmyLvlHtgGea" src="https://plot.ly/embed.js" async></script>
</div>

<p><img src="http://blog.nzrs.net.nz/content/images/2018/01/reg128cohorts.png" alt="Drawing" style="width: 600px;"></p>

<p>The new creates forecast reveals some interesting findings as well. See the two registrars showing below, the historical new creates data shows a clear downward trend. This might indicate a change in the focus of the business. The forecast therefore could be negative for some periods and will be replaced by zero.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/12/registrardecline-12.png" alt="Drawing" style="width: 600px;"></p>

<p>Some registrars have extremely stationary and low-quantity new creates over time, see the figure below. For such registrars, there are barely trend or cyclic fluctuations underlying the data points. </p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/12/registrarstable.png" alt="Drawing" style="width: 600px;"></p>

<p>On the opposite, some registrars have new creates data that fluctuates greatly and shows no clear trend or seasonality. Accurate forecasts for such cases are hard. A further look at the reasons behind those fluctuations will be helpful for more reasonable forecast.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2018/01/fluctuation.png" alt="Drawing" style="width: 600px;"></p>

<p>To test the performance of the prediction, historical data up to May 2017 is used to make prediction for June, July and August 2017. The table below shows the MAPE (mean absolute percent error) for each registrar in descending order by size. As mentioned before, smaller registrar size makes it harder for prediction. Hence it is not surprising to see some of the bottom 10 registrars have a MAPE greater than 10%. In general, our procedure generates comparatively accurate prediction for relatively big registrars.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2018/01/MAPE-1.png" alt="Drawing" style="width: 350px;"></p>

<p>Finally, let’s see the prediction results for the top 20 registrars. The total size of this group is increasing over time. Larger registrars also show an increasing trend. Some registrars’ size decrease slightly each month. This is due to the forecasted low new creates and/or comparatively larger number of drop outs in certain months.</p>

<div>  
    <a href="https://plot.ly/~linking/29/?share_key=6E2NYJbK7iMFM9pyMDUwGA" target="_blank" title="r1-r10" style="display: block; text-align: center;"><img src="https://plot.ly/~linking/29.png?share_key=6E2NYJbK7iMFM9pyMDUwGA" alt="r1-r10" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="linking:29" sharekey-plotly="6E2NYJbK7iMFM9pyMDUwGA" src="https://plot.ly/embed.js" async></script>
</div>

<div>  
    <a href="https://plot.ly/~linking/22/?share_key=ufWNoYuN38TwOrPHFU6Xz3" target="_blank" title="r11-r20_2018" style="display: block; text-align: center;"><img src="https://plot.ly/~linking/22.png?share_key=ufWNoYuN38TwOrPHFU6Xz3" alt="r11-r20_2018" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="linking:22" sharekey-plotly="ufWNoYuN38TwOrPHFU6Xz3" src="https://plot.ly/embed.js" async></script>
</div>

<p>Registrar size prediction is more challenging compared with register size prediction due to the data quality after segmentation. Nonetheless, some interesting findings surface in between. Since bulk transfer of domains between registrars happen for various reasons (e.g., movement of re-sellers or large portfolio holders between registrars), a further investigation on those cases will help improve the quality of data and prediction. For data that is reasonably stationary, using a naive or moving average forecasting technique might be a better choice. These could be directions for follow on work. </p>]]></content:encoded></item><item><title><![CDATA[Scanning .nz for HTTPS support]]></title><description><![CDATA[<p>As part of our efforts to understand the .nz namespace better, we started at the beginning of 2017 to check domains for the presence of a secure website using HTTPS, and collect information about certificates, protocol features and other valuable information in the process.</p>

<p>The collection process is straightforward:</p>

<ul>
<li>Extract</li></ul>]]></description><link>http://blog.nzrs.net.nz/scanning-nz-for-https-support/</link><guid isPermaLink="false">a0895123-7d3a-4b7f-b8d8-567f61aff520</guid><dc:creator><![CDATA[Sebastian Castro]]></dc:creator><pubDate>Thu, 19 Oct 2017 22:32:25 GMT</pubDate><content:encoded><![CDATA[<p>As part of our efforts to understand the .nz namespace better, we started at the beginning of 2017 to check domains for the presence of a secure website using HTTPS, and collect information about certificates, protocol features and other valuable information in the process.</p>

<p>The collection process is straightforward:</p>

<ul>
<li>Extract the list of active .nz domains in the register</li>
<li>For each domain, verify if there is an A record for the host <em>www</em>.<strong>domain</strong>. If the resolution process fails, the domain won't be included.</li>
<li>Test each domain using <a href="https://github.com/nabla-c0d3/sslyze">sslyze</a>. We have a script that will test for different versions of SSL and TLS, protocol features and will collect information about the certificate chain on the site.</li>
<li>Once the collection is completed, produce aggregated counters and make them available on <a href="https://idp.nz/Domain-Names/-nz-SSL-scan-results/cmxt-74aq">IDP</a>, our Internet Data Portal.</li>
</ul>

<p>We started in January 2017 and so far the process has completed five times, giving us some valuable datapoints.</p>

<h3 id="httpssupport">HTTPS support</h3>

<p>Let's start with the overall picture of how much of the .nz namespace has a secure website.</p>

<div id="https_support_vis"></div>

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>  
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>  
<script src="https://cdnjs.cloudflare.com/ajax/libs/moment.js/2.19.1/moment.min.js"></script>

<script>  
    var dataSource = "https://idp.nz/resource/8khv-hfif.json" + 
"?Classification='Total'";

    $.getJSON(dataSource, function(data, textstatus) {
          var points = {};
          $.each(data, function(i, entry) {
               var x_point = moment(entry.date.substr(0, 10)).format('MMM YYYY');
               var y_point = +(100*(entry.count/entry.domains)).toFixed(2);
               if (entry.metric in points) {
                    points[entry.metric]['x'].push(x_point);
                    points[entry.metric]['y'].push(y_point);
               }
               else {
                   points[entry.metric] = { x: [x_point], y: [y_point] };
               }
            });

            // Convert points into something suitable for plotly
            var data = [];
            var prefColor = {
                    'Broken DNS': 'rgb(215,25,28)',
                    'No HTTPS Support': 'rgb(253,174,97)',
                    'HTTPS Support': 'rgb(171,217,233)',
                    'Invalid Certificate': 'rgb(44,123,182)'
            };

            for (var m in points) {
                data.push({
                    type: 'bar',
                    x: points[m].x,
                    y: points[m].y,
                    marker: {
                        color: prefColor[m]
                    },
                    name: m });
            }
            var https_layout = {
                autosize: true,
                title: '.nz HTTPS Support',
                yaxis: { title: '% of domains' },
                xaxis: { zeroline: true,
                        showline: false,
                        type: 'category'
                },
                barmode: 'stack',
                margin: { l: 50, r: 50, b: 20, t: 40 }
            };

            Plotly.newPlot('https_support_vis', data, https_layout);
        });
</script>

<p>On our first collection, we didn't register how many domains failed the test due to incorrect DNS response, it was added afterwards. Despite that detail, there is a lot to notice here. Around 14% of the domains fail the DNS test, and around 0.7% have invalid certificates, for example, where the name in the certificate doesn't match the website name. The positive news is HTTPS support grew from 44% to 47% during this year.</p>

<h3 id="protocolsupport">Protocol Support</h3>

<p>HTTPS relies on cryptographic protocols to ensure privacy and authenticity, such as SSL and TLS. A given webserver can support multiple crypto protocols at the same time. SSL v2.0 was released in February 1995, SSL v3.0 released in 1996, TLS v1.0 was defined in January 1999 as a replacement for SSL v3.0, TLS v1.1 was published in April 2006, TLS v1.2 was published in August 2008 and TLS v1.3 is currently being drafted in the IETF and we don't test for it.</p>

<div id="proto_support_vis"></div>

<p>As SSL v2.0 was deprecated and prohibited in 2011 in <a href="https://tools.ietf.org/html/rfc6176">RFC 6176</a>, and SSL v3.0 was deprecated in June 2015 by <a href="https://tools.ietf.org/html/rfc7568">RFC 7568</a>, <strong>there SHOULD NOT be any domains supporting it.</strong> Sites with those crypto protocols activated are a risk given the number of known exploitable vulnerabilities against SSL. If you are an administrator of a secure website, you can use the excellent <a href="https://www.ssllabs.com/ssltest/">Qualys SSL Site tester</a> to verify for weaknesses.</p>

<h3 id="certificatepublickeys">Certificate Public Keys</h3>

<p>Crypto protocols use public key cryptography to provide privacy and authenticity. To achieve this, SSL and TLS rely on certificates issued by Certificate Authorities (CA), that authenticate the identity of the website you are visiting, and contain a cryptographic key to encrypt traffic.</p>

<p>Cryptographic keys have two main properties: the algorithm used to generated them, where we can have RSA and ECC, and the key size measured in bits.  </p>

<p>As part of the testing, we collect what kind of cryptographic keys the sites with HTTPS enabled are using.</p>

<div id="cert_key_vis"></div>

<p>You can see 3 different key sizes for RSA keys: 1024, 2048 and 4096. A 1024-bit RSA key is considered weak and should not be used. The fact keys of that size were visible at the beginning of this year but now have disappeared shows a healthy attitude towards security maintenance. The transition from 2048-bit keys to 4096-keys observed since August 2017 also is a great indication of the hygiene of the .nz namespace.</p>

<p>What about the <strong>256 id-ecPublicKey</strong> cases? Those are sites using the new Eliptic Curve DSA cryptography, which uses smaller key sizes. There are also a few cases using 384-bit ECDSA keys.</p>

<p>To make the visualisation easy to read, we omit values with less than 1% of domains, but we can find some oddities such as sites with unusual key sizes. For example, there are still a few cases with weak 512-bit RSA keys, or 3000-bit keys, or 1536-bit keys (1536 is half-way between 1024 and 2048).</p>

<h3 id="certificatesignaturealgorithms">Certificate Signature Algorithms</h3>

<p>Each SSL certificate contains a digital signature, a hash of the certificate content, signed with the issuing Certificate Authority private key. This digital signature allows the verification of the integrity of the certificate and enables browsers to verify the validity of the certificate.</p>

<p>There are a few hashing algorithms used for signatures, such as MD5, SHA-1 family and SHA-256 family, including SHA-256, SHA-384 and SHA-512.</p>

<div id="sign_vis"></div>

<p>Great news right? Most of the sites use the strong SHA-256 hashes, with RSA or ECDSA. As SHA-1 is subject to collision attacks, it's feasible to generate fraudulent certificates and trick users into using the wrong website. Firefox <a href="https://blog.mozilla.org/security/2014/09/23/phasing-out-certificates-with-sha-1-based-signature-algorithms/">now</a> warns users visiting secure websites using certificates signed with SHA-1, and NIST <a href="https://csrc.nist.gov/publications/detail/sp/800-57-part-1/rev-3/archive/2012-07-10">has recommended</a> moving away from SHA-1 since 2014. So despite the fraction of sites with weak hashes moving from 7.1% to 4.5%, we still have reasons to be concerned.</p>

<p>As with the previous plot, for clarity we didn't include signature algorithms with less than 1% of the domains. There are a few domains using MD5 as hashing algorithms in their certificates, considering MD5 was deprecated back in 2013, that's certainly not good news.</p>

<h3 id="certificateauthorities">Certificate Authorities</h3>

<p>As we mentioned before, certificates are issued by Certificate Authorities, and browsers are configured to trust a set of CAs. An administrator can relatively easily set up their own CA and create certificates, but those won't be verifiable by a browser.</p>

<p>The plot below shows the top 10 issuing Certificate Authorities observed in the .nz namespace. For simplicity, we group the long tail of CAs including self-issued certificates into the 'Other' category.</p>

<div id="ca_vis"></div>

<p>The figure shows the biggest players in the certificate industry as observed in New Zealand: UserTrust Network, GoDaddy, GeoTrust, Comodo and Let's Encrypt. The interesting bit is the evolution across time: Let's Encrypt grew from 16.29% to 22.84% of the domains in 9 months, and that growth came at the cost of more traditional CAs such as GoDaddy, GeoTrust and Comodo. The other CA with a massive growth is UserTrust Network, that nearly tripled their market share since January 2017.</p>

<p>Let's Encrypt provides an interesting story: started back in December 2015 to issue certificates for free, with a validity of 90 days and a fully automated process. Before that, all CAs were for profit organisations, charging in some cases considerable fees to issue a certificate. <a href="https://letsencrypt.org/">Let's Encrypt</a> is supported by the Internet Security Research Group (ISRG), which includes Facebook, the Mozilla Foundation, Google Chrome, EFF and many others.</p>

<h3 id="wrapup">Wrap up</h3>

<p>We started this collection with the motivation of finding out more about trust and security in the .nz namespace and we've learned a lot in the process. The collection will carry on every other month and a future blog post could cover more details about the protocols, such as negotiation features, certificate chain length and intermediate CAs.</p>

<script>  
// Data about Protocol Support
var dataSource2 = "https://idp.nz/resource/8khv-hfif.json" + "?$where=Classification='Protocol Support'&$order=date,metric";  
var dataSource3 = "https://idp.nz/resource/8khv-hfif.json" + "?$where=Classification='Certificate Public Key'&$order=date,metric";  
var dataSource4 = "https://idp.nz/resource/8khv-hfif.json" + "?$where=Classification='Certificate Signature Algorithm'&$order=date,metric";  
var dataSource5 = "https://idp.nz/resource/8khv-hfif.json" + "?$where=Classification='CA issuer'&$order=date,metric";  
var totalSource = "https://idp.nz/resource/8khv-hfif.json" + "?Classification='Total'&Metric='HTTPS Support'";

// Get the totals first
$.getJSON(totalSource, function(total, jsonstatus) {
    var totals = {};
    $.each(total, function(i, entry) {
        var x = moment(entry.date.substr(0, 10)).format('MMM YYYY');
        totals[x] = entry.count;
    });

    $.getJSON(dataSource2, function(data, textstatus) {
        var points = {};

        $.each(data, function(i, entry) {
            var x_point = moment(entry.date.substr(0, 10)).format('MMM YYYY');
            var y_point = +(100*(entry.count/totals[x_point])).toFixed(2);
            if (x_point in points) {
                points[x_point]['x'].push(entry.metric);
                points[x_point]['y'].push(y_point);
            }
            else {
                points[x_point] = { x: [entry.metric], y: [y_point] };
            }
        });

        // Convert points into something suitable for plotly
        var data = [];
        for (var m in points) {
            data.push({ type: 'bar', x: points[m].x, y: points[m].y, name: m });
        }

        var proto_layout = {
            autosize: true,
            title: 'HTTPS Protocol Support',
            yaxis: { title: '% of domains' },
            xaxis: { zeroline: true,
                    showline: false,
                    type: 'category',
                    tickvals: ['SSL v2', 'SSL v3', 'TLS v1.0', 'TLS v1.1', 'TLS v1.2'],
                    ticktext: ['SSL 2', 'SSL 3', 'TLS 1.0', 'TLS 1.1', 'TLS 1.2']
            },
            margin: { l: 50, r: 50, b: 20, t: 40 }
        };

        Plotly.newPlot('proto_support_vis', data, proto_layout);
    });

    // Next plot
    $.getJSON(dataSource3, function(data, textstatus) {
        var points = {};

        $.each(data, function(i, entry) {
            var x_point = moment(entry.date.substr(0, 10)).format('MMM YYYY');
            var y_point = +(100*(entry.count/totals[x_point])).toFixed(2);
            if (y_point > 1.0) {
                if (x_point in points) {
                    points[x_point]['x'].push(entry.metric);
                    points[x_point]['y'].push(y_point);
                }
                else {
                    points[x_point] = {
                        x: [entry.metric],
                        y: [y_point] };
                }
            }
        });

        // Convert points into something suitable for plotly
        var data = [];
        for (var m in points) {
            data.push({ type: 'bar',
                x: points[m].y,
                y: points[m].x,
                orientation: 'h',
                name: m });
        }

        var key_layout = {
            autosize: true,
            title: 'Certificate Public Key Distribution',
            yaxis: { type: 'category' },
            xaxis: { zeroline: true,
                    showline: false,
                    title: '% of domains'
            },
            margin: { l: 150, r: 50, b: 30, t: 30 }
        };

        Plotly.newPlot('cert_key_vis', data, key_layout);
    });

    // Next Plot
    $.getJSON(dataSource4, function(data, textstatus) {
        var points = {};

        $.each(data, function(i, entry) {
            var x_point = moment(entry.date.substr(0, 10)).format('MMM YYYY');
            var y_point = +(100*(entry.count/totals[x_point])).toFixed(2);
            if (y_point > 1.0) {
                if (x_point in points) {
                    points[x_point]['x'].push(entry.metric);
                    points[x_point]['y'].push(y_point);
                }
                else {
                    points[x_point] = {
                        x: [entry.metric],
                        y: [y_point] };
                }
            }
        });

        // Convert points into something suitable for plotly
        var data = [];
        for (var m in points) {
            data.push({ type: 'bar',
                x: points[m].y,
                y: points[m].x,
                orientation: 'h',
                name: m });
        }

        var sign_layout = {
            autosize: true,
            title: 'Signature Algorithm Distribution',
            yaxis: { type: 'category' },
            xaxis: { zeroline: true,
                    showline: false,
                    title: '% of domains'
            },
            margin: { l: 180, r: 50, b: 30, t: 30 }
        };

        Plotly.newPlot('sign_vis', data, sign_layout);
    });

    // Next Plot
    $.getJSON(dataSource5, function(data, textstatus) {
        var points = {};

        $.each(data, function(i, entry) {
            var x_point = moment(entry.date.substr(0, 10)).format('MMM YYYY');
            var y_point = +(100*(entry.count/totals[x_point])).toFixed(2);
            if (y_point > 1.0) {
                if (x_point in points) {
                    points[x_point]['x'].push(entry.metric);
                    points[x_point]['y'].push(y_point);
                }
                else {
                    points[x_point] = {
                        x: [entry.metric],
                        y: [y_point] };
                }
            }
        });

        // Convert points into something suitable for plotly
        var data = [];
        for (var m in points) {
            data.push({ type: 'bar',
                x: points[m].y,
                y: points[m].x,
                orientation: 'h',
                name: m });
        }

        var ca_layout = {
            autosize: true,
            title: 'CA Issuer Distribution',
            yaxis: { type: 'category' },
            xaxis: { zeroline: true,
                    showline: false,
                    title: '% of domains'
            },
            margin: { l: 180, r: 50, b: 30, t: 30 }
        };

        Plotly.newPlot('ca_vis', data, ca_layout);
    });

});
</script>]]></content:encoded></item><item><title><![CDATA[.nz DNS traffic: Trend and Anomalies]]></title><description><![CDATA[<p>As we run the .nz ccTLD (Country Code Top Level Domain) authoritative nameservers, we receive lots of DNS queries and answer each query with a DNS response. We capture these queries and responses, and store them in a Hadoop cluster for further analysis. Based on this data, we generate daily</p>]]></description><link>http://blog.nzrs.net.nz/nz-dns-traffic-trend-and-anomalies/</link><guid isPermaLink="false">6263b388-624a-426f-aebe-c77db73242a5</guid><dc:creator><![CDATA[Jing Qiao]]></dc:creator><pubDate>Wed, 14 Jun 2017 04:56:52 GMT</pubDate><media:content url="http://blog.nzrs.net.nz/content/images/2017/06/magento-statistics-1.png" medium="image"/><content:encoded><![CDATA[<img src="http://blog.nzrs.net.nz/content/images/2017/06/magento-statistics-1.png" alt=".nz DNS traffic: Trend and Anomalies"><p>As we run the .nz ccTLD (Country Code Top Level Domain) authoritative nameservers, we receive lots of DNS queries and answer each query with a DNS response. We capture these queries and responses, and store them in a Hadoop cluster for further analysis. Based on this data, we generate daily statistics about our DNS traffic and publish it in IDP (Internet Data Portal) as <a href="https://idp.nz/Domain-Names/-nz-DNS-Statistics/5a6u-t52b">.nz DNS Statistics</a>.</p>

<p>We have a clean, continuous dataset dating back to 2015. We are now able to apply some time series analysis to explore trends. This post will show some interesting results from that analysis.</p>

<h2 id="querytrend">Query trend</h2>

<p>A DNS query contains several attributes such as the domain being queried, the query type (the type of resource related to the domain) and the set of DNS header flags. Based on the aggregated counts of each attribute, we can explore the data across a variety of dimensions.</p>

<h3 id="registereddomainsqueried">Registered Domains Queried</h3>

<p><strong>IDP Dataset:</strong> <a href="https://idp.nz/Domain-Names/Unique-registered-domains-queried/u5i8-kxxg/data">Unique registered domains queried</a></p>

<p>We see lots of domains being queried in our traffic. Not all of them are registered domains, many do not exist in our register. Selecting by the response code in the DNS response message, we can extract the registered domains that were queried. As 'the number of unique registered domains queried' depends on the register size, we normalize it by dividing by the register size of each day, which can be obtained from <a href="https://idp.nz/Domain-Names/-nz-Activity/mm2r-3dj9">.nz registration statistics</a>. </p>

<p>Then we applied the Facebook forecasting library <a href="https://facebookincubator.github.io/prophet/">Prophet</a> to our data. Using the logistic growth trend model with carrying capacity of '1', we obtained the following result.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/06/reg_dn_trend.png" alt=".nz DNS traffic: Trend and Anomalies">
From the plot, we can see the activity of the .nz namespace fits an upward trend over the past two years and is predicted to keep growing in the next year. </p>

<p>Prophet is based on an additive model where non-linear trends are fit with yearly and weekly seasonality. From the components plot below, we can see the trend, weekly variation and yearly seasonality.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/06/componetplot.png" alt=".nz DNS traffic: Trend and Anomalies"></p>

<p>The weekly and yearly seasonality are quite interesting. As our data is in UTC time, shifted 12 hours compared to NZ time, the weekly activity actually ramps up around Thursday and then stays high until Saturday. We presume the increased activity is partly due to the business queries on Thursday/Friday, and partly due to the weekend leisure queries on Friday/Saturday. </p>

<p>In the yearly subplot, we see a peak in March which could relate to the financial planning for the year (The financial year commonly finishes in March in NZ). And a decrease in the number of registered domains queried is across July, August and September, which could correlate with lower business activity during the winter months. Finally, the low point at Christmas time could be explained by holiday effect.</p>

<h3 id="querytypes">Query Types</h3>

<p><strong>IDP Dataset:</strong> <a href="https://idp.nz/Domain-Names/Query-Types/sgtp-vrup/data">Query types</a></p>

<p>Query type (the type of resource being asked for a domain) indicates the usage for the domain. Please refer to <a href="http://blog.nzrs.net.nz/characterization-of-popular-resolvers-from-our-point-of-view-2/">DNS in a Glimpse</a> for the definition of query types. We explore the query volume for each major query type to see how the usage of .nz domains evolves through the years. DS and DNSKEY are two major query types related to DNSSEC, so we show them in a separate plot as an indicator of the DNSSEC deployment progress.</p>

<p>We use <a href="https://plot.ly/">Plotly</a> for interactive plotting. <br>
 <div>
    <a href="https://plot.ly/~QiaoJing/7/?share_key=uFMlM3NVULZIB3KZsNfkfK" target="_blank" title="Plot 7" style="display: block; text-align: center;"><img src="https://plot.ly/~QiaoJing/7.png?share_key=uFMlM3NVULZIB3KZsNfkfK" alt=".nz DNS traffic: Trend and Anomalies" style="max-width: 100%;width: 687px;" width="687" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="QiaoJing:7" sharekey-plotly="uFMlM3NVULZIB3KZsNfkfK" src="https://plot.ly/embed.js" async></script>
</div></p>

<p>We can see the type of A and AAAA remain the top two query types asked for the .nz domains. Specifically, </p>

<ul>
<li><strong>Queries for A</strong> (mapping the IPv4 address for a domain) shows a steady growth with occasional spikes. </li>
<li><strong>Queries for AAAA</strong> (mapping the IPv6 address for a domain) experienced a strong growth across 2015 and had a steep drop in July 2016 and then caught up gradually. The drop in July 2016 is probably related to fixing a lot of AAAA queries for two of .nz nameservers as explained in this <a href="https://nzrs.net.nz/sites/default/files/The%20hunger%20for%20AAAA.pdf">presentation</a>.</li>
<li><strong>Queries for NS</strong> (locating the name server for a domain) were very small and then jumped up in Feb 2016, and remained steady at the higher level. Extremely high volumes were seen in early 2017. These anomalies will be explored later in this blog.</li>
<li><strong>Queries for MX</strong> (locating the mail server for a domain) should reflect the activity of sending emails to addresses within .nz namespace, including spamming. These volumes are steady with strong seasonality in a weekly and monthly level.</li>
</ul>

<div>  
    <a href="https://plot.ly/~QiaoJing/20/?share_key=uFMlM3NVULZIB3KZsNfkfK" target="_blank" title="Plot 7 copy" style="display: block; text-align: center;"><img src="https://plot.ly/~QiaoJing/20.png?share_key=uFMlM3NVULZIB3KZsNfkfK" alt=".nz DNS traffic: Trend and Anomalies" style="max-width: 100%;width: 687px;" width="687" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="QiaoJing:20" sharekey-plotly="uFMlM3NVULZIB3KZsNfkfK" src="https://plot.ly/embed.js" async></script>
</div>

<ul>
<li><strong>Queries for DS</strong> (validating delegations by resolvers doing validation) shows a rising trend, which reflects the deployment progress of DNSSEC. </li>
<li><strong>Queries for DNSKEY</strong> (validating signed records) shows a slower rising trend. This type of queries normally should happen in the delegated zone. As the authoritative nameserver for mainly top/second level domains, we only see a small amount of DNSKEY queries.</li>
</ul>

<h3 id="rdbit">RD bit</h3>

<p><strong>IDP Dataset:</strong> <a href="https://idp.nz/Domain-Names/RD-bit/ek7h-ijay/data">RD bit</a></p>

<p>The DNS message header contains an RD (Recursion Desired) bit. Usually, it's set in the DNS query sent by the end user to the resolver. As the authoritative for the .nz namespace, most of the queries should not come with that bit set. That's why we don't expect to see lots of queries with RD bit set as shown in the plot below.</p>

<div>  
    <a href="https://plot.ly/~QiaoJing/19/?share_key=cY8OExR4RhYClU5UFRL3Ep" target="_blank" title="Plot 19" style="display: block; text-align: center;"><img src="https://plot.ly/~QiaoJing/19.png?share_key=cY8OExR4RhYClU5UFRL3Ep" alt=".nz DNS traffic: Trend and Anomalies" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="QiaoJing:19" sharekey-plotly="cY8OExR4RhYClU5UFRL3Ep" src="https://plot.ly/embed.js" async></script>
</div>

<p>We can see a big jump in Feb 2016 similar to the NS queries mentioned in the previous section. We will explore this anomaly later in this blog. </p>

<h2 id="networktrend">Network trend</h2>

<p>From our traffic, we can also see the source IP addresses and the network protocol they use to communicate with us such as UDP or TCP, IPv4 or IPv6. So we can explore the trend of the network protocols usage in our clients' infrastructure from our traffic. In this section, we draw the comparison plots in log scale, as the compared objects have a big difference in quantity.</p>

<h3 id="udpvstcp">UDP vs. TCP</h3>

<p><strong>IDP Dataset:</strong> <a href="https://idp.nz/Domain-Names/UDP-and-TCP/nqfr-qpez/data">UDP and TCP</a></p>

<p>The use of UDP and TCP in DNS is driven by message size and other factors as described in <a href="https://tools.ietf.org/html/rfc7766">RFC7766</a>:  </p>

<blockquote>
  <p>Most DNS [RFC1034] transactions take place over UDP [RFC768].  TCP
     [RFC793] is always used for full zone transfers (using AXFR) and is often used for messages whose sizes exceed the DNS protocol's
     original 512-byte limit.  The growing deployment of DNS Security
     (DNSSEC) and IPv6 has increased response sizes and therefore the use of TCP.  The need for increased TCP use has also been driven by the
     protection it provides against address spoofing and therefore
     exploitation of DNS in reflection/amplification attacks.  It is now
     widely used in Response Rate Limiting [RRL1] [RRL2].  Additionally,
     recent work on DNS privacy solutions such as [DNS-over-TLS] is
     another motivation to revisit DNS-over-TCP requirements.</p>
</blockquote>

<p>We compare the UDP and TCP trends in our traffic in two ways:</p>

<ul>
<li><strong>UDP vs. TCP query volume</strong></li>
<li><strong>The number of unique source addresses through UDP vs. TCP</strong></li>
</ul>

<div>  
    <a href="https://plot.ly/~QiaoJing/9/?share_key=RNqbaCMNzLDT3K62Hkh4Px" target="_blank" title="Plot 9" style="display: block; text-align: center;"><img src="https://plot.ly/~QiaoJing/9.png?share_key=RNqbaCMNzLDT3K62Hkh4Px" alt=".nz DNS traffic: Trend and Anomalies" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="QiaoJing:9" sharekey-plotly="RNqbaCMNzLDT3K62Hkh4Px" src="https://plot.ly/embed.js" async></script>
</div>

<div>  
    <a href="https://plot.ly/~QiaoJing/11/?share_key=sBTK311SlyFKypPaJAmRER" target="_blank" title="Plot 11" style="display: block; text-align: center;"><img src="https://plot.ly/~QiaoJing/11.png?share_key=sBTK311SlyFKypPaJAmRER" alt=".nz DNS traffic: Trend and Anomalies" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="QiaoJing:11" sharekey-plotly="sBTK311SlyFKypPaJAmRER" src="https://plot.ly/embed.js" async></script>
</div>

<p>From the two plots, we can see that both the query volume and unique source addresses through TCP increased significantly in the first half of 2016, and then stabilized. In contrast, the query volume over UDP showed slow growth through the years, but the number of unique source addresses through UDP decreased slightly. </p>

<p>In general, we found that the total number of unique source addresses has been decreasing since late 2015. As we have 3 name servers hosted by offshore providers that we don't capture the data for, we speculate that this reduction could be related to the traffic moving to other name servers offshore.</p>

<h3 id="ipv4vsipv6">IPv4 vs. IPv6</h3>

<p><strong>IDP Dataset:</strong> <a href="https://idp.nz/Domain-Names/IPv4-and-IPv6/eiys-uj9p/data">IPv4 and IPv6</a></p>

<p>We can do a similar comparison between IPv4 and IPv6 trend as below.</p>

<ul>
<li><strong>Query volume from IPv4 vs. IPv6 source addresses</strong></li>
<li><strong>The number of unique IPv4 vs. IPv6 source addresses</strong></li>
</ul>

<div>  
    <a href="https://plot.ly/~QiaoJing/13/?share_key=bEmOkax0AhmcfB86DMs2Xr" target="_blank" title="Plot 13" style="display: block; text-align: center;"><img src="https://plot.ly/~QiaoJing/13.png?share_key=bEmOkax0AhmcfB86DMs2Xr" alt=".nz DNS traffic: Trend and Anomalies" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="QiaoJing:13" sharekey-plotly="bEmOkax0AhmcfB86DMs2Xr" src="https://plot.ly/embed.js" async></script>
</div>

<div>  
    <a href="https://plot.ly/~QiaoJing/15/?share_key=apXMCikXIfWwX96ESP6PIq" target="_blank" title="Plot 15" style="display: block; text-align: center;"><img src="https://plot.ly/~QiaoJing/15.png?share_key=apXMCikXIfWwX96ESP6PIq" alt=".nz DNS traffic: Trend and Anomalies" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="QiaoJing:15" sharekey-plotly="apXMCikXIfWwX96ESP6PIq" src="https://plot.ly/embed.js" async></script>
</div>

<p>From 2016, the query volume from IPv6 addresses has grown as has the number of IPv6 source addresses. IPv4 query volume has grown more slowly, while the number of source addresses has decreased since 2016. The reason for this decrease may be similar to that mentioned in the analysis of UDP/TCP queries. </p>

<p>We have also investigated the weekly and yearly seasonality for each metric. As there are many different patterns, here we just show two typical examples related to IPv4 queries.</p>

<ul>
<li>Weekend off-peak is typically seen in some metrics of the query volumes and the number of unique source addresses. This reduction is probably due to lower business activity during the weekend.</li>
</ul>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/06/v4_week.png" alt=".nz DNS traffic: Trend and Anomalies"></p>

<ul>
<li>The annual seasonality in the query volume from IPv4 addresses shows low points during winter and Christmas holiday.</li>
</ul>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/06/v4_que_sea.png" alt=".nz DNS traffic: Trend and Anomalies"></p>

<h2 id="oneincreaseacrossmultiplemetrics">One increase across multiple metrics</h2>

<p>During the time series analysis, we found some simultaneous abrupt increases in different metrics shown below.  </p>

<div>  
    <a href="https://plot.ly/~QiaoJing/17/?share_key=oC1NhtI1NoeSB7hSkmG0FQ" target="_blank" title="Plot 17" style="display: block; text-align: center;"><img src="https://plot.ly/~QiaoJing/17.png?share_key=oC1NhtI1NoeSB7hSkmG0FQ" alt=".nz DNS traffic: Trend and Anomalies" style="max-width: 100%;width: 600px;" width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';"></a>
    <script data-plotly="QiaoJing:17" sharekey-plotly="oC1NhtI1NoeSB7hSkmG0FQ" src="https://plot.ly/embed.js" async></script>
</div>

<p>This appears very anomalous, so we did some analysis to our raw data trying to find out what's happening. We located a bunch of source IP addresses that generated these increases. From 2016-02-04, each of these IP addresses began to send about 63k NS queries for non-existent domains every day, and the RD bit was set in each query. It increased to 91k later and remained at the level. Due to these queries, we have a great number of unique non-existent domains that are queried each day and continue to be queried. </p>

<p>We have checked these IP addresses in our query log, and found that none of them showed up until Feb 2016. Their sudden appearance, with such specific behavior, has already attracted our attention. We can monitor these source addresses and do further research to find out the reason so as to suggest the best solution.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Using the daily .nz DNS statistics, we undertook some time series analysis to show trends and anomalies in our DNS traffic. Interesting patterns are shown. Some of them are quite easy to explain, while others require further research. By sharing the daily DNS statistics as open data, we hope anyone who's interested can make use of it to better understand the Internet in New Zealand.</p>]]></content:encoded></item><item><title><![CDATA[Register Size Prediction]]></title><description><![CDATA[<p>In my previous two posts, we've seen how to model <a href="http://blog.nzrs.net.nz/domain-retention-prediction/">domain retention prediction</a> and <a href="http://blog.nzrs.net.nz/time-series-analysis-of-nz-activity-data/">new creates forecasting</a>. Those are essential model components required for a register size prediction model. In this post, I'll illustrate how the prediction procedure is constructed and some key results.</p>

<h3 id="predictionprocedure">Prediction Procedure</h3>

<p>Like any population size</p>]]></description><link>http://blog.nzrs.net.nz/register-size-prediction/</link><guid isPermaLink="false">a8db8c61-7509-41c0-b89c-fe41b2848f02</guid><dc:creator><![CDATA[Huayi Jing]]></dc:creator><pubDate>Wed, 24 May 2017 23:27:22 GMT</pubDate><content:encoded><![CDATA[<p>In my previous two posts, we've seen how to model <a href="http://blog.nzrs.net.nz/domain-retention-prediction/">domain retention prediction</a> and <a href="http://blog.nzrs.net.nz/time-series-analysis-of-nz-activity-data/">new creates forecasting</a>. Those are essential model components required for a register size prediction model. In this post, I'll illustrate how the prediction procedure is constructed and some key results.</p>

<h3 id="predictionprocedure">Prediction Procedure</h3>

<p>Like any population size prediction problem, the key in register size prediction is to understand what the "births"(flows in) and "deaths"(flows out) are. The figure below conceptualizes register size changes for each month.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/05/prediction_procedure-1.png" alt="Drawing" style="width: 750px;"></p>

<p>Each month some of the existing domains stop being active, which may then leave the register. Meanwhile, some new domains are created on the register. Hence, the calculation of register size for month t+1 can be summarized as:</p>

<p><em>Register size (t+1)   = Register size (t) + New creates(t+1) - Dropping out (t+1)</em></p>

<p>Where the dropping outs are modelled by the <a href="http://blog.nzrs.net.nz/domain-retention-prediction/">domain retention prediction</a>, and the number of new creates are modelled by the <a href="http://blog.nzrs.net.nz/time-series-analysis-of-nz-activity-data/">new creates forecasting</a>. The shifted Beta Geometric model (as used in retention modelling) gives us year-to-year retention rates. The retention rates for the remaining 11 months of a year are assumed to be 100%. This assumption is based on our observation that most of the domains are registered or renewed for 1 year. Although this means our prediction procedure overestimates the register size, we will see in the results that the errors are reasonably small.</p>

<h3 id="inputsresults">Inputs &amp; Results</h3>

<h4 id="retentionprediction">Retention Prediction</h4>

<p>The input of this step is simple. For instance, to make predictions for Jan 2017 onwards, all we need is the domains that are active in Dec 2016 and their age (i.e., how long they have been active). A sample of the input is shown below:</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/05/input-sample-3.png" alt="Drawing" style="width: 450px;"></p>

<p>A finding from the results in the <a href="http://blog.nzrs.net.nz/domain-retention-prediction/">domain retention prediction</a> is that different SLDs have different retention behaviour. Hence retention prediction is done separately for each group - co.nz, org.nz, net.nz, and other SLDs. We fit the model with 12 year's historical data points of 12 cohorts (representing the domains created in each month of 2004); from which we calculate the 95% confidence interval retention rates shown below. As different SLDs are combined as one group, the variation in their retention behaviour is bigger, which can be seen from the larger spread in the 95% confidence interval.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/05/retention-rate-2.png" alt="Drawing" style="width: 750px;"></p>

<p>In order to test the accuracy of retention prediction, the predicted retention rates are applied to domains that were active on 1st Dec 2016. The prediction of how many domains stay active in the register starts from Jan to Apr 2017. The 95% confidence interval in comparison with the actual values is shown in the following figures. Although predictions are slightly overestimated, the errors on average are around 1% which proves that the predictions are reasonably accurate.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/05/co.nz.no.png" alt="Drawing" style="width: 750px;"> <br>
<img src="http://blog.nzrs.net.nz/content/images/2017/05/org.no.png" alt="Drawing" style="width: 750px;"> <br>
<img src="http://blog.nzrs.net.nz/content/images/2017/05/nz.no.png" alt="Drawing" style="width: 750px;"> <br>
<img src="http://blog.nzrs.net.nz/content/images/2017/05/net.no.png" alt="Drawing" style="width: 750px;"> <br>
<img src="http://blog.nzrs.net.nz/content/images/2017/05/other.no.png" alt="Drawing" style="width: 750px;"></p>

<h4 id="newcreates">New Creates</h4>

<p>In <a href="http://blog.nzrs.net.nz/time-series-analysis-of-nz-activity-data/">new creates forecasting</a>, I introduced how to do it using SARIMA model which needs parameter tuning. In Feb 2017, Facebook open sourced a Python/R package called <a href="https://research.fb.com/prophet-forecasting-at-scale/">Prophet </a>to automate the time forecasting process. It used the additive model which makes it computationally efficient. For those interested in finding more about Prophet, I recommend reading Facebook’s <a href="https://facebookincubator.github.io/prophet/static/prophet_paper_20170113.pdf">white paper</a>. It is used here to do new creates forecasting for different groups. </p>

<p>The following figure shows the new creates prediction for co.nz. The prediction captures the trend nicely. Looking at the test period starting from Jan 2016 to Apr 2017, we see most of the actual values (represented by red dots) are captured by the 95% confidence interval. A spike occurred in Apr 2017 which was caused by the end of reservation period for registration at the second level. Hence, it is relatively normal that the prediction didn’t capture that special event.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/05/co-forecast.png" alt="Drawing" style="width: 750px;"></p>

<h4 id="registersize">Register size</h4>

<p>Now we are finally ready to predict the total number of active domains in the register! This is done by combining the number of domains that stay from last month plus the number of new creates in between. The figure below shows the predicted register size from Jan to Apr 2017 (at the beginning of each month) compared with the actual value. The results are fairly satisfying. The 95% confidence interval successfully covers actual register size of Feb and Apr. The absolute errors are all less than 1%. The underestimation in Apr was caused by the end of reservation period for registration at the second level.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/05/total.no-1.png" alt="Drawing" style="width: 750px;"></p>

<p>Knowing that the procedure is working, let’s check out the predictions from May 2017 up to the end of this financial year (the figure below shows the register size prediction at the beginning of each month):</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/05/register.no.png" alt="Drawing" style="width: 850px;"></p>

<h4 id="finalthoughts">Final thoughts</h4>

<p>Register size prediction is my first project after joining NZRS and I've learned a lot from it, e.g. working with Python and understanding the register behaviour. The prediction procedure can be further improved by investigating the minority domains that are registered / renewed on a monthly basis, and/ or revisiting the prediction if special events occur in the future. For practitioners who intend to implement this procedure, it is important to check the availability of input data since the models require specific details about the domains in the register. </p>

<p>Finally, I'd like to quote Nils Bohr who said "Prediction is very difficult, especially if it's about the future". Although I've been working to make the prediction accurate, I will be pleased to see faster growth in the register size. So, let's work hard to prove me wrong!</p>]]></content:encoded></item><item><title><![CDATA[Resolver centricity experiment]]></title><description><![CDATA[<p>The Domain Name System is a distributed database with a hierarchical structure. A complete DNS resolution is a time-consuming process composed of iterations of network communications between the DNS resolver and the name server chain. Hence, caching has been introduced to speed up the process and reduce Internet traffic. As</p>]]></description><link>http://blog.nzrs.net.nz/resolvers-centricity-detection/</link><guid isPermaLink="false">d977a2d8-59ef-41a9-a3ac-efe0fd2a8db4</guid><dc:creator><![CDATA[Jing Qiao]]></dc:creator><pubDate>Wed, 11 Jan 2017 20:26:02 GMT</pubDate><content:encoded><![CDATA[<p>The Domain Name System is a distributed database with a hierarchical structure. A complete DNS resolution is a time-consuming process composed of iterations of network communications between the DNS resolver and the name server chain. Hence, caching has been introduced to speed up the process and reduce Internet traffic. As our main traffic source, the caching behaviour of resolvers directly affects the traffic we receive at the .nz DNS infrastructure.</p>

<p>Resolvers are implemented differently to prioritise the NS RRset for a domain depending on where it came from, the parent or the child zone. These differences in implementation result in different traffic volumes to authoritative name servers when the NS RRset has different TTL (Time To Live) values configured in parent zone and child zone. DNS standards do not prescribe the behaviour of the resolver (its centricity). </p>

<p>In this post we outline an experiment created to detect centricity patterns of major local ISP resolvers in New Zealand and public DNS resolvers. The work is inspired by a presentation from Ólafur Guðmundsson available <a href="http://archive.icann.org/en/meetings/siliconvalley2011/bitcache/Conclusions%20from%20DNS%20Traces%20-%20Olafur%20Gudmunsson,%20Shinkuro-vid=23075&amp;disposition=attachment&amp;op=download.pdf">here</a>.</p>

<h2 id="dnsresolution">DNS Resolution</h2>

<p>The figure below shows the non-cached resolution process for the address of the domain name <strong>string.sub.experiment.nz</strong>.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2017/01/resolution.png" alt=""></p>

<p>The process involves the following steps (for a clean cache):</p>

<ol>
<li>The client (end-user) sends a DNS query to the resolver configured in their network settings (normally the ISP's DNS resolver or a public DNS resolver).  </li>
<li>The resolver doesn't know the address for string.sub.experiment.nz, but knows the IP addresses for the root name servers, so it asks one of them.  </li>
<li>The root server also doesn't have the information about string.sub.experiment.nz, but it holds the locations of the name servers for each top-level domain (such as .nz, .au, .com, etc.), so it answers with the referral (NS RRset) to the .nz name servers, who might know the address of string.sub.experiment.nz.  </li>
<li>The resolver then starts another iterative query against one of .nz name servers.  </li>
<li>The .nz name server holds a list of name servers for each domain delegated in its zone rather than the address of string.sub.experiment.nz, so it responds with the referral to experiment.nz name servers.  </li>
<li>The resolver does its third iterative query against one of experiment.nz name servers.  </li>
<li>As per the preceding servers, it replies with the referral to sub.experiment.nz name servers, which are the name servers that hold the information about string.sub.experiment.nz.  </li>
<li>The resolver sends the query to sub.experiment.nz name server.  </li>
<li>The name server replies with the A or AAAA records (depending on the query type) containing IPv4 or IPv6 addresses for string.sub.experiment.nz.  </li>
<li>The resolver sends back the final answer to client.</li>
</ol>

<p>During the process, the resolver handles the recursive resolution task for the client by repeating the query and following successive referrals until gets to the final answer. </p>

<h2 id="resolvercachingandtimetolivettl">Resolver caching and time-to-live (TTL)</h2>

<p>In normal operations, a resolver caches the answers it gets from name servers for future reference. This allows it to speed up DNS resolution and reduce the traffic to name servers. To keep updated with changes in zone data, every DNS record has a TTL. This TTL controls how long the record can be cached before it is no longer considered valid. </p>

<p>Besides caching answers from client's queries, a resolver also caches referrals (NS RRsets) to the name servers it gets in the process. These will be later used when an answer is not in its cache. Depending on its implementation, some resolvers only keep the NS RRsets from parent zone, which are called parent-centric resolvers. Other resolvers overwrite parent NS RRsets with child zone NS RRsets when receiving answers from child zone, which are called child-centric resolvers.</p>

<p>In real life, it's very common to have different TTL values configured for NS RRsets in parent and child zones, making caching time different between parent-centric and child-centric resolvers for the same query.</p>

<h2 id="parentandchildzonesetup">Parent and Child Zone Setup</h2>

<p>To detect a resolver's centricity pattern, we set up a child zone and a parent zone and the delegation chain as shown in the figure blow. <br>
<img src="http://blog.nzrs.net.nz/content/images/2017/01/zonesetup_m.png" alt=""></p>

<p>The steps are outlined below:</p>

<ol>
<li><p>We registered experiment.nz with two name servers, and for simplicity, pointed the two name servers to the same IP address. The following records are added to the .nz zone:  </p>

<blockquote>
  <p>experiment.nz.  86400  IN  NS  ns1.experiment.nz.<br>
  experiment.nz.  86400  IN  NS  ns2.experiment.nz.<br>ns1.experiment.nz.  86400  IN  A  150.242.40.246<br>ns2.experiment.nz.  86400  IN  A  150.242.40.246<br> <br>
  (Note: 86400 is the default TTL for all delegations and cannot be altered)</p>
</blockquote></li>
<li><p>We set up the experiment.nz zone:  </p>

<blockquote>
  <p>experiment.nz.  86400  IN  NS  ns1.experiment.nz.<br>
  experiment.nz.  86400  IN  NS  ns2.experiment.nz. <br>
  ns1.experiment.nz.  86400  IN  A  150.242.40.246 <br>
  ns2.experiment.nz.  86400  IN  A  150.242.40.246 <br>
  sub.experiment.nz. 10 IN  NS  ns.sub.experiment.nz. <br>
  ns.sub.experiment.nz.  10  IN  A  150.242.41.248</p>
</blockquote></li>
<li><p>We set up sub.experiment.nz zone:  </p>

<blockquote>
  <p>sub.experiment.nz. 120 IN  NS ns.sub.experiment.nz.<br>
  ns.sub.experiment.nz.  120  IN  A  150.242.41.248 <br>
  *  0  IN  A  127.0.53.53</p>
</blockquote>

<p>We use the wildcard to match any query name in our experiment and set TTL of 0 so it will not be cached, which allows us to focus on the caching time of the referral. </p></li>
</ol>

<h2 id="ripeatlas">RIPE Atlas</h2>

<p>We used <a href="https://atlas.ripe.net/about/">RIPE Atlas</a>, an Internet measurement platform that provides thousands of active probes located around the world and a REST API to conduct measurements,  and simulated DNS queries from end users. We selected 80 active probes located in New Zealand, all of which can be used to query public resolvers such as GoogleDNS and OpenDNS, and compiled a list of ISP resolvers' addresses for providers hosting probes.</p>

<h2 id="experiment">Experiment</h2>

<p>Each probe sent DNS queries to their resolver IP addresses at fixed interval for a period of time. Measurements for different resolvers were run in parallel to improve the efficiency, and measurements for the same resolver (multiple IPs) were run one by one to ensure they did not interfere with each other's data in the cache.</p>

<p>To help identify queries from different measurements we generated a unique query name every time by composing the query name using "resolver name" + "resolver ip" + "probe id" + timestamp of the run. For example, a query for name opendns208.67.222.222-10269-1480038557.sub.experiment.nz was sent by probe #10269 to OpenDNS's 208.67.222.222 ip address, and 1480038557 was the start timestamp of the program.</p>

<h2 id="resultanalysis">Result Analysis</h2>

<p>From the query log we extract four fields to build query sequences at both parent and child name servers: </p>

<ul>
<li>timestamp (ts)</li>
<li>resolver's service IP (probing target)</li>
<li>probe ID</li>
<li>resolver's source IP querying the name server</li>
</ul>

<p>As the TTL for the queried record is set to 0, every query should be sent to the child name server. The query is sent to parent server as well only when the NS record is not found in the cache (first query or TTL expire) and the resolver has to ask parent name server for the referral. As each query name is unique, by checking the parent and child query logs, we are able to mark a query with 'P' if it is sent to both servers, or 'C' if it is sent only to the child server. </p>

<p>With a fixed query interval of 60s, we tested three different combinations of parent TTL and child TTL, and verified the expected patterns as below:</p>

<ul>
<li><p>parent-ttl=10s, child-ttl=120s</p>

<p>PPPPPPPPPPPPPPPPP   &lt;- Parent Centric <br>
PCPCPCPCPCPCPCPCP   &lt;- Child Centric</p></li>
<li><p>parent-ttl=5m, child-ttl=30s </p>

<p>PCCCCPCCCCPCCCCPC   &lt;- Parent Centric <br>
PPPPPPPPPPPPPPPPP   &lt;- Child Centric</p></li>
<li><p>parent-ttl=5m, child-ttl=10m</p>

<p>PCCCCPCCCCPCCCCPC   &lt;- Parent Centric <br>
PCCCCCCCCCPCCCCCC   &lt;- Child Centric</p></li>
</ul>

<p>There could be other patterns indicating minimum TTL enforcement, TTL stretching, etc.</p>

<p>We found for a single measurement (same target and same probe), the source IP address in the query log changes for each query. This may due to the architecture of a high performance resolver where a load balancer may be installed announcing the service IP address with two or more DNS servers behind doing the recursive resolution. The choice of server is unpredictable, and could be affected by many factors such as policy, server load, network performance or probe location. </p>

<p>We could also tell if the balanced servers were sharing caches (behaving as one server), by analysing the consecutive queries using the same service IP. If not, each source IP will be treated as an independent caching server, and be analysed separately.</p>

<p>For local ISP resolvers, we observed 33 servers (source IP addresses) in the query log with different patterns as below:</p>

<ul>
<li>parent-centric: 5 servers</li>
<li>child-centric: 13 servers</li>
<li>minimum (parent-ttl, child-ttl): 11 servers</li>
<li>mixed pattern: 2 servers from the same resolver show a pattern mixed with child-centric and NS non-cached.</li>
<li>NS non-cached: 2 servers from the same resolver send queries to parent and child with probing interval regardless of NS TTL value.</li>
</ul>

<p>Among all the servers above, two from the same resolver were sending queries to child server with larger interval than probing interval, possibly because the 0 TTL of the queried record was replaced by a value of minimum TTL enforcement.</p>

<p>For public DNS resolvers, we get the following results:</p>

<ul>
<li>6 addresses from OpenDNS, all of which behave as child-centric</li>
<li>73 addresses in four prefixes 173.194.171/24, 173.194.93/24, 74.125.41/24, 103.9.106/24 from GoogleDNS, which show a quite unique behaviour: most of queries use two different IPs to query the parent and child name servers. As GoogleDNS's addresses 8.8.8.8 and 8.8.4.4 are very popular probing targets on RIPE Atlas and easy to get conflicts, we only probed them with parent-ttl=10s and child-ttl=120s, in which a steady parent-centric pattern was shown, and could do the same for testing of other values in the future.</li>
</ul>

<h2 id="futurework">Future Work</h2>

<p>In this post we detailed an experiment to detect the centricity of local ISP resolvers and public resolvers using RIPE Atlas with probes located in New Zealand. Interesting patterns and behaviours were found indicating different implementations and architectures of resolvers. Besides centricity, other factors such as TTL values and query intervals for different domain names also play a critical role in how much traffic volume to .nz name servers is reduced by caching, which we will explore in the future.</p>]]></content:encoded></item><item><title><![CDATA[Domain Retention Prediction]]></title><description><![CDATA[<p>Customer-base analysis has become increasingly important. A critical part of the analysis is the prediction of retention rate, which is defined as the proportion of customers active at the end of period $t-1$ who are still active at the end of period $t$. Retention prediction is commonly used for customer</p>]]></description><link>http://blog.nzrs.net.nz/domain-retention-prediction/</link><guid isPermaLink="false">69dc0b68-29ec-41de-be46-0fa5f2aecaca</guid><dc:creator><![CDATA[Huayi Jing]]></dc:creator><pubDate>Mon, 03 Oct 2016 02:08:10 GMT</pubDate><content:encoded><![CDATA[<p>Customer-base analysis has become increasingly important. A critical part of the analysis is the prediction of retention rate, which is defined as the proportion of customers active at the end of period $t-1$ who are still active at the end of period $t$. Retention prediction is commonly used for customer lifetime value calculation. For the .nz registry, the retention rate of domains is a crucial metric used for financial planning. </p>

<p>Several probabilistic models have been developed for retention prediction. In a <a href="https://marketing.wharton.upenn.edu/files/?whdmsaction=public:main.file&amp;fileID=341">survey paper</a> written by Peter and Bruce, customer’s relationship with a company can be classified into four types (see Figure 1).  In the domain name industry, there are two characteristics: </p>

<ul>
<li>Whether a domain name is renewed or not can be observed from the registry's database; </li>
<li>A domain name is usually renewed for a certain length of period which can vary from 1 month to 120 months, 1 year is the most common.</li>
</ul>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/19-09-2016-09-25-16.png" alt="Drawing" style="width: 550px;"></p>

<p>Hence, the domain business belongs to the contractual and discrete type according to this two-dimensional classification. One commonly used retention prediction model for this type is the<a href="https://marketing.wharton.upenn.edu/files/?whdmsaction=public:main.file&amp;fileID=327"> shifted-Beta-Geometric (sBG) model</a> developed by Peter and Bruce in 2007. In this post, the objective is to predict domain retention probability of the .nz registry using the sBG model. </p>

<h3 id="thesbgmodel">The sBG Model</h3>

<p>The sBG model is based on two assumptions and I describe them in domain name service setting:</p>

<ul>
<li><p>A domain name remains active with constant retention probability $1-\theta$. From the definition of <a href="https://en.wikipedia.org/wiki/Survival_function">survivor function</a>, we have:</p>

<p>a.  The probability of churn (the domain will not be renewed at $t$): 
$$P(T=t|\theta)=\theta(1-\theta)^{t-1}$$</p>

<p>b. The survivor function (probability that the domain is still active at $t$): 
$$S(t|\theta)=(1-\theta)^t$$</p></li>
<li><p><em>Heterogeneity</em> in $\theta$  is modeled by Beta distribution with the pdf:
$$f(\theta|\alpha, \beta)=\frac{\theta^{\alpha-1}(1-\theta)^{\beta-1}}{B(\alpha,\beta)}$$
where $B(\alpha,\beta)$ is the Beta function.</p></li>
</ul>

<p>Individual domain's value of $\theta$ is <em>unobserved</em> (not measurable from the dataset), therefore the expectation over the Beta distribution (e.g., $E[P(T=t|\Theta=\theta)]$)  is used to get a randomly chosen domain’s probability of churn and survivor function.</p>

<p>Compared with models that assume a constant churn rate for all customers, the advantage of the sBG model is that it takes customer heterogeneity into account.  After some transformation, the retention rate is expressed in the following concise form: <br>
$$r_t=\frac{\beta+t-1}{\alpha+\beta+t-1}$$</p>

<h3 id="domainretentionprediction">Domain Retention Prediction</h3>

<p>The implementation of the sBG model lies in the estimation of the two parameters: $\alpha$ and $\beta$. In their paper (Appendix B), Peter and Bruce showed how to implement the model and compute the maximum likelihood estimates in Excel. We follow the same procedure and code it in Python (code can be found <a href="https://github.com/NZRS/domain-retention-prediction">here</a>) to find the $\hat{\alpha}$ and $\hat{\beta}$. The survival data shown in Table 1 are for the domains registered in April 2004 at three parent levels: <strong>co.nz</strong>, <strong>org.nz</strong> and <strong>net.nz</strong>. </p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/19-09-2016-10-17-57.png" alt="Drawing" style="width: 500px;"></p>

<p>We fit the sBG model to the first 6,7,8 years of the data for each parent level respectively to compare the accuracy of the estimation. The parameter estimation results are summarized in Table 2. Using these parameter estimates, the survivor function for each parent level is extrapolated out to year 12. The model-based results along with the actual numbers are plotted in Figure 1. The resulting predictions for the survival probability are quite accurate, especially when more data points are used to do estimation. It can be seen that the survival probability is decreasing and the decreasing rate gets smaller with time, meaning that less and less of domains stop being renewed. Another observation is that the survival probability of org.nz is higher than co.nz and net.nz. Hence, this model can help us to identify groups of domains with different retention behaviour so that we can analyse the characteristics shared within each group.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/19-09-2016-10-21-52.png" alt="Drawing" style="width: 500px;"></p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/45345.png" alt="Drawing" style="width: 650px;"></p>

<p>Another interesting plot is the retention rate.  The model-based retention rates and the actual numbers are plotted in Figure 3. Although the model does not track the actual data perfectly, it fits the data on average and captures the trend. There are two reasons: firstly, our data points fluctuate a lot due to possible special events which makes the estimation harder; secondly, although the survival probability and the retention rate are closely related, the survival probability is easier to predict since it has a cumulative form which makes it less sensitive to period-to-period variations. It can be seen that the retention rates are increasing and the retention rate of org.nz is higher. In fact, this is similar to our observation found from the data. Figure 4 shows the retention rates (proportion of domains still active in the next year) for domains created at different periods. It can be seen that the mean retention rates are increasing with time and the domains with age of over 6 years have average retention rates over 0.90. On the other hand, retention rates of newly created domains have a much larger spread compared with older domains, indicating domains are unstable in early age.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/23432432.png" alt="Drawing" style="width: 650px;"></p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/16-09-2016-12-09-08.png" alt="Drawing" style="width: 600px;"></p>

<h3 id="conclusion">Conclusion</h3>

<p>In this post, the sBG model is implemented to fit the data of domains at different parent levels. The sBG model is a simple and powerful. It is interesting to see different retention behaviour among different parent levels. This model will be very helpful in identifying domains with different behaviours so that we can the factors/drivers behind. </p>

<script type="text/x-mathjax-config">  
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      processEscapes: true
    }
  });
</script>

<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

<script type="text/javascript" async src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
</script>]]></content:encoded></item><item><title><![CDATA[Time Series Analysis of .nz Activity Data]]></title><description><![CDATA[<p>The number of new domain creations is an important indicator of the registry activity. Using an appropriate method to predict the number of creations in the future helps us to better understand registry behaviour and plan wisely. One of the most popular forecasting methods is Time Series Modelling, which is</p>]]></description><link>http://blog.nzrs.net.nz/time-series-analysis-of-nz-activity-data/</link><guid isPermaLink="false">e6ec0c5f-28e7-42a8-8366-6e3e02e54984</guid><dc:creator><![CDATA[Huayi Jing]]></dc:creator><pubDate>Thu, 08 Sep 2016 23:57:35 GMT</pubDate><content:encoded><![CDATA[<p>The number of new domain creations is an important indicator of the registry activity. Using an appropriate method to predict the number of creations in the future helps us to better understand registry behaviour and plan wisely. One of the most popular forecasting methods is Time Series Modelling, which is used to find meaningful statistics and useful characteristics from time-based (years, months, days) data.</p>

<p>The <a href="https://idp.nz/Domain-Names/-nz-Activity/mm2r-3dj9">.nz Activity data set on Internet Data Portal</a> keeps daily aggregated registration statistics for .nz domains. This article presents a time series analysis of .nz Activity data since 2012 using <a href="http://www.statsmodels.org/dev/statespace.html">SARIMAX model in state space method</a> of Python. While the .nz Activity data contains data of different kinds of measures, this article focuses on the monthly count of newly created domains.  A few transformations are made to the data so that it can be read as a time series object. The Python code can be found <a href="https://github.com/NZRS/domain-create-sarima-model">here</a>.</p>

<h2 id="visualizethedata">Visualize the data</h2>

<p>The first step is to visualize the data to have a general understanding.  It is important to discover the overall trend and/or seasonal trend so that we know which type of model to use. Note that we use data from January 2012 to December 2015 to build the model so that we can evaluate the performance of out-of-sample forecasting using data of 2016.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/123-3.png" alt="Drawing" style="width: 780px;"></p>

<p>As Figure 1 indicates, there are some spikes in our data point. Particularly, the creations in September / October 2014 and March 2015 are higher than other months. Taking a further look at the data, it shows that: the jump in September 2012 was due to the launch of kiwi.nz; the spikes in September/October 2014 were due to registrations being allowed directly at the second level and at the end of March 2015 it was the end of the preferential registration and reservation period for second level .nz domains. Since these are all special events, they should be treated as outliers. Our observations are also supported by the boxplot of the data, see Figure 2. </p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/1234-1.png" alt="Drawing" style="width: 500px;"></p>

<p>The five outliers here are caused by special events and using a technique for dealing with outliers found <a href="https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/#three">here</a>, we replace these outliers with the mean, which is 11396. The data after this treatment is shown in Figure 3.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/12345-3.png" alt="Drawing" style="width: 780px;"></p>

<p>As we visualize the number of creations per month, we can see that there is an upward trend and there is a seasonality to it. Figure 4 shows plots generated from the seasonal decompose function which deconstructs a time series into several components. The upward trend starts from March 2014 and the seasonality is every 12 months. </p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/123456.png" alt="Drawing" style="width: 780px;"></p>

<h2 id="makeitstationary">Make it stationary</h2>

<p>The next step is to make the time series stationary, i.e., the mean and variance of our data points should be constant over time. Why is stationarity important? Most statistical models (e.g. linear regression model) are based on the assumption that the underlying data can be stationarized, so that the statistics (e.g. mean, variance, covariance) from the past can be used to predict the future. Otherwise, considering a data series with an upward trend as an example, its mean in the future will always be underestimated if the current mean is used as a statistical descriptor.</p>

<p>From the decomposition results, we already know there are a trend and seasonality. Another way to test it is to use rolling statistics and <em>Dickey-Fuller test</em>.  It can be seen from Figure 5 since the trend is not very strong, the rolling mean and standard deviation does not vary too much over time. The p-value is $2.99e^{-3}$ which is not high. </p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/1234567.png" alt="Drawing" style="width: 780px;"></p>

<p>Now let’s see if any improvement in the stationarity can be achieved. Discussions of several data transformation methods can be found <a href="http://people.duke.edu/~rnau/whatuse.htm">here </a>. </p>

<p>One commonly used method is differencing. The <em>first difference</em> of a time series is a series of changes from a period to the next. Hence, differencing can help us to eliminate the impact of the trend. By changing the number of periods, we can get the <em>seasonal difference</em> as well. Sometimes, both of the two transformations are used and we get <em>the first difference of seasonal difference</em>. For example, the first difference of seasonal difference in March 2014 is the equal to the March-February difference in 2014 minus March-February difference in 2013, given that the length of a season is 12 months. We take the first difference and the first difference of seasonal difference of the time series and compare their results.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/12345678.png" alt="Drawing" style="width: 780px;"></p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/123456789.png" alt="Drawing" style="width: 780px;"></p>

<p>It can be seen from Figure 6 that, there are still some obvious signs of seasonality (see data points around November 2012 and November 2014 as an example). The plot for the first difference of the seasonal difference is more stationary. Supporting evidence can be found by looking at the smaller p-value which is $1.48e^{-9}$. We now have a rough idea about how to set the parameters for the  model. Next, we will plot some charts to further assist us to find optimal parameters.</p>

<h2 id="theseasonalarimamodel">The Seasonal ARIMA model</h2>

<p>The <a href="https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average">Autoregressive Integrated Moving Averages (ARIMA) </a>models are the most general class of models for forecasting a time series that can be stationarized. More information about ARIMA is available <a href="http://people.duke.edu/~rnau/arimrule.htm">here</a>. The forecasts are based on three parameters:</p>

<ul>
<li>$p$: the number of autoregressive (AR) terms, i.e., the number of lagged dependent variables.</li>
<li>$d$: the number of non-seasonal differences needed to make the time series stationary. </li>
<li>$q$: the number of lagged forecast errors for the moving average (MA) term.</li>
</ul>

<p>A Seasonal ARIMA model is denoted by $(p,d,q)\times(P,D,Q)_s$, where $s$ is the number of periods per season and $(P,D,Q)$ are seasonal terms. As discussed above, we know $d=1$ since the impact of the upward trend is reduced after taking the first difference. Also, $D=1$ since the time series becomes further stationarized after taking the first difference of seasonal difference.  <em>Autocorrelation</em> and <em>partial autocorrelation</em> graphs can help us with other parameters. Figure 8 shows the two plots for stationarized data.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/11.png" alt="Drawing" style="width: 780px;"></p>

<p>Now we can identify our seasonal ARIMA model by observing the behaviour of the two plots, according to the <a href="http://people.duke.edu/~rnau/arimrule.htm">rules</a> for identifying ARIMA models . In the PACF, there’s a negative spike at lag 12 which may suggest an AR term for the seasonal part of the model. In the non-seasonal legs of PACF, there is a spike at lag 13 which suggest an AR term for the non-seasonal part of the model. Different combinations of these parameters are tried out to fit the model and the one with the lowest AIC value is selected, which is a (1,1,0)*(2,1,0,12) with AIC = 593.</p>

<h2 id="predictionsandforecasting">Predictions and Forecasting</h2>

<p>With the model built, we can make predictions and forecasting based on our data. Data observed from January - August 2016 is used to test the accuracy of our model. The <em>predict</em> command can generate the <em>one-step-ahead</em> prediction and <em>dynamic</em> prediction. One-step-ahead prediction always uses observed values to predict the next in-sample value, whereas Dynamic prediction is equal to one-step-ahead prediction only up to a certain data point and then the previous predicted value will be used to make predictions.</p>

<p>Here the dynamic prediction is performed starting in the October 2015. The mean value of predictions and their 95% confidence interval are shown in Figure 9. It can be seen that the one-step-ahead is slightly better. However, since it uses actual observed value for each subsequent prediction, the one-step-ahead prediction can only be used for in-sample testing. When we want to do out-of-sample forecasting, the dynamic prediction must  be used since the actual value is not available. As shown in Figure 9, our model can predict pretty well. The prediction error is shown in Figure 10. The Mean Squared Error of the one-step-ahead and Dynamic prediction is 58112 and 143150 respectively.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/122.png" alt="Drawing" style="width: 780px;"></p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/1222.png" alt="Drawing" style="width: 780px;"></p>

<p>Finally, we can make the out-of-sample forecast for September 2016 - July 2017, which is shown in Figure 11. It can be seen that the forecasted value captures both the seasonal and upward trends.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/09/133.png" alt="Drawing" style="width: 780px;"></p>

<h2 id="conclusion">Conclusion</h2>

<p>Through this blog we have explored the ARIMA model and the statsmodel package in Python by creating a time series forecast for .nz activity data. The principle advantage of ARIMA is the comprehensiveness of the family of models. Features behind the data can be nicely captured by identifying the model appropriately. This procedure can be easily applied to other time series data, e.g., a different ccTLD data, with similar​ data available.</p>

<script type="text/x-mathjax-config">  
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      processEscapes: true
    }
  });
</script>

<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

<script type="text/javascript" async src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
</script>]]></content:encoded></item><item><title><![CDATA[Observations from 4 years in the domain name industry.]]></title><description><![CDATA[<p>So today marks my last day in the office as CMO for the .nz registry, <a href="https://nzrs.net.nz/index">NZRS</a>. A bitter sweet day as a I reflect on my time here and prepare for the next adventure.</p>

<p>Prior to this role I had no experience with this diverse and vibrant sector of the</p>]]></description><link>http://blog.nzrs.net.nz/observations-from-4-years-in-the-domain-name-industry/</link><guid isPermaLink="false">fc0520e4-aca1-453a-9a59-5a1492870888</guid><dc:creator><![CDATA[David Morrison]]></dc:creator><pubDate>Fri, 05 Aug 2016 02:45:00 GMT</pubDate><media:content url="http://blog.nzrs.net.nz/content/images/2016/08/profile-pic.jpg" medium="image"/><content:encoded><![CDATA[<img src="http://blog.nzrs.net.nz/content/images/2016/08/profile-pic.jpg" alt="Observations from 4 years in the domain name industry."><p>So today marks my last day in the office as CMO for the .nz registry, <a href="https://nzrs.net.nz/index">NZRS</a>. A bitter sweet day as a I reflect on my time here and prepare for the next adventure.</p>

<p>Prior to this role I had no experience with this diverse and vibrant sector of the Internet. Hiding in plain sight it is a sector most us interact with every single day and yet give little thought to. The world of domain names and the critical Domain Name System (DNS) that operates it is much like electricity to me. We (the public) are constantly using domain names when we browse the Internet and send/receive our emails, but give little thought to the infrastructure required to make our experience on the Internet work.</p>

<p>A large part of my role at NZRS has been working with Registrars from around the globe. Registrars are the retail channel through which domain names are sold and NZRS functions as the wholesaler of .nz domain names. The relationship between registrar and registry are crucial to the success of both. To be successful Registries need to provide a product (domain names) that has appeal in the market, has sound policies and low friction for on-boarding and transacting domain names. <br>
The registrar sector has undergone significant change in the past four years with the release of hundreds of new top level domains (nTLDs). Some of the key changes I have observed are:</p>

<ul>
<li>Registrars have had to evolve. The growth in domain name options has changed registrars from operating like a ‘corner store’ selling a few products to a supermarket selling a wide range. This has forced registrars to contend with more complex supply chains, differentiated pricing, merchandising (how and what names to promote to customers), and customer communications.</li>
<li>Registrars have more power over which domains to promote. When the range of products was small registrars could easily determine what domain names were relevant to the market they were servicing. As the range of options grew, the power to choose what domain names to offer sat with the registrar. Registries of new TLDs needed to encourage registrars to not only stock their product but also promote it. This has resulted in a constant flow of marketing opportunities being offered to registrars.</li>
<li>It is becoming more about the customer. A lot of the change to date has been around the supply chain and selection of which domain names to offer. This change has forced registrars to develop clarity on the market they serve and as a result have a better understanding of what their customers actually want from them. Whilst there is still a long way to go there have been a growing number of registrars focussing on the user experience in order to make it easier to register and manage domain names. In what is essentially a commodity market customer loyalty and retention will be key to success for registrars and registries.</li>
</ul>

<p>As well as working with registrars I have had the privilege to collaborate and share ideas with other registries around the globe. In particular the country code domain space (ccTLD) is a highly collaborative community, largely operating for the public good as not for profit or government run entities. This community is welcoming and open to sharing ideas, lessons and successes. I learnt a lot from other ccTLDs and was lucky enough to be able to share some of my own stories and lessons, in particular around the launch of shorter .nz domain names.</p>

<p>Three times a year <a href="https://www.icann.org/">ICANN</a> meetings are held which bring together a diverse group of stakeholders committed to the stable operation of a global Internet. These meetings are a fascinating insight into how policy that impacts global users of the Internet can and is developed. The consensus driven multi-stakeholder approach is intriguing to observe and contribute in and the commitment (largely voluntary) is astounding.</p>

<p>At a more local level the organisations behind the .nz domain name, <a href="https://internetnz.nz/">InternetNZ</a> (delegated manager), <a href="https://nzrs.net.nz/index">NZRS</a> (registry) and <a href="https://dnc.org.nz/">DNC</a> (regulator) are full of passionate people committed to ensuring a better Internet for all New Zealanders. The work they do is unseen by many but essential to ensuring we have an Internet that is open, accessible and free from excessive control and oversight. I have been privileged to work with such a great group. I will continue as a member of InternetNZ and encourage anyone else with an interest in the Internet to <a href="https://internetnz.nz/join-us">become a member as well</a>.</p>

<p>So, onwards to the new adventure working at <a href="http://www.silverstripe.com/nz/">Silverstripe</a>. There have been a great many achievements and lessons I will take to my new role. </p>

<p>A sincere thanks to all those who have been part of this stage in my journey.</p>]]></content:encoded></item><item><title><![CDATA[Characterization of popular resolvers from .nz authoritative name servers' point of view]]></title><description><![CDATA[<p>As part of our role as .nz registry, we manage 4 out of 7 of the .nz authoritative name servers. We receive hundreds of millions of DNS queries from the Internet every day. Many of them are sent from recursive DNS servers commonly located at Internet Service Providers (ISPs) or</p>]]></description><link>http://blog.nzrs.net.nz/characterization-of-popular-resolvers-from-our-point-of-view-2/</link><guid isPermaLink="false">7f7018cc-4e80-49e9-850d-28cecc6d7005</guid><dc:creator><![CDATA[Jing Qiao]]></dc:creator><pubDate>Fri, 13 May 2016 03:20:57 GMT</pubDate><media:content url="http://blog.nzrs.net.nz/content/images/2016/05/http.jpg" medium="image"/><content:encoded><![CDATA[<img src="http://blog.nzrs.net.nz/content/images/2016/05/http.jpg" alt="Characterization of popular resolvers from .nz authoritative name servers' point of view"><p>As part of our role as .nz registry, we manage 4 out of 7 of the .nz authoritative name servers. We receive hundreds of millions of DNS queries from the Internet every day. Many of them are sent from recursive DNS servers commonly located at Internet Service Providers (ISPs) or institutional networks that are used to respond to user requests to resolve a domain name. We store this traffic in a Hadoop Cluster to support research and operations. </p>

<p>In this post we are going to explore certain patterns in the traffic from some popular resolvers including local ISPs such as Spark, Vodafone, Inspire, Actrix, as well as popular public DNS resolvers such as Google DNS and OpenDNS. </p>

<p>We receive some traffic for domains other than .nz, for example, .as, .gg, .je, .aq which we are authoritative for, as well as domains we are not authoritative for but were sent to us due to misconfigurations. By extracting .nz traffic, we are focusing our analysis exclusively on .nz domain space. </p>

<p>We analysed traffic for 1st March 2016.  </p>

<h2 id="dnsinaglimpse">DNS in a Glimpse</h2>

<p>A query sent to a DNS name server is a message which specifies a target domain name, query type, and query class asking for matching resource records. The query type indicates the requested record type, for example:</p>

<ul>
<li>A - IPv4 address</li>
<li>MX - the mail exchanger of a domain</li>
<li>AAAA - IPv6 address</li>
<li>TXT - text records commonly used with Sender Policy Framework (SPF)</li>
<li>SPF - special data used in Sender Policy Framework protocol as an anti-spam technique(OBSOLETE - use TXT)</li>
<li>DS - the record used to identify the DNSSEC signing key of a delegated zone</li>
<li>DNSKEY - the key record used in DNSSEC</li>
<li>NS - the authoritative name servers used for delegating a DNS zone</li>
<li>SRV - records defining service location of servers for specified services</li>
<li>SOA - authoritative information about a DNS zone, including the primary name server, the email of the domain administrator, the domain serial number, and several timers related to refreshing the zone. </li>
</ul>

<p>There are many other query types we do not explore here.   </p>

<p>The response by the DNS name server either answers the question posed in the query, refers the requester to another set of name servers, or signals some error condition. The response code is set in the header of a response message to return the status of the query. For example, NOERROR is returned when the query is completed successfully, while NXDOMAIN response code indicates the domain name does not exist, and there are also other response codes such as REFUSED, which means the DNS server refused to answer for the query that was sent to it. For an exhaustive list of these codes, please refer to the DNS RCODEs section of the link <a href="http://www.iana.org/assignments/dns-parameters/dns-parameters.xhtml">here</a>.</p>

<h2 id="resolverssourceaddresses">Resolvers' Source Addresses</h2>

<p>Recursive DNS servers are also called resolvers, which are normally implemented to have multiple IP addresses for load balancing and service redundancy. With ISPs' addresses we know and subnets published on the websites of <a href="https://developers.google.com/speed/public-dns/faq#technical">Google</a> and <a href="https://www.opendns.com/data-center-locations/">OpenDNS</a>, we search the source addresses in our query data to filter the traffic from these resolvers. From the dataset used, we could identify 737 addresses from Google DNS and 119 addresses from OpenDNS. </p>

<h2 id="exploringthequerytraffic">Exploring the Query Traffic</h2>

<p>On each resolver's plot, the bar represents one source IP address(we will call them servers from now on). Compared with ISPs that just have a few servers, public DNS resolvers have hundreds of them which are plotted as dense bars.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/05/qtype-1.png" alt="Characterization of popular resolvers from .nz authoritative name servers' point of view"></p>

<p>From the plots, we can observe some interesting patterns:</p>

<ul>
<li>Though the distribution of query types with each server varies even within the same resolver, they still share some common patterns. We can see several popular query types such as A, AAAA, MX, DS. And A is still the most popular one.   </li>
<li>Regarding DNSSEC, it shows some amount of DS and DNSKEY queries, from which we could assume that these servers are doing DNSSEC validation. In the traffic, DS queries are significantly more than DNSKEY queries, which is reasonable assuming, as authoritative name server for mainly top/second level domains, we are queried for DS records to be used in the validation of the Chain of Trust, while DNSKEY queries are used in validating signed records within the delegated zones.  </li>
<li>Additionally, there is a big fraction (~ 50%) of NS queries sent from some of Google's servers. This is against intuition as directly requesting NS records is not required in a common DNS resolution process. We don't exclude the possibility that these servers are not the common DNS resolvers but are running some specific tasks like network probing, etc.</li>
</ul>

<p>More explorations could be done in the future, for example, the characteristics of IPv6 traffic, DNS flags in the header of a DNS message, etc. </p>

<h2 id="exploringtheresponses">Exploring the Responses</h2>

<p>Responses are interesting because we can explore the quality of queries against the zone data on the authoritative name servers.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/05/rcode.png" alt="Characterization of popular resolvers from .nz authoritative name servers' point of view"></p>

<p>From the Response Codes in the plots above, we are not surprised that the majority is responded with NOERROR, meaning the domain names requested are found in our zone, while NXDOMAIN(the domain names requested do not exist in our zone) for the rest. There are also other response codes such as REFUSED, FORMERR, SERVFAIL, etc., which are not observed in our sample of .nz traffic.</p>

<p>Unexpectedly, four servers from Spark and some from OpenDNS have high fraction of NXDOMAIN responses which is suspicious in some way. Possible causes are infected hosts, spam sources, or search for non-registered domains using DNS instead of WHOIS.</p>

<h2 id="exploringtheregistereddomainsqueried">Exploring the Registered Domains Queried</h2>

<p>We explored the percentage of registered domain names queried at least once per day, which is very interesting because it will show how active is the .nz namespace.</p>

<p>To get the queried domain names that are registered, we extract the unique domain names from the responses with NOERROR response code. The total number of registered domain names on that day is obtained from the <a href="https://idp.nz/Domain-Names/register-number-1-Mar-2016/xdes-3x42">Internet Data Portal (IDP)</a>.</p>

<p><img src="http://blog.nzrs.net.nz/content/images/2016/05/domain.png" alt="Characterization of popular resolvers from .nz authoritative name servers' point of view"></p>

<p>From the plots, we observe that public DNS resolvers show lower percentages than local ISPs possibly due to the different population compositions. Clients using public DNS resolvers are more likely to be international so their requests are widely distributed across different TLDs (Top Level Domains) and .nz domain names just represent a small part, while those using local ISPs are mostly .nz centric.</p>

<p>We explored two different days and they show similar patterns like the above.</p>

<h2 id="conclusion">Conclusion</h2>

<p>We explored one day of .nz traffic coming from popular resolvers including local ISPs and public DNS resolvers. Interesting patterns were observed, which can be used to identify well behaved resolvers and further assist classification of the traffic by different types of source addresses. More exploration will be carried on to characterize all the traffic based on patterns from sampled traffic.</p>]]></content:encoded></item><item><title><![CDATA[The list of ALL 882 top level domain options]]></title><description><![CDATA[<p>Recently the European Domain Centre published a fantastic infographic that shows the explosion of new top level domain options that have and are being made available.  You can <a href="http://blog.europeandomaincentre.com/list-of-domain-extensions/">view the original post here</a></p>

<p>We liked the graphic so much we requested a large format printable version that will soon hang</p>]]></description><link>http://blog.nzrs.net.nz/the-list-of-all-882-top-level-domain-options/</link><guid isPermaLink="false">b7920017-1de8-45e5-b011-850ce4fc5d9d</guid><category><![CDATA[.nz]]></category><category><![CDATA[Domain Names]]></category><category><![CDATA[nTLDs]]></category><dc:creator><![CDATA[David Morrison]]></dc:creator><pubDate>Tue, 10 May 2016 02:36:09 GMT</pubDate><media:content url="http://blog.nzrs.net.nz/content/images/2016/05/Screen-Shot-2016-05-10-at-2-34-22-PM.png" medium="image"/><content:encoded><![CDATA[<img src="http://blog.nzrs.net.nz/content/images/2016/05/Screen-Shot-2016-05-10-at-2-34-22-PM.png" alt="The list of ALL 882 top level domain options"><p>Recently the European Domain Centre published a fantastic infographic that shows the explosion of new top level domain options that have and are being made available.  You can <a href="http://blog.europeandomaincentre.com/list-of-domain-extensions/">view the original post here</a></p>

<p>We liked the graphic so much we requested a large format printable version that will soon hang on our wall.  It is an excellent reminder of just how much the domain name market is changing.  .nz launched in 1987 when there were only a handful of domain options globally.  How things have changed and will continue to do so for the foreseeable future.</p>

<p>'<div style="clear:both"><a href="http://blog.europeandomaincentre.com/list-of-domain-extensions/"><img src="http://blog.europeandomaincentre.com/wp-content/uploads/2016/03/timeline-infographic-011.jpg" title="List of ALL 882 domain extensions [2016 INFOGRAPHIC]" alt="The list of ALL 882 top level domain options" border="0"></a></div><div>Courtesy of: <a href="http://www.europeandomaincentre.com">European Domain Centre</a></div>'</p>]]></content:encoded></item><item><title><![CDATA[The Impact of .nz and .uk Registrations at the Second Level]]></title><description><![CDATA[<p>This interview was conducted with Tony Kirsch (Domain Name Association), Oliver Hope (Nominet) and David Morrison (NZRS).  This interview was originally <a href="http://www.thedna.org/interview-impact-2nd-level-cctld/">posted by the Domain Name Association</a></p>

<p>With the recent rollout of new TLDs, the opportunity to create new, meaningful domains has presented a range of opportunities for the global</p>]]></description><link>http://blog.nzrs.net.nz/the-impact-of-nz-and-uk-registrations-at-the-second-level/</link><guid isPermaLink="false">279887ce-7926-4c20-b94c-f9c63d4a1181</guid><dc:creator><![CDATA[David Morrison]]></dc:creator><pubDate>Tue, 10 May 2016 02:04:32 GMT</pubDate><content:encoded><![CDATA[<p>This interview was conducted with Tony Kirsch (Domain Name Association), Oliver Hope (Nominet) and David Morrison (NZRS).  This interview was originally <a href="http://www.thedna.org/interview-impact-2nd-level-cctld/">posted by the Domain Name Association</a></p>

<p>With the recent rollout of new TLDs, the opportunity to create new, meaningful domains has presented a range of opportunities for the global internet community.</p>

<p>Almost all new TLDs have provided opportunities for registrations of domains exclusively at the second-level (i.e. ‘anything.something’), as having domains at the third level (i.e. ‘anything.anything.something’) is seen as cumbersome and unnecessary in newly created TLDs.</p>

<p>However this is not the case globally, as many TLDs (predominantly ccTLDs, or “country code TLDs”) offer domain registration only at the third level.</p>

<p>Forward thinking ccTLDs such as the .uk (United Kingdom) and .nz (New Zealand) extensions have recently undertaken extensive reviews, which ultimately led to the decision to change policy framework and allow registrations at the second-level. <br>
I recently sat down for a frank discussion on the merits of and experiences in introducing ccTLD registrations at the second-level with Oliver Hope, Director of Registry Services for Nominet and David Morrison, Chief Marketing Officer at NZRS (the .nz Registry) to understand the methods used to gain grass roots approval of the change, the impact the change has had today and what may be expected as we move into the future.</p>

<p><strong>TONY KIRSCH:</strong> Let me start off with a slightly historical perspective. How did opening up at the second-level even come about? What were the drivers behind it?  I think it’s important for people to understand the philosophy, because it’s a big shift and I’m sure that for many you’ve spoken to about this, it could at face value appear like a bit of a scam.</p>

<p><strong>OLIVER HOPE:</strong> A big part for us was just that the internet and the industry are changing.  More and more people and other domain alternatives are direct.  For the UK, one of the biggest ccTLDs there is, everyone knows us as predominately .co.uk, and that’s great.  But if you’re a business trying to attract a customer in France, New Zealand or Australia, it might not necessarily be their natural instinct to go to .co.uk.  They’re used to .fr in France, or .de in Germany, you know the list is endless and weighted that way.  We have to accept how the industry is changing and what people are getting used to.  There are a few other things as well- it appeals to new groups and opens some space up, because we’re quite saturated with 10.5 million domains, even though in comparison to .com, this isn’t many.</p>

<p><strong>TONY KIRSCH:</strong> What about in terms of meaningful domains?</p>

<p><strong>OLIVER HOPE:</strong> In terms of meaningful domains, .uk is saturated.  The change also responds a bit to the challenges of new gTLD introduction.  You know everything out there only really has one dot and we had two which is a little bit of a blocker.  Additionally, one of the key things is it’s shorter.  Certainly Twitter has proved that in recent years, shorter and sharper equals better.  If you say to people “Do you want me to make your email or your website shorter by three characters?”, someone who knows what they’re talking about is all for that.</p>

<p><strong>DAVID MORRISON:</strong> On the New Zealand side, our reasons are very similar.  The evolution of domain names particularly in the country code space has gone from a ‘something.something.something’ to a two-level structure ‘something.something’.  So for us, it’s very much about keeping it fresh and relevant.  The other point also is that within the .nz space we have 14 second levels such as .org.nz, .net.nz, .kiwi.nz and.geek.nz.  So we had this categorisation in place, and if you wanted to get a domain name you had to decide which category you were in.  “Am I a company, an organisation? Well, no not really, I’m a sports club.  So I don’t really fit into an .org.nz or a .net.nz or a .co.nz”.  So it’s about giving people true choice and not forcing them to categorise themselves in a way that wasn’t appropriate for their organisation.</p>

<p><strong>TONY KIRSCH:</strong> So the categories became a limiting factor?</p>

<p><strong>DAVID MORRISON:</strong> Correct, and less meaningful over time as the internet has evolved and become more relevant for more people.</p>

<p><strong>TONY KIRSCH:</strong> That’s interesting.  So let’s play the role of the current registrant for a moment. Is the introduction at the second-level a challenge for the current registrant? For example, if I’ve registered my .co.nz or my .co.uk, wouldn’t I think that you’ve introduced a competitor for me?</p>

<p><strong>DAVID MORRISON:</strong> I think both .uk and .nz have tried to treat that in similar ways.  For New Zealand, we allowed domains that were registered prior to a certain date to have priority on second-level registrations. So if you had a third-level domain name that was unique across the zone, prior to a particular date, you had the right to reserve the shorter name at no cost for two years. Then at the end of those two years, there’s an expectation that you would register it.</p>

<p>TONY KIRSCH: And if you haven’t, it would go back into the pool?</p>

<p><strong>DAVID MORRISON:</strong> That’s still to be decided. There’s policy review occurring later this year. But in terms of initial introduction people had a choice to say “I do want to protect it, but I don’t want to have to outlay out the money for it”.  We had about 20,000 domain names take up that option and that has resulted in about 18,000 reservations currently in place which are expected to fall due around the 30th September 2016, so we might see some activity around that. Additionally, where there were multiple domains which matched in differing second levels we had a different process to .uk where we had deemed those domains to be in conflict.  In fact, many of those names are still in conflict. We have about 17,000 or 18,000 conflict sets or approximately 40,000 domain names.  Most conflicts were people who conflicted with themselves where they had for example a matching .co.nz and a .org.nz registration.  Those two policy decisions helped alleviate a lot of the negative outcomes.
TONY KIRSCH: What happened with the conflicts? Were they unavailable?</p>

<p><strong>DAVID MORRISON:</strong> The conflicts stay locked until the parties reach a resolution and no one’s been forced to come to a resolution.</p>

<p><strong>TONY KIRSCH:</strong> That’s interesting. Did you facilitate those introductions between conflicting registrants?</p>

<p><strong>DAVID MORRISON:</strong> The Domain Name Commission, the regulator for .nz, set up a website to facilitate people being able to lodge their preference and encourage the use of the WHOIS to look up who the other parties were and get in contact directly themselves.</p>

<p><strong>OLIVER HOPE:</strong> We’re similar at .uk but probably not as complex because we don’t have as many second-level zones. The main ones that we have are .org.uk, .me.uk and .co.uk.  There are some others but they’re a bit more specialised and very low volume. We have a five-year right of refusal which ends in 2019. The pricing is not free but rather the standard registration fee.  In terms of conflict, originally we were going to give priority to the earliest-registered name across the main three levels, but after consultation we changed that to .co.uk having preference if it’s before certain dates.  Our difficulty is we have approximately 2,700 members, a huge channel and you’re never going to please everyone so you put it to consultation. The danger is ending up with a horse designed by a committee – in other words, a camel. We ended up with this five year right of refusal, which, while far from perfect, was less problematic than many of the alternatives.</p>

<p><strong>TONY KIRSCH:</strong> Policy changes are never perfect and wonderful for everyone. So what happened?</p>

<p><strong>OLIVER HOPE:</strong> There are some people who absolutely love it and said we should just give everyone their .co.uk match for free. But then there’s other people who are more commercial who would absolutely hate that.  Then there are people who think we should just open it up as a new launch and sell to anyone. Finally, then there are people who think you should only ever be able to buy if you have prior use, so you try to weave your way through a very complex minefield and it’s good to see both .uk and .nz end up in similar places.  We’ve both done the same exercise with very similar products, and ended up in roughly the same place.</p>

<p><strong>DAVID MORRISON:</strong> We’ve seen some really interesting current activity in the registry.  Registration at the second-level accounts for just under 16 percent of active registrations.  That’s in the space of less than 18 months. Nearly a third to a quarter of all creates are at the second-level.  So, we are going to see some change in registrations at the third-level.
TONY KIRSCH: Over what time period do you expect this to occur?</p>

<p><strong>DAVID MORRISON:</strong> It’ll be a long time. Some of our second levels such as .org.nz and .net.nz are declining whilst direct registrations under .nz are growing.</p>

<p><strong>TONY KIRSCH:</strong> Were they declining already?</p>

<p><strong>DAVID MORRISON:</strong> They were flat and growth in .co.nz was also starting to flatten. We’ve got approximately 480,000 .co.nz domains and 104,000 .nz domains.</p>

<p><strong>TONY KIRSCH:</strong> That’s really impressive!</p>

<p><strong>DAVID MORRISON:</strong> If you were to compare us to a gTLD launch for example, at the same time we would have been about number 10 or 11 on the new gTLDs table.  No one’s heard of us or the results, so we just go about our business and do our thing at the bottom of the world.  But I think we’re going to see a transition to .nz over time.  We did some deeper analysis and found that in the last year about 60,000 .co.nz domains that were newly registered did not register the .nz version.  And 35,000 of the .nz registrants did not register the corresponding .co.nz. So the behaviour we are seeing from the registrants is that they’re not registering other alternatives, possibly because they aren’t aware or not seeing the need to register both.  They’re either making a conscious decision to go for .co.nz and not care about the shorter .nz version or vice versa.
Within the general public it’s about 37 percent that have awareness of the shorter version of the name. And within registrants, we directly surveyed some of our Registrars’ customers last year with their permission, and 77 percent of people had awareness of the second-level.</p>

<p><strong>TONY KIRSCH:</strong> Oliver, is that symptomatic of what you were saying in regards to .uk?</p>

<p><strong>OLIVER HOPE:</strong> We’re doing things slightly differently and engaging in a lot of promotions.  We’ve actually just finished one where we gave away right of refusal domains.  You go to a Registrar and if you had the right of refusal to your .uk, you could register it for free.  We’re doing things like that where you buy one and get one free.  We’re really trying to market and push .uk because that’s the new product that we want everyone to be aware of.  However, with regards the future, we’re flexible and open. We’re not 100 percent sure what we will do in 2019.  We might have 8 million rights holders still or we might have only half a million.  We see about a third of all the new direct .uk domains are not pre-existing rights holders, they’re brand new domains.</p>

<p><strong>DAVID MORRISON:</strong> Yes, there’s been a similar experience for us.</p>

<p><strong>OLIVER HOPE:</strong> And obviously you can’t register if someone owns the .co.uk already or if they have rights, you cannot register the .uk.  So that’s what’s really quite interesting – that it’s quite a high percentage of people, they’re new customers buying new domains that don’t currently exist.</p>

<p><strong>TONY KIRSCH:</strong> And they’re not trying to steal other people’s brands or digital identities?</p>

<p><strong>OLIVER HOPE:</strong> No.</p>

<p><strong>DAVID MORRISON:</strong> No.  We really haven’t seen a lot of behaviour of people trying to steal brands.  They’re coming up with the name and then going on to register it; the support of registrars is really important with this.  About 26 to 30 percent of .nz domains are creates, so I can look at the Registrars where the proportion is either the same or greater than this. If Registrars are at that or above then it indicates they’re doing a reasonable job of promotion and making people aware of the shorter version.  Where Registrars are not, it’s an indicator that they may not have a great awareness of the New Zealand market, or that they’ve got deep resale channels, so they’ve enabled their systems to support .nz, but their resellers downstream have not turned on similar functionality.  We’re seeing some registrars with 97 percent of creates still at .co.nz.</p>

<p><strong>TONY KIRSCH:</strong> Which lets you go and engage them on that then?</p>

<p><strong>DAVID MORRISON:</strong> That’s right, but we aren’t about pushing, at the end of the day we’re all about the registrant being able to make a choice so we’re not going to force someone to choose .co.nz or .nz.  We’re a not-for-profit entity so we’re not doing this for profit-driven reasons. It’s about ensuring that the registrant has the option of choice, and that at point of sale those options are available to them.</p>

<p><strong>TONY KIRSCH:</strong> That’s a really important point, because I think there is a public perception of opening at the second-level as a bit of a money grabbing exercise.  In particular from the larger corporates who are contacted every day with offers to register their name under different TLDs. Not only are you a not-for-profit, it almost to me sounds like you have a philosophy that there is a need to change but with adequate protections. You can’t please everyone but it sounds to me like both of you have done the best that you can to protect the people that are your current customers, but at the same time position yourselves for the future.</p>

<p><strong>OLIVER HOPE:</strong> Yes, you have to protect your business for your own good and the good of your members. Of course, some people aren’t going to like the way that is, that’s not ideal but we will always do what we think is the absolute best thing for that whole organisation and the UK internet.</p>

<p><strong>DAVID MORRISON:</strong> InternetNZ does a wide range of things that we make quite transparent in terms where money is spent, so when people call this a money-grabbing exercise we can talk quite transparently about what we do as a group of organisations, why we made this decision and what would happen if we just sat back and didn’t evolve with the domain industry. Over time there’s a possibility that growth and revenues may stagnate, which means the society can do less.</p>

<p><strong>TONY KIRSCH:</strong> I think Oliver had a good point of it being about consistency.  The truth is that over the next three to five years, if not already today, there are a lot of people in a lot of countries that think of that second-level structure ‘something.something’.</p>

<p><strong>DAVID MORRISON:</strong> Yeah, that is where it’s going.</p>

<p><strong>OLIVER HOPE:</strong> It needs to join up.</p>

<p><strong>TONY KIRSCH:</strong> Five years from now, how do you see it working? Do you see the things working in parallel where each zone has their niche, or does one start to take over from the other?</p>

<p><strong>OLIVER HOPE:</strong> I don’t actually have a clear answer because it really depends on adoption, and part of that will take time because if you’re a local building firm and you’ve got your web address on your van, you’re not going to get your van resprayed because you want to take three characters off it.  When you buy your next van, it’s probably more likely because it’s shorter.  But, that’s that slow burn of growing adoption.  It’s only 20-odd years really since the internet kicked off. Five years down the line in our industry it could be so wide, the sphere of what it could be is hard to predict.</p>

<p><strong>DAVID MORRISON:</strong> Within the New Zealand space, .co.nz is so ingrained into the psyche, so when you mention a website, the default is .co.nz, so I think we’re talking generations here.  Five years from now, I think .nz will have a greater share of the pie and it will grow from 16 percent to maybe 20, 30 or 40 percent, but .co.nz will still be a dominant player for a long time to come. There are a lot of established brands and a lot of small organisations with limited funds that aren’t going to rebrand because of a domain name change.  They may not see the value of it.  With new businesses coming on board, I think they will adopt it but it’s going to be a long, slow change.  Then again if it doesn’t happen, it’s actually not the end of the world.  Because we’re there to provide choice to our customers and if they choose .co.nz as their preferred place to play, while I don’t think that will be the case, that’s okay but the choice is there.</p>

<p><strong>TONY KIRSCH:</strong> So the idea is not necessarily to pit them against each other?</p>

<p><strong>DAVID MORRISON:</strong> No, not at all.</p>

<p><strong>OLIVER HOPE:</strong> No.</p>

<p><strong>TONY KIRSCH:</strong> It’s all about choice and ensuring consistency with global standards and mainstream behaviours then?</p>

<p><strong>DAVID MORRISON:</strong> Yes very much so. We have our own ‘micro-brands’ or options within our own space, it lets us say “there’s a range of options here; each hold their own separate values and identities and you choose the one that is right for you.”</p>

<p><strong>TONY KIRSCH:</strong> Thank you both so much for your time. Opening up at the second-level for ccTLDs like yours sounds remarkably similar to the principles of new TLDs.  I think it’s great that we realise that the change is probably not for our own immediate benefit, but rather it’s for our kids so when they establish their startups or any other online activity requiring a domain name, they have real choice available to them.  </p>

<p>When kids grow up and they go to a Registrar they won’t have this hang up of what we have.  We think a direct registration at the second-level is innovative but they won’t even know any different, it will just feel like it’s been like this forever as we do with colour TV for example.</p>

<p>All the best with your future endeavours in this space – this is really great information for both the DNA members and the wider community. We really appreciate you sharing your time and insights with us on this important topic.</p>]]></content:encoded></item><item><title><![CDATA[Domain retention prediction using DTMC]]></title><description><![CDATA[<p>Being able to predict domain name retention / renewal is an important tool for registries to better understand the behaviour of domain names in the register. The prediction model can also be used to forecast the future value of a domain name and the income  from domain renewal. In this post,</p>]]></description><link>http://blog.nzrs.net.nz/domain-retention-prediction-using-dtmc/</link><guid isPermaLink="false">fcbe97ae-584b-4ef5-a265-ef2d0035b06d</guid><dc:creator><![CDATA[Huayi Jing]]></dc:creator><pubDate>Sun, 01 May 2016 22:50:31 GMT</pubDate><content:encoded><![CDATA[<p>Being able to predict domain name retention / renewal is an important tool for registries to better understand the behaviour of domain names in the register. The prediction model can also be used to forecast the future value of a domain name and the income  from domain renewal. In this post, we will see how the behaviour of a domain name can be modelled using a Markov Chain model.</p>

<h2 id="memorylessbeautymarkovchain">Memoryless Beauty: Markov Chain</h2>

<p><a href="https://en.wikipedia.org/wiki/Markov_chain">Markov process</a>, named for Andrei Markov, is a stochastic process that has <em>Markov property</em> which states that the future state of the process is independent of  the past, given the present. It has been regarded as a simple and  classic way to analyse complicated stochastic processes because of its 'memoryless' property.</p>

<p>To better understand how it works, let us first look at a very interesting example which can be naturally modelled by a <em>Discrete Time Markov Chain</em> (DTMC).</p>

<h3 id="gamblerruinproblem">Gambler ruin problem</h3>

<p>Let's consider a gambler who has \$100. He bets \$1 each round of the game, and wins with probability of 0.5. He will stop playing when he either gets broke or wins \$1000. You may ask: what's the probability  that he goes broke? or, in the long run, how many games will he play before he stops? Such questions can be answered by modelling it with DTMC. </p>

<p>The amount of money he has at the end of each game is called the <em>state</em> of the process. Here we have 1001 possible states that forms the <em>state space</em>. The transition from a state to another can be represented by a <em>States transition diagram</em> shown below. The nodes represents the states and the number over the arrows stands for the <em>transition probability</em>.</p>

<p><img style="width: 600px;" src="http://blog.nzrs.net.nz/content/images/2016/04/transition-diagram-1.png"></p>

<p>It can be seen that the states 1,2,...,999 can <em>communicate</em> with each other since one state is <em>reachable</em> from another. On the other hand, state 0 (broke) and state 1000  are absorbing since there's no escape from those states. They are also called the <em>closed class</em>. A markov chain is said to be <em>irreducible</em> if all the states communicate with each other. For the Gambler Ruin problem, it is not irreducible since we have two closed classes.</p>

<p>We can also list the transition probabilities in a <em>transition probability matrix</em>. Assuming we have a state space with states 1,2,...,<em>r</em>, then the transition matrix denoted by $P$ is determined by the following.</p>

<p>$$
P= <br>
\begin{bmatrix}
        P_{11} &amp; P_{12} &amp; \cdots &amp; P_{1r} \\
        P_{21} &amp; P_{22} &amp; \cdots &amp;P_{2r} \\
   \vdots &amp; \vdots &amp; \ddots &amp; \vdots \\
        P_{r1} &amp; P_{r2} &amp; \cdots &amp; P_{rr} \\
  \end{bmatrix}
$$</p>

<p>Given that we are in state $i$, the next step must be in one of the possible states. Hence, each row in the transition matrix must sum to one. That is: </p>

<p>$$\sum_{k=1}^{r} P_{ik} = \sum_{k=1}^{r}P(X_{m+1}=k|X_{m}=i)=1, P_{ij}\geq 0, \forall i$$</p>

<h2 id="domainretentionprediction">Domain Retention Prediction</h2>

<p>Now we are ready to investigate the domain retention prediction problem using DTMC. There are four possible states of a domain name: <strong>Active, Pending Release, Expired</strong> and do-not- exist. We are interested in the probability of a domain to stay in the register. Since the do-not-exist state is not reachable from other states and the transit from the do-not-exist is beyond the scope of this study, it is not considered here. The transition diagram is shown below. A simplified lifecycle diagram can be found on <a href="http://www.getyourselfonline.nz/sites/default/files/DotNZ_LifecycleDiagram_v3.5_WithURL.pdf">GetYourselfOnline</a>. </p>

<p><img style="width: 650px;" src="http://blog.nzrs.net.nz/content/images/2016/04/29-04-2016-02-38-39.png"></p>

<p>From the diagram it can be seen that all the states communicate with each other, therefore it is an irreducible chain. Moreover, note that there is a positive probability of remaining  in a state if the chain starts with that state. A beautiful thing about an irreducible markov chain is that it has a <em>stationary distribution</em>, which is the long term behaviour of the chain. The stationary distribution $\pi$ is a vector obtained by solving a set of linear equations  $\pi = \pi P$.</p>

<p>Given the sequence data set containing states of domain names at different time points, we can estimate the transition probability matrix by using the following equation:</p>

<p>$$
\hat{P} _{ij} = \frac{n_{ij}}{\sum_{k=1}^{r}n_{ik}}
$$</p>

<p>Where $n_{ij}$ denotes the number of times the chain moved from state $i$ to $j$. The denominator, $\sum_{k=1}^{r}n_{ik}$ is the  total number of transitions out of state $i$. Estimating the entries like this also corresponds to the <a href="http://www.stat.cmu.edu/~cshalizi/462/lectures/06/markov-mle.pdf">maximum likelihood estimator</a> of the transition probability matrix. </p>

<p>Having these in mind, we can do predictive analysis to answer questions like: <br>
1. Given that a domain name is active today, what is the probability that it is active /  pending release / expired after 3 months, 6 months, 9 months? <br>
2. If a new domain name is registered today,  what fraction of its life will it spend in each of the  three states?</p>

<h2 id="nextsteps">Next Steps</h2>

<p>In this post, we have explored DTMC and its application on domain retention prediction. Some preliminary results have been obtained from a simulation using Python and a data sample from the .nz registry. Another interesting idea for predictive modelling is to use a Markov chain classifier. A portion of the factors considered included:</p>

<ul>
<li>Registration Information</li>
<li>Status, purpose, and information of the website for the domain name</li>
<li>Historical information, e.g., length of time being active in the register, website traffic, total number of renewals</li>
<li>Setup indicators, e.g. the domain has a nameserver, web server or mail server in use</li>
</ul>

<p>One main objective of the domain name retention study is to find factors mostly influencing the probability of renewal, i.e., the KPIs associated with domain names. With that knowledge, registrars can better understand their customers and provider better service to attract high-value domain registrations to increase income. A follow-up blog post will discuss results and insights in details.</p>

<script type="text/x-mathjax-config">  
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      processEscapes: true
    }
  });
</script>

<script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

<script type="text/javascript" async src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-MML-AM_CHTML">
</script>]]></content:encoded></item><item><title><![CDATA[10 things every board should be doing about cyber security]]></title><description><![CDATA[ten tips that boards need for cyber security]]></description><link>http://blog.nzrs.net.nz/10-things-every-board-should-be-doing-about-cyber-security/</link><guid isPermaLink="false">748481ea-564e-4c81-881a-a55b684354f1</guid><dc:creator><![CDATA[Jay Daley]]></dc:creator><pubDate>Wed, 17 Feb 2016 21:00:00 GMT</pubDate><media:content url="http://blog.nzrs.net.nz/content/images/2016/02/iStock_000064428261_Double-1.jpg" medium="image"/><content:encoded><![CDATA[<img src="http://blog.nzrs.net.nz/content/images/2016/02/iStock_000064428261_Double-1.jpg" alt="10 things every board should be doing about cyber security"><p>After many years with cyber security as a fringe issue, more and more boards are becoming aware of just how important it is that they have a clear oversight of the company’s cyber security posture and the risks to the company if they don’t.  For boards still grappling with this complex area I share these 10 hard learnt lessons to help them develop that clarity.</p>

<h3 id="1nbspnbsppayexpertstotryandbreakinitstheonlyauthoritativewaytotellifyourcompanyissecure">1.&nbsp;&nbsp;Pay experts to try and break in – it’s the only authoritative way to tell if your company is secure.</h3>

<p>Risk analysis, well designed processes, regular process audits and intrusion detection tools are all necessary parts of good cyber security but it’s not good enough to rely on those alone.  The one method that absolutely has to be utilised if you want peace of mind is paying experts to try and break into your company, the service known as penetration testing.  If experts using the same tools and techniques that cyber criminals have at their disposal can’t break in, then you have the strongest assurance it’s possible to get that your company is safe.  </p>

<p>What you’re quite likely to find though, is that they can actually break in and possibly quite easily.  Reading a report from penetration testers for the first time that explains just how easily your company defences were breached is sobering to say the least.  That first shock is slightly tempered when the executive explain how quickly they have acted to bolt the stable doors, as they inevitably will have given the wake up call a report like this delivers.  But when the second or third penetration test fails to find a way into the company that quiet panic is finally replaced with a reassuring sense of security based on the strong evidence that penetration testing delivers.  No amount of passive audit can deliver this surety.</p>

<h3 id="2nbspnbsprotatesecurityauditorseveryfewyearsandlookforskillsnotbadges">2.&nbsp;&nbsp;Rotate security auditors every few years and look for skills not badges.</h3>

<p>There’s a natural progression in every industry that sees pioneers in the industry organise to develop a capability maturity model and issue accreditation based on assessment against that model.  By the time an industry has been around for a hundred years or so the whole industry is involved and the process of accreditation is comprehensive, consistent and robust, and so can be taken as a trusted sign of quality. </p>

<p>The cyber security industry isn’t that mature yet and while the service provided by an accredited auditor can be excellent it is not going to be comprehensive as the field is just too broad for one firm to cover it all.  In financial audit, rotating the lead audit partner is sufficient as it ensures a fresh set of eyes, but for cyber security a fresh set of eyes is not enough.  Instead it needs a whole new audit company with different methodology, people and tools to tackle things in a different way and thereby build up that comprehensive coverage.</p>

<h3 id="3nbspnbspitsnotjustlonersintheirbedroomsthatyouneedtoprotectagainstitsorganisedcriminalswithautomatedhackingtools">3.&nbsp;&nbsp;It's not just loners in their bedrooms that you need to protect against, it's organised criminals with automated hacking tools.</h3>

<p>The loners in their bedrooms are still there and still a threat but things have developed a long way since a kid hacking networks just for the thrill of it was the biggest threat you faced.  The criminals have moved in and there are organised gangs out there who hack into companies for a variety of reasons.  Some steal money, some steal company information they can sell, some are paid to disrupt a company by a competitor, some disrupt companies for a blackmail payoff and some are a mix of hackers and activists - <a href="http://ncjolt.org/ashley-madison-breach-hacktivists-or-criminals/">hacktivists</a>. </p>

<p>These organised gangs don’t have to be particularly technical or innovative as there’s plenty of software out there that automates almost every step in the hacking process, from target identification to vulnerability assessment through to the actual process of taking control of a company’s computers.  Some of these tools are highly sophisticated and criminals can pay a high price for them; a commonplace investment given the potential for profit these tools provide.</p>

<p>To add to the complexity there are a few countries with state sponsored hacking teams that act indistinguishably from criminal gangs.  They have the same modus operandi but economic or political espionage as their primary objective.  Unlike criminals motivated by profit, these teams are protected by legal impunity, have far greater resources at their disposal and are much more technical.</p>

<h3 id="4nbspnbspifthegovernmentofferstohelpthentaketheofferseriously">4.&nbsp;&nbsp;If the government offers to help, then take the offer seriously.</h3>

<p>The number of clever hackers out there looking for new vulnerabilities and developing new means of illicit entry are far fewer than it would appear given the volume of hacking attempts.  As I noted earlier, the reason there is so much active hacking taking place globally is because the tools have become so powerful and so simple to use that it doesn’t take much knowledge to use them.  </p>

<p>Increasingly, and particularly in certain countries, the clever hackers who invent the techniques used in these tools are being recruited into military cyberwarfare teams as the militarisation of cyberspace accelerates.  The net result is more and more successful intrusions being <a href="http://www.cnbc.com/2015/10/19/china-hacking-us-companies-for-secrets-despite-cyber-pact-.html">pinned on foreign state-sponsored hackers</a> using their own home grown and initially undetectable tools.  Worryingly, the list of probable victims covers a wide range of companies and industries and there is no guarantee that a smaller company is not important enough to be a target.</p>

<p>To counter these threats, governments have developed expertise to defend themselves and built networks for information exchange with allies and trusted third parties.  If your government offers that expertise to you then at the very least it is wise to run a trial to see what they can identify that your vendors have missed. </p>

<h3 id="5nbspnbspaskforthesamevisibilityofcyberattacksonyouremployeesasyouhaveforyouritsystems">5.&nbsp;&nbsp;Ask for the same visibility of cyber attacks on your employees as you have for your IT systems.</h3>

<p>There’s an entire branch of hacking, called social engineering, which deals with tricking people into breaching their own cyber security.  The latest development to make the headlines is the <a href="http://www.stuff.co.nz/national/75271261/Finance-boss-at-Te-Wananga-o-Aotearoa-falls-for-whaling-scam">whaling email</a> – an email that pretends to come from the CEO or other senior manager, instructing the finance team to make a payment, which uses subtle psychology to bypasses the normal authorisation processes.  </p>

<p>Boards expect to see documented risks and mitigations of attacks against IT systems and this expectation should be the same for social engineering attacks as they have a different threat profile and require different mitigations.</p>

<h3 id="6nbspnbspallofthestaffinthecompanyfromtoptobottomneedtobetrainedtodetectcyberattacks">6.&nbsp;&nbsp;All of the staff in the company, from top to bottom, need to be trained to detect cyber attacks.</h3>

<p>The safest approach to cyber security training is to mirror the approach you take to health and safety and treat it as a company wide problem that everyone has to understand and play their part in.  Any staff that are not trained are oblivious to the risk and that leaves the company vulnerable to a step-by-step attack getting its first foothold with an untrained member of staff.</p>

<p>Staff training in this area, which could be as simple as two-hour workshop, aims to do two things.  The first is to help staff identify potential attacks by showing them a wide variety of historic and current attacks.  This is vital because most employees who don’t work in IT simply have no idea what to look out for.  </p>

<p>The second, even more important aim, is to help employees understand how these attacks work and how even someone junior in a company may be carefully and individually targeted as a stepping stone to a bigger target.  Once that understanding is embedded throughout the company the chances of falling victim to even a new and innovative attack are significantly diminished.</p>

<h3 id="7nbspnbspensurethatthirdpartieswhodiscovervulnerabilitiesintocompanysystemshaveasafewaytoreportthemtothecompany">7.&nbsp;&nbsp;Ensure that third parties who discover vulnerabilities into company systems have a safe way to report them to the company.</h3>

<p>Vulnerabilities in company systems are going to be discovered by third parties on a regular basis and boards have a choice on the strategic response to that.  Some companies operate on the belief that if information on vulnerabilities can be suppressed from getting into the public domain then that will keep them safe.  As a result, they respond with legal action against anyone who even discloses they’ve found a vulnerability let alone shares the details.</p>

<p>While this can provide short-term protection for a company’s reputation it doesn’t stop people looking for vulnerabilities (or finding them), it doesn’t stop that information being sold to criminals and then exploited, nor does it protect the company’s reputation in the longer term when one of those vulnerabilities leads to a hack too large to keep secret.</p>

<p>The forward thinking approach is to publish a <a href="https://nzrs.net.nz/about/vulnerability-disclosure-policy">Vulnerability Disclosure Policy</a> that tells third parties exactly how to tell the company about vulnerabilities they find without the risk of legal action and ensures the company has time to fix them before any public disclosure.  It’s not uncommon for the discoverer of a vulnerability to <a href="http://motherboard.vice.com/read/vtech-hacker-explains-why-he-hacked-the-toy-company">use it against the company out of sheer frustration</a> if they feel they can’t report it safely or are not listened to.</p>

<p>A particularly enlightened approach, pioneered by the likes of Google and Facebook, is to pay people who discover serious vulnerabilities and follow the rules when reporting those.  Even a small company can do this, paying out in the 10s or 100s of dollars at most.</p>

<h3 id="8nbspnbspensurethatthecompanyhasaclearpolicytoguidehowitrespondstoamajorcustomerdatabreach">8.&nbsp;&nbsp;Ensure that the company has a clear policy to guide how it responds to a major customer data breach.</h3>

<p>There’s a <a href="http://www.independent.co.uk/news/uk/home-news/talktalk-cyber-attack-company-accused-of-cover-up-following-reports-customers-targeted-a-week-before-a6707091.html">familiar pattern emerging</a> where companies that are hacked are accused of taking too long to admit the hack (appearing only to do so when they have no choice), seeming to underplay the scale of the hack, and apparently refusing to acknowledge that they could have done things better.  The overall impression is of companies that abandon their stated values when things get hard.</p>

<p>The problem starts when a company does not accept that once breached it is no longer in control of events.  There are some types of hackers namely hacktivists and hackers for hire, who want the full extent of the breach known and will deliberately embarrass any company that tries to bury the news.  Then there are the affected customers who will known soon enough if their data is stolen and exploited, plenty of whom will make a public fuss.</p>

<p>The risk of falling into this trap can be minimised by a clear policy that sets out what’s important for the company and what customers can expect the company to do.  In other words, a reminder to uphold the company values however bad the breach.</p>

<h3 id="9nbspnbspupskillingtheboardincybersecurityshouldbeatoplevelboardpriority">9.&nbsp;&nbsp;Upskilling the board in cyber security should be a top level board priority.</h3>

<p>The jargon can be baffling and many of the concepts are new and unusual but boards have to push through that and upskill as the risk from cyber security is too high to be left to the executive.   At a minimum your board (or a committee) should be comfortable in receiving a specific cyber security risk analysis and specific cyber security audit reports.  Ideally, around the board table there is sufficient knowledge of the concepts to scrutinise the company approach to cyber security as deeply as you scrutinise other areas of high risk.</p>

<p>Take encryption for example.  Many companies that have been breached could have prevented the loss of critical data by encrypting that data.  That would have meant that the hackers that broke into the systems would only have been able to steal a file of garbled nonsense.  Those companies needed a board that scrutinised their cyber security enough to have asked the question “but what if they do break in?”.</p>

<h3 id="10nbspnbspasboarddirectorstakeyourpersonalitsecurityveryseriously">10.&nbsp;&nbsp;As board directors, take your personal IT security very seriously.</h3>

<p>Managing your own IT as a board director sitting on multiple boards each with its own IT systems and processes, can be complex.  In particular, using multiple email systems and multiple devices just reduces your productivity and so there’s a tendency to try and short circuit some of that by using a personal email address and personal laptop/tablet.  Being a board director of course, you’re more likely to ensure that the policies allow this and are less rigid for the board than for staff.  </p>

<p>What should be obvious to every board director by now is that you are some of the highest value targets in the company.  While it’s unlikely that as a director you’ll have any privileged access to IT systems, what you do have is authority both explicit and implicit, and that’s what the hackers will target and try to hijack.  A hacked email account can be used to great effect by a skilful hacker.  They know what psychological techniques to use to pressure someone receiving an email they think comes from a board director, into bypassing the ordinary internal controls. </p>

<p>It’s therefore vital that board directors take the security of their personal IT very seriously, which means including it in the company’s security audit plans as a recognised high risk target. </p>

<p>That’s the top 10 and I hope that’s inspired you, or even scared you, to implement them.  If your board implements just half of them, you’ll probably be in the top 5% of cyber security aware companies.</p>

<p>For further reading try the excellent <a href="https://www.iod.org.nz/Portals/0/Governance%20resources/Cyber-Risk%20Practice%20Guide.pdf">IoD Cyber-risk practice guide</a></p>]]></content:encoded></item><item><title><![CDATA[Two years of .nz zone scans]]></title><description><![CDATA[<p>At <a href="https://nzrs.net.nz">NZRS</a> we run Critical Internet Infrastructure and we are a source of Authoritative Internet Data. Because we run the DNS for the .nz country code Top Level Domain (ccTLD), we get access to a great amount and variety of data, but we also seek interesting ways to collect more</p>]]></description><link>http://blog.nzrs.net.nz/two-years-of-nz-zone-scans/</link><guid isPermaLink="false">2b1b9530-b097-42e0-b8cc-f4d4bc96a457</guid><dc:creator><![CDATA[Sebastian Castro]]></dc:creator><pubDate>Sun, 14 Feb 2016 21:41:59 GMT</pubDate><media:content url="http://blog.nzrs.net.nz/content/images/2016/02/zone-scan-word-cloud-6.png" medium="image"/><content:encoded><![CDATA[<img src="http://blog.nzrs.net.nz/content/images/2016/02/zone-scan-word-cloud-6.png" alt="Two years of .nz zone scans"><p>At <a href="https://nzrs.net.nz">NZRS</a> we run Critical Internet Infrastructure and we are a source of Authoritative Internet Data. Because we run the DNS for the .nz country code Top Level Domain (ccTLD), we get access to a great amount and variety of data, but we also seek interesting ways to collect more data to better understand how .nz is used.</p>

<p>One of our initiatives is the .nz zone scan, a collection started in August 2013, run monthly where all active .nz domain names are checked for DNS correctness such as broken/lame delegations or nameservers open for recursion, and checked for the presence of new uses of DNS such as if domains are DNSSEC signed or if they announce IPv6 addresses for various services. <br>
To run the zone scan, we use our own <a href="https://github.com/NZRS/dnscheck">fork</a> of IIS <a href="https://github.com/dotse/dnscheck">dnscheck</a>.</p>

<p>Summarized data of the zone scan is available in our <a href="https://idp.nz">Internet Data Portal</a> for download and exploration. To provide you with a glimpse of the things we found, we'll show you some highlights.</p>

<p>Because we are geeks, we will be using the newly available <a href="https://plot.ly">plot.ly</a> JavaScript library for plotting, and querying the data directly from the portal. Everytime you check this post, the data will be fresh and new, directly consumed from the available dataset.</p>

<h3 id="domainerrors">Domain Errors</h3>

<p><strong>IDP Dataset</strong>: <a href="https://idp.nz/Domain-Names/-nz-Zone-Scan-Domain-Errors/2cqk-jxpt">Domain Errors</a></p>

<p>There are a variety of checks run at the domain level, the most relevant are shown below.</p>

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>  
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>

<style>  
  .plot_ph {
            width: 600px;
            height: 450px;
            border: 1px;
  }
</style>

<div id="domain_plt" class="plot_ph"></div>

<script type="text/javascript">  
var domain_url = "https:" + "//idp.nz/resource/2cqk-jxpt.json?" +  
    "$select=metric,count,domains,date_started" +
    "&$order=date_started" +
    "&$where=metric='Domains with broken delegation' or " +
    "metric='Domains with lame delegations' or " +
    "metric='Domains with no MX record'";
$.getJSON(domain_url, function(data, textstatus) {
    var points = {};
    $.each(data, function(i, entry) {
        var x_point = entry.date_started.substr(0, 10);
        var y_point = +(100*(entry.count/entry.domains)).toFixed(2);
        if (entry.metric in points) {
            points[entry.metric]['x'].push(x_point);
            points[entry.metric]['y'].push(y_point);
        } else {
            points[entry.metric] = {
                x: [x_point],
                y: [y_point]
            };
        }
    });
    // Convert points into something suitable for plotly
    var data = [];
    for (var m in points) {
        data.push({
            type: 'scatter',
            mode: 'lines',
            x: points[m].x,
            y: points[m].y,
            name: m.substr(13)
        });
    }

    var domain_layout = {
        autosize: true,
        title: '.nz Domain Errors',
        yaxis: {
            title: 'Percentage of register'
        },
        xaxis: {
            zeroline: true,
            showline: true
        },
        margin: {
            l: 50,
            r: 50,
            b: 80,
            t: 50
        },
        legend: {
            x: 0.75,
            y: 1.0
        },
        annotations: [{
            x: '2014-05-01',
            y: 13,
            xref: 'x',
            yref: 'y',
            text: 'Collection error',
            showarrow: true
        }]

    };

    Plotly.newPlot('domain_plt', data, domain_layout);
});
</script>

<p>There are three high level categories:</p>

<ul>
<li><strong>Broken Delegation</strong>, where a domain is utterly broken and no other tests can be run. This usually caused when an active domain has nameservers that don't respond for the name. We consider these domains as inactive, they can't be used.</li>
<li><strong>Lame Delegations</strong>, where a domain has one or more listed nameservers that don't respond for the domain. Lame delegations can lead to non-deterministic failures of the domain, so it's a good thing to see a slow decrease of those cases.</li>
<li><strong>No MX Record</strong>, domains that don't have an MX record, so it's not practically possible to use them for email. We use this as an indication of activity for the domain</li>
</ul>

<h3 id="nameserversinfoanderrors">Nameservers info and errors</h3>

<p><strong>IDP Dataset</strong>: <a href="https://idp.nz/Domain-Names/-nz-Zone-Scan-Nameserver-Errors/g8c6-rp3v">Nameserver Errors</a></p>

<p>As part of the correctness test, we query each nameserver for a set of queries, to discover if they provide recursion (bad thing!), if they leave open access to the zone (undesirable), hints of new software versions (NSID), and others.</p>

<div id="ns_plt" class="plot_ph"></div>

<script type="text/javascript">  
var ns_url = "https" + "://idp.nz/resource/g8c6-rp3v.json?" +  
    "$select=metric,count,nameservers,date_started" +
    "&$order=date_started";
$.getJSON(ns_url, function(data, textstatus) {
    var points = {};
    $.each(data, function(i, entry) {
        var x_point = entry.date_started.substr(0, 10);
        var y_point = +(100*(entry.count/entry.nameservers)).toFixed(2);
        if (entry.metric in points) {
            points[entry.metric]['x'].push(x_point);
            points[entry.metric]['y'].push(y_point);
        } else {
            points[entry.metric] = {
                x: [x_point],
                y: [y_point]
            };
        }
    });

    // Convert points into something suitable for plotly
    var data = [];
    for (var m in points) {
        data.push({
            type: 'scatter',
            mode: 'lines',
            x: points[m].x,
            y: points[m].y,
            name: m.substr(12)
        });
    }

    var ns_layout = {
        autosize: true,
        title: '.nz Nameservers info and errors',
        yaxis: { title: 'Percentage of nameservers' },
        xaxis: { zeroline: true, showline: false },
        margin: { l: 50, r: 50, b: 100, t: 40 },
        legend: { x: 0.0, y: -0.5, },
        annotations: [{
            x: '2015-09-01',
            y: 15,
            xref: 'x',
            yref: 'y',
            text: 'Collection error',
            showarrow: true
        }]
    };

    Plotly.newPlot('ns_plt', data, ns_layout);
});
</script>

<p>Nameservers with open recursion are a menace for all Internet users, they can be used for DNS amplification attacks and should be completely eradicated. It's good to see the downward trend. Nameservers with open transfer make the zone contents available via zone transfer. In the InfoSec environment, such situation is seen as an open door for reconnaissance and target identification.</p>

<p>NSID is a relatively new feature in the DNS defined in <a href="https://tools.ietf.org/html/rfc5001">RFC 5001</a> to identify the precise server behind a nameserver. With the expansion of anycast, where several hosts distributed in different places can represent a single nameserver, NSID is a useful tool to track an specific instance. If you feel brave, you can test it against our nameservers ;)</p>

<p><code>dig soa nz @ns2.dns.net.nz +nsid
</code></p>

<p>The response will include something like this</p>

<p><code>;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; NSID: 6e 73 32 61 2e 69 6e 74 2e 64 6e 73 2e 6e 65 74 2e 6e 7a  (n) (s) (2) (a) (.) (i) (n) (t) (.) (d) (n) (s) (.) (n) (e) (t) (.) (n) (z)
;; QUESTION SECTION:
;nz.                IN  SOA</code></p>

<p>In this case, the server <strong>ns2a.dns.net.nz</strong> answered for the query. Depending on your location, you can get ns2a or ns2b.</p>

<h3 id="dnssec">DNSSEC</h3>

<p><strong>IDP Dataset</strong>: <a href="https://idp.nz/Domain-Names/-nz-Zone-Scan-DNSSEC/jd96-epec">DNSSEC</a></p>

<p>DNSSEC is an extension to the original DNS protocol, to provide integrity, authentication and non-repudiation using cryptographic signatures. The .nz zone has been signed for some years, and we are interested on seeing how deployment has progressed. Adoption is currently quite low, but from time to time there is a disruption that speed things up.</p>

<div id="dnssec_plt" class="plot_ph"></div>

<script type="text/javascript">  
var dnssec_url = "https" + "://idp.nz/resource/jd96-epec.json?" +  
"$limit=200&$select=metric,count,domains,classification,date_started" +
"&$order=date_started&$where=metric='Signed domains' or metric='Domains with DS records'";
$.getJSON(dnssec_url, function(data, textstatus) {
    var points = {};
    $.each(data, function(i, entry) {
        if (entry.metric in points) {
            points[entry.metric]['x'].push(entry.date_started.substr(0, 10));
            points[entry.metric]['y'].push(entry.count);
        } else {
            points[entry.metric] = {
                x: [entry.date_started.substr(0, 10)],
                y: [entry.count]
            };
        }
    });

    // Convert points into something suitable for plotly
    var data = [];
    for (var m in points) {
        data.push({
            type: 'scatter',
            mode: 'lines',
            x: points[m].x,
            y: points[m].y,
            name: m
        });
    }

    var dnssec_layout = {
        autosize: true,
        title: '.nz DNSSEC adoption',
        yaxis: { title: 'Domains', rangemode: 'tozero' },
        xaxis: { zeroline: false, showline: false },
        margin: { l: 50, r: 50, b: 80, t: 50 },
        legend: { x: 0.65, y: 0.01 },
        annotations: [ {
            x: '2015-11-06',
            y: 370,
            xref: 'x',
            yref: 'y',
            text: 'Cloudflare',
            showarrow: true
        } ]
    };

    Plotly.newPlot('dnssec_plt', data, dnssec_layout);
});
</script>

<p>There are a couple of situations to observe from this plot. The number of signed domains grows faster that the number of secure delegations (DS records in the registry). We think this is mainly due to registrants testing signing without committing to complete the chain of trust (that may cause operational problems), and also due to few registrars able to handle DS records for their registrants.</p>

<p>The disruption came from Cloudflare, who announced their <a href="https://blog.cloudflare.com/introducing-universal-dnssec/">Universal DNSSEC</a> service to facilitate DNSSEC signing. The uptake was phenomenal, and CloudFlare decided to use the rather a new ECDSA algorithm, which encouraged us to swiftly enable it the registry.</p>

<h3 id="ipv6">IPv6</h3>

<p><strong>IDP Dataset</strong>: <a href="https://idp.nz/Domain-Names/-nz-Zone-Scan-IPv6/rypa-4eiq">IPv6</a></p>

<p>As part of the data gathering effort, we check if domain names have IPv6 addresses for their nameservers, mail servers and web servers. We use this as an indication of IPv6 adoption.</p>

<div id="ipv6_plt" class="plot_ph"></div>

<script>  
    var ipv6_url = "https" + "://idp.nz/resource/rypa-4eiq.json" +
        "?$select=metric,count,domains,date_started" +
        "&$order=date_started";
    $.getJSON(ipv6_url, function(data, textstatus) {
        var points = {};
        $.each(data, function(i, entry) {
            var x_point = entry.date_started.substr(0, 10);
            var y_point = +(100*(entry.count/entry.domains)).toFixed(2);
            if (entry.metric in points) {
                points[entry.metric]['x'].push(x_point);
                points[entry.metric]['y'].push(y_point);
            } else {
                points[entry.metric] = {
                    x: [x_point],
                    y: [y_point]
                };
            }
        });

        // Convert points into something suitable for plotly
        var data = [];
        for (var m in points) {
            data.push({
                type: 'scatter',
                mode: 'lines',
                x: points[m].x,
                y: points[m].y,
                name: m.substr(13)
            });
        }

        var ipv6_layout = {
            autosize: true,
            title: '.nz IPv6 adoption',
            yaxis: { title: 'Percentage of register' },
            margin: { l: 50, r: 50, b: 50, t: 50 },
            legend: { x: 0.05, y: 1 },
            annotations: [ {
                x: '2015-11-29',
                y: 25,
                xref: 'x',
                yref: 'y',
                text: 'Registrar adds v6',
                showarrow: true
            }]
        };

        Plotly.newPlot('ipv6_plt', data, ipv6_layout);
    });
</script>

<p>As we described the IPv6 situation for a policy person recently, it's flat like a pancake. There is little growth along the years, with the unique disruption of a 11% increase in December 2015, due to a registrar adding IPv6 addresses to their nameservers. That change was not driven by registrants, but by operational rules of that registrar in particular. Using other data sources, IPv6 adoption in NZ is basically stalled (will leave that for a different post).</p>

<h3 id="conclusions">Conclusions</h3>

<p>The results presented above are just a little sample of the many metrics we collect monthly for the .nz namespace. We are interested in seeing other people exploring those metrics using the open access the Internet Data Portal provides. We are also planning to make other datasets available, like aggregated counts from our DNS traffic.</p>]]></content:encoded></item><item><title><![CDATA[Namescon 2016, Las Vegas]]></title><description><![CDATA[Namescon (http://namescon.com/) is a domain name industry event held annually over the past three years and this was the first time NZRS attended representing as the registry for the .nz ccTLD.]]></description><link>http://blog.nzrs.net.nz/namescon-2016/</link><guid isPermaLink="false">f9d9234e-81f7-4818-9e8e-a08392603ad8</guid><category><![CDATA[Namescon]]></category><category><![CDATA[Domain Names]]></category><category><![CDATA[ccTLD]]></category><category><![CDATA[gTLD]]></category><category><![CDATA[.nz]]></category><dc:creator><![CDATA[David Morrison]]></dc:creator><pubDate>Wed, 20 Jan 2016 19:12:09 GMT</pubDate><media:content url="http://blog.nzrs.net.nz/content/images/2016/01/2016-01-11-12-02-32.jpg" medium="image"/><content:encoded><![CDATA[<img src="http://blog.nzrs.net.nz/content/images/2016/01/2016-01-11-12-02-32.jpg" alt="Namescon 2016, Las Vegas"><p>Namescon (<a href="http://namescon.com/">http://namescon.com/</a>) is a domain name industry event held annually over the past three years. This was the first time I attended representing NZRS (the registry for the .nz ccTLD).  </p>

<p>Attendance was very diverse encompassing registrars, registries, web hosting companies, attorneys, brand managers, domain name investors, start ups, affiliate marketing companies, parking companies, financial service providers, and individual end-users. This year the conference attracted over 1,200 participants. </p>

<p>The <a href="http://namescon.com/agenda">agenda</a> was jam packed with a wide variety of keynote presentations, educational and informative sessions.  The conference also had a very strong focus on networking and it was awesome seeing how open and friendly people were.  </p>

<p>The domain investment (aka domaining) market looks to be very strong on the back of recent massive growth in the acquisition of domains from the Chinese market.  Several sessions were dedicated to understanding the Chinese market and learning what types of names are attractive. </p>

<p>Monetisation of names looks to be big business with a range of interesting conversations with domainers and those providing monetisation platforms for parked domains. The conference had a very active expo with a large number organisations showcasing their services. Of note were product launches by Uniregistry and GoDaddy delivering more seamless services to large portfolio holders and those active in the secondary market.</p>

<p>I attended with the primary goal of gaining a better understanding of the domain investment market and domain after-market on the international stage.  It was also an excellent opportunity to raise awareness of .nz in the investor space and with international registrars.  </p>

<p>The <a href="http://www.thedna.org/">Domain Name Association</a> (of which .nz is a member) was in attendance and it was excellent seeing increasing dialogue about how top level domains, registrars and the wider industry can work together to create a strong, self regulated market. </p>

<p>One of the highlights for me was the charity event for <a href="http://waterschool.com">WaterSchool</a> the organisation that helps transform families in Uganda by teaching them how to have clean water for life.  In fact Namescon was initially created as a vehicle to raise money for Waterschool.  It was heartwarming to see over $100,000USD raised for this charity.</p>

<p>So was it worthwhile?  In short, yes! The event has grown year on year and based on what I saw I predict it will be bigger and more jam packed with value next year.  </p>]]></content:encoded></item></channel></rss>