Selected Outputs

Examples of what I do.

Score calibration is the process of empirically determining the relationship between a score and an outcome on some population of interest, and scaling is the process of expressing that relationship in agreed units. Calibration is often treated as a simple matter and attacked with simple tools – typically, either assuming the relationship between score and log-odds is linear and fitting a logistic regression with the score as the only covariate, or dividing the score range into bands and plotting the empirical log-odds as a function of score band.

Both approaches ignore some information in the data. The assumption of a linear score to log-odds relationship is too restrictive and score banding ignores the continuity of the scores. While a linear score to log-odds relationship is often an adequate approximation, the reality can be much more interesting, with noticeable deviations from the linear trend. These deviations include large-scale non-linearity, small-scale non-monotonicity, discrete discontinuities, and complete breakdown of the linear trend at extreme scores.

Detecting these effects requires a more sophisticated approach to empirically determining the score to outcome relationship. Taking a more sophisticated approach can be surprisingly tricky: the typically strong linear trend can obscure smaller deviations from linearity; detecting subtle trends requires exploiting the continuity of the scores, which can obscure discrete deviations; trends at extreme scores (out in the data-sparse tails of the distribution of scores) can be obscured by trends at less extreme scores (where there is more data); score distributions with some specific values that are relatively common can disrupt methods relying on continuity; and any modelling technique can introduce its own biases.

Over the years I have developed a personal approach to these issues in score calibration and implemented them as an open source, publicly accessible R package for score calibration. I discuss these technical issues in empirical score calibration and show how they are addressed in the scorecal package.

CSCC 2019, 2019

In this talk I describe Vector Symbolic Architectures, a family of mathematical techniques for analog computation in hyperdimensional vector spaces that map naturally onto neural network implementations. VSAs naturally support computation on discrete compositional data structures and a form of virtualisation that breaks the nexus between the items to be represented and the hardware that supports the representation. This means that computations on evolving data structures do not require physical rewiring of the implementing hardware. I illustrate this approach with a VSA system that finds isomorphisms between graphs and where different problems to be solved are represented by different initial states of the fixed hardware rather than by rewiring the hardware.
Redwood Center, UC Berkeley, 2013

These comments support Hand’s argument for the lack of practical progress in classifier technology by pursuing them a little deeper in the specific context of credit scoring. Academic development of modeling techniques tends to ignore the role of the practitioner and the impact of business objectives. In credit scoring it can be seen that the nature of the task forces practitioners to adopt modeling strategies that positively favor simple techniques or, at least, limit the possible advantage of sophisticated techniques. The strategies adopted by credit scorers can be viewed as a heuristic approach to inference of the unobserved (and unobservable) distribution of possible data sets. The technical progress examined by Hand has been aimed toward better goodness of fit. However, technical progress toward a more principled basis for inferring the distribution of future problem data would be more likely to be adopted in practice.
Statistical Science, vol. 21, no. 1, pp. 19–23. doi:10.1214 /088342306000000051, 2006

Jackendoff (2002) posed four challenges that linguistic combinatoriality and rules of language present to theories of brain function. The essence of these problems is the question of how to neurally instantiate the rapid construction and transformation of the compositional structures that are typically taken to be the domain of symbolic processing. He contended that typical connectionist approaches fail to meet these challenges and that the dialogue between linguistic theory and cognitive neuroscience will be relatively unproductive until the importance of these problems is widely recognised and the challenges answered by some technical innovation in connectionist modelling. This paper claims that a little-known family of connectionist models (Vector Symbolic Architectures) are able to meet Jackendoff’s challenges.
Proceedings of the ICCS/ASCS Joint International Conference on Cognitive Science (ICCS/ASCS 2003), 2003

The ROC curve is useful for assessing the predictive power of risk models and is relatively well known for this purpose in the credit scoring community. The ROC curve is a component of the Theory of Signal Detection (TSD), a theory which has pervasive links to many issues in model building. However, these conceptual links and their associated insights and techniques are less well known than they deserve to be among credit scoring practitioners.

The purpose of this paper is to alert credit risk modelers to the relationships between TSD and common scorecard development concepts and to provide a toolbox of simple techniques and interpretations.

Presentation at Credit Scoring and Credit Control VI, Edinburgh, Scotland, 1999


Compositional Memory

An overview of my approach to compositional memory.

Credit Scoring

Place-holder for any work related to credit scoring that is not allocated to a more specific project.

Score Calibration

Development of an R package for score calibration.

Recent Posts

One of the side-effects of working in credit-scoring, and especially working for a credit bureau, is that you tend to become very concerned about privacy issues. This website is implemented in blogdown using the Hugo static website generator and the Academic theme. When I launched the website, just before the EU General Data Protection Regulation (GDPR) came into force, the implementation tools provided some basic GDPR support. Since then, some more privacy-related features have been added - so I feel obliged to add them to my site.


I created my first personal website in 2006 using Google Sites. This served me well enough for relatively static content but didn’t really suit me for more involved content like blog posts of technical analyses. Eventually I got rather behind in updating it, so now that site is quite out of date. Web technology has moved on since 2006. In particular, the R blogdown package has been released to simplify construction of websites, especially technical blogging based on R analyses.



Future Events

I will be attending these events.
Feel free to organise catching up with me for a chat.

  • VSA workshop 2020 [SPEAKER]
    First Workshop on Developments in Hyperdimensional Computing and Vector Symbolic Architectures
    16 March 2020
    Heidelberg, Germany
    My presentation: VSA, Analogy, and Dynamic Similarity

Past Events

Recent & Upcoming Talks

scorecal - Empirical score calibration under the microscope



I don’t work as an academic, so I don’t have career incentives for traditional publications. Consequently, my outputs are in whatever format was most convenient for me at the time. Most of my conference presentations are exactly that, presentations with no accompanying paper. My traditional format publications tend to mostly arise from collaborations with academic colleagues.

I have not yet transferred all the outputs from my old, outdated website. Until I do, the best sources are:

(2003). Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience. Proceedings of the ICCS/ASCS Joint International Conference on Cognitive Science (ICCS/ASCS 2003).

PDF Project

(1999). Signal Detection for Credit Scoring Practitioners. Presentation at Credit Scoring and Credit Control VI, Edinburgh, Scotland.

Project Slides


Creative Commons Licence
All content, unless explicitly noted otherwise, is licensed under a Creative Commons Attribution 4.0 International License .

Privacy Policy

All the following points should be read as “to the best of my knowledge”. I am not a website expert, so I can’t vouch for how this website is actually implemented. I can only tell you about my intentions.

Collection of your personal information

Nothing on this website requires you to identify yourself. The only personal information collected while you visit this site is non-identifying information, such as browser type and operating system. This information is collected by Google Analytics for measuring visitor traffic to this site.

I do not collect this information and have no access to it other than as aggregated reports. Here is the Google Analytics privacy page.

This information is collected via cookies. Most web browsers allow you to control handling of cookies. To the best of my knowledge, you can disable all cookies for this website without in any way reducing the functionality for you.

I have set the Hugo GDPR options so that your IP address is anonymised within Google Analytics and the “Do Not Track” request is respected.

Sharing of your personal information

I don’t collect your personal information, so there is nothing I can share.

Google Analytics does collect some information about you. See the Google Analytics privacy page.

Use of your personal information

For each visitor to reach the site, Google Analytics collects the following non-personally identifiable information, including but not limited to browser type, version and language, operating system, pages viewed while browsing the site, page access times and referring website address. This information is presented to me as aggregated reports for the purpose of gauging visitor traffic and trends.