An alternative to the OFQUAL algorithm

As anyone in the UK knows, A-level exams, taken at age 18 and mostly used to determined admission to university were cancelled this year due to the pandemic. The UK government asked OFQUAL (a semi-independent government body) to determine the A-level grades. OFQUAL developed an algorithm based on past school performance and teacher rankings of students explained here. When these results were released there was an uproar because large numbers of students did not receive the grades they were expecting and rejected from their chosen universities. After days of pressure the UK government backtracked on these predicted results and also allowed teacher predicted grades. For many universities, this has resulted in more students having qualified to enter than they can accommodate. Hence chaos.

Naturally statisticians were very interested in this algorithm. Back at the beginning of the process, the Royal Statistical Society had volunteered professorial help in constructing the algorithm. But OFQUAL insisted on a highly restrictive NDA to which the RSS declined to agree. After details of the algorithm emerged, Guy Nason described several deficiencies in the algorithm. The algorithm had some biases that favoured small independent schools (that usually have wealthier students) and tended to mark down larger state schools (that usually have less privileged students). My colleague in Computer Science at Bath, Tom Haines, describes other problems with the construction of the algorithm (although I do take umbrage at him blaming statisticians for this! – we did try to help)

Statisticians have suggested improvements to the algorithm that would avoid some of the bias problems but given the information available, students would still have been upset at the result. Given that the school of the student was one of the few useful predictors available, one could not avoid using this information and yet the very fact that this was done was found highly objectionable by many. Why should the school you attend determine your university outcome? Furthermore, due to a natural phenomenom known as “regression to the mean”, predictions for students who had good reasons to expect to do well will be shifted downwards. Even had OFQUAL taken more professional advice, many students would still have been angry about the results and the media uproar would have been much the same.

I propose that an entirely different approach should have been used. Few really care what A-level grades they get – they care which university will accept them. We can issue a pandemic certificate of completion for the A-levels and deal with the university admission problem directly. Here’s my proposal:

  1. Wait until all universities have made their offers and students have made their firm (first) choice and insurance (second) choice.
  2. Oxbridge is at the top of the tree. They randomly choose students from among those they have accepted. They would want to control numbers on different courses and in different colleges but they must make a random selection. They pass on their rejected students to the next tier of universities.
  3. As in a normal year, the next tier of universities would wait until they receive their insurance students students from Oxbridge. This year they will accept all the insurance students and randomly fill their remaining places with students who held them as first choice. They pass on their rejected students to the next tier as in a normal year.
  4. The process repeats until all students have been (randomly) allocated. The sequence of universities in the decision process is determined by the entry tariff for the given subject as would happen in a normal year. The only differences are that the selection is random and all insurance students are accepted.

Now there would be need to be some modifications occasionally if capacity constraints are hit or for other uncommon circumstances. Given that there is about enough capacity in the university system as a whole for all students, my proposed algorithm would ensure that almost all students receive their first or second choice of university. Now doubtless there would be some sad face photos of students whose hopes and dreams have been crushed by not going to their first choice of university and having to suffer through the horrors of their second choice. But the important difference is that their misfortune will be just that – bad luck and cannot be attributed to some perceived bias against them. You can’t get angry at bad luck.

Efficiency is the other consideration as we want to allocate students to universities commensurate with their ability. Under my scheme, students chose universities where they believed they could gain admission and universities had accepted them. Given we have no A-level exam information, this is the best we can do.

Unfortunately, it’s too late to execute this scheme as it would be politically unacceptable that already accepted students could now be rejected.

Experience from running R workshops in developing countries

Having run a couple of workshops introducing R in developing countries, I offer four pieces of advice:

  1. Don’t rely on the internet. The internet reaches everywhere but what may be adequate for light use by a single user quickly crumbles when large numbers try to make downloads simultaneously. We anticipated this would be a problem and came equipped with R and Rstudio on many cheap memory sticks but this is not sufficient. For example, try installing the Tidyverse and knitting an R markdown document. You will find this requires installing several packages with multiple dependencies. You’ve probably forgotten the multiple packages, pieces of software and configurations you did to make things work on your own machine. It was all so easy when you had a good internet connection. Find an old laptop with a fresh OS install and disconnect it from the internet. Discover what it takes to get it running on your R teaching materials and then you’ll be prepared. Purchase a local data SIM for that country so you can create a local WiFi hotspot. At least you’ll have a chance to access the internet in case of need (quite likely!). Also plan on a period of chaos at the beginning of the workshop to take care of the many install problems.
  2. Tidyverse. R is difficult, particularly for users only familiar with GUI-based statistics software. A tidyverse-only approach greatly simplifies the range of syntax and commands that the workshop participants will need to understand. You can create exercises that they can successfully complete. The tidyverse is sufficiently powerful that you can perform a wide range of practically useful tasks. The participants will gain a feeling of accomplishment and some ability to use R for their own work. Soon enough they will also need to learn to use base R. But if you start with base R, the entry cost is much higher and some of your students will not make it.
  3. Simplicity. This comes in two forms. Firstly, in countries where English is not the native language, you will find that most professional people (who are attending your workshop) know at least some English but that does not mean all of them are entirely fluent. In your presentation, speak slowly and enunciate your words clearly. Avoid colloquial expressions and figures of speech. Do not use complex words. As a native speaker of English, you will find this difficult. Use written documentation to supplement your spoken presentation. Consider using a translator. Secondly, consider simplicity in your R presentation. It is tempting to include some cool R tricks but beginners won’t enjoy this. Stick to the basics and reduce complexity where possible.
  4. Local Data. All too many expositions of R use overworn example datasets such as mtcars. Find some datasets from the country of the workshop. The participants will find this far more interesting and will suggest different ways to analyze the data. This will demonstrate how R can be used to turn data into knowledge.

Linear Models with R translated to Python

I have translated the R code in Linear Models with R into Python. The code is available as Jupyter Notebooks.

I was able to translate most of the content into Python. Sometimes the output is similar but not the same. Python has far less statistics functionality than R but it seems most of the functionality in base R can be found in Python. R now has over 10,000 packages. Python has about ten times as many but most of these are unrelated to Statistics. My book does not depend heavily on additional packages so this was not so much of an obstacle for me. In a few cases, I rely on R packages that do not exist in Python. Doubtless a Python equivalent could be created but that will take some effort.

After this experience, I can say that R is a better choice for Statistics than Python. Nevertheless, there are good reasons why one might choose to do Statistics with Python. One obvious reason is that if you already know Python, you will be reluctant to also learn R. In the UK, Python is now being taught in schools and we will soon have a wave of students who will come to university knowing Python. Python usage is also more common in several areas such as Computer Science and Engineering. Another reason to use Python is the huge range of packages ranging from text, image and signal processing to machine learning and optimisation. These go far beyond what can found in R. The Python userbase is much larger than R and this has translated into greater functionality as a programming language.

I started using S in 1984 and moved onto R when it was first released. It’s hard to move from 34 years of experience of R to no prior experience with Python. Here are a few impressions that may help other R users who start learning Python:

  1. Base R is quite functional without loading any packages. In Python, you will always need to load some packages even to do some basic computations. You will probably need to load numpy, scipy, pandas, statsmodels, matplotlib just to get something similar to the base R environment.
  2. Python is very fussy about namespaces. You will find yourself have to prefix every loaded function. For example, you cannot write log(x) — you’ll need to write np.log(x) indicating that log comes from the numpy package. I understand the reason for this but this alone makes Python code longer than R code.
  3. Python array indices start from zero. Again, I know why this is but it’s something the R user has to continually adjust to.
  4. matplotlib is the Python equivalent of the R base plotting functionality. It does more than R but a range of options is daunting for the new user. I had a better time with seaborn which is more like ggplot is producing attractive plots. (There’s a partial translation of ggplot into Python but I preferred seaborn).
  5. statsmodels provides the linear modelling functionality found in R but you will find some differences that will trip you up. In particular, no intercept term is included by default and the handling of saturated models is different, as the Moore-Penrose inverse is used rather than dropping some offending columns as R does. The output from the linear model is far too verbose for my tastes. Of course, you can work around all these issues.
  6. Python uses pipes very commonly. It helps if you have already started using these in R via the ‘%>%’ operator to get you into that frame of mind.

New users inevitably encounter some frustrations but did find Python enjoyable. I have nightmares about having to do Statistics in Excel but Python world is a pleasant land even if it is still a bit unfamiliar to me.

When small data beats big data

More data is always better? No – not always. Nicole Augustin and I have just published a paper entitled When Small Data beats Big Data (preprint). The main points are:

  1. Quality beats quantity. A high quality small dataset is often more informative than a biased larger dataset. Performance is a tradeoff between bias and variance. As the sample size increases, the variance decreases but the bias remains. You don’t need a huge amount of data to achieve an acceptably small variance so that small dataset with no bias due to careful sampling or a controlled experiment will beat the big garbage dump of a dataset. David beats Goliath with a well-aimed shot.
  2.  Cost. There is no free lunch. More data costs money – what did you think those power studies were for? But it’s not just the acquisition costs of data. Some procedures are computationally expensive and the cost increases at a faster than linear rate with data size. If you need your results now, you might do better with less data. Other costs of data are not financial. People value privacy – we should assign a cost to invading that privacy. We should avoid using more data than necessary to protect privacy.
  3. Inference works better on small data. Statisticians have spent years developing methods for inference on relatively small datasets. Unfortunately, most of these methods don’t work well with big data because the inference becomes unbelievably sharp. The reason for this is that most statistical methods do not allow for model uncertainty or unknown sampling biases. Machine learners do no better as their methods often fail to tackle the uncertainty problem at all. Until we learn how to express uncertainty in big data models, we might be better off sticking with small data.
  4. Aggregation. Sometimes we have the option of reducing a big individual level dataset to a smaller grouped dataset. Information may be lost by this aggregation but sometimes it can be beneficial. It reduces variation, needs simpler models and reduces privacy concerns.
  5. Teaching. Students now need to learn about big and small data methods. But where to start? Small data is easier to work with. It’s much simpler for students to understand both the principles and details of the computation without the technical overhead of big data. Students will need to learn big data methods sooner rather than later but it’s best to come into this with a good understanding of the ideas of uncertainty.

Statisticians should not meddle in sports

There’s always been statistics in sports. People have kept records of achievements such as the most goals in a season or home runs in a career for many years. These statistics add extra flavour to the mere winning and losing of games and championships. I’ve no complaint about such descriptive statistics. My objection is to the use of more advanced statistical methods to improve the chances of a team to win games. The book, and later the movie, Moneyball, was about recruiting underrated players and changing the strategy of a baseball team to win more games. The underlying methods are statistical and have been very successful. The ideas have been developed and spread throughout the sporting world. Many regard this as a success story for statistics but I beg to differ.

I understand the attraction. As a graduate student in Berkeley, I came across Bill James’ Baseball Abstract and was fascinated. James was not a professional statistician but he was great at assembling the right data to answer a question and backing up his claims with sensible statistical summaries. This was  a model for how applied statistics should be done. Nonetheless, I realised that Baseball would not take much notice of a nerdy guy and so it proved for many years until Moneyball. So I’m not one of those people who object to sports sessions at Statistics conferences because sports is a trivial matter (as they feel) that does not deserve such attention at a serious conference.

We all like to feel that our work is improving the world in some way such as improving medical procedures or building more reliable products. American football coach, Vince Lombardi said “Winning isn’t everything, it’s the only thing.” so you might think the purpose of professional sports is winning. If statisticians can help teams win, surely that’s a good thing? But for every winner, there is a loser. The statistician can only help one team win at the expense of other teams losing. It’s a zero-sum game.

The true purpose of professional sports is not winning but entertainment. Lombardi had it all wrong. We watch sports because we find it enjoyable. The winning and losing are all part of the enjoyment. Do statisticians improve the enjoyment of sports? At best the answer is neutral and there is some evidence to suggest that they make it less enjoyable. In baseball, statisticians have discovered that players who foul off a lot of balls improve the chances of winning while stealing bases does not. Yet watching balls being fouled off is boring while base stealing is exciting. In football, statistical advice has sometimes led to boring, park-the-bus, defensive play.

It’s probably too late to turn back the clock as sports teams will not forgo the chance to gain an advantage. Nevertheless, statisticians should realise that, while they may derive some satisfaction and employment in applying statistics to sports, the overall effect of statistics on the professional sporting world has been negative.

Bayesian Regression with INLA

I am writing a book entitled Bayesian Regression with INLA with Xiaofeng Wang and Ryan Yue. INLA stands for integrated nested Laplace approximations. Bayesian computation is not straightforward. In a few simple cases, explicit solutions exist, but in most statistical applications one typically uses simulation – usually based on MCMC (Markov chain Monte Carlo) methods. In some cases, this simulation can take a long time so it would be nice if you could do it faster. INLA is an approximation-based method that can do some Bayesian model fitting computation very quickly compared to simulation-based methods. You can learn more at the R-INLA website. You can also see some preview examples from the book.

Putting mathematical lecture notes on a mobile phone

Almost every student has a smart phone so it makes sense to format lecture notes so that they can be read on these small screen devices. But this can be difficult to achieve if you use LaTeX to produce your lecture notes or other mathematical/statistical handouts. We also want to maintain a full size version so the smaller version needs to be produced with minimal changes. Here are some tips which require increasing effort to implement

  1. Use A6 size paper by opening your LaTeX document with:

    \documentclass[a6paper]{article}

    This is one quarter the size of A4 so it will shrink the page size greatly. I use the default 10pt font since students tend to have good eyesight. I can just about read this if I hold my phone in landscape orientation.

  2. Use the geometry package to specify minimal margins:

    \usepackage[margin=1mm]{geometry}

    Because who needs margins if you are not printing this out.

  3. I use the graphicx package for including graphics. When including a plot or diagram use something like this:

    \includegraphics*[width=0.75\textwidth]{resfit.pdf}

    The plot will take up 75% of the width of the page which works for both the small and the large screen versions.

  4. Some mathematical expressions can be quite long and will exceed the page width particularly on the small screen version. This is not a problem with text since LaTeX knows how to set the line breaks. But it’s much harder to do this with mathematical expressions. This is where the breqn package is very useful. At a minimum you can easily replace all your equation environments with dmath and displaymath with dmath*. This will get you automated line breaking in your equations. The breqn package has a lot more functionality if you want to make more of an effort.

It would be nice if LaTeX could produce documents that could dynamically reflow depending on screen size like the epub format used on e-readers. But that’s unlikely to happen so a version formatted for the small screen is the next best thing.

What’s wrong with University League tables

Today, the Guardian published its University Guide 2017. Let’s take a closer look at the components of these league tables. I don’t mean to pick on the Guardian table as the other tables have similar characteristics.

Three of the measures are based on student satisfaction and are drawn from the National Student Survey. When the NSS started out, it received fairly honest responses and had some value in provoking genuine improvements in education. But its value has deteriorated over time as Universities and students have reacted to it. Most students realise that the future value of their degree depends on the esteem in which their university is held. It is rational to rate your own university highly even if you don’t really feel that way. Furthermore, students find it difficult to rate their experience since most have only been to one university. It’s like rating the only restaurant you’ve ever eaten at. The Guardian makes things worse by using three measures from the NSS in its rankings.

Student to staff ratio is a slippery statistic. Many academics have divided responsibilities between teaching and research. It’s difficult to measure how much teaching they do and how much they interact with students. Class sizes can vary a lot according to type of programme and year in the degree – it’s not like primary education. Spend per student is another problematic measure. Expenditure on facilities can vary substantially from year to year and unpicking university budgets is difficult.

Average entry tariff is a solid measure and reflects student preferences. If you reorder the Guardian table on this measure alone, you’ll get a ranking that is closer to something knowledgeable raters would construct. This measure is sensitive to the selection of courses offered by the university since grades vary among A-level subjects.

Value added scores are highly dubious. It’s a measure of the difference between degree output and A-level input. A-levels are national tests and are a reasonable measure of entry ability. Degree qualifications are not comparable across Universities. A 2:1 at a top university is not the same as a 2:1 from a lower level university. If you compare the exams in Mathematics taken in top universities with those given by lower level universities, you will see huge differences in the difficulty and content. A student obtaining a 2:2 in Mathematics at a top university will likely know far more Maths than a student with a first from a post 92 university. This means it is foolish to take the proportion of good degrees (1 and 2:1) as a measure of output performance.

The final component is the percentage holding career jobs after six months. This is a useful statistic but is hard to measure accurately. This will also be affected by the mixture of courses offered by the university.

All these measures are then combined into a single “Guardian score”.  There is no one right way to do this. If you consider the set of all convex combinations of the measures, you would generate a huge number of possible rankings, all them just as valid (or invalid) as the Guardian score. It’s a cake baked from mostly dodgy ingredients using an arbitrary recipe.

We might laugh this off as a bit of harmless fun. Unfortunately, some prospective students and members of the public take it seriously. Universities boast of their performance in press releases and on their websites. University administrators set their policies to increase their league table standing. Some of these policies are harmful to education and rarely are they beneficial. Meanwhile, the league tables are not actually useful to a 17 year old deciding on a degree course. The choice is constrained by expected A-level grades, course and location preferences. The statistics in the tables fluctuate from year to year and are an unreliable predictor of the individual student experience.