The most fundamental unsolved problem in complexity theory is undoubtedly the P=NP problem, which asks (roughly speaking) whether a problem which can be solved by a non-deterministic polynomial-time (NP) algorithm, can also be solved by a deterministic polynomial-time (P) algorithm. The general belief is that ${P \neq NP}$, i.e. there exist problems which can be solved by non-deterministic polynomial-time algorithms but not by deterministic polynomial-time algorithms.

One reason why the ${P \neq NP}$ question is so difficult to resolve is that a certain generalisation of this question has an affirmative answer in some cases, and a negative answer in other cases. More precisely, if we give all the algorithms access to an oracle, then for one choice ${A}$ of this oracle, all the problems that are solvable by non-deterministic polynomial-time algorithms that calls ${A}$ (${NP^A}$), can also be solved by a deterministic polynomial-time algorithm algorithm that calls ${A}$ (${P^A}$), thus ${P^A = NP^A}$; but for another choice ${B}$ of this oracle, there exist problems solvable by non-deterministic polynomial-time algorithms that call ${B}$, which cannot be solved by a deterministic polynomial-time algorithm that calls ${B}$, thus ${P^B \neq NP^B}$. One particular consequence of this result (which is due to Baker, Gill, and Solovay) is that there cannot be any relativisable proof of either ${P=NP}$ or ${P \neq NP}$, where “relativisable” means that the proof would also work without any changes in the presence of an oracle.

The Baker-Gill-Solovay result was quite surprising, but the idea of the proof turns out to be rather simple. To get an oracle ${A}$ such that ${P^A=NP^A}$, one basically sets ${A}$ to be a powerful simulator that can simulate non-deterministic machines (and, furthermore, can also simulate itself); it turns out that any PSPACE-complete oracle would suffice for this task. To get an oracle ${B}$ for which ${P^B \neq NP^B}$, one has to be a bit sneakier, setting ${B}$ to be a query device for a sparse set of random (or high-complexity) strings, which are too complex to be guessed at by any deterministic polynomial-time algorithm.

Unfortunately, the simple idea of the proof can be obscured by various technical details (e.g. using Turing machines to define ${P}$ and ${NP}$ precisely), which require a certain amount of time to properly absorb. To help myself try to understand this result better, I have decided to give a sort of “allegory” of the proof, based around a (rather contrived) story about various students trying to pass a multiple choice test, which avoids all the technical details but still conveys the basic ideas of the argument. This allegory was primarily for my own benefit, but I thought it might also be of interest to some readers here (and also has some tangential relation to the proto-polymath project of determinstically finding primes), so I reproduce it below.

— 1. ${P}$ and ${NP}$ students —

In this story, two students, named ${P}$ and ${NP}$ (and which for sake of grammar, I will arbitrarily assume to be male), are preparing for their final exam in a maths course, which will consist of a long, tedious sequence of multiple-choice questions, or more precisely true-false questions. The exam has a reasonable but fixed time limit (e.g. three hours), and unlimited scratch paper is available during the exam. Students are allowed to bring one small index card into the exam. Other than scratch paper, an index card, and a pencil, no other materials are allowed. Students cannot leave questions blank; they must answer each question true or false. The professor for this course is dull and predictable; everyone knows in advance the type of questions that will be on the final, the only issue being the precise numerical values that will be used in the actual questions.

For each student response to a question, there are three possible outcomes:

• False negative. The student answers “false”, but the actual answer is “true”.
• False positive. The student answers “true”, but the actual answer is “false”.

We will assume a certain asymmetry in the grading: a few points are deducted for false negatives, but a large number of points are deducted for false positives. (There are many real-life situations in which one type of error is considered less desirable than another; for instance, when deciding on guilt in a capital crime, a false positive is generally considered a much worse mistake than a false negative.) So, while students would naturally like to ace the exam by answering all questions correctly, they would tend to err on the side of caution and put down “false” when in doubt.

Student ${P}$ is hard working and careful, but unimaginative and with a poor memory. His exam strategy is to put all the techniques needed to solve the exam problems on the index card, so that they can be applied by rote during the exam. If the nature of the exam is such that ${P}$ can be guaranteed to ace it by this method, we say that the exam is in class ${P}$. For instance, if the exam will consist of verifying various multiplication problems (e.g. “Is ${231*136=31516}$?”), then this exam is in class ${P}$, since ${P}$ can put the algorithm for long multiplication, together with a multiplication table, on the index card, and perform these computations during the exam. A more non-trivial example of an exam in class ${P}$ would be an exam consisting solely of determining whether various large numbers are prime; here ${P}$ could be guaranteed to ace the test by writing down on his index card the details of the AKS primality test.

Student ${NP}$ is similar to ${P}$, but is substantially less scrupulous; he has bribed the proctor of the exam to supply him with a full solution key, containing not only the answers, but also the worked computations that lead to that answer (when the answer is “true”). The reason he has asked (and paid) for the latter is that he does not fully trust the proctor to give reliable answers, and is terrified of the impact to his grades if he makes a false positive. Thus, if the answer key asserts that the answer to a question is “true”, he plans to check the computations given to the proctor himself before putting down “true”; if he cannot follow these computations, and cannot work out the problem himself, he will play it safe and put down “false” instead.

We will say that the exam is in class ${NP}$ if

• ${NP}$ is guaranteed to ace the exam if the information given to him by the proctor is reliable;
• ${NP}$ is guaranteed not to make a false positive, even if the proctor has given him unreliable information.

For instance, imagine an exam consisting of questions such as “Is Fermat’s last theorem provable in ten pages or less?”. Such an exam is in the class ${NP}$, as the student can bribe the proctor to ask for a ten-page proof of FLT, if such exists, and then would check that proof carefully before putting down “True”. This way, the student is guaranteed not to make a false positive (which, in this context, would be a severe embarrassment to any reputable mathematician), and will ace the exam if the proctor actually does happen to have all the relevant proofs available.

It is clear that ${NP}$ is always going to do at least as well as ${P}$, since ${NP}$ always has the option of ignoring whatever the proctor gives him, and copying ${P}$‘s strategy instead. But how much of an advantage does ${NP}$ have over ${P}$? In particular, if we give ${P}$ a little bit more time (and a somewhat larger index card), could every exam that is in class ${NP}$, also be in class ${P}$? This, roughly speaking, is the ${P=NP}$ problem. It is believed that ${P \neq NP}$, thus there are exams which ${NP}$ will ace (with reliable information) and will at least not make a false positive (even with unreliable information), but for which ${P}$ is not guaranteed to ace, even with a little extra time and space.

— 2. Oracles —

Now let’s modify the exams a bit by allowing a limited amount of computer equipment in the exam. In addition to the scratch paper, pencil, and index card, every student in the exam is now also given access to a computer ${A}$ which can perform a carefully limited set of tasks that are intended to assist the student. Examples of tasks permitted by ${A}$ could include a scientific calculator, a mathematics package such as Matlab or SAGE, or access to Wikipedia or Google. We say that an exam is in class ${P^A}$ if it can be guaranteed to be aced by ${P}$ if he has access to ${A}$, and similarly the exam is in class ${NP^A}$ if it can be guaranteed to be aced by ${NP}$ if he has access to ${A}$ and the information obtained from the proctor was reliable, and if he is at least guaranteed not to make a false positive with access to ${A}$ if the information from the proctor turned out to be unreliable. Again, it is clear that ${NP}$ will have the advantage over ${P}$, in the sense that every exam in class ${P^A}$ will also be in class ${NP^A}$. (In other words, the proof that ${P \subset NP}$ relativises.) But what about the converse – is every exam in class ${NP^A}$, also in class ${P^A}$ (if we give ${P}$ a little more time and space, and perhaps also a slightly larger and faster version of ${A}$)?

We now give an example of a computer ${A}$ with the property that ${P^A=NP^A}$, i.e. that every exam in class ${NP^A}$, is also in class ${P^A}$. Here, ${A}$ is an extremely fast computer with reasonable amount of memory and a compiler for a general-purpose programming language, but with no additional capabilities. (More precisely, ${A}$ should be a PSPACE-complete language, but let me gloss over the precise definition of this term here.)

Suppose that an exam is in class ${NP^A}$, thus ${NP}$ will ace the exam if he can access ${A}$ and has reliable information, and will not give any false positive if he can access ${A}$ and has unreliable information. We now claim that ${P}$ can also ace this exam, if given a little bit more time and a slightly larger version of ${A}$. The way he does it is to program his version of ${A}$ to simulate ${NP}$‘s strategy, by looping through all possible values of the solution key that ${NP}$ might be given, and also simulating ${NP}$‘s copy of ${A}$ as well. (The latter task is possible as long as ${P}$‘s version of ${A}$ is slightly larger and faster than ${NP}$‘s version.) There are of course an extremely large number of combinations of solution key to loop over (for instance, consider how many possible proofs of Fermat’s last theorem under ten pages there could be), but we assume that the computer is so fast that it can handle all these combinations without difficulty. If at least one of the possible choices for a solution key causes the simulation of ${NP}$ to answer “true”, then ${P}$ will answer “true” also; if instead none of the solution keys cause ${NP}$ to answer “true”, then ${P}$ will answer “false” instead. If the exam is in class ${NP^A}$, it is then clear that ${P}$ will ace the exam.

Now we give an example of a computer ${B}$ with the property that ${P^B \neq NP^B}$, i.e. there exists an exam which is in class ${NP^B}$, but for which ${P}$ is not guaranteed to ace even with the assistance of ${B}$. The only software loaded on ${B}$ is a web browser, which can fetch any web page desired after typing in the correct URL. However, rather than being connected to the internet, the browser can only access a local file system of pages. Furthermore, there is no directory or search feature in this file system; the only way to find a page is to type in its URL, and if you can’t guess the URL correctly, there is no way to access that page. (In particular, there are no links between pages.)

Furthermore, to make matters worse, the URLs are not designed according to any simple scheme, but have in fact been generated randomly, by the following procedure. For each positive integer ${n}$, flip a coin. If the coin is heads, then create a URL of ${n}$ random characters and place a web page at that URL. Otherwise, if the coin is tails, do nothing. Thus, for each ${n}$, there will either be one web page with a URL of length ${n}$, or there will be no web pages of this length; but in the former case, the web page will have an address consisting of complete gibberish, and there will be no means to obtain this address other than by guessing.

The exam will consist of a long series of questions such as “Is there a web page on ${B}$ with a URL of ${1254}$ characters in length?”.

It is clear that this exam is in class ${NP^B}$. Indeed, for ${NP}$ to ace this exam, he just needs to bribe the proctor for the URLs of all the relevant web pages (if they exist). He can then confirm their existence by typing them into ${B}$, and then answer “true” if he finds the page, and “false” otherwise. It is clear that ${NP}$ will ace the exam if the proctor information is reliable, and will avoid false positives otherwise.

On the other hand, poor ${P}$ will have no chance to ace this exam if the length of the URLs are long enough, for two reasons. Firstly, the browser ${B}$ is useless to him: any URL he can guess will have almost no chance of being the correct one, and so the only thing he can generate on the browser is an endless stream of “404 Not Found” messages. (Indeed, these URLs are very likely to have a high Kolmogorov complexity, and thus cannot be guessed by ${P}$. Admittedly, ${P}$ does have ${B}$ available, but one can show by induction on the number of queries that ${B}$ is useless to ${P}$. We also make the idealised assumption that side-channel attacks are not available.) As ${B}$ is useless, the only hope ${P}$ has is to guess the sequence of coin flips that were used to determine the set of ${n}$ for which URLs exist of that length. But the random sequence of coin flips is also likely to have high Kolmogorov complexity, and thus cannot be guaranteed to be guessed by ${P}$ either. Thus ${P^B \neq NP^B}$.

Remark 1 Note how the existence of long random strings could be used to make an oracle that separates ${P}$ from ${NP}$. In the absence of oracles, it appears that separation of ${P}$ from ${NP}$ is closely connected to the existence of long pseudorandom strings – strings of numbers which can be deterministically generated (perhaps from a given seed) in a reasonable amount of time, but are difficult to distinguish from genuinely random strings by any quick tests. See my writeup of this lecture by Avi Wigderson for more discussion.