I’ve just uploaded to the arXiv my paper “On product representations of squares“. This short paper answers (in the negative) a (somewhat obscure) question of Erdös. Namely, for any {k \geq 1}, let {F_k(N)} be the size of the largest subset {A} of {\{1,\dots,N\}} with the property that no {k} distinct elements of {A} multiply to a square. In a paper by Erdös, Sárközy, and Sós, the following asymptotics were shown for fixed {k}:

  • {F_1(N) = (1+o(1)) N}.
  • {F_2(N) = (\frac{6}{\pi^2} + o(1)) N}.
  • {F_3(N) = (1+o(1)) N}.
  • {F_{4k}(N) = (1+o(1)) \frac{N}{\log N}} for {k \geq 1}.
  • {F_{4k+2}(N) = (\frac{3}{2}+o(1)) \frac{N}{\log N}} for {k \geq 1}.
  • {(\log 2 + o(1)) N \leq F_{2k+1}(N) \leq N} for {k \geq 2}.
Thus the asymptotics for {F_k(N)} for odd {k \geq 5} were not completely settled. Erdös asked if one had {F_k(N) = (1-o(1)) N} for odd {k \geq 5}. The main result of this paper is that this is not the case; that is to say, there exists {c_k>0} such that any subset {A} of {\{1,\dots,N\}} of cardinality at least {(1-c_k) N} will contain {k} distinct elements that multiply to a square, if {N} is large enough. In fact, the argument works for all {k \geq 4}, although it is not new in the even case. I will also note that there are now quite sharp upper and lower bounds on {F_k} for even {k \geq 4}, using methods from graph theory: see this recent paper of Pach and Vizer for the latest results in this direction. Thanks to the results of Granville and Soundararajan, we know that the constant {c_k} cannot exceed the Hall-Montgomery constant

\displaystyle  1 - \log(1+\sqrt{e}) + 2 \int_1^{\sqrt{e}} \frac{\log t}{t+1}\ dt = 0.171500\dots

and I (very tentatively) conjecture that this is in fact the optimal value for this constant. This looks somewhat difficult, but a more feasible conjecture would be that the {c_k} asymptotically approach the Hall-Montgomery constant as {k \rightarrow \infty}, since the aforementioned result of Granville and Soundararajan morally corresponds to the {k=\infty} case.

In the end, the argument turned out to be relatively simple; no advanced results from additive combinatorics, graph theory, or analytic number theory were required. I found it convenient to proceed via the probabilistic method (although the more combinatorial technique of double counting would also suffice here). The main idea is to generate a tuple {(\mathbf{n}_1,\dots,\mathbf{n}_k)} of distinct random natural numbers in {\{1,\dots,N\}} which multiply to a square, and which are reasonably uniformly distributed throughout {\{1,\dots,N\}}, in that each individual number {1 \leq n \leq N} is attained by one of the random variables {\mathbf{n}_i} with a probability of {O(1/N)}. If one can find such a distribution, then if the density of {A} is sufficienly close to {1}, it will happen with positive probability that each of the {\mathbf{n}_i} will lie in {A}, giving the claim.

When {k=3}, this strategy cannot work, as it contradicts the arguments of Erdös, Särközy, and Sós. The reason can be explained as follows. The most natural way to generate a triple {(\mathbf{n}_1,\mathbf{n}_2,\mathbf{n}_3)} of random natural numbers in {\{1,\dots,N\}} which multiply to a square is to set

\displaystyle  \mathbf{n}_1 := \mathbf{d}_{12} \mathbf{d}_{13}, \mathbf{n}_2 := \mathbf{d}_{12} \mathbf{d}_{23}, \mathbf{n}_3 := \mathbf{d}_{13} \mathbf{d}_{23}

for some random natural numbers {\mathbf{d}_{12} \mathbf{d}_{13}, \mathbf{d}_{23}}. But if one wants all these numbers to have magnitude {\asymp N}, one sees on taking logarithms that one would need

\displaystyle  \log \mathbf{d}_{12} + \log \mathbf{d}_{13}, \log \mathbf{d}_{12} + \log \mathbf{d}_{23}, \log \mathbf{d}_{13} + \log \mathbf{d}_{23} = \log N + O(1)

which by elementary linear algebra forces

\displaystyle  \log \mathbf{d}_{12}, \log \mathbf{d}_{13}, \log \mathbf{d}_{23} = \frac{1}{2} \log N + O(1),

so in particular each of the {\mathbf{n}_i} would have a factor comparable to {\sqrt{N}}. However, it follows from known results on the “multiplication table problem” (how many distinct integers are there in the {n \times n} multiplication table?) that most numbers up to {N} do not have a factor comparable to {\sqrt{N}}. (Quick proof: by the Hardy–Ramanujan law, a typical number of size {N} or of size {\sqrt{N}} has {(1+o(1)) \log\log N} factors, hence typically a number of size {N} will not factor into two factors of size {\sqrt{N}}.) So the above strategy cannot work for {k=3}.

However, the situation changes for larger {k}. For instance, for {k=4}, we can try the same strategy with the ansatz

\displaystyle \mathbf{n}_1 = \mathbf{d}_{12} \mathbf{d}_{13} \mathbf{d}_{14}; \quad \mathbf{n}_2 = \mathbf{d}_{12} \mathbf{d}_{23} \mathbf{d}_{24}; \quad \mathbf{n}_3 = \mathbf{d}_{13} \mathbf{d}_{23} \mathbf{d}_{34}; \quad \mathbf{n}_4 = \mathbf{d}_{14} \mathbf{d}_{24} \mathbf{d}_{34}.

Whereas before there were three (approximate) equations constraining three unknowns, now we would have four equations and six unknowns, and so we no longer have strong constraints on any of the {\mathbf{d}_{ij}}. So in principle we now have a chance to find a suitable random choice of the {\mathbf{d}_{ij}}. The most significant remaining obstacle is the Hardy–Ramanujan law: since the {\mathbf{n}_i} typically have {(1+o(1))\log\log N} prime factors, it is natural in this {k=4} case to choose each {\mathbf{d}_{ij}} to have {(\frac{1}{3}+o(1)) \log\log N} prime factors. As it turns out, if one does this (basically by requiring each prime {p \leq N^{\varepsilon^2}} to divide {\mathbf{d}_{ij}} with an independent probability of about {\frac{1}{3p}}, for some small {\varepsilon>0}, and then also adding in one large prime to bring the magnitude of the {\mathbf{n}_i} to be comparable to {N}), the calculations all work out, and one obtains the claimed result.