N2D3P9: Difference between revisions
Dave Keenan (talk | contribs) No edit summary |
Dave Keenan (talk | contribs) Changed "minimize" to "maximize" when describing correlation. Changed "this weighted rank correlation" to "this weighted rank error". |
||
| Line 80: | Line 80: | ||
Estimation of pitch ratio popularity is possible because it correlates with numeric simplicity. <math>\text{N2D3P9}</math> is most useful when comparing ranks of more complex ratios, because usage data about such ratios is sparse. By fitting a function to the statistical usage data which is available for simpler ratios, <math>\text{N2D3P9}</math> enables the extension of the patterns found in these simpler ratios. | Estimation of pitch ratio popularity is possible because it correlates with numeric simplicity. <math>\text{N2D3P9}</math> is most useful when comparing ranks of more complex ratios, because usage data about such ratios is sparse. By fitting a function to the statistical usage data which is available for simpler ratios, <math>\text{N2D3P9}</math> enables the extension of the patterns found in these simpler ratios. | ||
Rather than attempt to fit functions to the exact counts of votes for each ratio, the functions were fit to the rank indices of each ratio; in other words, a function only needed to sort ratios the same as the actual data, and within each rank position it was unimportant how close its estimate of votes was. In technical parlance, the goal was to | Rather than attempt to fit functions to the exact counts of votes for each ratio, the functions were fit to the rank indices of each ratio; in other words, a function only needed to sort ratios the same as the actual data, and within each rank position it was unimportant how close its estimate of votes was. In technical parlance, the goal was to maximize the [https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient Spearman’s rank coefficient] between the estimated ranks and the actual ranks. For purposes of comparing competing functions, maximizing Spearman’s rank coefficient could be simplified to minimizing the sum of squared differences between the ranks. But because fitting to the simpler ratios which had more votes is more important, a Zipf's-law weighting was applied to the ranks by taking their reciprocals before calculating their squared differences. A [https://en.wikipedia.org/wiki/Ranking#Fractional_ranking_(%221_2.5_2.5_4%22_ranking) fractional ranking] strategy was used to ensure that stretches of the data with tied vote counts did not distort the measurement. | ||
The overall strategy, then, was to minimize this weighted rank | The overall strategy, then, was to minimize this weighted rank error, while also minimizing the complexity of the function, to avoid overfitting. An earlier notational popularity ranking function for 2,3-removed-ratios, that had been used by the creators of Sagittal was <math>\text{sopfr}</math> ([https://mathworld.wolfram.com/SumofPrimeFactors.html sum of prime factors with repetition]). It does a remarkably good job of estimating the rank of pitch ratios given how simple it is. However the weighted sum of squared errors that <math>\text{sopfr}</math> gives for the Scala stats is about 0.026, while <math>\text{N2D3P9}</math> reduces that to about 0.010. Functions giving sums of squares as low as 0.008 were found, however, these functions were so complex that they probably were fitting to noise in the Scala stats instead of to the true nature of musical pitch. An informal “chunk” metric was devised to compare function complexity in terms of ability to fit to the data, with considered functions ranging from one chunk (<math>\text{sopfr}</math>) to eight chunks; the winning function <math>\text{N2D3P9}</math> has five chunks. | ||
Several techniques were used to find and decide on <math>\text{N2D3P9}</math> as the best 2,3-removed-ratio notational-popularity rank-estimation function. Initial observations about shortcomings of <math>\text{sopfr}</math>, such as its failure to differentiate balanced ratios from their imbalanced equivalents — such as <math>\frac{11}{5}</math> versus <math>\frac{55}{1}</math> — or those with different prime limits such as <math>\frac{13}{5}</math> and <math>\frac{11}{7}</math>, despite those pairs of ratios exhibiting remarkably different actual ranks in the Scala stats, formed the basis of the investigation. Psychoacoustic plausibility of functions was used as a top-down guide for experimentation. [https://en.wikipedia.org/wiki/Mathematical_optimization Optimization] tools such as [https://www.microsoft.com/en-us/microsoft-365/blog/2009/09/21/new-and-improved-solver/ Excel's Evolutionary Solver] were used to navigate toward ideal values for each parameter. The approach that was finally successful was a brute-force approach implemented by Douglas Blumeyer, whereby nearly 2 billion functions combined out of constituent "submetrics" were checked automatically. In the end, one of the functions on the short-list generated from the brute-force checker was recognized as being re-writable in a much simpler form with parameter values rounded to whole numbers without doing much damage to its sum-of-squares, and thus <math>\text{N2D3P9}</math> was born. | Several techniques were used to find and decide on <math>\text{N2D3P9}</math> as the best 2,3-removed-ratio notational-popularity rank-estimation function. Initial observations about shortcomings of <math>\text{sopfr}</math>, such as its failure to differentiate balanced ratios from their imbalanced equivalents — such as <math>\frac{11}{5}</math> versus <math>\frac{55}{1}</math> — or those with different prime limits such as <math>\frac{13}{5}</math> and <math>\frac{11}{7}</math>, despite those pairs of ratios exhibiting remarkably different actual ranks in the Scala stats, formed the basis of the investigation. Psychoacoustic plausibility of functions was used as a top-down guide for experimentation. [https://en.wikipedia.org/wiki/Mathematical_optimization Optimization] tools such as [https://www.microsoft.com/en-us/microsoft-365/blog/2009/09/21/new-and-improved-solver/ Excel's Evolutionary Solver] were used to navigate toward ideal values for each parameter. The approach that was finally successful was a brute-force approach implemented by Douglas Blumeyer, whereby nearly 2 billion functions combined out of constituent "submetrics" were checked automatically. In the end, one of the functions on the short-list generated from the brute-force checker was recognized as being re-writable in a much simpler form with parameter values rounded to whole numbers without doing much damage to its sum-of-squares, and thus <math>\text{N2D3P9}</math> was born. | ||