Harmonic entropy: Difference between revisions

Line 94:

Lastly, the set of rationals is often chosen to be only those "reduced" rationals within the cutoff, such that <math>n/d</math> is in the set only if <math>n</math> and <math>d</math> are coprime. HE can also be formulated with unreduced rationals as well. Both methods tend to give similar results. In Paul's work, reduced rationals are most common, although the use of unreduced rationals may be useful in extending HE to the case where <math>N=\infty</math>.

Given a spreading function and set of basis rationals, there are two different procedures commonly used to assign probabilities to each rational. The first, the '''domain-integral approach''', works for arbitrary nowhere dense sets of rationals without any further free parameters. The second, the '''~~complexity-normalization~~ approach''', has nice mathematical properties which sometimes make it easier to compute and which may lead to generalizations to infinite sets of rationals which are sometimes dense in the reals. It is conjectured that there are certain important limiting situations where the two converge; both are described in detail below.

Given a spreading function and set of basis rationals, there are two different procedures commonly used to assign probabilities to each rational. The first, the '''domain-integral approach''', works for arbitrary nowhere dense sets of rationals without any further free parameters. The second, the '''simple weighted approach''', has nice mathematical properties which sometimes make it easier to compute and which may lead to generalizations to infinite sets of rationals which are sometimes dense in the reals. It is conjectured that there are certain important limiting situations where the two converge; both are described in detail below.

===Domain-Integral Probabilities===

Line 111:

In the case where the set of basis rationals consists of a finite set bounded by Tenney or Weil height, the resulting set of widths is conjectured to have interesting mathematical properties, leading to mathematically nice conceptual simplifications of the model. These simplifications are explained below.

===~~Complexity-Normalization~~ Probabilities===

===Simple Weighted Probabilities===

It has been noted empirically by Paul Erlich that, given all those rationals with Tenney height under some cutoff <math>N</math> as a basis set, that the domain widths for rationals sufficiently far from the cutoff seem to be proportional to <math>\frac{1}{\sqrt{nd}}</math>.

Line 128:

where this time the set of basis rationals is assumed to be all of those of Weil Height ≤ <math>N</math> for some <math>N</math>.

In both cases, the general approach is the same: the value of the spreading function, taken at the value of <math>\cent(j)</math>, is divided by some sort of "complexity" function representing how much weight is given to that rational number. While the two ~~complexity~~ functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary ~~complexities~~ as follows:

In both cases, the general approach is the same: the value of the spreading function, taken at the value of <math>\cent(j)</math>, is divided by some sort of "weighting" (or sometimes, "complexity") function representing how much weight is given to that rational number. While the two weighting functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary weights as follows:

<math>\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\|j\|}</math>

where <math>\|j\|</math> denotes a ~~complexity~~ function ~~mapping~~ from rational numbers to non-negative reals.

where <math>\|j\|</math> denotes a weighting function that maps from rational numbers to non-negative reals.

As these "probabilities" don't sum to 1, the result is not a probability distribution at all, invalidating the use of the Shannon Entropy. To rectify this, the distribution is normalized so that the probabilities do sum to 1:

Line 147:

=== s=17, N<10000, sqrt(n*d) weights ===

This uses as a spreading function the Gaussian distribution with <math>s=~17\cent</math> (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10,000. This uses the ~~complexity-normalization~~ approach, and the ~~complexity~~ function is <math>\sqrt{nd}</math>:

This uses as a spreading function the Gaussian distribution with <math>s=~17\cent</math> (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10,000. This uses the simple weighted approach, and the weighting function is <math>\sqrt{nd}</math>:

[[File:HE_Tenney_N_10000_s_17cents.png]]

=== s=17, N<100, max(n,d) weights ===

This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The ~~complexity~~ function here is <math>\max(n,d)</math>:

This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The weighting function here is <math>\max(n,d)</math>:

[[File:HE_Weil_N_100_s_17cents.png]]

=== s=17, N<10000, sqrt(n*d) vs mediant-to-mediant weights ===

The following image (from Paul Erlich) compares the domain-integral and ~~complexity-normalization~~ approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with s=~17 cents, and the basis set is all those rationals with Tenney height ≤ 10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:

The following image (from Paul Erlich) compares the domain-integral and simple weighted approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with s=~17 cents, and the basis set is all those rationals with Tenney height ≤ 10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:

[[File:HE_Tenney_mediant_vs_sqrt_nd_Paul.png|800px]]

Line 200:

[[File:HE_Tenney_N_10000_s_17cents.png]]

''Harmonic Shannon Entropy (a=1) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> ~~complexity~~.''

''Harmonic Shannon Entropy (a=1) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> weighting.''

===a=2: Harmonic Collision Entropy===

Line 209:

[[File:HE_Tenney_N_10000_s_17cents_a=2.png]]

''Harmonic Collision Entropy (a=2) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> ~~complexity~~.''

''Harmonic Collision Entropy (a=2) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> weighting.''

===a=∞: Harmonic Min-Entropy===

Line 218:

[[File:HE_Tenney_N_10000_s_17cents_a=7.png]]

''Harmonic Rényi Entropy with a=7, with the high value of a being chosen to approximate min-entropy (a=''∞''). The basis set is still all rationals with Tenney height ≤ 10000, the spreading function a Gaussian distribution with s=1% (~17 cents), and the ~~complexity~~ function <math>\sqrt{nd}</math>.''

''Harmonic Rényi Entropy with a=7, with the high value of a being chosen to approximate min-entropy (a=''∞''). The basis set is still all rationals with Tenney height ≤ 10000, the spreading function a Gaussian distribution with s=1% (~17 cents), and the weighting function <math>\sqrt{nd}</math>.''

==Convolution-Based Expression For Quickly Computing Rényi Entropy==

Below is given an derivation that expresses Harmonic Rényi Entropy in terms of two simpler functions, each of which is a convolution product and hence can be computed quickly using the Fast Fourier Transform.

The below derivation depends on the use of ~~complexity-normalization~~ probabilities, although it may be possible to extend to domain-integral probabilities instead.

The below derivation depends on the use of simple weighted probabilities, although it may be possible to extend to domain-integral probabilities instead.

===Preliminaries===

Line 349:

=Extending HE to <math>N=\infty</math>: zeta-HE=

All of the models described above involve a finite set of rational numbers, bounded by some ~~complexity~~ function, and where the ~~complexity~~ is less than some max value <math>N</math>.

All of the models described above involve a finite set of rational numbers, bounded by some weighting function, and where the weighting is less than some max value <math>N</math>.

It so happens that we are more or less able to analytically continue this definition to the situation where <math>N=\infty</math>. More precisely, we are able to analytically continue the exponential of HE, which yields the same relative interval rankings as standard HE.

Line 392:

===Definition of the Unnormalized Harmonic Rényi Entropy===

Let's start by recalling the original definition for Harmonic Rényi Entropy, using ~~complexity-normalization~~ probabilities:

Let's start by recalling the original definition for Harmonic Rényi Entropy, using simple weighted probabilities:

<math>\displaystyle \text{HE}_a(c) = \frac{1}{1-a} \log \sum_{j \in J} P(j|c)^a</math>

Line 400:

<math>\displaystyle P(j|c) = \frac{Q(j|c)}{\sum_{j \in J} Q(j|c)}</math>

where the <math>Q(j|c)</math> is the "unnormalized probability" - the raw value of the spreading function, evaluated at the ratio in question, divided by the ratio's ~~complexity~~. The above equation tells us that the normalized probability is equal to the unnormalized probability, divided by the sum of all unnormalized probabilities.

where the <math>Q(j|c)</math> is the "unnormalized probability" - the raw value of the spreading function, evaluated at the ratio in question, divided by the ratio's weighting. The above equation tells us that the normalized probability is equal to the unnormalized probability, divided by the sum of all unnormalized probabilities.

Line 434:

<math>\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|}</math>

where <math>\|j\|</math> represents the ~~"complexity"~~ of the JI basis ratio <math>j</math>. In the particular case of Tenney weighting, we get:

where <math>\|j\|</math> represents the weighting of the JI basis ratio <math>j</math>. In the particular case of Tenney weighting, we get:

<math>\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{(j_n \cdot j_d)^{0.5}}</math>

Line 464:

Now, we note our summation is currently written simply as <math>\sum_{j \in J}</math>. For a Tenney height ~~complexity~~, we typically bound by <math>\sqrt{nd} < N</math> for some <math>N</math>. However, although it is unusual, for the sake of simplifying the derivation, we will bound by <math>\max(n,d) < N</math> instead, despite the use of Tenney height for ~~complexity~~. This will not end up being much of a problem, as the two will converge on the same result anyway.

Now, we note our summation is currently written simply as <math>\sum_{j \in J}</math>. For a Tenney height weighting, we typically bound by <math>\sqrt{nd} < N</math> for some <math>N</math>. However, although it is unusual, for the sake of simplifying the derivation, we will bound by <math>\max(n,d) < N</math> instead, despite the use of Tenney height for our weighting. This will not end up being much of a problem, as the two will converge on the same result anyway.

Bounding by <math>\max(n,d) < N</math> is the same as specifying that <math>j_n < N</math> and <math>j_d < N</math>. Doing so, we get

Line 492:

And we have now obtained a very interesting result: if we had instead gone with something like <math>(nd)^2</math> ~~complexity~~ on rationals, rather than <math>\sqrt{nd}</math>, that our HE setup ''would'' have converged as <math>N \to \infty</math>, and our original HE convolution kernel would have been the Fourier transform of a particular vertical "slice" of the Riemann zeta function where <math>\Re(z) = 2</math>.

And we have now obtained a very interesting result: if we had instead gone with something like <math>(nd)^2</math> weighting on rationals, rather than <math>\sqrt{nd}</math>, that our HE setup ''would'' have converged as <math>N \to \infty</math>, and our original HE convolution kernel would have been the Fourier transform of a particular vertical "slice" of the Riemann zeta function where <math>\Re(z) = 2</math>.

Furthermore, although the above series doesn't converge for <math>w = 0.5</math>, we can simply use the analytic continuation of the Riemann zeta function to obtain a meaningful function at that point, so that our original convolution kernel can be written as

@@ Line 94: / Line 94: @@
 Lastly, the set of rationals is often chosen to be only those "reduced" rationals within the cutoff, such that <math>n/d</math> is in the set only if <math>n</math> and <math>d</math> are coprime. HE can also be formulated with unreduced rationals as well. Both methods tend to give similar results. In Paul's work, reduced rationals are most common, although the use of unreduced rationals may be useful in extending HE to the case where <math>N=\infty</math>.
-Given a spreading function and set of basis rationals, there are two different procedures commonly used to assign probabilities to each rational. The first, the '''domain-integral approach''', works for arbitrary nowhere dense sets of rationals without any further free parameters. The second, the '''complexity-normalization approach''', has nice mathematical properties which sometimes make it easier to compute and which may lead to generalizations to infinite sets of rationals which are sometimes dense in the reals. It is conjectured that there are certain important limiting situations where the two converge; both are described in detail below.
+Given a spreading function and set of basis rationals, there are two different procedures commonly used to assign probabilities to each rational. The first, the '''domain-integral approach''', works for arbitrary nowhere dense sets of rationals without any further free parameters. The second, the '''simple weighted approach''', has nice mathematical properties which sometimes make it easier to compute and which may lead to generalizations to infinite sets of rationals which are sometimes dense in the reals. It is conjectured that there are certain important limiting situations where the two converge; both are described in detail below.
 ===Domain-Integral Probabilities===
@@ Line 111: / Line 111: @@
 In the case where the set of basis rationals consists of a finite set bounded by Tenney or Weil height, the resulting set of widths is conjectured to have interesting mathematical properties, leading to mathematically nice conceptual simplifications of the model. These simplifications are explained below.
-===Complexity-Normalization Probabilities===
+===Simple Weighted Probabilities===
 It has been noted empirically by Paul Erlich that, given all those rationals with Tenney height under some cutoff <math>N</math> as a basis set, that the domain widths for rationals sufficiently far from the cutoff seem to be proportional to <math>\frac{1}{\sqrt{nd}}</math>.
@@ Line 128: / Line 128: @@
 where this time the set of basis rationals is assumed to be all of those of Weil Height ≤ <math>N</math> for some <math>N</math>.
-In both cases, the general approach is the same: the value of the spreading function, taken at the value of <math>\cent(j)</math>, is divided by some sort of "complexity" function representing how much weight is given to that rational number. While the two complexity functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary complexities as follows:
+In both cases, the general approach is the same: the value of the spreading function, taken at the value of <math>\cent(j)</math>, is divided by some sort of "weighting" (or sometimes, "complexity") function representing how much weight is given to that rational number. While the two weighting functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary weights as follows:
 <math>\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\|j\|}</math>
-where <math>\|j\|</math> denotes a complexity function mapping from rational numbers to non-negative reals.
+where <math>\|j\|</math> denotes a weighting function that maps from rational numbers to non-negative reals.
 As these "probabilities" don't sum to 1, the result is not a probability distribution at all, invalidating the use of the Shannon Entropy. To rectify this, the distribution is normalized so that the probabilities do sum to 1:
@@ Line 147: / Line 147: @@
 === s=17, N<10000, sqrt(n*d) weights ===
-This uses as a spreading function the Gaussian distribution with <math>s=~17\cent</math> (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10,000. This uses the complexity-normalization approach, and the complexity function is <math>\sqrt{nd}</math>:
+This uses as a spreading function the Gaussian distribution with <math>s=~17\cent</math> (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10,000. This uses the simple weighted approach, and the weighting function is <math>\sqrt{nd}</math>:
 [[File:HE_Tenney_N_10000_s_17cents.png]]
 === s=17, N<100, max(n,d) weights ===
-This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The complexity function here is <math>\max(n,d)</math>:
+This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The weighting function here is <math>\max(n,d)</math>:
 [[File:HE_Weil_N_100_s_17cents.png]]
 === s=17, N<10000, sqrt(n*d) vs mediant-to-mediant weights ===
-The following image (from Paul Erlich) compares the domain-integral and complexity-normalization approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with s=~17 cents, and the basis set is all those rationals with Tenney height ≤ 10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:
+The following image (from Paul Erlich) compares the domain-integral and simple weighted approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with s=~17 cents, and the basis set is all those rationals with Tenney height ≤ 10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:
 [[File:HE_Tenney_mediant_vs_sqrt_nd_Paul.png|800px]]
@@ Line 200: / Line 200: @@
 [[File:HE_Tenney_N_10000_s_17cents.png]]
-''Harmonic Shannon Entropy (a=1) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> complexity.''
+''Harmonic Shannon Entropy (a=1) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> weighting.''
 ===a=2: Harmonic Collision Entropy===
@@ Line 209: / Line 209: @@
 [[File:HE_Tenney_N_10000_s_17cents_a=2.png]]
-''Harmonic Collision Entropy (a=2) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> complexity.''
+''Harmonic Collision Entropy (a=2) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> weighting.''
 ===a=∞: Harmonic Min-Entropy===
@@ Line 218: / Line 218: @@
 [[File:HE_Tenney_N_10000_s_17cents_a=7.png]]
-''Harmonic Rényi Entropy with a=7, with the high value of a being chosen to approximate min-entropy (a=''∞''). The basis set is still all rationals with Tenney height ≤ 10000, the spreading function a Gaussian distribution with s=1% (~17 cents), and the complexity function <math>\sqrt{nd}</math>.''
+''Harmonic Rényi Entropy with a=7, with the high value of a being chosen to approximate min-entropy (a=''∞''). The basis set is still all rationals with Tenney height ≤ 10000, the spreading function a Gaussian distribution with s=1% (~17 cents), and the weighting function <math>\sqrt{nd}</math>.''
 ==Convolution-Based Expression For Quickly Computing Rényi Entropy==
 Below is given an derivation that expresses Harmonic Rényi Entropy in terms of two simpler functions, each of which is a convolution product and hence can be computed quickly using the Fast Fourier Transform.
-The below derivation depends on the use of complexity-normalization probabilities, although it may be possible to extend to domain-integral probabilities instead.
+The below derivation depends on the use of simple weighted probabilities, although it may be possible to extend to domain-integral probabilities instead.
 ===Preliminaries===
@@ Line 349: / Line 349: @@
 =Extending HE to <math>N=\infty</math>: zeta-HE=
-All of the models described above involve a finite set of rational numbers, bounded by some complexity function, and where the complexity is less than some max value <math>N</math>.
+All of the models described above involve a finite set of rational numbers, bounded by some weighting function, and where the weighting is less than some max value <math>N</math>.
 It so happens that we are more or less able to analytically continue this definition to the situation where <math>N=\infty</math>. More precisely, we are able to analytically continue the exponential of HE, which yields the same relative interval rankings as standard HE.
@@ Line 392: / Line 392: @@
 ===Definition of the Unnormalized Harmonic Rényi Entropy===
-Let's start by recalling the original definition for Harmonic Rényi Entropy, using complexity-normalization probabilities:
+Let's start by recalling the original definition for Harmonic Rényi Entropy, using simple weighted probabilities:
 <math>\displaystyle \text{HE}_a(c) = \frac{1}{1-a} \log \sum_{j \in J} P(j|c)^a</math>
@@ Line 400: / Line 400: @@
 <math>\displaystyle P(j|c) = \frac{Q(j|c)}{\sum_{j \in J} Q(j|c)}</math>
-where the <math>Q(j|c)</math> is the "unnormalized probability" - the raw value of the spreading function, evaluated at the ratio in question, divided by the ratio's complexity. The above equation tells us that the normalized probability is equal to the unnormalized probability, divided by the sum of all unnormalized probabilities.
+where the <math>Q(j|c)</math> is the "unnormalized probability" - the raw value of the spreading function, evaluated at the ratio in question, divided by the ratio's weighting. The above equation tells us that the normalized probability is equal to the unnormalized probability, divided by the sum of all unnormalized probabilities.
@@ Line 434: / Line 434: @@
 <math>\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|}</math>
-where <math>\|j\|</math> represents the "complexity" of the JI basis ratio <math>j</math>. In the particular case of Tenney weighting, we get:
+where <math>\|j\|</math> represents the weighting of the JI basis ratio <math>j</math>. In the particular case of Tenney weighting, we get:
 <math>\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{(j_n \cdot j_d)^{0.5}}</math>
@@ Line 464: / Line 464: @@
-Now, we note our summation is currently written simply as <math>\sum_{j \in J}</math>. For a Tenney height complexity, we typically bound by <math>\sqrt{nd} < N</math> for some <math>N</math>. However, although it is unusual, for the sake of simplifying the derivation, we will bound by <math>\max(n,d) < N</math> instead, despite the use of Tenney height for complexity. This will not end up being much of a problem, as the two will converge on the same result anyway.
+Now, we note our summation is currently written simply as <math>\sum_{j \in J}</math>. For a Tenney height weighting, we typically bound by <math>\sqrt{nd} < N</math> for some <math>N</math>. However, although it is unusual, for the sake of simplifying the derivation, we will bound by <math>\max(n,d) < N</math> instead, despite the use of Tenney height for our weighting. This will not end up being much of a problem, as the two will converge on the same result anyway.
 Bounding by <math>\max(n,d) < N</math> is the same as specifying that <math>j_n < N</math> and <math>j_d < N</math>. Doing so, we get
@@ Line 492: / Line 492: @@
-And we have now obtained a very interesting result: if we had instead gone with something like <math>(nd)^2</math> complexity on rationals, rather than <math>\sqrt{nd}</math>, that our HE setup ''would'' have converged as <math>N \to \infty</math>, and our original HE convolution kernel would have been the Fourier transform of a particular vertical "slice" of the Riemann zeta function where <math>\Re(z) = 2</math>.
+And we have now obtained a very interesting result: if we had instead gone with something like <math>(nd)^2</math> weighting on rationals, rather than <math>\sqrt{nd}</math>, that our HE setup ''would'' have converged as <math>N \to \infty</math>, and our original HE convolution kernel would have been the Fourier transform of a particular vertical "slice" of the Riemann zeta function where <math>\Re(z) = 2</math>.
 Furthermore, although the above series doesn't converge for <math>w = 0.5</math>, we can simply use the analytic continuation of the Riemann zeta function to obtain a meaningful function at that point, so that our original convolution kernel can be written as