Harmonic entropy: Difference between revisions

Line 2:

'''Harmonic entropy''' ('''HE''') is a simple model to quantify the extent to which musical chords align with the harmonic series, and thus tend to partly "fuse" into the perception of a single sound with a complex timbre and '''virtual fundamental''' pitch. It was invented by Paul Erlich and developed extensively on the Yahoo! tuning and harmonic_entropy lists, and draws from prior research by Parncutt and Terhardt. Various later contributions to the model have been made by Steve Martin, Mike Battaglia, Keenan Pepper, and others.

Note: the terms dyad, triad and tetrad usually refer to chord with 2, 3 or 4 [[Pitch class|pitch classes]]. But in this discussion they refer to chords with 2, 3 or 4 pitches. Thus C-E-G-C is a tetrad ~~not~~ a triad.

Note: the terms dyad, triad and tetrad usually refer to chord with 2, 3 or 4 [[Pitch class|pitch classes]]. But in this discussion they refer to chords with 2, 3, or 4 pitches. Thus {{dash|C, E, G, C}} is a tetrad instead of a triad.

== Background ==

Line 20:

For dyads, the basic harmonic entropy model is fairly simple: it places the dyad we are trying to measure amidst a backdrop of JI candidates. Then, it uses a point-spread function to determine the relative strengths of the match to each, which are then normalized and treated as probabilities. The "entropy" of the resulting probability distribution is a way to measure how closely this distribution tends to focus on one possibility, rather than being spread out among a set of equally-likely possibilities. If there is only one clear choice of dyad which far exceeds all others in probability, the entropy will be lower. If, on the other hand, there are many equally-likely probabilities, the entropy will be higher. The basic harmonic entropy model can also be extended to modeling triads, tetrads, and so on; the standard way to do so is to simply look at the incoming triad's match to a set of candidate JI triads, and likewise with tetrads, and etc.

=== Additional ~~Interpretations~~ ===

=== Additional interpretations ===

In recent years, it has become clearer that the model can also be very useful in modeling other types of concordance as well, particularly for dyads, where the same model does a very good job in also predicting beatlessness, periodicity buzz, and so on. In particular, Erlich has often suggested the same model, perhaps with slightly different parameters, can also be useful to measure how easy it is to tune a dyad by ear on an instrument such as a guitar, or how much of a sense of being "locked-in" the dyad gives as it is tuned more closely to JI. This may be less related to the perception of virtual fundamentals than it is to beatlessness and so on.

However, it should be noted that the various aspects of psychoacoustic concordance tend to diverge quite strongly in their behavior for larger chords, and thus, when modeling different aspects of psychoacoustic concordance, different ways of generalizing the dyadic model to higher-cardinality chords may be appropriate. In particular, when ~~modeling~~ beatlessness, Erlich has suggested instead looking only at the entropies of the pairwise dyadic subsets of the chord, so that the major and minor chords would be ranked equal in beatlessness, whereas they would not be ranked equal in their ability to produce a clear virtual fundamental (the major chord would be much stronger and lower in entropy).

However, it should be noted that the various aspects of psychoacoustic concordance tend to diverge quite strongly in their behavior for larger chords, and thus, when modeling different aspects of psychoacoustic concordance, different ways of generalizing the dyadic model to higher-cardinality chords may be appropriate. In particular, when modelling beatlessness, Erlich has suggested instead looking only at the entropies of the pairwise dyadic subsets of the chord, so that the major and minor chords would be ranked equal in beatlessness, whereas they would not be ranked equal in their ability to produce a clear virtual fundamental (the major chord would be much stronger and lower in entropy).

~~=== Concordance vs Actual Consonance ===~~

Concordance has often been confused with actual musical consonance, an unfortunate fact made more common by the psychoacoustics literature under the unfortunate name '''sensory consonance''', most often used to refer to phenomena related to roughness and beatlessness specifically. This is not to be confused with the more familiar construct of tonal stability, typically just called "consonance" in Western common practice music theory and sometimes clarified as "musical consonance" in the music cognition literature. To make matters worse, the literature has also at times referred to concordance -- and not tonal stability -- as '''tonal consonance''', often referring to phenomena related to virtual pitch integration, creating a complete terminological mess. As a result, the term "consonance" has been completely avoided in this article

~~While psychoacoustic concordance~~ is not ~~a feature universal~~ to ~~all styles~~ of ~~music~~, ~~it has been utilized significantly~~ in Western music in the ~~study of intonation~~. ~~For instance~~, ~~flexible-pitch ensembles operating within 12-EDO, such as barbershop quartets and string ensembles, will often adjust intonationally from~~ the ~~underlying 12-EDO reference~~ to ~~maximize the concordance of individual chords. Indeed, the entire history of Western tuning theory -- from meantone temperament~~, to ~~the various Baroque well-temperaments,~~ to ~~12-EDO itself~~, to the modern [[Regular_Temperaments|theory of regular temperament]] -- can be seen as an attempt to reason mathematically about how to generate manageable tuning systems that will maximize concordance and minimize discordance. ~~Consonance and dissonance~~, on the ~~other hand, is a much more general phenomenon which can even exist~~ in ~~music which is predominantly monophonic and uses no chords at all~~.

=== Concordance vs. actual consonance ===

Concordance has often been confused with actual musical consonance, an unfortunate fact made more common by the psychoacoustics literature under the unfortunate name '''sensory consonance''', most often used to refer to phenomena related to roughness and beatlessness specifically. This is not to be confused with the more familiar construct of tonal stability, typically just called "consonance" in Western common practice music theory and sometimes clarified as "musical consonance" in the music cognition literature. To make matters worse, the literature has also at times referred to concordance—and not tonal stability—as '''tonal consonance''', often referring to phenomena related to virtual pitch integration, creating a complete terminological mess. As a result, the term "consonance" has been completely avoided in this article.

While psychoacoustic concordance is not a feature universal to all styles of music, it has been utilized significantly in Western music in the study of intonation. For instance, flexible-pitch ensembles operating within 12-EDO, such as barbershop quartets and string ensembles, will often adjust intonationally from the underlying 12-EDO reference to maximize the concordance of individual chords. Indeed, the entire history of Western tuning theory—from meantone temperament, to the various Baroque well-temperaments, to 12-EDO itself, to the modern [[Regular_Temperaments|theory of regular temperament]]—can be seen as an attempt to reason mathematically about how to generate manageable tuning systems that will maximize concordance and minimize discordance. Consonance and dissonance, on the other hand, is a much more general phenomenon which can even exist in music which is predominantly monophonic and uses no chords at all.

== Basic Model: Shannon Entropy ==

Line 38:

Line 35:

The general idea of Harmonic Entropy is to first develop a discrete probability distribution quantifying how strongly an arbitrary incoming dyad "matches" every element in a set of basis rational intervals, and then seeing how evenly distributed the resulting probabilities are. If the distribution for some dyad is spread out very evenly, such that there is no clear "victor" basis interval that dominates the distribution, the dyad is considered to be more discordant; on the other extreme, if the distribution tends to concentrate on one or a small set of dyads, the dyad is considered to be more concordant.

A clear mathematical way of quantifying this "dispersion" is via the ~~[https://en.wikipedia.org/wiki/Entropy_~~(~~information_theory~~) Shannon entropy] of the probability distribution, which can be thought of as describing the "uncertainty" in the distribution. A distribution which has a very high probability of picking one outcome has low entropy and is not very uncertain, whereas a distribution which has the probability spread out on many outcomes is highly uncertain and has a high entropy.

A clear mathematical way of quantifying this "dispersion" is via the {{w|Entropy (information theory)|Shannon entropy}} of the probability distribution, which can be thought of as describing the "uncertainty" in the distribution. A distribution which has a very high probability of picking one outcome has low entropy and is not very uncertain, whereas a distribution which has the probability spread out on many outcomes is highly uncertain and has a high entropy.

=== Definitions ===

To formalize our notion of Shannon entropy, we will first describe the random variable ~~<math>~~J~~</math>~~, representing the set of JI "basis" intervals that our incoming interval is being "matched" to, and the parameter ~~<math>~~C~~</math>~~, representing the "cents" of the incoming interval being played. For example, the interval ~~<math>~~C~~</math>~~ would take values such as "400 ~~cents~~," and the interval ~~<math>~~J~~</math>~~ would take values in the set of basis ratios, such as "5/4" or "9/7."

To formalize our notion of Shannon entropy, we will first describe the random variable ''J'', representing the set of JI "basis" intervals that our incoming interval is being "matched" to, and the parameter ''C'', representing the "cents" of the incoming interval being played. For example, the interval ''C'' would take values such as "400[[cent]]s", and the interval ''J'' would take values in the set of basis ratios, such as "5/4" or "9/7."

So for example, if we want to express the probability that the incoming dyad "400 ~~cents"~~ is perceived as the JI basis interval "5/4," we would write that as the conditional probability

So for example, if we want to express the probability that the incoming dyad "400{{cent}}'' is perceived as the JI basis interval "5/4," we would write that as the conditional probability

$$\displaystyle \newcommand{\cent}{\text{¢}}$$

$$\displaystyle P(J=5/4|C=400\cent)$$

$$\displaystyle P(J=5/4\,|\, C=400\cent)$$

Or, in general, if we want to write the conditional probability that some incoming dyad of ~~<math>~~c~~</math>~~ cents is perceived as the JI basis interval ~~<math>~~j~~</math>~~, we would write that as

Or, in general, if we want to write the conditional probability that some incoming dyad of ''c'' cents is perceived as the JI basis interval ''j'', we would write that as

$$\displaystyle P(J=j|C=c)$$

$$\displaystyle P(J=j\,|\, C=c)$$

which notationally, we will often abbreviate as

Line 55:

Line 52:

$$\displaystyle P(j|c)$$

Note that at this point, we haven't yet specified what the particular probability distribution is. There are different ways to do this, which are described in more detail below. Generally, most approaches involve each JI interval's probability being assigned based on how close it is to ~~<math>~~c~~</math>~~ (closer dyads are given a larger probability), and how simple it is (simple dyads are given a higher probability, if distance is the same).

Note that at this point, we haven't yet specified what the particular probability distribution is. There are different ways to do this, which are described in more detail below. Generally, most approaches involve each JI interval's probability being assigned based on how close it is to ''c'' (closer dyads are given a larger probability), and how simple it is (simple dyads are given a higher probability, if distance is the same).

A noteworthy point is that we generally do not assume any probability distribution on ~~<math>~~C~~</math>~~. This reflects that we do not make any assumptions at all about which notes or intervals are likely to be played to begin with. In other words, we are treating ~~<math>~~C~~</math>~~ more as a "parameter" rather than as a random variable.

A noteworthy point is that we generally do not assume any probability distribution on ''C''. This reflects that we do not make any assumptions at all about which notes or intervals are likely to be played to begin with. In other words, we are treating ''C'' more as a "parameter" rather than as a random variable.

Once we have decided on a probability distribution, we can finally evaluate the Shannon entropy. For a random variable ~~<math>~~X~~</math>~~, the Shannon entropy is defined as:

Once we have decided on a probability distribution, we can finally evaluate the Shannon entropy. For a random variable ''X'', the Shannon entropy is defined as:

$$\displaystyle H(X) = -\sum_{x \in X} P(x) \log_b P(x)$$

where the different ~~<math>~~x~~</math>~~ are taken from the sample space of ~~<math>~~X~~</math>~~, and ~~<math>~~b~~</math>~~ is the base of the log. Different choices of ~~<math>~~b~~</math>~~ simply change the units in which entropy is given, the most common values being 2 and e, denoting "bits" and "nats". We will omit the base going forward, for simplicity.

where the different ''x'' are taken from the sample space of ''X'', and ''b'' is the base of the log. Different choices of ''b'' simply change the units in which entropy is given, the most common values being 2 and e, denoting "bits" and "nats". We will omit the base going forward, for simplicity.

In our case, we want to find the entropy of the random variable ~~<math>~~J~~</math>~~ of JI intervals, given a particular choice of incoming dyad in cents. The corresponding quantity that we want is:

In our case, we want to find the entropy of the random variable ''J'' of JI intervals, given a particular choice of incoming dyad in cents. The corresponding quantity that we want is:

$$\displaystyle H(J|c) = -\sum_{j \in J} P(j|c) \log P(j|c)$$

Note that above, the summation is only taken on the ~~<math>~~j~~</math>~~ from the sample space of ~~<math>~~J~~</math>~~ (i.e. the set of JI basis intervals), whereas the parameter ~~<math>~~c~~</math>~~ is treated as constant within the summation (and is taken as the free parameter to the function).

Note that above, the summation is only taken on the ''j'' from the sample space of ''J'' (i.e. the set of JI basis intervals), whereas the parameter ''c'' is treated as constant within the summation (and is taken as the free parameter to the function).

Since the parameter ~~<math>~~c~~</math>~~ is the free parameter, sometimes the above is notated as

Since the parameter ''c'' is the free parameter, sometimes the above is notated as

$$\displaystyle \text{HE}(c) = H(J|c)$$

which makes more explicit that ~~<math>~~c~~</math>~~ is the argument to the harmonic entropy function, which is equal to the entropy of ~~<math>~~J~~</math>~~, conditioned on the incoming dyad of ~~<math>~~c~~</math>~~ cents.

which makes more explicit that ''c'' is the argument to the harmonic entropy function, which is equal to the entropy of ''J'', conditioned on the incoming dyad of ''c'' cents.

~~=== Probability Distributions ===~~

In order to systematically assign a probability distribution to this dyad, we first start by defining a '''spreading function''', denoted by ~~<math>~~S(x)~~</math>~~, that dictates how the dyad is "smeared" out in log-frequency space, representing how the auditory system allows for some tolerance for mistuning. The typical choice that we will assume here for a spreading function is a Gaussian distribution, with mean centered around the incoming dyad, and standard deviation typically taken as a free parameter in the system and denoted as ~~<math>~~s~~</math>~~.

=== Probability distributions ===

In order to systematically assign a probability distribution to this dyad, we first start by defining a '''spreading function''', denoted by ''S''(''x''), that dictates how the dyad is "smeared" out in log-frequency space, representing how the auditory system allows for some tolerance for mistuning. The typical choice that we will assume here for a spreading function is a Gaussian distribution, with mean centered around the incoming dyad, and standard deviation typically taken as a free parameter in the system and denoted as ''s''.

A fairly typical choice of settings for a basic dyadic HE model would be:

* The basis set is all those rationals bounded by some maximum Tenney height, with the bound typically notated as ''N'' and set to at least 10,000.

* The spreading function is typically a Gaussian distribution with a frequency deviation of 1% either way, or about {{nowrap|''s'' ≈ 17{{c}}}}.

* The basis set is all those rationals bounded by some maximum Tenney height, with the bound typically notated as <math>N</math> and set to at least 10,000.

Other spreading functions have also been explored, such as the use of the heavy-tailed [https://en.wikipedia.org/wiki/Laplace_distribution Laplace distribution], sometimes described as the "Vos function" in Paul's writings. These two functions are part of the [https://en.wikipedia.org/wiki/Generalized_normal_distribution Generalized normal distribution] family, which has a parameter not only for the variance but for the kurtosis. However, for simplicity, we will assume the Gaussian distribution as the spreading function for the remainder of this article, so that the spreading function for an incoming dyad ''c'' can be written as follows:

* The spreading function is typically a Gaussian distribution with a frequency deviation of 1% either way, or about s=~17 cents.

Other spreading functions have also been explored, such as the use of the heavy-tailed [https://en.wikipedia.org/wiki/Laplace_distribution Laplace distribution], sometimes described as the "Vos function" in Paul's writings. These two functions are part of the [https://en.wikipedia.org/wiki/Generalized_normal_distribution Generalized normal distribution] family, which has a parameter not only for the variance but for the kurtosis. However, for simplicity, we will assume the Gaussian distribution as the spreading function for the remainder of this article, so that the spreading function for an incoming dyad ~~<math>~~c~~</math>~~ can be written as follows:

$$\displaystyle S(x-c) = \frac{1}{s\sqrt{2\pi}} e^{-\frac{(x-c)^2}{2s^2}}$$

where the notation ~~<math>~~S(x-c)~~</math>~~ is chosen to make clear that we are translating ~~<math>~~S(x)~~</math>~~ to be centered around the incoming dyad ~~<math>~~c~~</math>~~, which is now the mean of the Gaussian.

where the notation {{nowrap|''S''(''x'' − ''c'')}} is chosen to make clear that we are translating ''S''(''x'') to be centered around the incoming dyad ''c'', which is now the mean of the Gaussian.

We assume here that the variable ~~<math>~~x~~</math>~~ is a dummy variable representing cents, and will adopt this convention for the remainder of the article.

We assume here that the variable ''x'' is a dummy variable representing cents, and will adopt this convention for the remainder of the article.

In this notation, ~~<math>~~s~~</math>~~ becomes the standard deviation of the Gaussian, being an ASCII-friendly version of the more familiar symbol ~~<math>\sigma</math>~~ for representing the standard deviation. Note that in previous expositions on Harmonic Entropy, ~~<math>~~s~~</math>~~ was sometimes given in units representing a percentage of linear-frequency deviation; we allow ~~<math>~~s~~</math>~~ to stand for cents here to simplify the notation. To convert from a percentage to cents, the formula ~~<math>\text~~{~~cents~~} = 1200~~\log_2~~(1+~~\text{percentage})~~</~~math~~> can be used.

In this notation, ''s'' becomes the standard deviation of the Gaussian, being an ASCII-friendly version of the more familiar symbol σ for representing the standard deviation. Note that in previous expositions on Harmonic Entropy, ''s'' was sometimes given in units representing a percentage of linear-frequency deviation; we allow ''s'' to stand for cents here to simplify the notation. To convert from a percentage to cents, the formula {{nowrap|¢ {{=}} 1200(1 + log2(percentage))}} can be used.

It is also common to use as a basis set all those rationals bounded by some maximum Weil height, with a typical cutoff for ~~<math>~~N~~</math>~~ set to at least 100. This has sometimes been referred to as seeding HE with the "Farey sequence of order ~~<math>~~N~~</math>~~" and its reciprocals, so references in Paul's work to "Farey series HE" vs "Tenney series HE" are sometimes seen.

It is also common to use as a basis set all those rationals bounded by some maximum Weil height, with a typical cutoff for ''N'' set to at least 100. This has sometimes been referred to as seeding HE with the "Farey sequence of order ''N''" and its reciprocals, so references in Paul's work to "Farey series HE" vs "Tenney series HE" are sometimes seen.

Lastly, the set of rationals is often chosen to be only those "reduced" rationals within the cutoff, such that ~~<math>~~n/d~~</math>~~ is in the set only if ~~<math>~~n~~</math>~~ and ~~<math>~~d~~</math>~~ are coprime. HE can also be formulated with unreduced rationals as well. Both methods tend to give similar results. In Paul's work, reduced rationals are most common, although the use of unreduced rationals may be useful in extending HE to the case where ~~<math>~~N=~~\infty</math>~~.

Lastly, the set of rationals is often chosen to be only those "reduced" rationals within the cutoff, such that ''n''/''d'' is in the set only if ''n'' and ''d'' are coprime. HE can also be formulated with unreduced rationals as well. Both methods tend to give similar results. In Paul's work, reduced rationals are most common, although the use of unreduced rationals may be useful in extending HE to the case where {{nowrap|''N'' {{=}} ∞}}.

Given a spreading function and set of basis rationals, there are two different procedures commonly used to assign probabilities to each rational. The first, the '''domain-integral approach''', works for arbitrary nowhere dense sets of rationals without any further free parameters. The second, the '''simple weighted approach''', has nice mathematical properties which sometimes make it easier to compute and which may lead to generalizations to infinite sets of rationals which are sometimes dense in the reals. It is conjectured that there are certain important limiting situations where the two converge; both are described in detail below.

==== Domain-~~Integral Probabilities~~ ====

==== Domain-integral probabilities ====

For discrete sets of JI basis ratios, the log-frequency spectrum can be divided up into '''domains''' assigned to each ratio. Each ratio is assigned a domain with lower bound equal to the mediant of itself and its nearest lower neighbor, and likewise with upper bound equal to the mediant of itself and its nearest upper neighbor. If no such neighbor exists, <math>\pm \infty</math> is used instead. Mathematically, this can be represented via the following expression:

$$\displaystyle P(j|c) = \int_{\cent(j_l)}^{\cent(j_u)} S(x-c) dx$$

$$\displaystyle P(j|c) = \int_{\cent\left(j_l\right)}^{\cent\left(j_u\right)} S(x-c) dx$$

where ~~<math>~~S(x-c)~~</math>~~ is the spreading function associated with c, <~~math~~>~~j_l~~</~~math~~> and <~~math~~>~~j_u~~</~~math~~> are the domain lower and upper bounds associated with JI basis ratio ~~<math>~~j~~</math>~~, and <math>\cent(f) = 1200\log_2(f)</math>, or the "cents" function converting frequency ratios to cents. Typically, <~~math~~>~~j_l~~</~~math~~> is set equal to the mediant of ~~<math>~~j~~</math>~~ and its nearest lower neighbor (if it exists), or ~~<math>-\infty</math>~~ if not; likewise with <~~math~~>~~j_u~~</~~math~~> and its nearest upper neighbor.

where {{nowrap|''S''(''x'' − ''c'')}} is the spreading function associated with ''c'', ''j''''l'' and ''j''''u'' are the domain lower and upper bounds associated with JI basis ratio ''j'', and <math>\cent(f) = 1200\log_2(f)</math>, or the "cents" function converting frequency ratios to cents. Typically, ''j''''l'' is set equal to the mediant of ''j'' and its nearest lower neighbor (if it exists), or −∞ if not; likewise with ''j''''u'' and its nearest upper neighbor.

This process can be summarized by the following picture, taken from [http://sethares.engr.wisc.edu/paperspdf/HarmonicEntropy.pdf William Sethares' paper on Harmonic Entropy]:

Line 113:

Line 108:

[[File:HarmonicEntropySethares.png]]

Note the difference in terminology ~~here - in~~ this example, the <~~math~~>~~f_{~~j+n}</~~math~~> are the basis ratios, the <~~math~~>~~r_{~~j+n}</~~math~~> are the domains for each basis ratio, and the bounds for each domain are the mediants between each <~~math~~>~~f_{~~j+n}</~~math~~> and its nearest neighbor. The probability assigned to each basis ratio is then the area under the spreading function curve for each ratio's domain. The entropy of this probability distribution is then the Harmonic Entropy for that dyad.

Note the difference in terminology here—in this example, the {{nowrap|''f''''j'' + ''n''}} are the basis ratios, the {{nowrap|''r''''j'' + ''n''}} are the domains for each basis ratio, and the bounds for each domain are the mediants between each {{nowrap|''f''''j'' + ''n''}} and its nearest neighbor. The probability assigned to each basis ratio is then the area under the spreading function curve for each ratio's domain. The entropy of this probability distribution is then the Harmonic Entropy for that dyad.

In the case where the set of basis rationals consists of a finite set bounded by Tenney or Weil height, the resulting set of widths is conjectured to have interesting mathematical properties, leading to mathematically nice conceptual simplifications of the model. These simplifications are explained below.

==== Simple Weighted Probabilities ====

It has been noted empirically by Paul Erlich that, given all those rationals with Tenney height under some cutoff ~~<math>~~N~~</math>~~ as a basis set, that the domain widths for rationals sufficiently far from the cutoff seem to be proportional to <math>\frac{1}{\sqrt{nd}}</math>.

It has been noted empirically by Paul Erlich that, given all those rationals with Tenney height under some cutoff ''N'' as a basis set, that the domain widths for rationals sufficiently far from the cutoff seem to be proportional to <math>\frac{1}{\sqrt{nd}}</math>.

While it's still an open conjecture that this pattern holds for arbitrarily large ~~<math>~~N~~</math>~~, the assumption is sometimes made that this is the case, and hence that for these basis ratio sets, <math>\frac{1}{\sqrt{nd}}</math> "approximations" to the width are sufficient to estimate domain-integral Harmonic Entropy.

While it's still an open conjecture that this pattern holds for arbitrarily large ''N'', the assumption is sometimes made that this is the case, and hence that for these basis ratio sets, <math>\frac{1}{\sqrt{nd}}</math> "approximations" to the width are sufficient to estimate domain-integral Harmonic Entropy.

This modifies the expression for the probabilities ~~<math>~~P(j|c)~~</math>~~ as follows, noting that for now the "probabilities" won't sum to 1:

This modifies the expression for the probabilities P(j{{!}}c) as follows, noting that for now the "probabilities" won't sum to 1:

$$\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\sqrt{j_n \cdot j_d}}$$

where the ~~<math>~~Q~~</math>~~ notation now represents that these "probabilities" are unnormalized, and <~~math~~>~~j_n~~</~~math~~> and <~~math~~>~~j_d~~</~~math~~> are the numerator and denominator, respectively, of JI basis ratio ~~<math>~~j~~</math>~~. Again, the set of basis rationals here is assumed to be all of those rationals of Tenney ~~Height~~ ≤ ~~<math>~~N~~</math>~~ for some ~~<math>~~N~~</math>~~.

where the ''Q'' notation now represents that these "probabilities" are unnormalized, and ''j''''n'' and ''j''''d'' are the numerator and denominator, respectively, of JI basis ratio ''j''. Again, the set of basis rationals here is assumed to be all of those rationals of Tenney height ≤ ''N'' for some ''N''.

A similar observation for the use of Weil-bounded subsets of the rationals suggests domain widths of ~~<math>\frac~~{1~~}{\~~max(n,d)}~~</math>~~, yielding instead the following formula:

A similar observation for the use of Weil-bounded subsets of the rationals suggests domain widths of {{sfrac|1|max(''n'', ''d'')}}, yielding instead the following formula:

$$\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\max(j_n, j_d)}$$

where this time the set of basis rationals is assumed to be all of those of Weil ~~Height~~ ≤ ~~<math>~~N~~</math>~~ for some ~~<math>~~N~~</math>~~.

where this time the set of basis rationals is assumed to be all of those of Weil height ≤ ''N'' for some ''N''.

In both cases, the general approach is the same: the value of the spreading function, taken at the value of ~~<math>\cent~~(j)~~</math>~~, is divided by some sort of "weighting" (or sometimes, "complexity") function representing how much weight is given to that rational number. While the two weighting functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary weights as follows:

In both cases, the general approach is the same: the value of the spreading function, taken at the value of ¢(j), is divided by some sort of "weighting" (or sometimes, "complexity") function representing how much weight is given to that rational number. While the two weighting functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary weights as follows:

$$\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\|j\|}$$

where ~~<math>\|~~j~~\|</math>~~ denotes a weighting function that maps from rational numbers to non-negative reals.

where {{!}}''j''{{!}} denotes a weighting function that maps from rational numbers to non-negative reals.

As these "probabilities" don't sum to 1, the result is not a probability distribution at all, invalidating the use of the Shannon Entropy. To rectify this, the distribution is normalized so that the probabilities do sum to 1:

Line 144:

Line 139:

$$\displaystyle P(j|c) = \frac{Q(j|c)}{\sum_{j \in J} Q(j|c)}$$

which is equal to the unnormalized probability, divided by the sum of all unnormalized probabilities. This definition of ~~<math>~~P(j|c)~~</math>~~ is then used directly to compute the entropy.

which is equal to the unnormalized probability, divided by the sum of all unnormalized probabilities. This definition of P(j{{!}}c) is then used directly to compute the entropy.

This approach to assigning probabilities to basis rationals is useful because it hypothetically makes it possible to consider the HE of sets of rationals which are dense in the reals, or even the entire set of positive rationals, although the best way to do this is a subject of ongoing research.

=== Examples ===

In all of these examples, the ''x''-axis represents the width in cents of the dyad, and the ''y''-axis represents ''discordance'' rather than concordance, measured in nats of Shannon entropy.

~~In all of these examples, the x-axis represents the width in cents of the dyad, and the y-axis represents ''discordance'' rather than concordance, measured in nats of Shannon entropy.~~

==== ''s'' {{=}} 17, ''N'' < 10000, <math>\sqrt{nd}</math> weights ====

This uses as a spreading function the Gaussian distribution with {{nowrap|''s'' {{=}} ~17{{c}}}} (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10,000. This uses the simple weighted approach, and the weighting function is <math>\sqrt{nd}</math>:

==== s=17, N<10000, sqrt~~(n*d)~~ weights ====

This uses as a spreading function the Gaussian distribution with ~~<math>~~s=~17~~\cent</math>~~ (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10,000. This uses the simple weighted approach, and the weighting function is <math>\sqrt{nd}</math>:

[[File:HE_Tenney_N_10000_s_17cents.png]]

==== s=17, N<100, max(n,d) weights ====

==== ''s'' {{=}} 17, ''N'' < 100, max(''n'', ''d'') weights ====

This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The weighting function here is ~~<math>\~~max(n,d)~~</math>~~:

This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The weighting function here is max(''n'', ''d''):

[[File:HE_Weil_N_100_s_17cents.png]]

==== s=17, N<10000, sqrt~~(n*d)~~ vs mediant-to-mediant weights ====

==== ''s'' {{=}} 17, ''N'' < 10000, <math>\sqrt{nd}</math> vs. mediant-to-mediant weights ====

The following image (from Paul Erlich) compares the domain-integral and simple weighted approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with s=~17 ~~cents~~, and the basis set is all those rationals with Tenney height ≤ 10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:

The following image (from Paul Erlich) compares the domain-integral and simple weighted approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with {{nowrap|''s'' {{=}} ~17{{c}}}}, and the basis set is all those rationals with Tenney height ≤ 10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:

[[File:HE_Tenney_mediant_vs_sqrt_nd_Paul.png|800px]]

== Harmonic Rényi ~~Entropy~~ ==

== Harmonic Rényi entropy ==

An extension to the base Harmonic Entropy model, proposed by Mike Battaglia, is to generalize the use of {{w|Entropy (information_ theory)|Shannon entropy}} by replacing it instead with {{w|Rényi entropy}}, a {{w|q-analog|''q''-analog}} of Shannon's original entropy. This can be thought of as adding a second parameter, called ''a'', to the model, reflecting how "intelligent" the brain's "decoding" process is when determining the most likely JI interpretation of an ambiguous interval.

An extension to the base Harmonic Entropy model, proposed by Mike Battaglia, is to generalize the use of ~~[https://en.wikipedia.org/wiki/Entropy_~~(~~information_theory~~) Shannon entropy] by replacing it instead with ~~[https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy~~ Rényi entropy], a ~~[https://en.wikipedia.org/wiki/Q~~-analog q-analog] of Shannon's original entropy. This can be thought of as adding a second parameter, called ~~<math>~~a~~</math>~~, to the model, reflecting how "intelligent" the brain's "decoding" process is when determining the most likely JI interpretation of an ambiguous interval.

=== Definitions and Background ===

The '''Harmonic Rényi entropy of order ''a''''' of an incoming dyad can be defined as follows:

The '''Harmonic Rényi ~~Entropy~~ of order a''' of an incoming dyad can be defined as follows:

$$\displaystyle \text{HE}_a(c) = H_a(J|c) = \frac{1}{1-a} \log \sum_{j \in J} P(j|c)^a$$

Being a q-analog, it is noteworthy that Rényi entropy converges to Shannon entropy in the limit as ~~<math>~~a ~~\to~~ 1~~</math>~~, a fact which can be verified using L'Hôpital's rule as found [http://www.sonycsl.co.jp/person/nielsen/Note-HopitalRuleShannonRenyiTsallis.pdf here].

Being a ''q''-analog, it is noteworthy that Rényi entropy converges to Shannon entropy in the limit as {{nowrap|''a'' → 1}}, a fact which can be verified using {{w|L'Hôpital's rule}} as found [http://www.sonycsl.co.jp/person/nielsen/Note-HopitalRuleShannonRenyiTsallis.pdf here].

The Rényi entropy has found use in cryptography as a measure of the strength of a cryptographic code in the face of an intelligent attacker, an application for which Shannon entropy has long been known to be insufficient as described in [http://users.cis.fiu.edu/~smithg/papers/qest11.pdf this paper] and [http://www.ietf.org/rfc/rfc4086.txt this RFC]. More precisely, the Rényi entropy of order ~~<math>\infty</math>~~, also called the '''min-entropy''', is used to measure the strength of the randomness used to define a cryptographic secret against a "worst-case" attacker who has complete knowledge of the probability distribution from which cryptographic secrets are drawn.

The Rényi entropy has found use in cryptography as a measure of the strength of a cryptographic code in the face of an intelligent attacker, an application for which Shannon entropy has long been known to be insufficient as described in [http://users.cis.fiu.edu/~smithg/papers/qest11.pdf this paper] and [http://www.ietf.org/rfc/rfc4086.txt this RFC]. More precisely, the Rényi entropy of order ∞, also called the '''min-entropy''', is used to measure the strength of the randomness used to define a cryptographic secret against a "worst-case" attacker who has complete knowledge of the probability distribution from which cryptographic secrets are drawn.

In a musical context, by considering the incoming dyad as analogous to a cryptographic code which is attempting to be "cracked" by an intelligent auditory system, we can consider that the analogous "worst-case attacker" would be a "best-case auditory system" which has complete awareness of the probability distribution for any incoming dyad. This analogy would view such an auditory system as actively attempting to choose the most probable rational, rather than drawing a rational at random weighted by the distribution.

The use of ~~<math>~~a=∞~~</math>~~ min-entropy would reflect this view. In contrast, the use of ~~<math>~~a=1~~</math>~~ Shannon entropy reflects a much "dumber" process which performs no such analysis and perhaps doesn't even seek to "choose" any sort of "victor" rational at all. As the parameter a interpolates between these two options, it can be interpreted as the extent to which the rational-matching process for incoming dyads is considered to be "intelligent" and "active" in this way.

The use of {{nowrap|''a'' {{=}} ∞}} min-entropy would reflect this view. In contrast, the use of {{nowrap|''a'' {{=}} 1}} Shannon entropy reflects a much "dumber" process which performs no such analysis and perhaps doesn't even seek to "choose" any sort of "victor" rational at all. As the parameter a interpolates between these two options, it can be interpreted as the extent to which the rational-matching process for incoming dyads is considered to be "intelligent" and "active" in this way.

Some psychoacoustic effects naturally fit into this paradigm, such as the virtual pitch integration process, which actually does attempt to find a single victor when matching incoming chords with chunks of the harmonic series. Other psychoacoustic effects, such as that of beatlessness, may instead be better viewed as "dumb" processes whereby nothing in particular is being "chosen," but where a more uniform distribution of matching rational numbers for a dyad simply generates a more discordant sonic effect. Different values of a can differentiate between the predominance given to these two types of effect in the overall construct of psychoacoustic concordance.

Certain values of ~~<math>~~a~~</math>~~ reduce to simpler expressions and have special names, as given in the examples below.

Certain values of ''a'' reduce to simpler expressions and have special names, as given in the examples below.

=== Examples ===

==== a=0: Harmonic Hartley ~~Entropy~~ ====

==== ''a'' {{=}} 0: Harmonic Hartley entropy ====

$$\displaystyle H_0(J|c) = \log |J|$$

where ~~<math>|~~J~~|</math>~~ is the cardinality of the set of basis rationals. This assumes, in essence, an "infinitely dumb" auditory system which can do no better than picking a rational number from a uniform distribution completely at random. All dyads have the same Harmonic Hartley Entropy. The Hartley Entropy is sometimes called the "max-entropy," and is useful mainly as an upper bound on the other forms of entropy: all Rényi Entropies are always guaranteed to be less than the Hartley Entropy.

where {{!}}''J''{{!}} is the cardinality of the set of basis rationals. This assumes, in essence, an "infinitely dumb" auditory system which can do no better than picking a rational number from a uniform distribution completely at random. All dyads have the same Harmonic Hartley Entropy. The Hartley Entropy is sometimes called the "max-entropy," and is useful mainly as an upper bound on the other forms of entropy: all Rényi Entropies are always guaranteed to be less than the Hartley Entropy.

[[File:HRE_a=0.png]]

''Harmonic Hartley Entropy (a=0) with the basis set all rationals with Tenney height ≤ 10000. Note that the choice of spreading function makes no difference in the end result at all.''

''Harmonic Hartley Entropy ({{nowrap|a {{=}} 0}}) with the basis set all rationals with Tenney height ≤ 10000. Note that the choice of spreading function makes no difference in the end result at all.''

==== a=1: Harmonic Shannon ~~Entropy (Harmonic Entropy)~~ ====

==== ''a'' {{=}} 1: Harmonic Shannon entropy ====

$$\displaystyle H_1(J|c) = -\sum_{j \in J} P(j|c) \log P(j|c)$$

This is Paul's original ~~Harmonic Entropy~~. Within the cryptographic analogy, this can be thought of as an auditory system which simply selects a rational at random from the incoming distribution, weighted via the distribution itself.

This is Paul's original harmonic entropy. Within the cryptographic analogy, this can be thought of as an auditory system which simply selects a rational at random from the incoming distribution, weighted via the distribution itself.

[[File:HE_Tenney_N_10000_s_17cents.png]]

''Harmonic Shannon Entropy (a=1) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 ~~cents~~), and <math>\sqrt{nd}</math> weighting.''

''Harmonic Shannon Entropy ({{nowrap|a {{=}} 1}}) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with {{nowrap|s {{=}} 1%}} (~17{{c}}), and <math>\sqrt{nd}</math> weighting.''

==== a=2: Harmonic ~~Collision Entropy~~ ====

==== ''a'' {{=}} 2: Harmonic collision entropy ====

$$\displaystyle H_2(J|c) = -\log \sum_{j \in J} P(j|c)^2 = -\log (J_1 = J_2|c)$$

$$\displaystyle H_2(J|c) = -\log \sum_{j \in J} P(j|c)^2 = -\log \left(J_1 = J_2\,\vert\,c\right)$$

where <~~math~~>~~J_1~~</~~math~~> and <~~math~~>~~J_2~~</~~math~~> are two independent and identically distributed random variables of JI basis ratios, conditioned on the same incoming dyad ~~<math>~~c~~</math>~~, and the collision entropy is the same as the negative log of the probability that the two JI variables produce the same outcome.

where ''J''1 and ''J''2 are two independent and identically distributed random variables of JI basis ratios, conditioned on the same incoming dyad ''c'', and the collision entropy is the same as the negative log of the probability that the two JI variables produce the same outcome.

[[File:HE_Tenney_N_10000_s_17cents_a=2.png]]

''Harmonic Collision Entropy (a=2) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 ~~cents~~), and <math>\sqrt{nd}</math> weighting.''

''Harmonic Collision Entropy ({{nowrap|a {{=}} 2}}) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with {{nowrap|''s'' {{=}} 1%}} (~17{{c}}), and <math>\sqrt{nd}</math> weighting.''

==== a=∞: Harmonic ~~Min~~-~~Entropy~~ ====

==== ''a'' {{=}} ∞: Harmonic min-entropy ====

$$\displaystyle H_\infty(J|c) = -\log \max_{j \in J} P(j|c)$$

This ~~is the~~ min-entropy~~, which~~ simply takes the negative log of the largest probability in the distribution. This can be thought of as representing the "strength" of the incoming dyad from being "deciphered" by a "best-case" auditory system. The name "min-entropy" reflects that the ~~<math>~~a=~~\infty</math>~~ case is guaranteed to be a lower bound among all Rényi entropies.

This min-entropy simply takes the negative log of the largest probability in the distribution. This can be thought of as representing the "strength" of the incoming dyad from being "deciphered" by a "best-case" auditory system. The name "min-entropy" reflects that the {{nowrap|''a'' {{=}} ∞}} case is guaranteed to be a lower bound among all Rényi entropies.

[[File:HE_Tenney_N_10000_s_17cents_a=7.png]]

''Harmonic Rényi Entropy with a=7, with the high value of a being chosen to approximate min-entropy (a=''∞''). The basis set is still all rationals with Tenney height ≤ 10000, the spreading function a Gaussian distribution with s=1% (~17 ~~cents~~), and the weighting function <math>\sqrt{nd}</math>.''

''Harmonic Rényi Entropy with {{nowrap|a {{=}} 7}}, with the high value of a being chosen to approximate min-entropy ({{nowrap|a {{=}} ''∞''}}). The basis set is still all rationals with Tenney height ≤ 10000, the spreading function a Gaussian distribution with {{nowrap|''s'' {{=}} 1%}} (~17{{c}}), and the weighting function <math>\sqrt{nd}</math>.''

=== Convolution-~~Based Expression For Quickly Computing~~ Rényi ~~Entropy~~ ===

=== Convolution-based expression for quickly computing Rényi entropy ===

Below is given an derivation that expresses ~~Harmonic~~ Rényi ~~Entropy~~ in terms of two simpler functions, each of which is a convolution product and hence can be computed quickly using the ~~Fast~~ Fourier ~~Transform~~.

Below is given an derivation that expresses harmonic Rényi entropy in terms of two simpler functions, each of which is a convolution product and hence can be computed quickly using the fast Fourier transform (FFT).

The below derivation depends on the use of simple weighted probabilities, although it may be possible to extend to domain-integral probabilities instead.

==== Preliminaries ====

~~The~~ Harmonic Rényi ~~Entropy~~ is defined as

Harmonic Rényi entropy is defined as

$$\displaystyle \text{HE}_a(c) = H_a(J|c) = \frac{1}{1-a} \log \sum_{j \in J} P(j|c)^a$$

Line 253:

Line 245:

Since ~~<math>\psi~~(c)~~</math>~~ is the same for each basis ratio, we can pull it out of the summation to obtain:

Since ψ(''c'') is the same for each basis ratio, we can pull it out of the summation to obtain:

$$\displaystyle H_a(J|c) = \frac{1}{1-a} \log \left( \frac{\sum_{j \in J} Q(j|c)^a}{\psi(c)^a} \right)$$

Line 268:

Line 260:

We thus reduce the term inside the logarithm to the quotient of the functions <~~math~~>~~\rho_a~~(c)~~</math>~~ and ~~<math>\psi~~(c)~~</math>~~. Our aim is now to express each of these two functions in terms of a convolution product.

We thus reduce the term inside the logarithm to the quotient of the functions ρa(''c'') and ψ(''c''). Our aim is now to express each of these two functions in terms of a convolution product.

==== Convolution product for ~~<math>\psi~~(c)~~</math>~~ ====

==== Convolution product for ψ(''c'') ====

~~<math>\displaystyle \psi~~(c)~~</math>~~, the normalization function, is written as follows:

ψ(''c''), the normalization function, is written as follows:

$$\displaystyle \psi(c) = \sum_{j \in J} Q(j|c)$$

Again, ~~<math>~~Q(j|c)~~</math>~~ is defined as follows:

Again, Q(''j''|''c'') is defined as follows:

$$\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\|j\|}$$

Line 291:

Line 283:

We note that the left factor in the convolution product is always the same ~~<math>~~S(-c)~~</math>~~, which is not dependent on ~~<math>~~j~~</math>~~ in any way. Since convolution distributes over addition, we can factor the ~~<math>~~S~~</math>~~ out of the summation to obtain

We note that the left factor in the convolution product is always the same ''S''(−''c''), which is not dependent on ''j'' in any way. Since convolution distributes over addition, we can factor the ''S'' out of the summation to obtain

$$\displaystyle \psi(c) = \left[S \ast \left(\sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|}\right)\right](-c)$$

We can clean up this notation by defining the auxiliary distribution K:

We can clean up this notation by defining the auxiliary distribution ''K'':

$$\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|}$$

Line 305:

Line 297:

$$\displaystyle \psi(c) = \left[S \ast K\right](-c)$$

==== Convolution product for <~~math~~>~~\rho_a~~(c)~~</math>~~ ====

==== Convolution product for ρa(''c'') ====

The derivation for <~~math~~>~~\rho_a~~(c)~~</math>~~ proceeds similarly. Recall the function is written as follows:

The derivation for ρa(''c'') proceeds similarly. Recall the function is written as follows:

$$\displaystyle \rho_a(c) = \sum_{j \in J} Q(j|c)^a$$

The expression for each ~~<math>~~Q(j|c)^a</~~math~~> is:

The expression for each ''Q''(''j''|''c'')''a'' is:

$$\displaystyle Q(j|c)^a = \frac{S(\cent(j)-c)^a}{\|j\|^a}$$

We can again express this as a convolution, this time of the function <~~math~~>S^a(-c)</~~math~~>, meaning the spreading function S taken to the a'th power, and a delta distribution:

We can again express this as a convolution, this time of the function ''S''''a''(−''c''), meaning the spreading function S taken to the ''a''th power, and a delta distribution:

$$\displaystyle Q(j|c)^a = \left(S^a \ast \frac{\delta_{-\cent(j)}}{\|j\|^a}\right)(-c)$$

Line 334:

Line 326:

$$\displaystyle \rho_a(c) = \left[S^a \ast K^a\right](-c)$$

We have now succeeded in representing <~~math~~>~~\rho_a~~(c)~~</math>~~ as a convolution.

We have now succeeded in representing ρa(''c'') as a convolution.

Note that the function <~~math~~>K^a(c)~~</math>~~ involves a slight abuse of notation, as it is not literally ~~<math>~~K(c)~~</math>~~ taken to the ~~<math>~~a~~</math>~~'th power (as the square of the delta distribution is undefined). Rather, we are simply taking the weights of each delta distribution in the summation to the ~~<math>~~a~~</math>~~'th power.

Note that the function ''K''a(''c'') involves a slight abuse of notation, as it is not literally ''K''(''c'') taken to the ''a''th power (as the square of the delta distribution is undefined). Rather, we are simply taking the weights of each delta distribution in the summation to the ''a''th power.

==== Round-up ====

Line 348:

Line 340:

$$\displaystyle \left[S \ast K\right]^a(-c)$$

represents the convolution of ~~<math>~~S~~</math>~~ and ~~<math>~~K~~</math>~~, taken to the ~~<math>~~a~~</math>~~'th power, and flipped backwards. Note that if ~~<math>~~S(x)~~</math>~~ is a symmetrical (even) spreading function, and if for each ratio ~~<math>~~n/d~~</math>~~ in ~~<math>~~J~~</math>~~, if the inverse ~~<math>~~d/n~~</math>~~ is also in ~~<math>~~J~~</math>~~, then the above convolution will also be symmetrical, and we also have

represents the convolution of ''S'' and ''K'', taken to the ''a''th power, and flipped backwards. Note that if ''S''(''x'') is a symmetrical (even) spreading function, and if for each ratio ''n''/''d'' in ''J'', if the inverse ''d''/''n'' is also in ''J'', then the above convolution will also be symmetrical, and we also have

$$\displaystyle \left[S \ast K\right]^a(-c) = \left[S \ast K\right]^a(c)$$

We have succeeded in representing ~~Harmonic~~ Rényi ~~Entropy~~ in simple terms of two convolution products, each of which can be computed in ~~<math>~~O(N log N)~~</math>~~ time.

We have succeeded in representing harmonic Rényi entropy in simple terms of two convolution products, each of which can be computed in {{nowrap|''O''(''N'' log ''N'')}} time.

== Extending HE to ~~<math>~~N=~~\infty</math>~~: zeta-HE ==

== Extending HE to ''N'' {{=}} ∞: zeta-HE ==

All of the models described above involve a finite set of rational numbers, bounded by some weighting function, and where the weighting is less than some max value ~~<math>~~N~~</math>~~.

All of the models described above involve a finite set of rational numbers, bounded by some weighting function, and where the weighting is less than some max value ''N''.

It so happens that we are more or less able to analytically continue this definition to the situation where <math>N=\infty</math>. More precisely, we are able to analytically continue the exponential of HE, which yields the same relative interval rankings as standard HE.

The only technical caveat is that we use the HE of the "unnormalized" probability distribution. However, in the large limit of ~~<math>~~N~~</math>~~, this appears to agree closely with the usual HE. We go into more detail below about this.

The only technical caveat is that we use the HE of the "unnormalized" probability distribution. However, in the large limit of ''N'', this appears to agree closely with the usual HE. We go into more detail below about this.

Our basic approach is: rather than weighting intervals by <math>(nd)^{0.5}</math>, we choose a different exponent, such as <math>(nd)^2</math>. For an exponent which is large enough (we will show that it must be greater than 1), HE does indeed converge as <math>N \to \infty</math>, and we show that this yields an expression related to the [[The_Riemann_Zeta_Function_and_Tuning|Riemann Zeta function]]. We can then use the analytic continuation of the zeta function to obtain an analytically continued curve for the <math>(nd)^{0.5}</math> weighting, which we then show empirically does indeed appear to be what HE converges on for large values of ~~<math>~~N~~</math>~~.

Our basic approach is: rather than weighting intervals by <math>(nd)^{0.5}</math>, we choose a different exponent, such as <math>(nd)^2</math>. For an exponent which is large enough (we will show that it must be greater than 1), HE does indeed converge as <math>N \to \infty</math>, and we show that this yields an expression related to the [[The_Riemann_Zeta_Function_and_Tuning|Riemann Zeta function]]. We can then use the analytic continuation of the zeta function to obtain an analytically continued curve for the <math>(nd)^{0.5}</math> weighting, which we then show empirically does indeed appear to be what HE converges on for large values of ''N''.

In short, what we will show is that the Fourier Transform of this unnormalized Shannon Harmonic Entropy is given by

Line 367:

Line 359:

$$|\zeta(0.5+it)|^2 \cdot \overline {\phi(t)}$$

where <math>\phi(t)</math> is the characteristic function of the spreading distribution and <math>\overline {\phi(t)}</math> is complex conjugation. Below we also give an expression for the Renyi entropy for arbitrary choice of the parameter ~~<math>~~a~~</math>~~.

where <math>\phi(t)</math> is the characteristic function of the spreading distribution and <math>\overline {\phi(t)}</math> is complex conjugation. Below we also give an expression for the Renyi entropy for arbitrary choice of the parameter ''a''.

This enables us to speak cognizantly of the harmonic entropy of an interval as measured against ''all'' rational numbers.

Line 374:

Line 366:

Our derivation only analytically continues the entropy function for the "unnormalized" set of probabilities, which we previously wrote as <math>Q(j|c)</math>. For this definition to be philosophically perfect, we would want to analytically continue the entropy function for the normalized sense of probabilities, previously written as <math>P(j|c)</math>.

However, in practice, the "unnormalized entropy" appears to be an extremely good approximation to the normalized entropy for large values of ~~<math>~~N~~</math>~~. The resulting curve has approximately the same minima and maxima as HE, the same general shape, and for all intents and purposes looks exactly like HE, just shifted on the y-axis.

However, in practice, the "unnormalized entropy" appears to be an extremely good approximation to the normalized entropy for large values of ''N''. The resulting curve has approximately the same minima and maxima as HE, the same general shape, and for all intents and purposes looks exactly like HE, just shifted on the y-axis.

Here are some examples for different values of ~~<math>~~s~~</math>~~. All of these are Shannon HE (<math>a=1</math>), using <math>\sqrt{nd}</math> weights, with unreduced rationals (more on this below), with the bound that <math>nd < 1000000</math>, just with different values of ~~<math>~~s~~</math>~~. All have been scaled so that the minimum entropy is 0, and the maximum entropy is 1:

Here are some examples for different values of ''s''. All of these are Shannon HE (<math>a=1</math>), using <math>\sqrt{nd}</math> weights, with unreduced rationals (more on this below), with the bound that <math>nd < 1000000</math>, just with different values of ''s''. All have been scaled so that the minimum entropy is 0, and the maximum entropy is 1:

[[File:HE vs UHE s=0.5%.png|800px]]

Line 384:

Line 376:

[[File:HE vs UHE s=1.5%.png|800px]]

As you can see, the unnormalized version is extremely close to a linear function of the normalized one. A similar situation holds for larger values of ~~<math>~~a~~</math>~~. The Pearson correlation coefficient of "rho" is also given, and is typically very close to 1 - for example, for <math>s=1%</math>, it's equal to 0.99922. The correlation also seems to get better with increasing values of ~~<math>~~N~~</math>~~, such that the correlation for N=1,000,000 (shown above) is much better than the one for N=10,000 (not pictured).

As you can see, the unnormalized version is extremely close to a linear function of the normalized one. A similar situation holds for larger values of ''a''. The Pearson correlation coefficient of "rho" is also given, and is typically very close to 1 - for example, for <math>s=1%</math>, it's equal to 0.99922. The correlation also seems to get better with increasing values of ''N'', such that the correlation for N=1,000,000 (shown above) is much better than the one for N=10,000 (not pictured).

In the above examples, note that there are slightly adjusted values of ~~<math>~~s~~</math>~~ (usually by less than a cent) between the normalized and unnormalized comparisons for each plot. For example, in the plot for <math>s=1%</math>, corresponding to 17.2264 cents, we compare to a slightly adjusted UHE of 16.4764 cents. This is because, empirically, sometimes a very slight adjustment corresponds to a better correlation coefficient, suggesting that the UHE may be equivalent to the HE with a miniscule adjustment in the value of ~~<math>~~s~~</math>~~.

In the above examples, note that there are slightly adjusted values of ''s'' (usually by less than a cent) between the normalized and unnormalized comparisons for each plot. For example, in the plot for <math>s=1%</math>, corresponding to 17.2264 cents, we compare to a slightly adjusted UHE of 16.4764 cents. This is because, empirically, sometimes a very slight adjustment corresponds to a better correlation coefficient, suggesting that the UHE may be equivalent to the HE with a miniscule adjustment in the value of ''s''.

It would be nice to show the exact relationship of unnormalized entropy to the normalized entropy in the limit of large ~~<math>~~N~~</math>~~, and whether the two converge to be exactly equal (perhaps given some miniscule adjustment in ~~<math>~~s~~</math>~~ or ~~<math>~~a~~</math>~~). However, we will leave this for future research, as well as the question of how to do an exact derivation of normalized HE.

It would be nice to show the exact relationship of unnormalized entropy to the normalized entropy in the limit of large ''N'', and whether the two converge to be exactly equal (perhaps given some miniscule adjustment in ''s'' or ''a''). However, we will leave this for future research, as well as the question of how to do an exact derivation of normalized HE.

For now, we will start with a derivation of the unnormalized entropy for <math>N=\infty</math>, as an interesting function worthy of study in its own right - not only because it looks exactly like HE, but because it leads to an expression for unnormalized HE in terms of the [[The_Riemann_Zeta_Function_and_Tuning|Riemann Zeta function]].

Line 423:

Line 415:

$$\displaystyle \text{UHE}_a(c) = \frac{1}{1-a} \log \left( S^a \ast K^a \right)(-c)$$

where, as before, <math>S^a</math> is our spreading function, taken to the ~~<math>~~a~~</math>~~'th power, and <math>K^a</math> is our convolution kernel, with the weights on the delta functions taken to the ~~<math>~~a~~</math>~~'th power as described previously.

where, as before, <math>S^a</math> is our spreading function, taken to the ''a'''th power, and <math>K^a</math> is our convolution kernel, with the weights on the delta functions taken to the ''a'''th power as described previously.

Note that if ~~<math>~~S~~</math>~~ is symmetric, as in the case of the Gaussian or Laplace distributions, then the inverted argument of <math>(-c)</math> on the end is redundant, and can be replaced by <math>(c)</math>.

Note that if ''S'' is symmetric, as in the case of the Gaussian or Laplace distributions, then the inverted argument of <math>(-c)</math> on the end is redundant, and can be replaced by <math>(c)</math>.

Line 436:

Line 428:

==== Analytic Continuation of the Convolution Kernel ====

The definition for ~~<math>~~K~~</math>~~ is:

The definition for ''K'' is:

$$\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|}$$

where <math>\|j\|</math> represents the weighting of the JI basis ratio ~~<math>~~j~~</math>~~. In the particular case of Tenney weighting, we get:

where <math>\|j\|</math> represents the weighting of the JI basis ratio ''j''. In the particular case of Tenney weighting, we get:

$$\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{(j_n \cdot j_d)^{0.5}}$$

where ~~<math>~~j_n~~</math>~~ and ~~<math>~~j_d~~</math>~~ are the numerator and denominator of ~~<math>~~j~~</math>~~, respectively.

where ''j_n'' and ''j_d'' are the numerator and denominator of ''j'', respectively.

Line 470:

Line 462:

Now, we note our summation is currently written simply as <math>\sum_{j \in J}</math>. For a Tenney height weighting, we typically bound by <math>\sqrt{nd} < N</math> for some ~~<math>~~N~~</math>~~. However, although it is unusual, for the sake of simplifying the derivation, we will bound by <math>\max(n,d) < N</math> instead, despite the use of Tenney height for our weighting. This will not end up being much of a problem, as the two will converge on the same result anyway.

Now, we note our summation is currently written simply as <math>\sum_{j \in J}</math>. For a Tenney height weighting, we typically bound by <math>\sqrt{nd} < N</math> for some ''N''. However, although it is unusual, for the sake of simplifying the derivation, we will bound by <math>\max(n,d) < N</math> instead, despite the use of Tenney height for our weighting. This will not end up being much of a problem, as the two will converge on the same result anyway.

Bounding by <math>\max(n,d) < N</math> is the same as specifying that <math>j_n < N</math> and <math>j_d < N</math>. Doing so, we get

Line 513:

Line 505:

$$\displaystyle K^a(n) = \mathcal{F}^{-1}\left\{|\zeta(0.5a+ t) |^2\right\}(n)$$

so that the choice of ~~<math>~~a~~</math>~~ simply changes our choice of vertical slice of the Riemann zeta function, as well as the shape of our spreading function (because it is also being raised to a power). If our spreading function is a Gaussian, then we simply get another Gaussian with a different standard deviation.

so that the choice of ''a'' simply changes our choice of vertical slice of the Riemann zeta function, as well as the shape of our spreading function (because it is also being raised to a power). If our spreading function is a Gaussian, then we simply get another Gaussian with a different standard deviation.

==== Analytic Continuation of Unnormalized Harmonic Rényi Entropy ====

We can put this back into our equation for the Unnormalized Harmonic Rényi Entropy. To do so, we will continue with our change of units from cents to nepers, corresponding to a change of our variable from ~~<math>~~c~~</math>~~ to ~~<math>~~n~~</math>~~. We will likewise assume the spreading probability distribution ~~<math>~~S~~</math>~~ has been scaled to reflect the new choice of units.

We can put this back into our equation for the Unnormalized Harmonic Rényi Entropy. To do so, we will continue with our change of units from cents to nepers, corresponding to a change of our variable from ''c'' to ''n''. We will likewise assume the spreading probability distribution ''S'' has been scaled to reflect the new choice of units.

Line 539:

Line 531:

We can simplify the expression of the above if we likewise take the Fourier transform of ~~<math>~~S~~</math>~~. If we do, we obtain the [https://en.wikipedia.org/wiki/Characteristic_function_(probability_theory) characteristic function] of the distribution, which is typically denoted by <math>\phi(t)</math>. We will use the following definitions:

We can simplify the expression of the above if we likewise take the Fourier transform of ''S''. If we do, we obtain the [https://en.wikipedia.org/wiki/Characteristic_function_(probability_theory) characteristic function] of the distribution, which is typically denoted by <math>\phi(t)</math>. We will use the following definitions:

$$\displaystyle \phi(t) = \mathcal{F}\left\{S(n)\right\}(t)$$

Line 576:

Line 568:

[[File:ExpUHE vs zeta s=1.5%.png|800px]]

Note that in all these plots, the value of ~~<math>~~a~~</math>~~ is chosen to be <math>1.00001</math> rather than exactly ~~<math>~~1~~</math>~~, so as to avoid that <math>(1-a)</math> term becoming 0. Similar results are seen for other choices of ~~<math>~~a~~</math>~~:

Note that in all these plots, the value of ''a'' is chosen to be <math>1.00001</math> rather than exactly ''1'', so as to avoid that <math>(1-a)</math> term becoming 0. Similar results are seen for other choices of ''a'':

==== s=1%, a=2.2 ====

Line 587:

Line 579:

''Note: this section is for future research; some of it needs to be put on more rigorous footing, but we've left it as it's certainly interesting.''

Let's go back to our original convolution expression for finite-~~<math>~~N~~</math>~~ UHE:

Let's go back to our original convolution expression for finite-''N'' UHE:

$$\displaystyle \text{UHE}_a(c) = \frac{1}{1-a} \log \left(\left( S^a \ast K^a \right)(-c)\right)$$

Line 603:

Line 595:

$$\displaystyle \text{UHE}_a(c) = \frac{1}{1-a} \log \left(U(0) + \tilde{U}(c) \right)$$

Lastly, suppose we only care about the entropy function up to a vertical shift and scaling: in other words, we want to declare two functions <math>f(x), g(x)</math> to be '''linearly equivalent''', and write <math>f(x) \approx g(x)</math>, if for some <math>a, b</math> that don't depend on ~~<math>~~x~~</math>~~, we have <math>f(x) = a\cdot g(x) + b</math>. This means we want to view two entropy functions as equivalent if one is just a scaled and shifted version of the other, so that when "normalizing" them (so that the entropy goes from 0 to 1), we get identical functions. Then we have all of the following relationships:

Lastly, suppose we only care about the entropy function up to a vertical shift and scaling: in other words, we want to declare two functions <math>f(x), g(x)</math> to be '''linearly equivalent''', and write <math>f(x) \approx g(x)</math>, if for some <math>a, b</math> that don't depend on ''x'', we have <math>f(x) = a\cdot g(x) + b</math>. This means we want to view two entropy functions as equivalent if one is just a scaled and shifted version of the other, so that when "normalizing" them (so that the entropy goes from 0 to 1), we get identical functions. Then we have all of the following relationships:

$$\displaystyle \text{UHE}_a(c) \approx \log U(c) \approx \log \left( U(c)^{\frac{1}{1-a}} \right)$$

$$U(c) \approx \tilde{U}(c)$$

where we have just dropped the constants of <math>\frac{1}{1-a}</math> and the constant vertical shift of <math>U(0)</math> which doesn't depend on ~~<math>~~c~~</math>~~.

where we have just dropped the constants of <math>\frac{1}{1-a}</math> and the constant vertical shift of <math>U(0)</math> which doesn't depend on ''c''.

Now, the main thing is that, if we are in the region where <math>a ≤ 2</math>, then this is also the region where the <math>U(0)</math> term goes to infinity as ~~<math>~~N~~</math>~~ increases: the entropy doesn't converge. And in general, we have the asymptotic expansion

Now, the main thing is that, if we are in the region where <math>a ≤ 2</math>, then this is also the region where the <math>U(0)</math> term goes to infinity as ''N'' increases: the entropy doesn't converge. And in general, we have the asymptotic expansion

$$

Line 617:

Line 609:

$$

and, for large ~~<math>~~k~~</math>~~, '''as long as''' <math>x \ll k</math>, the higher-order terms become negligible. This means, for all ~~<math>~~c~~</math>~~, we would need to show that <math>\tilde{U}(c) \ll U(0)</math> as <math>N \to \infty</math>. We would then be able to rewrite the above as

and, for large ''k'', '''as long as''' <math>x \ll k</math>, the higher-order terms become negligible. This means, for all ''c'', we would need to show that <math>\tilde{U}(c) \ll U(0)</math> as <math>N \to \infty</math>. We would then be able to rewrite the above as

$$\displaystyle \log U(c) \sim \frac{1}{1-a} \left (\log (U(0)) + \frac{\tilde{U}(c)}{U(0)} \right)$$

Line 657:

Line 649:

$$

Lastly, we note that for any particular choice of ~~<math>~~a~~</math>~~ and ~~<math>~~N~~</math>~~, the above is simply linearly equivalent to

Lastly, we note that for any particular choice of ''a'' and ''N'', the above is simply linearly equivalent to

$$

Line 680:

Line 672:

Now, the only missing piece needed for all of this is to show that we really do have <math>\tilde{U}(c) \ll U(0)</math> in the region of interest. For now, absent mathematical proof, we will simply plot the behavior for the Shannon entropy as <math>N \to \infty</math>.

What we see is that, while the function diverges, it diverges in a certain "uniform" sense. That is, as ~~<math>~~N~~</math>~~ increases, a constant vertical offset is added to <math>U(c)</math>, so that the function blows up to infinity. However, if this vertical offset is corrected for, for example by subtracting U(0), the resulting curve doesn't seem to grow at all, but rather shrinks in height slightly until it seems to converge. We would like to prove this formally, but for now, we can at least see this from the following plot:

What we see is that, while the function diverges, it diverges in a certain "uniform" sense. That is, as ''N'' increases, a constant vertical offset is added to <math>U(c)</math>, so that the function blows up to infinity. However, if this vertical offset is corrected for, for example by subtracting U(0), the resulting curve doesn't seem to grow at all, but rather shrinks in height slightly until it seems to converge. We would like to prove this formally, but for now, we can at least see this from the following plot:

[[File:ExpUHE-asymptotic-growth.png|800px]]

In other words, we can see that as ~~<math>~~N~~</math>~~ increases, the growth rate of <math>U(0)</math> dwarfs that of <math>\tilde{U}(c)</math>, which does not seem to grow at all.

In other words, we can see that as ''N'' increases, the growth rate of <math>U(0)</math> dwarfs that of <math>\tilde{U}(c)</math>, which does not seem to grow at all.

So, this is a fairly weak conjecture to make, given that empirical evidence suggests something much stronger - that not only does it grow more slowly, but that it seems to not grow at all - it converges! In particular, it seems to converge on our analytic continuation from before. However, a strict proof of any of these things would be nice.

Line 710:

Line 702:

[[File:HE_normalization_terms.png|800px]]

This picture shows how the denominator changes as ~~<math>~~N~~</math>~~ increases: you can see that in general, the function is shifted upward, increasing without bound. The thin plots reflect this for N=1000, 5000, 10000, 50000, and 100000, where you can see them increasing.

This picture shows how the denominator changes as ''N'' increases: you can see that in general, the function is shifted upward, increasing without bound. The thin plots reflect this for N=1000, 5000, 10000, 50000, and 100000, where you can see them increasing.

You will note that the denominator also looks exactly like unnormalized HE, just upside down. Normalized HE is the quotient of two functions that both look like this, which are slightly different. This quotient produces the usual HE curve, which is flipped upside down relative to the denominator, and which also increases without bound. That all these functions increase without bound is just another way to state that these things generally don't converge as <math>N \to \infty</math>.

However, look at what happens with our analytic continuation, which is given by the thicker blue line at the bottom. Despite our sequence of finite-~~<math>~~N~~</math>~~ denominator terms increasing on the y-axis, the analytically continued version suddenly "snaps" back to zero. Although the curve shape is roughly the same, the vertical offset is almost completely eliminated when the analytic continuation is done.

However, look at what happens with our analytic continuation, which is given by the thicker blue line at the bottom. Despite our sequence of finite-''N'' denominator terms increasing on the y-axis, the analytically continued version suddenly "snaps" back to zero. Although the curve shape is roughly the same, the vertical offset is almost completely eliminated when the analytic continuation is done.

The problem here is that the original HE function was the quotient of two very large, strictly positive functions - the numerator and denominator. However, performing the analytic continuation on each separately has caused both to "snap" back to zero, so that the denominator, while retaining the same shape, now has points where it touches the x-axis. As a result, the quotient of the two will have poles where the denominator is zero.

Line 724:

Line 716:

Those "spikes" are poles where the denominator is zero.

The problem is that we're really stretching the boundaries of complex analysis with this. With unnormalized HE, we were able to analytically continue the Fourier transform of exp-UHE to obtain a concrete expression in terms of the Riemann zeta function. While complex analysis makes no guarantees on the behavior of the Fourier transform of the analytic continuation of a holomorphic function, we did see the result seemed to converge on exp-UHE in the limit of large ~~<math>~~N~~</math>~~ when transforming back from the Fourier domain, confirming empirically that our analytically continued expression seemed to make sense.

The problem is that we're really stretching the boundaries of complex analysis with this. With unnormalized HE, we were able to analytically continue the Fourier transform of exp-UHE to obtain a concrete expression in terms of the Riemann zeta function. While complex analysis makes no guarantees on the behavior of the Fourier transform of the analytic continuation of a holomorphic function, we did see the result seemed to converge on exp-UHE in the limit of large ''N'' when transforming back from the Fourier domain, confirming empirically that our analytically continued expression seemed to make sense.

But in the case of "normalized HE," we analytically continued the Fourier transforms of the numerator and denominator, separately, transformed both out of the Fourier domain, and then took the quotient. Complex analysis ''really'' makes no guarantee on the behavior of the quotient of two Fourier transforms of the analytic continuations of holomorphic functions, and in this case the behavior is very strange. A different approach to analytically continuing the expression would be required.

Line 730:

Line 722:

This same principle explains why we plotted the exp of UHE, rather than UHE itself. Were we to take the log of finite UHE, we would be taking the log of a strictly positive function. However, the analytically continued exp-UHE snaps back to the x-axis, so that there are points where the function is zero or even negative. Taking the log of the analytically continued exp-UHE would yield a complex-valued function where it is negative, due to this snapping effect. However, looking at exp-UHE directly has no such problem.

Finally, it is noteworthy that for <math>a>2</math>, we end up looking at slices of the zeta function for which <math>\Re(z)>1</math>. This is where our original unnormalized HE function should converge as <math>N \to \infty</math>, corresponding to the region where the Riemann zeta function Dirichlet series converges. For these values of ~~<math>~~a~~</math>~~, the exp-UHE ''is'' positive. So, we can take the log again and look at the usual UHE. This can be useful for plotting, since exp-UHE tends to "flatten" out the curve for high values of ~~<math>~~a~~</math>~~, whereas taking the log accentuates the minima and maxima (and more closely resembles the usual HRE).

Finally, it is noteworthy that for <math>a>2</math>, we end up looking at slices of the zeta function for which <math>\Re(z)>1</math>. This is where our original unnormalized HE function should converge as <math>N \to \infty</math>, corresponding to the region where the Riemann zeta function Dirichlet series converges. For these values of ''a'', the exp-UHE ''is'' positive. So, we can take the log again and look at the usual UHE. This can be useful for plotting, since exp-UHE tends to "flatten" out the curve for high values of ''a'', whereas taking the log accentuates the minima and maxima (and more closely resembles the usual HRE).

=== Interpretation as a New Free Parameter: the Weighting Exponent ===

In our original derivation of the analytic continuation, we temporarily changed the weighting for rationals from <math>(nd)^{0.5}</math> to some other <math>(nd)^w</math>, with <math>w > 1</math>, for the sake of obtaining a series that converges. We then changed the exponent back to <math>0.5</math>.

This can be thought of as giving us another free parameter to HE, in addition to ~~<math>~~s~~</math>~~ and ~~<math>~~a~~</math>~~: the exponent for the weighting for each rational. That is, although Paul originally derived the <math>(nd)^{0.5}</math> exponent empirically by studying the behavior of mediant-to-mediant HE for Tenney-bounded rationals, there is no reason we can't simply that exponent to something else. As shown before, so long as that exponent is greater than 1, unnormalized HE will converge in the limit as <math>N -> \infty</math>, and will converge to the same thing whether we are bounding <math>nd < N</math>, <math>\max(n,d) < N</math>, or anything else (see again [https://math.stackexchange.com/questions/2593993/convergence-of-product-of-series-to-zeta-function here]). We can then analytically continue to the case where <math>w < 1</math>.

This can be thought of as giving us another free parameter to HE, in addition to ''s'' and ''a'': the exponent for the weighting for each rational. That is, although Paul originally derived the <math>(nd)^{0.5}</math> exponent empirically by studying the behavior of mediant-to-mediant HE for Tenney-bounded rationals, there is no reason we can't simply that exponent to something else. As shown before, so long as that exponent is greater than 1, unnormalized HE will converge in the limit as <math>N -> \infty</math>, and will converge to the same thing whether we are bounding <math>nd < N</math>, <math>\max(n,d) < N</math>, or anything else (see again [https://math.stackexchange.com/questions/2593993/convergence-of-product-of-series-to-zeta-function here]). We can then analytically continue to the case where <math>w < 1</math>.

If we add this as a third parameter, called ~~<math>~~w~~</math>~~ we can modify our definition of exp-UHE as follows:

If we add this as a third parameter, called ''w'' we can modify our definition of exp-UHE as follows:

$$\displaystyle \exp((1-a) \text{UHE}_{a,w}(n)) = \mathcal{F}^{-1}\left\{\overline \phi_a \cdot |\zeta_{w a}|^2\right\}$$

Line 743:

Line 735:

So that our vertical slice of the zeta function is given by $\Re(z) = w\cdot \a$.

=== Equivalence of the Weighting Exponent and ~~<math>~~a~~</math>~~ for Generalized Normal Distributions ===

=== Equivalence of the Weighting Exponent and ''a'' for Generalized Normal Distributions ===

We get a very interesting result if our spreading distribution is a [https://en.wikipedia.org/wiki/Generalized_normal_distribution generalized normal distribution], which a family that encompasses both the Gaussian and the Laplace distributions (sometimes referred to as the "Vos curve" in Paul's work).

Line 751:

Line 743:

$$\displaystyle \exp((1-a) \text{UHE}_{a,w}(n)) = \mathcal{F}^{-1}\left\{\overline \phi_a \cdot |\zeta_{w a}|^2\right\}$$

We can see that, in a sense, the need for both ~~<math>~~a~~</math>~~ and ~~<math>~~w~~</math>~~ is almost redundant. Their product specifies the vertical slice of the zeta function. If you set <math>w=0.5</math> and <math>a=1</math>, corresponding to the Shannon entropy with <math>\sqrt{nd}</math> weighting, you get the same vertical slice as if you set <math>w=0.25</math> and <math>a=2</math>, corresponding to the collision entropy with <math>^4\sqrt{nd}</math> weighting: in both cases this is the critical line of the zeta function.

We can see that, in a sense, the need for both ''a'' and ''w'' is almost redundant. Their product specifies the vertical slice of the zeta function. If you set <math>w=0.5</math> and <math>a=1</math>, corresponding to the Shannon entropy with <math>\sqrt{nd}</math> weighting, you get the same vertical slice as if you set <math>w=0.25</math> and <math>a=2</math>, corresponding to the collision entropy with <math>^4\sqrt{nd}</math> weighting: in both cases this is the critical line of the zeta function.

The only reason that these expressions are different is due to the <math>\phi_a</math> above. We had previously defined that as:

Line 757:

Line 749:

$$\displaystyle \phi_a(t) = \mathcal{F}\left\{S(n)^a\right\}(t)$$

or, the Fourier transform of the spreading distribution, raised to the power of ~~<math>~~a~~</math>~~. So if you hold the product <math>w a</math> as constant, but change the balance of ~~<math>~~w~~</math>~~ and ~~<math>~~a~~</math>~~, you will indeed get different results, simply because only the choice of ~~<math>~~a~~</math>~~ changes the <math>\phi_a</math>.

or, the Fourier transform of the spreading distribution, raised to the power of ''a''. So if you hold the product <math>w a</math> as constant, but change the balance of ''w'' and ''a'', you will indeed get different results, simply because only the choice of ''a'' changes the <math>\phi_a</math>.

However, we get a very neat result if we are using the generalized normal distribution. In that case, if we take the generalized normal distribution to a power ~~<math>~~a~~</math>~~, we get another instance of the same generalized normal distribution. The difference is, the variance will be divided by <math>a^{\frac{1}{\beta}}</math>, where <math>\beta</math> is the shape parameter for the distribution (a value of 1 is the Laplace distribution, a value of 2 is the Gaussian distribution, etc). The whole distribution will also no longer have an integral of 1, since we have also raised the scaling coefficient to a power, but this won't change anything, as it just corresponds to a uniform scaling of the end result.

However, we get a very neat result if we are using the generalized normal distribution. In that case, if we take the generalized normal distribution to a power ''a'', we get another instance of the same generalized normal distribution. The difference is, the variance will be divided by <math>a^{\frac{1}{\beta}}</math>, where <math>\beta</math> is the shape parameter for the distribution (a value of 1 is the Laplace distribution, a value of 2 is the Gaussian distribution, etc). The whole distribution will also no longer have an integral of 1, since we have also raised the scaling coefficient to a power, but this won't change anything, as it just corresponds to a uniform scaling of the end result.

In practice, what this means is that if you are using one of the above distributions, and you change ~~<math>~~a~~</math>~~, this is ''equivalent'' to changing the weighting exponent ~~<math>~~w~~</math>~~, and tweaking the standard deviation ~~<math>~~s~~</math>~~ according to the above equation.

In practice, what this means is that if you are using one of the above distributions, and you change ''a'', this is ''equivalent'' to changing the weighting exponent ''w'', and tweaking the standard deviation ''s'' according to the above equation.

This gives us a very nice interpretation of our ~~<math>~~a~~</math>~~ coefficient from HRE: it basically represents the weighting exponent on the rationals, with a corresponding adjustment to the standard deviation. The collision entropy <math>a=2</math> with the standard weighting <math>\sqrt{nd}</math> is totally equivalent to the Shannon entropy <math>a=1</math> with the weighting ~~<math>~~nd~~</math>~~ on the rationals, so long as the value of ~~<math>~~s~~</math>~~ is adjusted according to the equation above. However, it should be noted that this definition only holds for the "unnormalized HRE" given above.

This gives us a very nice interpretation of our ''a'' coefficient from HRE: it basically represents the weighting exponent on the rationals, with a corresponding adjustment to the standard deviation. The collision entropy <math>a=2</math> with the standard weighting <math>\sqrt{nd}</math> is totally equivalent to the Shannon entropy <math>a=1</math> with the weighting ''nd'' on the rationals, so long as the value of ''s'' is adjusted according to the equation above. However, it should be noted that this definition only holds for the "unnormalized HRE" given above.

=== Reduced Rationals Only ===

Line 771:

Line 763:

$$\displaystyle \mathcal{F}\left\{K(n)\right\}(t) = \sum_{j \in J} \frac{e^{i t \log (j_n/j_d)}}{(j_n \cdot j_d)^{w}}$$

Now, suppose we want to analytically continue this so that the set ~~<math>~~J~~</math>~~ is the set of all reduced rational numbers. We can first do so by starting again with unreduced rationals, but expressing each rational not as <math>\frac{n}{d}</math>, but rather as <math>\frac{n}{d} \cdot \frac{c}{c}</math>, where <math>n'</math> and <math>d'</math> are coprime, and ~~<math>~~c~~</math>~~ is the gcd of both. For example, we would express <math>\frac{6}{4}</math> as <math>\frac{3}{2} \cdot \frac{2}{2}</math>. Doing so, and assuming that we denote the set of unreduced rationals by <math>\mathbb{U}</math>, we get the following equivalent expression of the same convolution kernel above:

Now, suppose we want to analytically continue this so that the set ''J'' is the set of all reduced rational numbers. We can first do so by starting again with unreduced rationals, but expressing each rational not as <math>\frac{n}{d}</math>, but rather as <math>\frac{n}{d} \cdot \frac{c}{c}</math>, where <math>n'</math> and <math>d'</math> are coprime, and ''c'' is the gcd of both. For example, we would express <math>\frac{6}{4}</math> as <math>\frac{3}{2} \cdot \frac{2}{2}</math>. Doing so, and assuming that we denote the set of unreduced rationals by <math>\mathbb{U}</math>, we get the following equivalent expression of the same convolution kernel above:

$$\displaystyle \mathcal{F}\left\{K(n)\right\}(t) = \sum_{j \in \mathbb{U}} \frac{e^{i t \log (\frac{j_c j_{n'}}{j_c j_{d'}})}}{(j_c j_{n'} \cdot j_c j_{d'})^{w}} = |\zeta(w+i t)|^2$$

Line 785:

Line 777:

$$\displaystyle |\zeta(w+i t)|^2 = \left[ \sum_{j_c \in \mathbb{N}^+} \frac{1}{{j_c}^{2w}} \right] \cdot \left[ \sum_{j \in \mathbb{Q}} \frac{e^{i t \log (\frac{j_{n'}}{j_{d'}})}}{(j_{n'} j_{d'})^{w}} \right]$$

where the left summation now has <math>j_c \in \mathbb{N}^+</math>, the set of strictly positive rational numbers, and the right summation now has <math>j \in \mathbb{Q}</math> the set of reduced rationals. Note again that the product above yields all unreduced rationals, thanks to the ~~<math>~~j_c~~</math>~~.

where the left summation now has <math>j_c \in \mathbb{N}^+</math>, the set of strictly positive rational numbers, and the right summation now has <math>j \in \mathbb{Q}</math> the set of reduced rationals. Note again that the product above yields all unreduced rationals, thanks to the ''j_c''.

Now, note that that left series is, itself, just another Dirichlet series that converges to the zeta function. We have

Line 797:

Line 789:

This function then becomes our new <math>\mathcal{F}\left\{K(n)\right\}</math>.

However, you will note that <math>\zeta(2w)</math> is a constant not depending at all on ~~<math>~~t~~</math>~~. As a result, the reduced rational kernel is exactly equal to the unreduced rational kernel, times a constant depending only on ~~<math>~~w~~</math>~~. This means that when we take the inverse Fourier transform and convolve, the result for exp-UHE will likewise be identical, scaled only by a constant.

However, you will note that <math>\zeta(2w)</math> is a constant not depending at all on ''t''. As a result, the reduced rational kernel is exactly equal to the unreduced rational kernel, times a constant depending only on ''w''. This means that when we take the inverse Fourier transform and convolve, the result for exp-UHE will likewise be identical, scaled only by a constant.

As a result, we have shown that we get the same exact results for reduced and unreduced rationals, differing only by a multiplicative scaling.

@@ Line 2: / Line 2: @@
 '''Harmonic entropy''' ('''HE''') is a simple model to quantify the extent to which musical chords align with the harmonic series, and thus tend to partly "fuse" into the perception of a single sound with a complex timbre and '''virtual fundamental''' pitch. It was invented by Paul Erlich and developed extensively on the Yahoo! tuning and harmonic_entropy lists, and draws from prior research by Parncutt and Terhardt. Various later contributions to the model have been made by Steve Martin, Mike Battaglia, Keenan Pepper, and others.
-Note: the terms dyad, triad and tetrad usually refer to chord with 2, 3 or 4 [[Pitch class|pitch classes]]. But in this discussion they refer to chords with 2, 3 or 4 <u>pitches</u>. Thus C-E-G-C is a tetrad not a triad.
+Note: the terms dyad, triad and tetrad usually refer to chord with 2, 3 or 4 [[Pitch class|pitch classes]]. But in this discussion they refer to chords with 2, 3, or 4 <u>pitches</u>. Thus {{dash|C, E, G, C}} is a tetrad instead of a triad.
 == Background ==
@@ Line 20: / Line 20: @@
 For dyads, the basic harmonic entropy model is fairly simple: it places the dyad we are trying to measure amidst a backdrop of JI candidates. Then, it uses a point-spread function to determine the relative strengths of the match to each, which are then normalized and treated as probabilities. The "entropy" of the resulting probability distribution is a way to measure how closely this distribution tends to focus on one possibility, rather than being spread out among a set of equally-likely possibilities. If there is only one clear choice of dyad which far exceeds all others in probability, the entropy will be lower. If, on the other hand, there are many equally-likely probabilities, the entropy will be higher. The basic harmonic entropy model can also be extended to modeling triads, tetrads, and so on; the standard way to do so is to simply look at the incoming triad's match to a set of candidate JI triads, and likewise with tetrads, and etc.
-=== Additional Interpretations ===
+=== Additional interpretations ===
 In recent years, it has become clearer that the model can also be very useful in modeling other types of concordance as well, particularly for dyads, where the same model does a very good job in also predicting beatlessness, periodicity buzz, and so on. In particular, Erlich has often suggested the same model, perhaps with slightly different parameters, can also be useful to measure how easy it is to tune a dyad by ear on an instrument such as a guitar, or how much of a sense of being "locked-in" the dyad gives as it is tuned more closely to JI. This may be less related to the perception of virtual fundamentals than it is to beatlessness and so on.
-However, it should be noted that the various aspects of psychoacoustic concordance tend to diverge quite strongly in their behavior for larger chords, and thus, when modeling different aspects of psychoacoustic concordance, different ways of generalizing the dyadic model to higher-cardinality chords may be appropriate. In particular, when modeling beatlessness, Erlich has suggested instead looking only at the entropies of the pairwise dyadic subsets of the chord, so that the major and minor chords would be ranked equal in beatlessness, whereas they would not be ranked equal in their ability to produce a clear virtual fundamental (the major chord would be much stronger and lower in entropy).
+However, it should be noted that the various aspects of psychoacoustic concordance tend to diverge quite strongly in their behavior for larger chords, and thus, when modeling different aspects of psychoacoustic concordance, different ways of generalizing the dyadic model to higher-cardinality chords may be appropriate. In particular, when modelling beatlessness, Erlich has suggested instead looking only at the entropies of the pairwise dyadic subsets of the chord, so that the major and minor chords would be ranked equal in beatlessness, whereas they would not be ranked equal in their ability to produce a clear virtual fundamental (the major chord would be much stronger and lower in entropy).
-=== Concordance vs Actual Consonance ===
-Concordance has often been confused with actual musical consonance, an unfortunate fact made more common by the psychoacoustics literature under the unfortunate name '''sensory consonance''', most often used to refer to phenomena related to roughness and beatlessness specifically. This is not to be confused with the more familiar construct of tonal stability, typically just called "consonance" in Western common practice music theory and sometimes clarified as "musical consonance" in the music cognition literature. To make matters worse, the literature has also at times referred to concordance -- and not tonal stability -- as '''tonal consonance''', often referring to phenomena related to virtual pitch integration, creating a complete terminological mess. As a result, the term "consonance" has been completely avoided in this article
-While psychoacoustic concordance is not a feature universal to all styles of music, it has been utilized significantly in Western music in the study of intonation. For instance, flexible-pitch ensembles operating within 12-EDO, such as barbershop quartets and string ensembles, will often adjust intonationally from the underlying 12-EDO reference to maximize the concordance of individual chords. Indeed, the entire history of Western tuning theory -- from meantone temperament, to the various Baroque well-temperaments, to 12-EDO itself, to the modern [[Regular_Temperaments|theory of regular temperament]] -- can be seen as an attempt to reason mathematically about how to generate manageable tuning systems that will maximize concordance and minimize discordance. Consonance and dissonance, on the other hand, is a much more general phenomenon which can even exist in music which is predominantly monophonic and uses no chords at all.
+=== Concordance vs. actual consonance ===
+Concordance has often been confused with actual musical consonance, an unfortunate fact made more common by the psychoacoustics literature under the unfortunate name '''sensory consonance''', most often used to refer to phenomena related to roughness and beatlessness specifically. This is not to be confused with the more familiar construct of tonal stability, typically just called "consonance" in Western common practice music theory and sometimes clarified as "musical consonance" in the music cognition literature. To make matters worse, the literature has also at times referred to concordance—and not tonal stability—as '''tonal consonance''', often referring to phenomena related to virtual pitch integration, creating a complete terminological mess. As a result, the term "consonance" has been completely avoided in this article.
+While psychoacoustic concordance is not a feature universal to all styles of music, it has been utilized significantly in Western music in the study of intonation. For instance, flexible-pitch ensembles operating within 12-EDO, such as barbershop quartets and string ensembles, will often adjust intonationally from the underlying 12-EDO reference to maximize the concordance of individual chords. Indeed, the entire history of Western tuning theory—from meantone temperament, to the various Baroque well-temperaments, to 12-EDO itself, to the modern [[Regular_Temperaments|theory of regular temperament]]—can be seen as an attempt to reason mathematically about how to generate manageable tuning systems that will maximize concordance and minimize discordance. Consonance and dissonance, on the other hand, is a much more general phenomenon which can even exist in music which is predominantly monophonic and uses no chords at all.
 == Basic Model: Shannon Entropy ==
@@ Line 38: / Line 35: @@
 The general idea of Harmonic Entropy is to first develop a discrete probability distribution quantifying how strongly an arbitrary incoming dyad "matches" every element in a set of basis rational intervals, and then seeing how evenly distributed the resulting probabilities are. If the distribution for some dyad is spread out very evenly, such that there is no clear "victor" basis interval that dominates the distribution, the dyad is considered to be more discordant; on the other extreme, if the distribution tends to concentrate on one or a small set of dyads, the dyad is considered to be more concordant.
-A clear mathematical way of quantifying this "dispersion" is via the [https://en.wikipedia.org/wiki/Entropy_(information_theory) Shannon entropy] of the probability distribution, which can be thought of as describing the "uncertainty" in the distribution. A distribution which has a very high probability of picking one outcome has low entropy and is not very uncertain, whereas a distribution which has the probability spread out on many outcomes is highly uncertain and has a high entropy.
+A clear mathematical way of quantifying this "dispersion" is via the {{w|Entropy (information theory)|Shannon entropy}} of the probability distribution, which can be thought of as describing the "uncertainty" in the distribution. A distribution which has a very high probability of picking one outcome has low entropy and is not very uncertain, whereas a distribution which has the probability spread out on many outcomes is highly uncertain and has a high entropy.
 === Definitions ===
-To formalize our notion of Shannon entropy, we will first describe the random variable <math>J</math>, representing the set of JI "basis" intervals that our incoming interval is being "matched" to, and the parameter <math>C</math>, representing the "cents" of the incoming interval being played. For example, the interval <math>C</math> would take values such as "400 cents," and the interval <math>J</math> would take values in the set of basis ratios, such as "5/4" or "9/7."
+To formalize our notion of Shannon entropy, we will first describe the random variable ''J'', representing the set of JI "basis" intervals that our incoming interval is being "matched" to, and the parameter ''C'', representing the "cents" of the incoming interval being played. For example, the interval ''C'' would take values such as "400[[cent]]s", and the interval ''J'' would take values in the set of basis ratios, such as "5/4" or "9/7."
-So for example, if we want to express the probability that the incoming dyad "400 cents" is perceived as the JI basis interval "5/4," we would write that as the conditional probability
+So for example, if we want to express the probability that the incoming dyad "400{{cent}}'' is perceived as the JI basis interval "5/4," we would write that as the conditional probability
 $$\displaystyle \newcommand{\cent}{\text{¢}}$$
-$$\displaystyle P(J=5/4|C=400\cent)$$
+$$\displaystyle P(J=5/4\,|\, C=400\cent)$$
-Or, in general, if we want to write the conditional probability that some incoming dyad of <math>c</math> cents is perceived as the JI basis interval <math>j</math>, we would write that as
+Or, in general, if we want to write the conditional probability that some incoming dyad of ''c'' cents is perceived as the JI basis interval ''j'', we would write that as
-$$\displaystyle P(J=j|C=c)$$
+$$\displaystyle P(J=j\,|\, C=c)$$
 which notationally, we will often abbreviate as
@@ Line 55: / Line 52: @@
 $$\displaystyle P(j|c)$$
-Note that at this point, we haven't yet specified what the particular probability distribution is. There are different ways to do this, which are described in more detail below. Generally, most approaches involve each JI interval's probability being assigned based on how close it is to <math>c</math> (closer dyads are given a larger probability), and how simple it is (simple dyads are given a higher probability, if distance is the same).
+Note that at this point, we haven't yet specified what the particular probability distribution is. There are different ways to do this, which are described in more detail below. Generally, most approaches involve each JI interval's probability being assigned based on how close it is to ''c'' (closer dyads are given a larger probability), and how simple it is (simple dyads are given a higher probability, if distance is the same).
-A noteworthy point is that we generally do not assume any probability distribution on <math>C</math>. This reflects that we do not make any assumptions at all about which notes or intervals are likely to be played to begin with. In other words, we are treating <math>C</math> more as a "parameter" rather than as a random variable.
+A noteworthy point is that we generally do not assume any probability distribution on ''C''. This reflects that we do not make any assumptions at all about which notes or intervals are likely to be played to begin with. In other words, we are treating ''C'' more as a "parameter" rather than as a random variable.
-Once we have decided on a probability distribution, we can finally evaluate the Shannon entropy. For a random variable <math>X</math>, the Shannon entropy is defined as:
+Once we have decided on a probability distribution, we can finally evaluate the Shannon entropy. For a random variable ''X'', the Shannon entropy is defined as:
 $$\displaystyle H(X) = -\sum_{x \in X} P(x) \log_b P(x)$$
-where the different <math>x</math> are taken from the sample space of <math>X</math>, and <math>b</math> is the base of the log. Different choices of <math>b</math> simply change the units in which entropy is given, the most common values being 2 and e, denoting "bits" and "nats". We will omit the base going forward, for simplicity.
+where the different ''x'' are taken from the sample space of ''X'', and ''b'' is the base of the log. Different choices of ''b'' simply change the units in which entropy is given, the most common values being 2 and e, denoting "bits" and "nats". We will omit the base going forward, for simplicity.
-In our case, we want to find the entropy of the random variable <math>J</math> of JI intervals, given a particular choice of incoming dyad in cents. The corresponding quantity that we want is:
+In our case, we want to find the entropy of the random variable ''J'' of JI intervals, given a particular choice of incoming dyad in cents. The corresponding quantity that we want is:
 $$\displaystyle H(J|c) = -\sum_{j \in J} P(j|c) \log P(j|c)$$
-Note that above, the summation is only taken on the <math>j</math> from the sample space of <math>J</math> (i.e. the set of JI basis intervals), whereas the parameter <math>c</math> is treated as constant within the summation (and is taken as the free parameter to the function).
+Note that above, the summation is only taken on the ''j'' from the sample space of ''J'' (i.e. the set of JI basis intervals), whereas the parameter ''c'' is treated as constant within the summation (and is taken as the free parameter to the function).
-Since the parameter <math>c</math> is the free parameter, sometimes the above is notated as
+Since the parameter ''c'' is the free parameter, sometimes the above is notated as
 $$\displaystyle \text{HE}(c) = H(J|c)$$
-which makes more explicit that <math>c</math> is the argument to the harmonic entropy function, which is equal to the entropy of <math>J</math>, conditioned on the incoming dyad of <math>c</math> cents.
+which makes more explicit that ''c'' is the argument to the harmonic entropy function, which is equal to the entropy of ''J'', conditioned on the incoming dyad of ''c'' cents.
-=== Probability Distributions ===
-In order to systematically assign a probability distribution to this dyad, we first start by defining a '''spreading function''', denoted by <math>S(x)</math>, that dictates how the dyad is "smeared" out in log-frequency space, representing how the auditory system allows for some tolerance for mistuning. The typical choice that we will assume here for a spreading function is a Gaussian distribution, with mean centered around the incoming dyad, and standard deviation typically taken as a free parameter in the system and denoted as <math>s</math>.
+=== Probability distributions ===
+In order to systematically assign a probability distribution to this dyad, we first start by defining a '''spreading function''', denoted by ''S''(''x''), that dictates how the dyad is "smeared" out in log-frequency space, representing how the auditory system allows for some tolerance for mistuning. The typical choice that we will assume here for a spreading function is a Gaussian distribution, with mean centered around the incoming dyad, and standard deviation typically taken as a free parameter in the system and denoted as ''s''.
 A fairly typical choice of settings for a basic dyadic HE model would be:
+* The basis set is all those rationals bounded by some maximum Tenney height, with the bound typically notated as ''N'' and set to at least 10,000.
+* The spreading function is typically a Gaussian distribution with a frequency deviation of 1% either way, or about {{nowrap|''s'' ≈ 17{{c}}}}.
-* The basis set is all those rationals bounded by some maximum Tenney height, with the bound typically notated as <math>N</math> and set to at least 10,000.
+Other spreading functions have also been explored, such as the use of the heavy-tailed [https://en.wikipedia.org/wiki/Laplace_distribution Laplace distribution], sometimes described as the "Vos function" in Paul's writings. These two functions are part of the [https://en.wikipedia.org/wiki/Generalized_normal_distribution Generalized normal distribution] family, which has a parameter not only for the variance but for the kurtosis. However, for simplicity, we will assume the Gaussian distribution as the spreading function for the remainder of this article, so that the spreading function for an incoming dyad ''c'' can be written as follows:
-* The spreading function is typically a Gaussian distribution with a frequency deviation of 1% either way, or about s=~17 cents.
-Other spreading functions have also been explored, such as the use of the heavy-tailed [https://en.wikipedia.org/wiki/Laplace_distribution Laplace distribution], sometimes described as the "Vos function" in Paul's writings. These two functions are part of the [https://en.wikipedia.org/wiki/Generalized_normal_distribution Generalized normal distribution] family, which has a parameter not only for the variance but for the kurtosis. However, for simplicity, we will assume the Gaussian distribution as the spreading function for the remainder of this article, so that the spreading function for an incoming dyad <math>c</math> can be written as follows:
 $$\displaystyle S(x-c) = \frac{1}{s\sqrt{2\pi}} e^{-\frac{(x-c)^2}{2s^2}}$$
-where the notation <math>S(x-c)</math> is chosen to make clear that we are translating <math>S(x)</math> to be centered around the incoming dyad <math>c</math>, which is now the mean of the Gaussian.
+where the notation {{nowrap|''S''(''x'' − ''c'')}} is chosen to make clear that we are translating ''S''(''x'') to be centered around the incoming dyad ''c'', which is now the mean of the Gaussian.
-We assume here that the variable <math>x</math> is a dummy variable representing cents, and will adopt this convention for the remainder of the article.
+We assume here that the variable ''x'' is a dummy variable representing cents, and will adopt this convention for the remainder of the article.
-In this notation, <math>s</math> becomes the standard deviation of the Gaussian, being an ASCII-friendly version of the more familiar symbol <math>\sigma</math> for representing the standard deviation. Note that in previous expositions on Harmonic Entropy, <math>s</math> was sometimes given in units representing a percentage of linear-frequency deviation; we allow <math>s</math> to stand for cents here to simplify the notation. To convert from a percentage to cents, the formula <math>\text{cents} = 1200\log_2(1+\text{percentage})</math> can be used.
+In this notation, ''s'' becomes the standard deviation of the Gaussian, being an ASCII-friendly version of the more familiar symbol σ for representing the standard deviation. Note that in previous expositions on Harmonic Entropy, ''s'' was sometimes given in units representing a percentage of linear-frequency deviation; we allow ''s'' to stand for cents here to simplify the notation. To convert from a percentage to cents, the formula {{nowrap|¢ {{=}} 1200(1 + log<sub>2</sub>(percentage))}} can be used.
-It is also common to use as a basis set all those rationals bounded by some maximum Weil height, with a typical cutoff for <math>N</math> set to at least 100. This has sometimes been referred to as seeding HE with the "Farey sequence of order <math>N</math>" and its reciprocals, so references in Paul's work to "Farey series HE" vs "Tenney series HE" are sometimes seen.
+It is also common to use as a basis set all those rationals bounded by some maximum Weil height, with a typical cutoff for ''N'' set to at least 100. This has sometimes been referred to as seeding HE with the "Farey sequence of order ''N''" and its reciprocals, so references in Paul's work to "Farey series HE" vs "Tenney series HE" are sometimes seen.
-Lastly, the set of rationals is often chosen to be only those "reduced" rationals within the cutoff, such that <math>n/d</math> is in the set only if <math>n</math> and <math>d</math> are coprime. HE can also be formulated with unreduced rationals as well. Both methods tend to give similar results. In Paul's work, reduced rationals are most common, although the use of unreduced rationals may be useful in extending HE to the case where <math>N=\infty</math>.
+Lastly, the set of rationals is often chosen to be only those "reduced" rationals within the cutoff, such that ''n''/''d'' is in the set only if ''n'' and ''d'' are coprime. HE can also be formulated with unreduced rationals as well. Both methods tend to give similar results. In Paul's work, reduced rationals are most common, although the use of unreduced rationals may be useful in extending HE to the case where {{nowrap|''N'' {{=}} ∞}}.
 Given a spreading function and set of basis rationals, there are two different procedures commonly used to assign probabilities to each rational. The first, the '''domain-integral approach''', works for arbitrary nowhere dense sets of rationals without any further free parameters. The second, the '''simple weighted approach''', has nice mathematical properties which sometimes make it easier to compute and which may lead to generalizations to infinite sets of rationals which are sometimes dense in the reals. It is conjectured that there are certain important limiting situations where the two converge; both are described in detail below.
-==== Domain-Integral Probabilities ====
+==== Domain-integral probabilities ====
 For discrete sets of JI basis ratios, the log-frequency spectrum can be divided up into '''domains''' assigned to each ratio. Each ratio is assigned a domain with lower bound equal to the mediant of itself and its nearest lower neighbor, and likewise with upper bound equal to the mediant of itself and its nearest upper neighbor. If no such neighbor exists, <math>\pm \infty</math> is used instead. Mathematically, this can be represented via the following expression:
-$$\displaystyle P(j|c) = \int_{\cent(j_l)}^{\cent(j_u)} S(x-c) dx$$
+$$\displaystyle P(j|c) = \int_{\cent\left(j_l\right)}^{\cent\left(j_u\right)} S(x-c) dx$$
-where <math>S(x-c)</math> is the spreading function associated with c, <math>j_l</math> and <math>j_u</math> are the domain lower and upper bounds associated with JI basis ratio <math>j</math>, and <math>\cent(f) = 1200\log_2(f)</math>, or the "cents" function converting frequency ratios to cents. Typically, <math>j_l</math> is set equal to the mediant of <math>j</math> and its nearest lower neighbor (if it exists), or <math>-\infty</math> if not; likewise with <math>j_u</math> and its nearest upper neighbor.
+where {{nowrap|''S''(''x'' − ''c'')}} is the spreading function associated with ''c'', ''j''<sub>''l''</sub> and ''j''<sub>''u''</sub> are the domain lower and upper bounds associated with JI basis ratio ''j'', and <math>\cent(f) = 1200\log_2(f)</math>, or the "cents" function converting frequency ratios to cents. Typically, ''j''<sub>''l''</sub> is set equal to the mediant of ''j'' and its nearest lower neighbor (if it exists), or −∞ if not; likewise with ''j''<sub>''u''</sub> and its nearest upper neighbor.
 This process can be summarized by the following picture, taken from [http://sethares.engr.wisc.edu/paperspdf/HarmonicEntropy.pdf William Sethares' paper on Harmonic Entropy]:
@@ Line 113: / Line 108: @@
 [[File:HarmonicEntropySethares.png]]
-Note the difference in terminology here - in this example, the <math>f_{j+n}</math> are the basis ratios, the <math>r_{j+n}</math> are the domains for each basis ratio, and the bounds for each domain are the mediants between each <math>f_{j+n}</math> and its nearest neighbor. The probability assigned to each basis ratio is then the area under the spreading function curve for each ratio's domain. The entropy of this probability distribution is then the Harmonic Entropy for that dyad.
+Note the difference in terminology here—in this example, the {{nowrap|''f''<sub>''j'' + ''n''</sub>}} are the basis ratios, the {{nowrap|''r''<sub>''j'' + ''n''</sub>}} are the domains for each basis ratio, and the bounds for each domain are the mediants between each {{nowrap|''f''<sub>''j'' + ''n''</sub>}} and its nearest neighbor. The probability assigned to each basis ratio is then the area under the spreading function curve for each ratio's domain. The entropy of this probability distribution is then the Harmonic Entropy for that dyad.
 In the case where the set of basis rationals consists of a finite set bounded by Tenney or Weil height, the resulting set of widths is conjectured to have interesting mathematical properties, leading to mathematically nice conceptual simplifications of the model. These simplifications are explained below.
 ==== Simple Weighted Probabilities ====
-It has been noted empirically by Paul Erlich that, given all those rationals with Tenney height under some cutoff <math>N</math> as a basis set, that the domain widths for rationals sufficiently far from the cutoff seem to be proportional to <math>\frac{1}{\sqrt{nd}}</math>.
+It has been noted empirically by Paul Erlich that, given all those rationals with Tenney height under some cutoff ''N'' as a basis set, that the domain widths for rationals sufficiently far from the cutoff seem to be proportional to <math>\frac{1}{\sqrt{nd}}</math>.
-While it's still an open conjecture that this pattern holds for arbitrarily large <math>N</math>, the assumption is sometimes made that this is the case, and hence that for these basis ratio sets, <math>\frac{1}{\sqrt{nd}}</math> "approximations" to the width are sufficient to estimate domain-integral Harmonic Entropy.
+While it's still an open conjecture that this pattern holds for arbitrarily large ''N'', the assumption is sometimes made that this is the case, and hence that for these basis ratio sets, <math>\frac{1}{\sqrt{nd}}</math> "approximations" to the width are sufficient to estimate domain-integral Harmonic Entropy.
-This modifies the expression for the probabilities <math>P(j|c)</math> as follows, noting that for now the "probabilities" won't sum to 1:
+This modifies the expression for the probabilities P(j{{!}}c) as follows, noting that for now the "probabilities" won't sum to 1:
 $$\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\sqrt{j_n \cdot j_d}}$$
-where the <math>Q</math> notation now represents that these "probabilities" are unnormalized, and <math>j_n</math> and <math>j_d</math> are the numerator and denominator, respectively, of JI basis ratio <math>j</math>. Again, the set of basis rationals here is assumed to be all of those rationals of Tenney Height ≤ <math>N</math> for some <math>N</math>.
+where the ''Q'' notation now represents that these "probabilities" are unnormalized, and ''j''<sub>''n''</sub> and ''j''<sub>''d''</sub> are the numerator and denominator, respectively, of JI basis ratio ''j''. Again, the set of basis rationals here is assumed to be all of those rationals of Tenney height ≤&nbsp;''N'' for some ''N''.
-A similar observation for the use of Weil-bounded subsets of the rationals suggests domain widths of <math>\frac{1}{\max(n,d)}</math>, yielding instead the following formula:
+A similar observation for the use of Weil-bounded subsets of the rationals suggests domain widths of {{sfrac|1|max(''n'', ''d'')}}, yielding instead the following formula:
 $$\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\max(j_n, j_d)}$$
-where this time the set of basis rationals is assumed to be all of those of Weil Height ≤ <math>N</math> for some <math>N</math>.
+where this time the set of basis rationals is assumed to be all of those of Weil height ≤&nbsp;''N'' for some ''N''.
-In both cases, the general approach is the same: the value of the spreading function, taken at the value of <math>\cent(j)</math>, is divided by some sort of "weighting" (or sometimes, "complexity") function representing how much weight is given to that rational number. While the two weighting functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary weights as follows:
+In both cases, the general approach is the same: the value of the spreading function, taken at the value of ¢(j), is divided by some sort of "weighting" (or sometimes, "complexity") function representing how much weight is given to that rational number. While the two weighting functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary weights as follows:
 $$\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\|j\|}$$
-where <math>\|j\|</math> denotes a weighting function that maps from rational numbers to non-negative reals.
+where {{!}}''j''{{!}} denotes a weighting function that maps from rational numbers to non-negative reals.
 As these "probabilities" don't sum to 1, the result is not a probability distribution at all, invalidating the use of the Shannon Entropy. To rectify this, the distribution is normalized so that the probabilities do sum to 1:
@@ Line 144: / Line 139: @@
 $$\displaystyle P(j|c) = \frac{Q(j|c)}{\sum_{j \in J} Q(j|c)}$$
-which is equal to the unnormalized probability, divided by the sum of all unnormalized probabilities. This definition of <math>P(j|c)</math> is then used directly to compute the entropy.
+which is equal to the unnormalized probability, divided by the sum of all unnormalized probabilities. This definition of P(j{{!}}c) is then used directly to compute the entropy.
 This approach to assigning probabilities to basis rationals is useful because it hypothetically makes it possible to consider the HE of sets of rationals which are dense in the reals, or even the entire set of positive rationals, although the best way to do this is a subject of ongoing research.
 === Examples ===
+In all of these examples, the ''x''-axis represents the width in cents of the dyad, and the ''y''-axis represents ''discordance'' rather than concordance, measured in nats of Shannon entropy.
-In all of these examples, the x-axis represents the width in cents of the dyad, and the y-axis represents ''discordance'' rather than concordance, measured in nats of Shannon entropy.
+==== ''s'' {{=}} 17, ''N'' &lt; 10000, <math>\sqrt{nd}</math> weights ====
+This uses as a spreading function the Gaussian distribution with {{nowrap|''s'' {{=}} ~17{{c}}}} (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10,000. This uses the simple weighted approach, and the weighting function is <math>\sqrt{nd}</math>:
-==== s=17, N<10000, sqrt(n*d) weights ====
-This uses as a spreading function the Gaussian distribution with <math>s=~17\cent</math> (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10,000. This uses the simple weighted approach, and the weighting function is <math>\sqrt{nd}</math>:
 [[File:HE_Tenney_N_10000_s_17cents.png]]
-==== s=17, N<100, max(n,d) weights ====
+==== ''s'' {{=}} 17, ''N'' &lt; 100, max(''n'', ''d'') weights ====
-This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The weighting function here is <math>\max(n,d)</math>:
+This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The weighting function here is max(''n'',&nbsp;''d''):
 [[File:HE_Weil_N_100_s_17cents.png]]
-==== s=17, N<10000, sqrt(n*d) vs mediant-to-mediant weights ====
+==== ''s'' {{=}} 17, ''N'' < 10000, <math>\sqrt{nd}</math> vs. mediant-to-mediant weights ====
-The following image (from Paul Erlich) compares the domain-integral and simple weighted approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with s=~17 cents, and the basis set is all those rationals with Tenney height ≤ 10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:
+The following image (from Paul Erlich) compares the domain-integral and simple weighted approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with {{nowrap|''s'' {{=}} ~17{{c}}}}, and the basis set is all those rationals with Tenney height ≤&nbsp;10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:
 [[File:HE_Tenney_mediant_vs_sqrt_nd_Paul.png|800px]]
-== Harmonic Rényi Entropy ==
+== Harmonic Rényi entropy ==
+An extension to the base Harmonic Entropy model, proposed by Mike Battaglia, is to generalize the use of {{w|Entropy (information_ theory)|Shannon entropy}} by replacing it instead with {{w|Rényi entropy}}, a {{w|q-analog|''q''-analog}} of Shannon's original entropy. This can be thought of as adding a second parameter, called ''a'', to the model, reflecting how "intelligent" the brain's "decoding" process is when determining the most likely JI interpretation of an ambiguous interval.
-An extension to the base Harmonic Entropy model, proposed by Mike Battaglia, is to generalize the use of [https://en.wikipedia.org/wiki/Entropy_(information_theory) Shannon entropy] by replacing it instead with [https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy Rényi entropy], a [https://en.wikipedia.org/wiki/Q-analog q-analog] of Shannon's original entropy. This can be thought of as adding a second parameter, called <math>a</math>, to the model, reflecting how "intelligent" the brain's "decoding" process is when determining the most likely JI interpretation of an ambiguous interval.
 === Definitions and Background ===
+The '''Harmonic Rényi entropy of order ''a''''' of an incoming dyad can be defined as follows:
-The '''Harmonic Rényi Entropy of order a''' of an incoming dyad can be defined as follows:
 $$\displaystyle \text{HE}_a(c) = H_a(J|c) = \frac{1}{1-a} \log \sum_{j \in J} P(j|c)^a$$
-Being a q-analog, it is noteworthy that Rényi entropy converges to Shannon entropy in the limit as <math>a \to 1</math>, a fact which can be verified using L'Hôpital's rule as found [http://www.sonycsl.co.jp/person/nielsen/Note-HopitalRuleShannonRenyiTsallis.pdf here].
+Being a ''q''-analog, it is noteworthy that Rényi entropy converges to Shannon entropy in the limit as {{nowrap|''a'' → 1}}, a fact which can be verified using {{w|L'Hôpital's rule}} as found [http://www.sonycsl.co.jp/person/nielsen/Note-HopitalRuleShannonRenyiTsallis.pdf here].
-The Rényi entropy has found use in cryptography as a measure of the strength of a cryptographic code in the face of an intelligent attacker, an application for which Shannon entropy has long been known to be insufficient as described in [http://users.cis.fiu.edu/~smithg/papers/qest11.pdf this paper] and [http://www.ietf.org/rfc/rfc4086.txt this RFC]. More precisely, the Rényi entropy of order <math>\infty</math>, also called the '''min-entropy''', is used to measure the strength of the randomness used to define a cryptographic secret against a "worst-case" attacker who has complete knowledge of the probability distribution from which cryptographic secrets are drawn.
+The Rényi entropy has found use in cryptography as a measure of the strength of a cryptographic code in the face of an intelligent attacker, an application for which Shannon entropy has long been known to be insufficient as described in [http://users.cis.fiu.edu/~smithg/papers/qest11.pdf this paper] and [http://www.ietf.org/rfc/rfc4086.txt this RFC]. More precisely, the Rényi entropy of order ∞, also called the '''min-entropy''', is used to measure the strength of the randomness used to define a cryptographic secret against a "worst-case" attacker who has complete knowledge of the probability distribution from which cryptographic secrets are drawn.
 In a musical context, by considering the incoming dyad as analogous to a cryptographic code which is attempting to be "cracked" by an intelligent auditory system, we can consider that the analogous "worst-case attacker" would be a "best-case auditory system" which has complete awareness of the probability distribution for any incoming dyad. This analogy would view such an auditory system as actively attempting to choose the most probable rational, rather than drawing a rational at random weighted by the distribution.
-The use of <math>a=∞</math> min-entropy would reflect this view. In contrast, the use of <math>a=1</math> Shannon entropy reflects a much "dumber" process which performs no such analysis and perhaps doesn't even seek to "choose" any sort of "victor" rational at all. As the parameter a interpolates between these two options, it can be interpreted as the extent to which the rational-matching process for incoming dyads is considered to be "intelligent" and "active" in this way.
+The use of {{nowrap|''a'' {{=}} ∞}} min-entropy would reflect this view. In contrast, the use of {{nowrap|''a'' {{=}} 1}} Shannon entropy reflects a much "dumber" process which performs no such analysis and perhaps doesn't even seek to "choose" any sort of "victor" rational at all. As the parameter a interpolates between these two options, it can be interpreted as the extent to which the rational-matching process for incoming dyads is considered to be "intelligent" and "active" in this way.
 Some psychoacoustic effects naturally fit into this paradigm, such as the virtual pitch integration process, which actually does attempt to find a single victor when matching incoming chords with chunks of the harmonic series. Other psychoacoustic effects, such as that of beatlessness, may instead be better viewed as "dumb" processes whereby nothing in particular is being "chosen," but where a more uniform distribution of matching rational numbers for a dyad simply generates a more discordant sonic effect. Different values of a can differentiate between the predominance given to these two types of effect in the overall construct of psychoacoustic concordance.
-Certain values of <math>a</math> reduce to simpler expressions and have special names, as given in the examples below.
+Certain values of ''a'' reduce to simpler expressions and have special names, as given in the examples below.
 === Examples ===
-==== a=0: Harmonic Hartley Entropy ====
+==== ''a'' {{=}} 0: Harmonic Hartley entropy ====
 $$\displaystyle H_0(J|c) = \log |J|$$
-where <math>|J|</math> is the cardinality of the set of basis rationals. This assumes, in essence, an "infinitely dumb" auditory system which can do no better than picking a rational number from a uniform distribution completely at random. All dyads have the same Harmonic Hartley Entropy. The Hartley Entropy is sometimes called the "max-entropy," and is useful mainly as an upper bound on the other forms of entropy: all Rényi Entropies are always guaranteed to be less than the Hartley Entropy.
+where {{!}}''J''{{!}} is the cardinality of the set of basis rationals. This assumes, in essence, an "infinitely dumb" auditory system which can do no better than picking a rational number from a uniform distribution completely at random. All dyads have the same Harmonic Hartley Entropy. The Hartley Entropy is sometimes called the "max-entropy," and is useful mainly as an upper bound on the other forms of entropy: all Rényi Entropies are always guaranteed to be less than the Hartley Entropy.
 [[File:HRE_a=0.png]]
-''Harmonic Hartley Entropy (a=0) with the basis set all rationals with Tenney height ≤ 10000. Note that the choice of spreading function makes no difference in the end result at all.''
+''Harmonic Hartley Entropy ({{nowrap|a {{=}} 0}}) with the basis set all rationals with Tenney height ≤&nbsp;10000. Note that the choice of spreading function makes no difference in the end result at all.''
-==== a=1: Harmonic Shannon Entropy (Harmonic Entropy) ====
+==== ''a'' {{=}} 1: Harmonic Shannon entropy ====
 $$\displaystyle H_1(J|c) = -\sum_{j \in J} P(j|c) \log P(j|c)$$
-This is Paul's original Harmonic Entropy. Within the cryptographic analogy, this can be thought of as an auditory system which simply selects a rational at random from the incoming distribution, weighted via the distribution itself.
+This is Paul's original harmonic entropy. Within the cryptographic analogy, this can be thought of as an auditory system which simply selects a rational at random from the incoming distribution, weighted via the distribution itself.
 [[File:HE_Tenney_N_10000_s_17cents.png]]
-''Harmonic Shannon Entropy (a=1) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> weighting.''
+''Harmonic Shannon Entropy ({{nowrap|a {{=}} 1}}) with the basis set all rationals with Tenney height ≤&nbsp;10000, spreading function a Gaussian distribution with {{nowrap|s {{=}} 1%}} (~17{{c}}), and <math>\sqrt{nd}</math> weighting.''
-==== a=2: Harmonic Collision Entropy ====
+==== ''a'' {{=}} 2: Harmonic collision entropy ====
-$$\displaystyle H_2(J|c) = -\log \sum_{j \in J} P(j|c)^2 = -\log (J_1 = J_2|c)$$
+$$\displaystyle H_2(J|c) = -\log \sum_{j \in J} P(j|c)^2 = -\log \left(J_1 = J_2\,\vert\,c\right)$$
-where <math>J_1</math> and <math>J_2</math> are two independent and identically distributed random variables of JI basis ratios, conditioned on the same incoming dyad <math>c</math>, and the collision entropy is the same as the negative log of the probability that the two JI variables produce the same outcome.
+where ''J''<sub>1</sub> and ''J''<sub>2</sub> are two independent and identically distributed random variables of JI basis ratios, conditioned on the same incoming dyad ''c'', and the collision entropy is the same as the negative log of the probability that the two JI variables produce the same outcome.
 [[File:HE_Tenney_N_10000_s_17cents_a=2.png]]
-''Harmonic Collision Entropy (a=2) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and <math>\sqrt{nd}</math> weighting.''
+''Harmonic Collision Entropy ({{nowrap|a {{=}} 2}}) with the basis set all rationals with Tenney height ≤&nbsp;10000, spreading function a Gaussian distribution with {{nowrap|''s'' {{=}} 1%}} (~17{{c}}), and <math>\sqrt{nd}</math> weighting.''
-==== a=∞: Harmonic Min-Entropy ====
+==== ''a'' {{=}} ∞: Harmonic min-entropy ====
 $$\displaystyle H_\infty(J|c) = -\log \max_{j \in J} P(j|c)$$
-This is the min-entropy, which simply takes the negative log of the largest probability in the distribution. This can be thought of as representing the "strength" of the incoming dyad from being "deciphered" by a "best-case" auditory system. The name "min-entropy" reflects that the <math>a=\infty</math> case is guaranteed to be a lower bound among all Rényi entropies.
+This min-entropy simply takes the negative log of the largest probability in the distribution. This can be thought of as representing the "strength" of the incoming dyad from being "deciphered" by a "best-case" auditory system. The name "min-entropy" reflects that the {{nowrap|''a'' {{=}} ∞}} case is guaranteed to be a lower bound among all Rényi entropies.
 [[File:HE_Tenney_N_10000_s_17cents_a=7.png]]
-''Harmonic Rényi Entropy with a=7, with the high value of a being chosen to approximate min-entropy (a=''∞''). The basis set is still all rationals with Tenney height ≤ 10000, the spreading function a Gaussian distribution with s=1% (~17 cents), and the weighting function <math>\sqrt{nd}</math>.''
+''Harmonic Rényi Entropy with {{nowrap|a {{=}} 7}}, with the high value of a being chosen to approximate min-entropy ({{nowrap|a {{=}} ''∞''}}). The basis set is still all rationals with Tenney height ≤&nbsp;10000, the spreading function a Gaussian distribution with {{nowrap|''s'' {{=}} 1%}} (~17{{c}}), and the weighting function <math>\sqrt{nd}</math>.''
-=== Convolution-Based Expression For Quickly Computing Rényi Entropy ===
+=== Convolution-based expression for quickly computing Rényi entropy ===
-Below is given an derivation that expresses Harmonic Rényi Entropy in terms of two simpler functions, each of which is a convolution product and hence can be computed quickly using the Fast Fourier Transform.
+Below is given an derivation that expresses harmonic Rényi entropy in terms of two simpler functions, each of which is a convolution product and hence can be computed quickly using the fast Fourier transform (FFT).
 The below derivation depends on the use of simple weighted probabilities, although it may be possible to extend to domain-integral probabilities instead.
 ==== Preliminaries ====
-The Harmonic Rényi Entropy is defined as
+Harmonic Rényi entropy is defined as
 $$\displaystyle \text{HE}_a(c) = H_a(J|c) = \frac{1}{1-a} \log \sum_{j \in J} P(j|c)^a$$
@@ Line 253: / Line 245: @@
-Since <math>\psi(c)</math> is the same for each basis ratio, we can pull it out of the summation to obtain:
+Since ψ(''c'') is the same for each basis ratio, we can pull it out of the summation to obtain:
 $$\displaystyle H_a(J|c) = \frac{1}{1-a} \log \left( \frac{\sum_{j \in J} Q(j|c)^a}{\psi(c)^a} \right)$$
@@ Line 268: / Line 260: @@
-We thus reduce the term inside the logarithm to the quotient of the functions <math>\rho_a(c)</math> and <math>\psi(c)</math>. Our aim is now to express each of these two functions in terms of a convolution product.
+We thus reduce the term inside the logarithm to the quotient of the functions ρ<sub>a</sub>(''c'') and ψ(''c''). Our aim is now to express each of these two functions in terms of a convolution product.
-==== Convolution product for <math>\psi(c)</math> ====
+==== Convolution product for ψ(''c'') ====
-<math>\displaystyle \psi(c)</math>, the normalization function, is written as follows:
+ψ(''c''), the normalization function, is written as follows:
 $$\displaystyle \psi(c) = \sum_{j \in J} Q(j|c)$$
-Again, <math>Q(j|c)</math> is defined as follows:
+Again, Q(''j''|''c'') is defined as follows:
 $$\displaystyle Q(j|c) = \frac{S(\cent(j)-c)}{\|j\|}$$
@@ Line 291: / Line 283: @@
-We note that the left factor in the convolution product is always the same <math>S(-c)</math>, which is not dependent on <math>j</math> in any way. Since convolution distributes over addition, we can factor the <math>S</math> out of the summation to obtain
+We note that the left factor in the convolution product is always the same ''S''(−''c''), which is not dependent on ''j'' in any way. Since convolution distributes over addition, we can factor the ''S'' out of the summation to obtain
 $$\displaystyle \psi(c) = \left[S \ast \left(\sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|}\right)\right](-c)$$
-We can clean up this notation by defining the auxiliary distribution K:
+We can clean up this notation by defining the auxiliary distribution ''K'':
 $$\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|}$$
@@ Line 305: / Line 297: @@
 $$\displaystyle \psi(c) = \left[S \ast K\right](-c)$$
-==== Convolution product for <math>\rho_a(c)</math> ====
+==== Convolution product for ρ<sub>a</sub>(''c'') ====
-The derivation for <math>\rho_a(c)</math> proceeds similarly. Recall the function is written as follows:
+The derivation for ρ<sub>a</sub>(''c'') proceeds similarly. Recall the function is written as follows:
 $$\displaystyle \rho_a(c) = \sum_{j \in J} Q(j|c)^a$$
-The expression for each <math>Q(j|c)^a</math> is:
+The expression for each ''Q''(''j''|''c'')<sup>''a''</sup> is:
 $$\displaystyle Q(j|c)^a = \frac{S(\cent(j)-c)^a}{\|j\|^a}$$
-We can again express this as a convolution, this time of the function <math>S^a(-c)</math>, meaning the spreading function S taken to the a'th power, and a delta distribution:
+We can again express this as a convolution, this time of the function ''S''<sup>''a''(−''c'')</sup>, meaning the spreading function S taken to the ''a''th power, and a delta distribution:
 $$\displaystyle Q(j|c)^a = \left(S^a \ast \frac{\delta_{-\cent(j)}}{\|j\|^a}\right)(-c)$$
@@ Line 334: / Line 326: @@
 $$\displaystyle \rho_a(c) = \left[S^a \ast K^a\right](-c)$$
-We have now succeeded in representing <math>\rho_a(c)</math> as a convolution.
+We have now succeeded in representing ρ<sub>a</sub>(''c'') as a convolution.
-Note that the function <math>K^a(c)</math> involves a slight abuse of notation, as it is not literally <math>K(c)</math> taken to the <math>a</math>'th power (as the square of the delta distribution is undefined). Rather, we are simply taking the weights of each delta distribution in the summation to the <math>a</math>'th power.
+Note that the function ''K''<sup>a</sup>(''c'') involves a slight abuse of notation, as it is not literally ''K''(''c'') taken to the ''a''th power (as the square of the delta distribution is undefined). Rather, we are simply taking the weights of each delta distribution in the summation to the ''a''th power.
 ==== Round-up ====
@@ Line 348: / Line 340: @@
 $$\displaystyle \left[S \ast K\right]^a(-c)$$
-represents the convolution of <math>S</math> and <math>K</math>, taken to the <math>a</math>'th power, and flipped backwards. Note that if <math>S(x)</math> is a symmetrical (even) spreading function, and if for each ratio <math>n/d</math> in <math>J</math>, if the inverse <math>d/n</math> is also in <math>J</math>, then the above convolution will also be symmetrical, and we also have
+represents the convolution of ''S'' and ''K'', taken to the ''a''th power, and flipped backwards. Note that if ''S''(''x'') is a symmetrical (even) spreading function, and if for each ratio ''n''/''d'' in ''J'', if the inverse ''d''/''n'' is also in ''J'', then the above convolution will also be symmetrical, and we also have
 $$\displaystyle \left[S \ast K\right]^a(-c) = \left[S \ast K\right]^a(c)$$
-We have succeeded in representing Harmonic Rényi Entropy in simple terms of two convolution products, each of which can be computed in <math>O(N log N)</math> time.
+We have succeeded in representing harmonic Rényi entropy in simple terms of two convolution products, each of which can be computed in {{nowrap|''O''(''N'' log ''N'')}} time.
-== Extending HE to <math>N=\infty</math>: zeta-HE ==
+== Extending HE to ''N'' {{=}} ∞: zeta-HE ==
-All of the models described above involve a finite set of rational numbers, bounded by some weighting function, and where the weighting is less than some max value <math>N</math>.
+All of the models described above involve a finite set of rational numbers, bounded by some weighting function, and where the weighting is less than some max value ''N''.
 It so happens that we are more or less able to analytically continue this definition to the situation where <math>N=\infty</math>. More precisely, we are able to analytically continue the exponential of HE, which yields the same relative interval rankings as standard HE.
-The only technical caveat is that we use the HE of the "unnormalized" probability distribution. However, in the large limit of <math>N</math>, this appears to agree closely with the usual HE. We go into more detail below about this.
+The only technical caveat is that we use the HE of the "unnormalized" probability distribution. However, in the large limit of ''N'', this appears to agree closely with the usual HE. We go into more detail below about this.
-Our basic approach is: rather than weighting intervals by <math>(nd)^{0.5}</math>, we choose a different exponent, such as <math>(nd)^2</math>. For an exponent which is large enough (we will show that it must be greater than 1), HE does indeed converge as <math>N \to \infty</math>, and we show that this yields an expression related to the [[The_Riemann_Zeta_Function_and_Tuning|Riemann Zeta function]]. We can then use the analytic continuation of the zeta function to obtain an analytically continued curve for the <math>(nd)^{0.5}</math> weighting, which we then show empirically does indeed appear to be what HE converges on for large values of <math>N</math>.
+Our basic approach is: rather than weighting intervals by <math>(nd)^{0.5}</math>, we choose a different exponent, such as <math>(nd)^2</math>. For an exponent which is large enough (we will show that it must be greater than 1), HE does indeed converge as <math>N \to \infty</math>, and we show that this yields an expression related to the [[The_Riemann_Zeta_Function_and_Tuning|Riemann Zeta function]]. We can then use the analytic continuation of the zeta function to obtain an analytically continued curve for the <math>(nd)^{0.5}</math> weighting, which we then show empirically does indeed appear to be what HE converges on for large values of ''N''.
 In short, what we will show is that the Fourier Transform of this unnormalized Shannon Harmonic Entropy is given by
@@ Line 367: / Line 359: @@
 $$|\zeta(0.5+it)|^2 \cdot \overline {\phi(t)}$$
-where <math>\phi(t)</math> is the characteristic function of the spreading distribution and <math>\overline {\phi(t)}</math> is complex conjugation. Below we also give an expression for the Renyi entropy for arbitrary choice of the parameter <math>a</math>.
+where <math>\phi(t)</math> is the characteristic function of the spreading distribution and <math>\overline {\phi(t)}</math> is complex conjugation. Below we also give an expression for the Renyi entropy for arbitrary choice of the parameter ''a''.
 This enables us to speak cognizantly of the harmonic entropy of an interval as measured against ''all'' rational numbers.
@@ Line 374: / Line 366: @@
 Our derivation only analytically continues the entropy function for the "unnormalized" set of probabilities, which we previously wrote as <math>Q(j|c)</math>. For this definition to be philosophically perfect, we would want to analytically continue the entropy function for the normalized sense of probabilities, previously written as <math>P(j|c)</math>.
-However, in practice, the "unnormalized entropy" appears to be an extremely good approximation to the normalized entropy for large values of <math>N</math>. The resulting curve has approximately the same minima and maxima as HE, the same general shape, and for all intents and purposes looks exactly like HE, just shifted on the y-axis.
+However, in practice, the "unnormalized entropy" appears to be an extremely good approximation to the normalized entropy for large values of ''N''. The resulting curve has approximately the same minima and maxima as HE, the same general shape, and for all intents and purposes looks exactly like HE, just shifted on the y-axis.
-Here are some examples for different values of <math>s</math>. All of these are Shannon HE (<math>a=1</math>), using <math>\sqrt{nd}</math> weights, with unreduced rationals (more on this below), with the bound that <math>nd < 1000000</math>, just with different values of <math>s</math>. All have been scaled so that the minimum entropy is 0, and the maximum entropy is 1:
+Here are some examples for different values of ''s''. All of these are Shannon HE (<math>a=1</math>), using <math>\sqrt{nd}</math> weights, with unreduced rationals (more on this below), with the bound that <math>nd < 1000000</math>, just with different values of ''s''. All have been scaled so that the minimum entropy is 0, and the maximum entropy is 1:
 [[File:HE vs UHE s=0.5%.png|800px]]
@@ Line 384: / Line 376: @@
 [[File:HE vs UHE s=1.5%.png|800px]]
-As you can see, the unnormalized version is extremely close to a linear function of the normalized one. A similar situation holds for larger values of <math>a</math>. The Pearson correlation coefficient of "rho" is also given, and is typically very close to 1 - for example, for <math>s=1%</math>, it's equal to 0.99922. The correlation also seems to get better with increasing values of <math>N</math>, such that the correlation for N=1,000,000 (shown above) is much better than the one for N=10,000 (not pictured).
+As you can see, the unnormalized version is extremely close to a linear function of the normalized one. A similar situation holds for larger values of ''a''. The Pearson correlation coefficient of "rho" is also given, and is typically very close to 1 - for example, for <math>s=1%</math>, it's equal to 0.99922. The correlation also seems to get better with increasing values of ''N'', such that the correlation for N=1,000,000 (shown above) is much better than the one for N=10,000 (not pictured).
-In the above examples, note that there are slightly adjusted values of <math>s</math> (usually by less than a cent) between the normalized and unnormalized comparisons for each plot. For example, in the plot for <math>s=1%</math>, corresponding to 17.2264 cents, we compare to a slightly adjusted UHE of 16.4764 cents. This is because, empirically, sometimes a very slight adjustment corresponds to a better correlation coefficient, suggesting that the UHE may be equivalent to the HE with a miniscule adjustment in the value of <math>s</math>.
+In the above examples, note that there are slightly adjusted values of ''s'' (usually by less than a cent) between the normalized and unnormalized comparisons for each plot. For example, in the plot for <math>s=1%</math>, corresponding to 17.2264 cents, we compare to a slightly adjusted UHE of 16.4764 cents. This is because, empirically, sometimes a very slight adjustment corresponds to a better correlation coefficient, suggesting that the UHE may be equivalent to the HE with a miniscule adjustment in the value of ''s''.
-It would be nice to show the exact relationship of unnormalized entropy to the normalized entropy in the limit of large <math>N</math>, and whether the two converge to be exactly equal (perhaps given some miniscule adjustment in <math>s</math> or <math>a</math>). However, we will leave this for future research, as well as the question of how to do an exact derivation of normalized HE.
+It would be nice to show the exact relationship of unnormalized entropy to the normalized entropy in the limit of large ''N'', and whether the two converge to be exactly equal (perhaps given some miniscule adjustment in ''s'' or ''a''). However, we will leave this for future research, as well as the question of how to do an exact derivation of normalized HE.
 For now, we will start with a derivation of the unnormalized entropy for <math>N=\infty</math>, as an interesting function worthy of study in its own right - not only because it looks exactly like HE, but because it leads to an expression for unnormalized HE in terms of the [[The_Riemann_Zeta_Function_and_Tuning|Riemann Zeta function]].
@@ Line 423: / Line 415: @@
 $$\displaystyle \text{UHE}_a(c) = \frac{1}{1-a} \log \left( S^a \ast K^a \right)(-c)$$
-where, as before, <math>S^a</math> is our spreading function, taken to the <math>a</math>'th power, and <math>K^a</math> is our convolution kernel, with the weights on the delta functions taken to the <math>a</math>'th power as described previously.
+where, as before, <math>S^a</math> is our spreading function, taken to the ''a'''th power, and <math>K^a</math> is our convolution kernel, with the weights on the delta functions taken to the ''a'''th power as described previously.
-Note that if <math>S</math> is symmetric, as in the case of the Gaussian or Laplace distributions, then the inverted argument of <math>(-c)</math> on the end is redundant, and can be replaced by <math>(c)</math>.
+Note that if ''S'' is symmetric, as in the case of the Gaussian or Laplace distributions, then the inverted argument of <math>(-c)</math> on the end is redundant, and can be replaced by <math>(c)</math>.
@@ Line 436: / Line 428: @@
 ==== Analytic Continuation of the Convolution Kernel ====
-The definition for <math>K</math> is:
+The definition for ''K'' is:
 $$\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|}$$
-where <math>\|j\|</math> represents the weighting of the JI basis ratio <math>j</math>. In the particular case of Tenney weighting, we get:
+where <math>\|j\|</math> represents the weighting of the JI basis ratio ''j''. In the particular case of Tenney weighting, we get:
 $$\displaystyle K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{(j_n \cdot j_d)^{0.5}}$$
-where <math>j_n</math> and <math>j_d</math> are the numerator and denominator of <math>j</math>, respectively.
+where ''j_n'' and ''j_d'' are the numerator and denominator of ''j'', respectively.
@@ Line 470: / Line 462: @@
-Now, we note our summation is currently written simply as <math>\sum_{j \in J}</math>. For a Tenney height weighting, we typically bound by <math>\sqrt{nd} < N</math> for some <math>N</math>. However, although it is unusual, for the sake of simplifying the derivation, we will bound by <math>\max(n,d) < N</math> instead, despite the use of Tenney height for our weighting. This will not end up being much of a problem, as the two will converge on the same result anyway.
+Now, we note our summation is currently written simply as <math>\sum_{j \in J}</math>. For a Tenney height weighting, we typically bound by <math>\sqrt{nd} < N</math> for some ''N''. However, although it is unusual, for the sake of simplifying the derivation, we will bound by <math>\max(n,d) < N</math> instead, despite the use of Tenney height for our weighting. This will not end up being much of a problem, as the two will converge on the same result anyway.
 Bounding by <math>\max(n,d) < N</math> is the same as specifying that <math>j_n < N</math> and <math>j_d < N</math>. Doing so, we get
@@ Line 513: / Line 505: @@
 $$\displaystyle K^a(n) = \mathcal{F}^{-1}\left\{|\zeta(0.5a+ t) |^2\right\}(n)$$
-so that the choice of <math>a</math> simply changes our choice of vertical slice of the Riemann zeta function, as well as the shape of our spreading function (because it is also being raised to a power). If our spreading function is a Gaussian, then we simply get another Gaussian with a different standard deviation.
+so that the choice of ''a'' simply changes our choice of vertical slice of the Riemann zeta function, as well as the shape of our spreading function (because it is also being raised to a power). If our spreading function is a Gaussian, then we simply get another Gaussian with a different standard deviation.
 ==== Analytic Continuation of Unnormalized Harmonic Rényi Entropy ====
-We can put this back into our equation for the Unnormalized Harmonic Rényi Entropy. To do so, we will continue with our change of units from cents to nepers, corresponding to a change of our variable from <math>c</math> to <math>n</math>. We will likewise assume the spreading probability distribution <math>S</math> has been scaled to reflect the new choice of units.
+We can put this back into our equation for the Unnormalized Harmonic Rényi Entropy. To do so, we will continue with our change of units from cents to nepers, corresponding to a change of our variable from ''c'' to ''n''. We will likewise assume the spreading probability distribution ''S'' has been scaled to reflect the new choice of units.
@@ Line 539: / Line 531: @@
-We can simplify the expression of the above if we likewise take the Fourier transform of <math>S</math>. If we do, we obtain the [https://en.wikipedia.org/wiki/Characteristic_function_(probability_theory) characteristic function] of the distribution, which is typically denoted by <math>\phi(t)</math>. We will use the following definitions:
+We can simplify the expression of the above if we likewise take the Fourier transform of ''S''. If we do, we obtain the [https://en.wikipedia.org/wiki/Characteristic_function_(probability_theory) characteristic function] of the distribution, which is typically denoted by <math>\phi(t)</math>. We will use the following definitions:
 $$\displaystyle \phi(t) = \mathcal{F}\left\{S(n)\right\}(t)$$
@@ Line 576: / Line 568: @@
 [[File:ExpUHE vs zeta s=1.5%.png|800px]]
-Note that in all these plots, the value of <math>a</math> is chosen to be <math>1.00001</math> rather than exactly <math>1</math>, so as to avoid that <math>(1-a)</math> term becoming 0. Similar results are seen for other choices of <math>a</math>:
+Note that in all these plots, the value of ''a'' is chosen to be <math>1.00001</math> rather than exactly ''1'', so as to avoid that <math>(1-a)</math> term becoming 0. Similar results are seen for other choices of ''a'':
 ==== s=1%, a=2.2 ====
@@ Line 587: / Line 579: @@
 ''Note: this section is for future research; some of it needs to be put on more rigorous footing, but we've left it as it's certainly interesting.''
-Let's go back to our original convolution expression for finite-<math>N</math> UHE:
+Let's go back to our original convolution expression for finite-''N'' UHE:
 $$\displaystyle \text{UHE}_a(c) = \frac{1}{1-a} \log \left(\left( S^a \ast K^a \right)(-c)\right)$$
@@ Line 603: / Line 595: @@
 $$\displaystyle \text{UHE}_a(c) = \frac{1}{1-a} \log \left(U(0) + \tilde{U}(c) \right)$$
-Lastly, suppose we only care about the entropy function up to a vertical shift and scaling: in other words, we want to declare two functions <math>f(x), g(x)</math> to be '''linearly equivalent''', and write <math>f(x) \approx g(x)</math>, if for some <math>a, b</math> that don't depend on <math>x</math>, we have <math>f(x) = a\cdot g(x) + b</math>. This means we want to view two entropy functions as equivalent if one is just a scaled and shifted version of the other, so that when "normalizing" them (so that the entropy goes from 0 to 1), we get identical functions. Then we have all of the following relationships:
+Lastly, suppose we only care about the entropy function up to a vertical shift and scaling: in other words, we want to declare two functions <math>f(x), g(x)</math> to be '''linearly equivalent''', and write <math>f(x) \approx g(x)</math>, if for some <math>a, b</math> that don't depend on ''x'', we have <math>f(x) = a\cdot g(x) + b</math>. This means we want to view two entropy functions as equivalent if one is just a scaled and shifted version of the other, so that when "normalizing" them (so that the entropy goes from 0 to 1), we get identical functions. Then we have all of the following relationships:
 $$\displaystyle \text{UHE}_a(c) \approx \log U(c) \approx \log \left( U(c)^{\frac{1}{1-a}} \right)$$
 $$U(c) \approx \tilde{U}(c)$$
-where we have just dropped the constants of <math>\frac{1}{1-a}</math> and the constant vertical shift of <math>U(0)</math> which doesn't depend on <math>c</math>.
+where we have just dropped the constants of <math>\frac{1}{1-a}</math> and the constant vertical shift of <math>U(0)</math> which doesn't depend on ''c''.
-Now, the main thing is that, if we are in the region where <math>a ≤ 2</math>, then this is also the region where the <math>U(0)</math> term goes to infinity as <math>N</math> increases: the entropy doesn't converge. And in general, we have the asymptotic expansion
+Now, the main thing is that, if we are in the region where <math>a ≤ 2</math>, then this is also the region where the <math>U(0)</math> term goes to infinity as ''N'' increases: the entropy doesn't converge. And in general, we have the asymptotic expansion
 $$
@@ Line 617: / Line 609: @@
 $$
-and, for large <math>k</math>, '''as long as''' <math>x \ll k</math>, the higher-order terms become negligible. This means, for all <math>c</math>, we would need to show that <math>\tilde{U}(c) \ll U(0)</math> as <math>N \to \infty</math>. We would then be able to rewrite the above as
+and, for large ''k'', '''as long as''' <math>x \ll k</math>, the higher-order terms become negligible. This means, for all ''c'', we would need to show that <math>\tilde{U}(c) \ll U(0)</math> as <math>N \to \infty</math>. We would then be able to rewrite the above as
 $$\displaystyle \log U(c) \sim \frac{1}{1-a} \left (\log (U(0)) + \frac{\tilde{U}(c)}{U(0)} \right)$$
@@ Line 657: / Line 649: @@
 $$
-Lastly, we note that for any particular choice of <math>a</math> and <math>N</math>, the above is simply linearly equivalent to
+Lastly, we note that for any particular choice of ''a'' and ''N'', the above is simply linearly equivalent to
 $$
@@ Line 680: / Line 672: @@
 Now, the only missing piece needed for all of this is to show that we really do have <math>\tilde{U}(c) \ll U(0)</math> in the region of interest. For now, absent mathematical proof, we will simply plot the behavior for the Shannon entropy as <math>N \to \infty</math>.
-What we see is that, while the function diverges, it diverges in a certain "uniform" sense. That is, as <math>N</math> increases, a constant vertical offset is added to <math>U(c)</math>, so that the function blows up to infinity. However, if this vertical offset is corrected for, for example by subtracting U(0), the resulting curve doesn't seem to grow at all, but rather shrinks in height slightly until it seems to converge. We would like to prove this formally, but for now, we can at least see this from the following plot:
+What we see is that, while the function diverges, it diverges in a certain "uniform" sense. That is, as ''N'' increases, a constant vertical offset is added to <math>U(c)</math>, so that the function blows up to infinity. However, if this vertical offset is corrected for, for example by subtracting U(0), the resulting curve doesn't seem to grow at all, but rather shrinks in height slightly until it seems to converge. We would like to prove this formally, but for now, we can at least see this from the following plot:
 [[File:ExpUHE-asymptotic-growth.png|800px]]
-In other words, we can see that as <math>N</math> increases, the growth rate of <math>U(0)</math> dwarfs that of <math>\tilde{U}(c)</math>, which does not seem to grow at all.
+In other words, we can see that as ''N'' increases, the growth rate of <math>U(0)</math> dwarfs that of <math>\tilde{U}(c)</math>, which does not seem to grow at all.
 So, this is a fairly weak conjecture to make, given that empirical evidence suggests something much stronger - that not only does it grow more slowly, but that it seems to not grow at all - it converges! In particular, it seems to converge on our analytic continuation from before. However, a strict proof of any of these things would be nice.
@@ Line 710: / Line 702: @@
 [[File:HE_normalization_terms.png|800px]]
-This picture shows how the denominator changes as <math>N</math> increases: you can see that in general, the function is shifted upward, increasing without bound. The thin plots reflect this for N=1000, 5000, 10000, 50000, and 100000, where you can see them increasing.
+This picture shows how the denominator changes as ''N'' increases: you can see that in general, the function is shifted upward, increasing without bound. The thin plots reflect this for N=1000, 5000, 10000, 50000, and 100000, where you can see them increasing.
 You will note that the denominator also looks exactly like unnormalized HE, just upside down. Normalized HE is the quotient of two functions that both look like this, which are slightly different. This quotient produces the usual HE curve, which is flipped upside down relative to the denominator, and which also increases without bound. That all these functions increase without bound is just another way to state that these things generally don't converge as <math>N \to \infty</math>.
-However, look at what happens with our analytic continuation, which is given by the thicker blue line at the bottom. Despite our sequence of finite-<math>N</math> denominator terms increasing on the y-axis, the analytically continued version suddenly "snaps" back to zero. Although the curve shape is roughly the same, the vertical offset is almost completely eliminated when the analytic continuation is done.
+However, look at what happens with our analytic continuation, which is given by the thicker blue line at the bottom. Despite our sequence of finite-''N'' denominator terms increasing on the y-axis, the analytically continued version suddenly "snaps" back to zero. Although the curve shape is roughly the same, the vertical offset is almost completely eliminated when the analytic continuation is done.
 The problem here is that the original HE function was the quotient of two very large, strictly positive functions - the numerator and denominator. However, performing the analytic continuation on each separately has caused both to "snap" back to zero, so that the denominator, while retaining the same shape, now has points where it touches the x-axis. As a result, the quotient of the two will have poles where the denominator is zero.
@@ Line 724: / Line 716: @@
 Those "spikes" are poles where the denominator is zero.
-The problem is that we're really stretching the boundaries of complex analysis with this. With unnormalized HE, we were able to analytically continue the Fourier transform of exp-UHE to obtain a concrete expression in terms of the Riemann zeta function. While complex analysis makes no guarantees on the behavior of the Fourier transform of the analytic continuation of a holomorphic function, we did see the result seemed to converge on exp-UHE in the limit of large <math>N</math> when transforming back from the Fourier domain, confirming empirically that our analytically continued expression seemed to make sense.
+The problem is that we're really stretching the boundaries of complex analysis with this. With unnormalized HE, we were able to analytically continue the Fourier transform of exp-UHE to obtain a concrete expression in terms of the Riemann zeta function. While complex analysis makes no guarantees on the behavior of the Fourier transform of the analytic continuation of a holomorphic function, we did see the result seemed to converge on exp-UHE in the limit of large ''N'' when transforming back from the Fourier domain, confirming empirically that our analytically continued expression seemed to make sense.
 But in the case of "normalized HE," we analytically continued the Fourier transforms of the numerator and denominator, separately, transformed both out of the Fourier domain, and then took the quotient. Complex analysis ''really'' makes no guarantee on the behavior of the quotient of two Fourier transforms of the analytic continuations of holomorphic functions, and in this case the behavior is very strange. A different approach to analytically continuing the expression would be required.
@@ Line 730: / Line 722: @@
 This same principle explains why we plotted the exp of UHE, rather than UHE itself. Were we to take the log of finite UHE, we would be taking the log of a strictly positive function. However, the analytically continued exp-UHE snaps back to the x-axis, so that there are points where the function is zero or even negative. Taking the log of the analytically continued exp-UHE would yield a complex-valued function where it is negative, due to this snapping effect. However, looking at exp-UHE directly has no such problem.
-Finally, it is noteworthy that for <math>a>2</math>, we end up looking at slices of the zeta function for which <math>\Re(z)>1</math>. This is where our original unnormalized HE function should converge as <math>N \to \infty</math>, corresponding to the region where the Riemann zeta function Dirichlet series converges. For these values of <math>a</math>, the exp-UHE ''is'' positive. So, we can take the log again and look at the usual UHE. This can be useful for plotting, since exp-UHE tends to "flatten" out the curve for high values of <math>a</math>, whereas taking the log accentuates the minima and maxima (and more closely resembles the usual HRE).
+Finally, it is noteworthy that for <math>a>2</math>, we end up looking at slices of the zeta function for which <math>\Re(z)>1</math>. This is where our original unnormalized HE function should converge as <math>N \to \infty</math>, corresponding to the region where the Riemann zeta function Dirichlet series converges. For these values of ''a'', the exp-UHE ''is'' positive. So, we can take the log again and look at the usual UHE. This can be useful for plotting, since exp-UHE tends to "flatten" out the curve for high values of ''a'', whereas taking the log accentuates the minima and maxima (and more closely resembles the usual HRE).
 === Interpretation as a New Free Parameter: the Weighting Exponent ===
 In our original derivation of the analytic continuation, we temporarily changed the weighting for rationals from <math>(nd)^{0.5}</math> to some other <math>(nd)^w</math>, with <math>w > 1</math>, for the sake of obtaining a series that converges. We then changed the exponent back to <math>0.5</math>.
-This can be thought of as giving us another free parameter to HE, in addition to <math>s</math> and <math>a</math>: the exponent for the weighting for each rational. That is, although Paul originally derived the <math>(nd)^{0.5}</math> exponent empirically by studying the behavior of mediant-to-mediant HE for Tenney-bounded rationals, there is no reason we can't simply that exponent to something else. As shown before, so long as that exponent is greater than 1, unnormalized HE will converge in the limit as <math>N -> \infty</math>, and will converge to the same thing whether we are bounding <math>nd < N</math>, <math>\max(n,d) < N</math>, or anything else (see again [https://math.stackexchange.com/questions/2593993/convergence-of-product-of-series-to-zeta-function here]). We can then analytically continue to the case where <math>w < 1</math>.
+This can be thought of as giving us another free parameter to HE, in addition to ''s'' and ''a'': the exponent for the weighting for each rational. That is, although Paul originally derived the <math>(nd)^{0.5}</math> exponent empirically by studying the behavior of mediant-to-mediant HE for Tenney-bounded rationals, there is no reason we can't simply that exponent to something else. As shown before, so long as that exponent is greater than 1, unnormalized HE will converge in the limit as <math>N -> \infty</math>, and will converge to the same thing whether we are bounding <math>nd < N</math>, <math>\max(n,d) < N</math>, or anything else (see again [https://math.stackexchange.com/questions/2593993/convergence-of-product-of-series-to-zeta-function here]). We can then analytically continue to the case where <math>w < 1</math>.
-If we add this as a third parameter, called <math>w</math> we can modify our definition of exp-UHE as follows:
+If we add this as a third parameter, called ''w'' we can modify our definition of exp-UHE as follows:
 $$\displaystyle \exp((1-a) \text{UHE}_{a,w}(n)) = \mathcal{F}^{-1}\left\{\overline \phi_a \cdot |\zeta_{w a}|^2\right\}$$
@@ Line 743: / Line 735: @@
 So that our vertical slice of the zeta function is given by $\Re(z) = w\cdot \a$.
-=== Equivalence of the Weighting Exponent and <math>a</math> for Generalized Normal Distributions ===
+=== Equivalence of the Weighting Exponent and ''a'' for Generalized Normal Distributions ===
 We get a very interesting result if our spreading distribution is a [https://en.wikipedia.org/wiki/Generalized_normal_distribution generalized normal distribution], which a family that encompasses both the Gaussian and the Laplace distributions (sometimes referred to as the "Vos curve" in Paul's work).
@@ Line 751: / Line 743: @@
 $$\displaystyle \exp((1-a) \text{UHE}_{a,w}(n)) = \mathcal{F}^{-1}\left\{\overline \phi_a \cdot |\zeta_{w a}|^2\right\}$$
-We can see that, in a sense, the need for both <math>a</math> and <math>w</math> is almost redundant. Their product specifies the vertical slice of the zeta function. If you set <math>w=0.5</math> and <math>a=1</math>, corresponding to the Shannon entropy with <math>\sqrt{nd}</math> weighting, you get the same vertical slice as if you set <math>w=0.25</math> and <math>a=2</math>, corresponding to the collision entropy with <math>^4\sqrt{nd}</math> weighting: in both cases this is the critical line of the zeta function.
+We can see that, in a sense, the need for both ''a'' and ''w'' is almost redundant. Their product specifies the vertical slice of the zeta function. If you set <math>w=0.5</math> and <math>a=1</math>, corresponding to the Shannon entropy with <math>\sqrt{nd}</math> weighting, you get the same vertical slice as if you set <math>w=0.25</math> and <math>a=2</math>, corresponding to the collision entropy with <math>^4\sqrt{nd}</math> weighting: in both cases this is the critical line of the zeta function.
 The only reason that these expressions are different is due to the <math>\phi_a</math> above. We had previously defined that as:
@@ Line 757: / Line 749: @@
 $$\displaystyle \phi_a(t) = \mathcal{F}\left\{S(n)^a\right\}(t)$$
-or, the Fourier transform of the spreading distribution, raised to the power of <math>a</math>. So if you hold the product <math>w a</math> as constant, but change the balance of <math>w</math> and <math>a</math>, you will indeed get different results, simply because only the choice of <math>a</math> changes the <math>\phi_a</math>.
+or, the Fourier transform of the spreading distribution, raised to the power of ''a''. So if you hold the product <math>w a</math> as constant, but change the balance of ''w'' and ''a'', you will indeed get different results, simply because only the choice of ''a'' changes the <math>\phi_a</math>.
-However, we get a very neat result if we are using the generalized normal distribution. In that case, if we take the generalized normal distribution to a power <math>a</math>, we get another instance of the same generalized normal distribution. The difference is, the variance will be divided by <math>a^{\frac{1}{\beta}}</math>, where <math>\beta</math> is the shape parameter for the distribution (a value of 1 is the Laplace distribution, a value of 2 is the Gaussian distribution, etc). The whole distribution will also no longer have an integral of 1, since we have also raised the scaling coefficient to a power, but this won't change anything, as it just corresponds to a uniform scaling of the end result.
+However, we get a very neat result if we are using the generalized normal distribution. In that case, if we take the generalized normal distribution to a power ''a'', we get another instance of the same generalized normal distribution. The difference is, the variance will be divided by <math>a^{\frac{1}{\beta}}</math>, where <math>\beta</math> is the shape parameter for the distribution (a value of 1 is the Laplace distribution, a value of 2 is the Gaussian distribution, etc). The whole distribution will also no longer have an integral of 1, since we have also raised the scaling coefficient to a power, but this won't change anything, as it just corresponds to a uniform scaling of the end result.
-In practice, what this means is that if you are using one of the above distributions, and you change <math>a</math>, this is ''equivalent'' to changing the weighting exponent <math>w</math>, and tweaking the standard deviation <math>s</math> according to the above equation.
+In practice, what this means is that if you are using one of the above distributions, and you change ''a'', this is ''equivalent'' to changing the weighting exponent ''w'', and tweaking the standard deviation ''s'' according to the above equation.
-This gives us a very nice interpretation of our <math>a</math> coefficient from HRE: it basically represents the weighting exponent on the rationals, with a corresponding adjustment to the standard deviation. The collision entropy <math>a=2</math> with the standard weighting <math>\sqrt{nd}</math> is totally equivalent to the Shannon entropy <math>a=1</math> with the weighting <math>nd</math> on the rationals, so long as the value of <math>s</math> is adjusted according to the equation above. However, it should be noted that this definition only holds for the "unnormalized HRE" given above.
+This gives us a very nice interpretation of our ''a'' coefficient from HRE: it basically represents the weighting exponent on the rationals, with a corresponding adjustment to the standard deviation. The collision entropy <math>a=2</math> with the standard weighting <math>\sqrt{nd}</math> is totally equivalent to the Shannon entropy <math>a=1</math> with the weighting ''nd'' on the rationals, so long as the value of ''s'' is adjusted according to the equation above. However, it should be noted that this definition only holds for the "unnormalized HRE" given above.
 === Reduced Rationals Only ===
@@ Line 771: / Line 763: @@
 $$\displaystyle \mathcal{F}\left\{K(n)\right\}(t) = \sum_{j \in J} \frac{e^{i  t \log (j_n/j_d)}}{(j_n \cdot j_d)^{w}}$$
-Now, suppose we want to analytically continue this so that the set <math>J</math> is the set of all reduced rational numbers. We can first do so by starting again with unreduced rationals, but expressing each rational not as <math>\frac{n}{d}</math>, but rather as <math>\frac{n}{d} \cdot \frac{c}{c}</math>, where <math>n'</math> and <math>d'</math> are coprime, and <math>c</math> is the gcd of both. For example, we would express <math>\frac{6}{4}</math> as <math>\frac{3}{2} \cdot \frac{2}{2}</math>. Doing so, and assuming that we denote the set of unreduced rationals by <math>\mathbb{U}</math>, we get the following equivalent expression of the same convolution kernel above:
+Now, suppose we want to analytically continue this so that the set ''J'' is the set of all reduced rational numbers. We can first do so by starting again with unreduced rationals, but expressing each rational not as <math>\frac{n}{d}</math>, but rather as <math>\frac{n}{d} \cdot \frac{c}{c}</math>, where <math>n'</math> and <math>d'</math> are coprime, and ''c'' is the gcd of both. For example, we would express <math>\frac{6}{4}</math> as <math>\frac{3}{2} \cdot \frac{2}{2}</math>. Doing so, and assuming that we denote the set of unreduced rationals by <math>\mathbb{U}</math>, we get the following equivalent expression of the same convolution kernel above:
 $$\displaystyle \mathcal{F}\left\{K(n)\right\}(t) = \sum_{j \in \mathbb{U}} \frac{e^{i  t \log (\frac{j_c j_{n'}}{j_c j_{d'}})}}{(j_c j_{n'} \cdot j_c j_{d'})^{w}} = |\zeta(w+i t)|^2$$
@@ Line 785: / Line 777: @@
 $$\displaystyle |\zeta(w+i t)|^2 = \left[ \sum_{j_c \in \mathbb{N}^+} \frac{1}{{j_c}^{2w}} \right] \cdot \left[ \sum_{j \in \mathbb{Q}} \frac{e^{i  t \log (\frac{j_{n'}}{j_{d'}})}}{(j_{n'} j_{d'})^{w}} \right]$$
-where the left summation now has <math>j_c \in \mathbb{N}^+</math>, the set of strictly positive rational numbers, and the right summation now has <math>j \in \mathbb{Q}</math> the set of reduced rationals. Note again that the product above yields all unreduced rationals, thanks to the <math>j_c</math>.
+where the left summation now has <math>j_c \in \mathbb{N}^+</math>, the set of strictly positive rational numbers, and the right summation now has <math>j \in \mathbb{Q}</math> the set of reduced rationals. Note again that the product above yields all unreduced rationals, thanks to the ''j_c''.
 Now, note that that left series is, itself, just another Dirichlet series that converges to the zeta function. We have
@@ Line 797: / Line 789: @@
 This function then becomes our new <math>\mathcal{F}\left\{K(n)\right\}</math>.
-However, you will note that <math>\zeta(2w)</math> is a constant not depending at all on <math>t</math>. As a result, the reduced rational kernel is exactly equal to the unreduced rational kernel, times a constant depending only on <math>w</math>. This means that when we take the inverse Fourier transform and convolve, the result for exp-UHE will likewise be identical, scaled only by a constant.
+However, you will note that <math>\zeta(2w)</math> is a constant not depending at all on ''t''. As a result, the reduced rational kernel is exactly equal to the unreduced rational kernel, times a constant depending only on ''w''. This means that when we take the inverse Fourier transform and convolve, the result for exp-UHE will likewise be identical, scaled only by a constant.
 As a result, we have shown that we get the same exact results for reduced and unreduced rationals, differing only by a multiplicative scaling.