Harmonic entropy
Harmonic Entropy, sometimes abbreviated as "HE", is a simple model to quantify the extent to which musical chords exhibit various psychoacoustic effects, lumped together in a single construct called psychoacoustic concordance. It was invented by Paul Erlich and developed extensively on the Yahoo! tuning and harmonic_entropy lists. Various later contributions to the model have been made by Steve Martin, Mike Battaglia, Keenan Pepper, and others.
Background
The general workings of the human auditory system lead to a plethora of well-documented and sonically interesting phenomena that can occur when a musical chord is played:
- The perception of partial timbral fusion of the chord into one complex sound
- The appearance of a virtual fundamental pitch in the bass
- Critical band effects, such as timbral beatlessness, compared to mistunings of the chord in the surrounding area
- The appearance of a quick fluttering effect sometimes known as periodicity buzz
There has been much research specifically on the musical implications critical band effects in the literature (e.g. Sethares's work), which are perhaps the psychoacoustic phenomena that readers are most familiar with. However, the modern xenharmonic community has displayed immense interest in exploring the other effects mentioned above as well, which have proven extremely important to the development of modern xenharmonic music.
These effects sometimes behave differently, and do not always appear strictly in tandem with one another. For instance, Paul Erlich has noted that most models for beatlessness measure 10:12:15 and 4:5:6 as being identical, whereas the latter exhibits more timbral fusion and a more salient virtual fundamental than the former.
However, suppose we want to come up with a combined measure for how often effects such as the above tend to occur. It is then useful to note that:
- In general, effects such as these tend to appear most strongly for those chords with large subsets that correspond to simple chunks of the harmonic series
- In general, the effects produced exhibit some degree of tolerance for mistuning
This enables us to speak of a general notion of the psychoacoustic concordance of an interval - the degree to which effects such as the above will appear when an arbitrary musical chord is played. Additionally, chords which are very inharmonic often exhibit a quality known as psychoacoustic discordance.
While psychoacoustic concordance is not a feature universal to all styles of music, it has been utilized significantly in Western music in the study of intonation. For instance, flexible-pitch ensembles operating within 12-EDO, such as barbershop quartets and string ensembles, will often adjust intonationally from the underlying 12-EDO reference to maximize the concordance of individual chords. Indeed, the entire history of Western tuning theory -- from meantone temperament, to the various Baroque well-temperaments, to 12-EDO itself, to the modern theory of regular temperament -- can be seen as an attempt to reason mathematically about how to generate manageable tuning systems that will maximize concordance and minimize discordance.
The Harmonic Entropy model is a simple way of quantifying how much an arbitrary chord will exhibit psychoacoustic concordance.
Concordance has often been confused with actual musical consonance, an unfortunate fact made more common by the psychoacoustics literature under the unfortunate name sensory consonance, most often used to refer to phenomena related to roughness and beatlessness specifically. This is not to be confused with the more familiar construct of tonal stability, typically just called "consonance" in Western common practice music theory and sometimes clarified as "musical consonance" in the music cognition literature. To make matters worse, the literature has also at times referred to concordance -- and not tonal stability -- as tonal consonance, often referring to phenomena related to virtual pitch integration, creating a complete terminological mess. As a result, the term "consonance" has been completely avoided in this article.
Basic Model: Shannon Entropy
The original Harmonic Entropy model limited itself to working with dyads. More recently, work by Steve Martin and others has extended this basic idea to higher-cardinality chords. This article will concern itself with dyads, as the dyadic case is still the most well-developed, and many of the ideas extend naturally to larger chords without need for much exposition.
The general idea of Harmonic Entropy is to first develop a discrete probability distribution quantifying how strongly an arbitrary incoming dyad "matches" every element in a set of basis rational intervals, and then seeing how evenly distributed the resulting probabilities are. If the distribution for some dyad is spread out very evenly, such that there is no clear "victor" basis interval that dominates the distribution, the dyad is considered to be more discordant; on the other extreme, if the distribution tends to concentrate on one or a small set of dyads, the dyad is considered to be more concordant.
A clear mathematical way of quantifying this "dispersion" is via the Shannon entropy of the probability distribution, which can be thought of as describing the "uncertainty" in the distribution. A distribution which has a very high probability of picking one outcome has low entropy and is not very uncertain, whereas a distribution which has the probability spread out on many outcomes is highly uncertain and has a high entropy.
Definitions
To formalize our notion of Shannon entropy, we will first describe the random variable [math]\displaystyle{ J }[/math], representing the set of JI "basis" intervals that our incoming interval is being "matched" to, and the parameter [math]\displaystyle{ C }[/math], representing the "cents" of the incoming interval being played. For example, the interval [math]\displaystyle{ C }[/math] would take values such as "400 cents," and the interval [math]\displaystyle{ J }[/math] would take values in the set of basis ratios, such as "5/4" or "9/7."
So for example, if we want to express the probability that the incoming dyad "400 cents" is perceived as the JI basis interval "5/4," we would write that as the conditional probability [math]\displaystyle{ \newcommand{\cent}{\text{¢}} }[/math] [math]\displaystyle{ P(J=5/4|C=400\cent) }[/math]
Or, in general, if we want to write the conditional probability that some incoming dyad of [math]\displaystyle{ c }[/math] cents is perceived as the JI basis interval [math]\displaystyle{ j }[/math], we would write that as
[math]\displaystyle{ P(J=j|C=c) }[/math]
Note that at this point, we haven't yet specified what the particular probability distribution is. There are different ways to do this, which are described in more detail below. Generally, most approaches involve each JI interval's probability being assigned based on how close it is to [math]\displaystyle{ c }[/math] (closer dyads are given a larger probability), and how simple it is (simple dyads are given a higher probability, if distance is the same).
A noteworthy point is that we generally do not assume any probability distribution on [math]\displaystyle{ C }[/math]. This reflects that we do not make any assumptions at all about which notes or intervals are likely to be played to begin with. In other words, we are treating [math]\displaystyle{ C }[/math] more as a "parameter" rather than as a random variable.
Once we have decided on a probability distribution, we can finally evaluate the Shannon entropy. For a random variable [math]\displaystyle{ X }[/math], the Shannon entropy is defined as:
[math]\displaystyle{ H(X) = -\sum_{x \in X} P(X=x) \log_b P(X=x) }[/math]
where the different [math]\displaystyle{ x }[/math] are taken from the sample space of [math]\displaystyle{ X }[/math], and [math]\displaystyle{ b }[/math] is the base of the log. Different choices of [math]\displaystyle{ b }[/math] simply change the units in which entropy is given, the most common values being 2 and e, denoting "bits" and "nats". We will omit the base going forward, for simplicity.
In our case, we want to find the entropy of the random variable [math]\displaystyle{ J }[/math] of JI intervals, given a particular choice of incoming dyad in cents. The corresponding quantity that we want is:
[math]\displaystyle{ H(J|C=c) = -\sum_{j \in J} P(J=j|C=c) \log P(J=j|C=c) }[/math]
Note that above, the summation is only taken on the [math]\displaystyle{ j }[/math] from the sample space of [math]\displaystyle{ JI }[/math] (i.e. the set of JI basis intervals), whereas the parameter [math]\displaystyle{ c }[/math] is treated as constant within the summation (and is taken as the free parameter to the function).
Since the parameter [math]\displaystyle{ c }[/math] is the free parameter, sometimes the above is notated as
[math]\displaystyle{ \text{HE}(c) = H(J|C=c) }[/math]
which makes more explicit that [math]\displaystyle{ c }[/math] is the argument to the harmonic entropy function, which is equal to the entropy of [math]\displaystyle{ J }[/math], conditioned on [math]\displaystyle{ C=c }[/math].
Probability Distributions
In order to systematically assign a probability distribution to this dyad, we first start by defining a spreading function, denoted by [math]\displaystyle{ S(x) }[/math], that dictates how the dyad is "smeared" out in log-frequency space, representing how the auditory system allows for some tolerance for mistuning. The typical choice that we will assume here for a spreading function is a Gaussian distribution, with mean centered around the incoming dyad, and standard deviation typically taken as a free parameter in the system and denoted as [math]\displaystyle{ s }[/math].
A fairly typical choice of settings for a basic dyadic HE model would be:
- The basis set is all those rationals bounded by some maximum Tenney height, with the bound typically notated as [math]\displaystyle{ N }[/math] and set to at least 10,000.
- The spreading function is typically a Gaussian distribution with a frequency deviation of 1% either way, or about s=~17 cents.
Other spreading functions have also been explored, such as the use of the heavy-tailed Laplace distribution, sometimes described as the "Vos function" in Paul's writings. These two functions are part of the Generalized normal distribution family, which has a parameter not only for the variance but for the kurtosis. However, for simplicity, we will assume the Gaussian distribution as the spreading function for the remainder of this article, so that the spreading function for an incoming dyad [math]\displaystyle{ c }[/math] can be written as follows:
[math]\displaystyle{ S(x-c) = \frac{1}{s\sqrt{2\pi}} e^{-\frac{(x-c)^2}{2s^2}} }[/math]
where the notation [math]\displaystyle{ S(x-c) }[/math] is chosen to make clear that we are translating [math]\displaystyle{ S(x) }[/math] to be centered around the incoming dyad [math]\displaystyle{ c }[/math], which is now the mean of the Gaussian.
We assume here that the variable [math]\displaystyle{ x }[/math] is a dummy variable representing cents, and will adopt this convention for the remainder of the article.
In this notation, [math]\displaystyle{ s }[/math] becomes the standard deviation of the Gaussian, being an ASCII-friendly version of the more familiar symbol [math]\displaystyle{ \sigma }[/math] for representing the standard deviation. Note that in previous expositions on Harmonic Entropy, [math]\displaystyle{ s }[/math] was sometimes given in units representing a percentage of linear-frequency deviation; we allow [math]\displaystyle{ s }[/math] to stand for cents here to simplify the notation. To convert from a percentage to cents, the formula [math]\displaystyle{ \text{cents} = 1200\log_2(1+\text{percentage}) }[/math] can be used.
It is also common to use as a basis set all those rationals bounded by some maximum Weil height, with a typical cutoff for [math]\displaystyle{ N }[/math] set to at least 100. This has sometimes been referred to as seeding HE with the "Farey sequence of order [math]\displaystyle{ N }[/math]" and its reciprocals, so references in Paul's work to "Farey series HE" vs "Tenney series HE" are sometimes seen.
Lastly, the set of rationals is often chosen to be only those "reduced" rationals within the cutoff, such that [math]\displaystyle{ n/d }[/math] is in the set only if [math]\displaystyle{ n }[/math] and [math]\displaystyle{ d }[/math] are coprime. HE can also be formulated with unreduced rationals as well. Both methods tend to give similar results. In Paul's work, reduced rationals are most common, although the use of unreduced rationals may be useful in extending HE to the case where [math]\displaystyle{ N=\infty }[/math].
Given a spreading function and set of basis rationals, there are two different procedures commonly used to assign probabilities to each rational. The first, the domain-integral approach, works for arbitrary nowhere dense sets of rationals without any further free parameters. The second, the complexity-normalization approach, has nice mathematical properties which sometimes make it easier to compute and which may lead to generalizations to infinite sets of rationals which are sometimes dense in the reals. It is conjectured that there are certain important limiting situations where the two converge; both are described in detail below.
Domain-Integral Probabilities
For discrete sets of JI basis ratios, the log-frequency spectrum can be divided up into domains assigned to each ratio. Each ratio is assigned a domain with lower bound equal to the mediant of itself and its nearest lower neighbor, and likewise with upper bound equal to the mediant of itself and its nearest upper neighbor. If no such neighbor exists, [math]\displaystyle{ \pm \infty }[/math] is used instead. Mathematically, this can be represented via the following expression:
[math]\displaystyle{ P(J=j|C=c) = \int_{\cent(j_l)}^{\cent(j_u)} S(x-c) dx }[/math]
where [math]\displaystyle{ S(x-c) }[/math] is the spreading function associated with c, [math]\displaystyle{ j_l }[/math] and [math]\displaystyle{ j_u }[/math] are the domain lower and upper bounds associated with JI basis ratio [math]\displaystyle{ j }[/math], and [math]\displaystyle{ \cent(f) = 1200\log_2(f) }[/math], or the "cents" function converting frequency ratios to cents. Typically, [math]\displaystyle{ j_l }[/math] is set equal to the mediant of [math]\displaystyle{ j }[/math] and its nearest lower neighbor (if it exists), or [math]\displaystyle{ -\infty }[/math] if not; likewise with [math]\displaystyle{ j_u }[/math] and its nearest upper neighbor.
This process can be summarized by the following picture, taken from William Sethares' paper on Harmonic Entropy:
Note the difference in terminology here - in this example, the [math]\displaystyle{ f_{j+n} }[/math] are the basis ratios, the [math]\displaystyle{ r_{j+n} }[/math] are the domains for each basis ratio, and the bounds for each domain are the mediants between each [math]\displaystyle{ f_{j+n} }[/math] and its nearest neighbor. The probability assigned to each basis ratio is then the area under the spreading function curve for each ratio's domain. The entropy of this probability distribution is then the Harmonic Entropy for that dyad.
In the case where the set of basis rationals consists of a finite set bounded by Tenney or Weil height, the resulting set of widths is conjectured to have interesting mathematical properties, leading to mathematically nice conceptual simplifications of the model. These simplifications are explained below.
Complexity-Normalization Probabilities
It has been noted empirically by Paul Erlich that, given all those rationals with Tenney height under some cutoff [math]\displaystyle{ N }[/math] as a basis set, that the domain widths for rationals sufficiently far from the cutoff seem to be proportional to [math]\displaystyle{ \frac{1}{\sqrt{nd}} }[/math].
While it's still an open conjecture that this pattern holds for arbitrarily large [math]\displaystyle{ N }[/math], the assumption is sometimes made that this is the case, and hence that for these basis ratio sets, [math]\displaystyle{ \frac{1}{\sqrt{nd}} }[/math] "approximations" to the width are sufficient to estimate domain-integral Harmonic Entropy.
This modifies the expression for the probabilities [math]\displaystyle{ P(J=j|C=c) }[/math] as follows, noting that for now the "probabilities" won't sum to 1:
[math]\displaystyle{ \hat P(J=j|C=c) = \frac{S(\cent(j)-c)}{\sqrt{j_n \cdot j_d}} }[/math]
where the [math]\displaystyle{ \hat P }[/math] notation now represents that these "probabilities" are unnormalized, and [math]\displaystyle{ j_n }[/math] and [math]\displaystyle{ j_d }[/math] are the numerator and denominator, respectively, of JI basis ratio [math]\displaystyle{ j }[/math]. Again, the set of basis rationals here is assumed to be all of those rationals of Tenney Height ≤ [math]\displaystyle{ N }[/math] for some [math]\displaystyle{ N }[/math].
A similar observation for the use of Weil-bounded subsets of the rationals suggests domain widths of [math]\displaystyle{ \frac{1}{\max(n,d)} }[/math], yielding instead the following formula:
[math]\displaystyle{ \hat P(J=j|C=c) = \frac{S(\cent(j)-c)}{\max(j_n, j_d)} }[/math]
where this time the set of basis rationals is assumed to be all of those of Weil Height ≤ [math]\displaystyle{ N }[/math] for some [math]\displaystyle{ N }[/math].
In both cases, the general approach is the same: the value of the spreading function, taken at the value of [math]\displaystyle{ \cent(j) }[/math], is divided by some sort of "complexity" function representing how much weight is given to that rational number. While the two complexity functions considered thus far were derived empirically by observing the asymptotic behavior of various height-bounded subsets of the rationals, we can generalize this for arbitrary basis sets of rationals and arbitrary complexities as follows:
[math]\displaystyle{ \hat P(J=j|C=c) = \frac{S(\cent(j)-c)}{\|j\|} }[/math]
where [math]\displaystyle{ \|j\| }[/math] denotes a complexity function mapping from rational numbers to non-negative reals.
As these "probabilities" don't sum to 1, the result is not a probability distribution at all, invalidating the use of the Shannon Entropy. To rectify this, the distribution is normalized so that the probabilities do sum to 1:
[math]\displaystyle{ P(J=j|C=c) = \frac{\hat P(J=j|C=c)}{\sum_{j \in J} \hat P(J=j|C=c)} }[/math]
which is equal to the unnormalized probability, divided by the sum of all unnormalized probabilities. This definition of [math]\displaystyle{ P(J=j|C=c) }[/math] is then used directly to compute the entropy.
This approach to assigning probabilities to basis rationals is useful because it hypothetically makes it possible to consider the HE of sets of rationals which are dense in the reals, or even the entire set of positive rationals, although the best way to do this is a subject of ongoing research.
Examples
In all of these examples, the x-axis represents the width in cents of the dyad, and the y-axis represents discordance rather than concordance, measured in nats of Shannon entropy.
s=17, N<10000, sqrt(n*d) weights
This uses as a spreading function the Gaussian distribution with [math]\displaystyle{ s=~17\cent }[/math] (or a lin-frequency deviation of 1%). The basis set is all rationals of Tenney height less than 10,000. This uses the complexity-normalization approach, and the complexity function is [math]\displaystyle{ \sqrt{nd} }[/math]:
s=17, N<100, max(n,d) weights
This example uses the same spreading function and standard deviation, but this time the basis set is all rationals of Weil height less than 100. The complexity function here is [math]\displaystyle{ \max(n,d) }[/math]:
s=17, N<10000, sqrt(n*d) vs mediant-to-mediant weights
The following image (from Paul Erlich) compares the domain-integral and complexity-normalization approaches by overlaying the two curves on top of each other. In both cases, the spreading function is again a Gaussian with s=~17 cents, and the basis set is all those rationals with Tenney height ≤ 10000. It can be seen that the curves are extremely similar, and that the locations of the minima and maxima are largely preserved:
Harmonic Rényi Entropy
An extension to the base Harmonic Entropy model, proposed by Mike Battaglia, is to generalize the use of Shannon entropy by replacing it instead with Rényi entropy, a q-analog of Shannon's original entropy. This can be thought of as adding a second parameter, called [math]\displaystyle{ a }[/math], to the model, reflecting how "intelligent" the brain's "decoding" process is when determining the most likely JI interpretation of an ambiguous interval.
Definitions and Background
The Harmonic Rényi Entropy of order a of an incoming dyad can be defined as follows:
[math]\displaystyle{ \text{HE}_a(c) = H_a(J=j|C=c) = \frac{1}{1-a} \log \sum_{j \in J} P(J=j|C=c)^a }[/math]
Being a q-analog, it is noteworthy that Rényi entropy converges to Shannon entropy in the limit as [math]\displaystyle{ a \to 1 }[/math], a fact which can be verified using L'Hôpital's rule as found here.
The Rényi entropy has found use in cryptography as a measure of the strength of a cryptographic code in the face of an intelligent attacker, an application for which Shannon entropy has long been known to be insufficient as described in this paper and this RFC. More precisely, the Rényi entropy of order [math]\displaystyle{ \infty }[/math], also called the min-entropy, is used to measure the strength of the randomness used to define a cryptographic secret against a "worst-case" attacker who has complete knowledge of the probability distribution from which cryptographic secrets are drawn.
In a musical context, by considering the incoming dyad as analogous to a cryptographic code which is attempting to be "cracked" by an intelligent auditory system, we can consider that the analogous "worst-case attacker" would be a "best-case auditory system" which has complete awareness of the probability distribution for any incoming dyad. This analogy would view such an auditory system as actively attempting to choose the most probable rational, rather than drawing a rational at random weighted by the distribution.
The use of \math>a=∞</math> min-entropy would reflect this view. In contrast, the use of [math]\displaystyle{ a=1 }[/math] Shannon entropy reflects a much "dumber" process which performs no such analysis and perhaps doesn't even seek to "choose" any sort of "victor" rational at all. As the parameter a interpolates between these two options, it can be interpreted as the extent to which the rational-matching process for incoming dyads is considered to be "intelligent" and "active" in this way.
Some psychoacoustic effects naturally fit into this paradigm, such as the virtual pitch integration process, which actually does attempt to find a single victor when matching incoming chords with chunks of the harmonic series. Other psychoacoustic effects, such as that of beatlessness, may instead be better viewed as "dumb" processes whereby nothing in particular is being "chosen," but where a more uniform distribution of matching rational numbers for a dyad simply generates a more discordant sonic effect. Different values of a can differentiate between the predominance given to these two types of effect in the overall construct of psychoacoustic concordance.
Certain values of [math]\displaystyle{ a }[/math] reduce to simpler expressions and have special names, as given in the examples below.
Examples
a=0: Harmonic Hartley Entropy
[math]\displaystyle{ H_0(J|C=c) = \log |J| }[/math]
where [math]\displaystyle{ |J| }[/math] is the cardinality of the set of basis rationals. This assumes, in essence, an "infinitely dumb" auditory system which can do no better than picking a rational number from a uniform distribution completely at random. All dyads have the same Harmonic Hartley Entropy. The Hartley Entropy is sometimes called the "max-entropy," and is useful mainly as an upper bound on the other forms of entropy: all Rényi Entropies are always guaranteed to be less than the Hartley Entropy.
Harmonic Hartley Entropy (a=0) with the basis set all rationals with Tenney height ≤ 10000. Note that the choice of spreading function makes no difference in the end result at all.
a=1: Harmonic Shannon Entropy (Harmonic Entropy)
[math]\displaystyle{ H_1(J|C=c) = -\sum_{j \in J} P(J=j|C=c) \log P(J=j|C=c) }[/math]
This is Paul's original Harmonic Entropy. Within the cryptographic analogy, this can be thought of as an auditory system which simply selects a rational at random from the incoming distribution, weighted via the distribution itself.
Harmonic Shannon Entropy (a=1) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and [math]\displaystyle{ \sqrt{nd} }[/math] complexity.
a=2: Harmonic Collision Entropy
[math]\displaystyle{ H_2(J=j|C=c) = -\log \sum_{j \in J} P(J=j|C=c)^2 = -\log (J_1 = J_2|C=c) }[/math]
where [math]\displaystyle{ J_1 }[/math] and [math]\displaystyle{ J_2 }[/math] are two independent and identically distributed random variables of JI basis ratios, conditioned on the same incoming dyad [math]\displaystyle{ C=c }[/math], and the collision entropy is the same as the negative log of the probability that the two JI variables produce the same outcome.
Harmonic Collision Entropy (a=2) with the basis set all rationals with Tenney height ≤ 10000, spreading function a Gaussian distribution with s=1% (~17 cents), and [math]\displaystyle{ \sqrt{nd} }[/math] complexity.
a=∞: Harmonic Min-Entropy
[math]\displaystyle{ H_\infty(J=j|C=c) = -\log \max_{j \in J} P(J=j|C=c) }[/math]
This is the min-entropy, which simply takes the negative log of the largest probability in the distribution. This can be thought of as representing the "strength" of the incoming dyad from being "deciphered" by a "best-case" auditory system. The name "min-entropy" reflects that the [math]\displaystyle{ a=\infty }[/math] case is guaranteed to be a lower bound among all Rényi entropies.
Harmonic Rényi Entropy with a=7, with the high value of a being chosen to approximate min-entropy (a=∞). The basis set is still all rationals with Tenney height ≤ 10000, the spreading function a Gaussian distribution with s=1% (~17 cents), and the complexity function [math]\displaystyle{ \sqrt{nd} }[/math].
Convolution-Based Expression For Quickly Computing Rényi Entropy
Below is given an derivation that expresses Harmonic Rényi Entropy in terms of two simpler functions, each of which is a convolution product and hence can be computed quickly using the Fast Fourier Transform.
The below derivation depends on the use of complexity-normalization probabilities, although it may be possible to extend to domain-integral probabilities instead.
Preliminaries
The Harmonic Rényi Entropy is defined as
[math]\displaystyle{ \text{HE}_a(c) = H_a(J=j|C=c) = \frac{1}{1-a} \log \sum_{j \in J} P(J=j|C=c)^a }[/math]
As before, we can write [math]\displaystyle{ P(J=j|C=c) }[/math] as follows:
[math]\displaystyle{ P(J=j|C=c) = \frac{\hat P(J=j|C=c)}{\sum_{j \in J} \hat P(J=j|C=c)} }[/math]
where [math]\displaystyle{ \hat P(J=j|C=c) }[/math] is the "unnormalized" probability, and the denominator above is the sum of these unnormalized probabilities, so that all of the [math]\displaystyle{ P(J=j|C=c) }[/math] sum to 1.
To simplify notation, we first rewrite the denominator as a "normalization" function:
[math]\displaystyle{ \psi(c) = \sum_{j \in J} \hat P(J=j|C=c) }[/math]
and putting back into the original equation, we get
[math]\displaystyle{ H_a(J=j|C=c) = \frac{1}{1-a} \log \left( \sum_{j \in J} \left( \frac{\hat P(J=j|C=c)}{\psi(c)} \right)^a \right) }[/math]
Since [math]\displaystyle{ \psi(c) }[/math] is the same for each basis ratio, we can pull it out of the summation to obtain:
[math]\displaystyle{ H_a(J=j|C=c) = \frac{1}{1-a} \log \left( \frac{\sum_{j \in J} \hat P(J=j|C=c)^a}{\psi(c)^a} \right) }[/math]
To simplify notation further, we can also rewrite the numerator, the sum of "raw" (unnormalized) pseudo-probabilities, as a function:
[math]\displaystyle{ \rho_a(c) = \sum_{j \in J} \hat P(J=j|C=c)^a }[/math]
Finally, we put this all together to obtain a simplified version of the Harmonic Rényi Entropy equation:
[math]\displaystyle{ \text{HE}_a(c) = H_a(J=j|C=c) = \frac{1}{1-a} \log \left( \frac{\rho_a(c)}{\psi(c)^a} \right) }[/math]
We thus reduce the term inside the logarithm to the quotient of the functions [math]\displaystyle{ \rho_a(c) }[/math] and [math]\displaystyle{ \psi(c) }[/math]. Our aim is now to express each of these two functions in terms of a convolution product.
Convolution product for [math]\displaystyle{ \psi(c) }[/math]
[math]\displaystyle{ \psi(c) }[/math], the normalization function, is written as follows:
[math]\displaystyle{ \psi(c) = \sum_{j \in J} \hat P(J=j|C=c) }[/math]
Again, [math]\displaystyle{ \hat P(J=j|C=c) }[/math] is defined as follows:
[math]\displaystyle{ \hat P(J=j|C=c) = \frac{S(\cent(j)-c)}{\|j\|} }[/math]
We can rewrite the above equation as a convolution with a delta distribution:
[math]\displaystyle{ \hat P(J=j|C=c) = \left(S \ast \frac{\delta_{-\cent(j)}}{\|j\|}\right)(-c) }[/math]
Putting this back into the original summation, we obtain
[math]\displaystyle{ \psi(c) = \sum_{j \in J} \left(S \ast \frac{\delta_{-\cent(j)}}{\|j\|}\right)(-c) }[/math]
We note that the left factor in the convolution product is always the same [math]\displaystyle{ S(-c) }[/math], which is not dependent on [math]\displaystyle{ j }[/math] in any way. Since convolution distributes over addition, we can factor the [math]\displaystyle{ S }[/math] out of the summation to obtain
[math]\displaystyle{ \psi(c) = \left[S \ast \left(\sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|}\right)\right](-c) }[/math]
We can clean up this notation by defining the auxiliary distribution K:
[math]\displaystyle{ K(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|} }[/math]
Which leaves us with the final expression:
[math]\displaystyle{ \psi(c) = \left[S \ast K\right](-c) }[/math]
Convolution product for [math]\displaystyle{ \rho_a(c) }[/math]
The derivation for [math]\displaystyle{ \rho_a(c) }[/math] proceeds similarly. Recall the function is written as follows:
[math]\displaystyle{ \rho_a(c) = \sum_{j \in J} \hat P(J=j|C=c)^a }[/math]
The expression for each [math]\displaystyle{ \hat P(J=j|C=c)^a }[/math] is:
[math]\displaystyle{ \hat P(J=j|C=c)^a = \frac{S(\cent(j)-c)^a}{\|j\|^a} }[/math]
We can again express this as a convolution, this time of the function [math]\displaystyle{ S^a(-c) }[/math], meaning the spreading function S taken to the a'th power, and a delta distribution:
[math]\displaystyle{ \hat P(J=j|C=c)^a = \left(S^a \ast \frac{\delta_{-\cent(j)}}{\|j\|^a}\right)(-c) }[/math]
Putting this back into the original summation and factoring as before, we obtain
[math]\displaystyle{ \rho_a(c) = \left[S^a \ast \left(\sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|^a}\right)\right](-c) }[/math]
And again we clean up notation by defining the auxiliary distribution
[math]\displaystyle{ K^a(c) = \sum_{j \in J} \frac{\delta_{-\cent(j)}}{\|j\|^a} }[/math]
so that
[math]\displaystyle{ \rho_a(c) = \left[S^a \ast K^a\right](-c) }[/math]
We have now succeeded in representing [math]\displaystyle{ \rho_a(c) }[/math] as a convolution.
Note that the function [math]\displaystyle{ K^a(c) }[/math] involves a slight abuse of notation, as it is not literally [math]\displaystyle{ K(c) }[/math] taken to the [math]\displaystyle{ a }[/math]'th power (as the square of the delta distribution is undefined). Rather, we are simply taking the weights of each delta distribution in the summation to the [math]\displaystyle{ a }[/math]'th power.
Round-up
Taking all of this, we can rewrite the original expression for Harmonic Rényi Entropy as follows:
[math]\displaystyle{ \text{HE}_a(c) = H_a(J=j|C=c) = \frac{1}{1-a} \log \left( \frac{\left[S^a \ast K^a\right](-c)}{\left[S \ast K\right]^a(-c)} \right) }[/math]
where the expression
[math]\displaystyle{ \left[S \ast K\right]^a(-c) }[/math]
represents the convolution of [math]\displaystyle{ S }[/math] and </math>K, taken to the [math]\displaystyle{ a }[/math]'th power, and flipped backwards. Note that if [math]\displaystyle{ S(x) }[/math] is a symmetrical (even) spreading function, and if for each ratio [math]\displaystyle{ n/d }[/math] in [math]\displaystyle{ J }[/math], if the inverse [math]\displaystyle{ d/n }[/math] is also in [math]\displaystyle{ J }[/math], then the above convolution will also be symmetrical, and we also have
[math]\displaystyle{ \left[S \ast K\right]^a(-c) = \left[S \ast K\right]^a(c) }[/math]
We have succeeded in representing Harmonic Rényi Entropy in simple terms of two convolution products, each of which can be computed in [math]\displaystyle{ O(N log N) }[/math] time.
Extending HE to N=∞
All of the models described above involve a finite set of rational numbers, bounded by some complexity function, and where the complexity is less than some max value [math]\displaystyle{ N }[/math].
It so happens that, with one technical caveat, we are generally able to analytically continue this definition to the situation where [math]\displaystyle{ N=\infty }[/math]. More precisely, we are able to analytically continue the exponential of HE, which yields the same relative interval rankings as standard HE.
This enables us to speak cognizantly of the harmonic entropy of an interval as measured against all rational numbers.
Background: Unnormalized Entropy
The caveat is that our derivation only analytically continues the entropy function for the "unnormalized" set of probabilities, which we previously wrote as [math]\displaystyle{ \hat P(J=j|C=c) }[/math]. For this definition to be philosophically perfect, we would want to analytically continue the entropy function for the normalized sense of probabilities, previously written as [math]\displaystyle{ P(J=j|C=c) }[/math].
However, in practice, the "unnormalized entropy" appears to be an extremely good approximation to the normalized entropy for large values of [math]\displaystyle{ N }[/math]. The resulting curve has the same minima and maxima as HE, the same general shape, and for all intents and purposes looks exactly like HE, just shifted on the y-axis.
Here are some examples for different values of [math]\displaystyle{ s }[/math]. All of these are Shannon HE ([math]\displaystyle{ a=1 }[/math]), using [math]\displaystyle{ \sqrt{nd} }[/math] weights, with unreduced rationals (more on this below), with the bound that [math]\displaystyle{ nd < 1000000 }[/math], just with different values of [math]\displaystyle{ s }[/math]. All have been scaled so that the minimum entropy is 0, and the maximum entropy is 1:
As you can see, the unnormalized version is extremely close to a linear function of the normalized one. A similar situation holds for larger values of [math]\displaystyle{ a }[/math]. The Pearson correlation coefficient of "rho" is also given, and is typically very close to 1 - for example, for [math]\displaystyle{ s=1% }[/math], it's equal to 0.99922. The correlation also seems to get better with increasing values of [math]\displaystyle{ N }[/math], such that the correlation for N=1,000,000 (shown above) is much better than the one for N=10,000 (not pictured).
In the above examples, note that there are slightly adjusted values of [math]\displaystyle{ s }[/math] (usually by less than a cent) between the normalized and unnormalized comparisons for each plot. For example, in the plot for [math]\displaystyle{ s=1% }[/math], corresponding to 17.2264 cents, we compare to a slightly adjusted UHE of 16.4764 cents. This is because, empirically, sometimes a very slight adjustment corresponds to a better correlation coefficient, suggesting that the UHE may be equivalent to the HE with a miniscule adjustment in the value of [math]\displaystyle{ s }[/math].
It would be nice to show the exact relationship of unnormalized entropy to the normalized entropy in the limit of large [math]\displaystyle{ N }[/math], and whether the two converge to be exactly equal (perhaps given some miniscule adjustment in [math]\displaystyle{ s }[/math] or [math]\displaystyle{ a }[/math]). However, we will leave this for future research, as well as the question of how to do an exact derivation of normalized HE.
For now, we will start with a derivation of the unnormalized entropy for [math]\displaystyle{ N=\infty }[/math], as an interesting function worthy of study in its own right - not only because it looks exactly like HE, but because it leads to an expression for unnormalized HE in terms of the Riemann Zeta function.
Derivation
For now, our derivation is limited to the case of [math]\displaystyle{ \sqrt{nd} }[/math] Tenney-weighted rationals, although it may be possible to derive a similar result for [math]\displaystyle{ \max(n,d) }[/math] weighting as well.
Analytic Continuation of the Convolution Kernel
Let's start by recalling the convolution-based expression for Harmonic Rényi Entropy, using complexity-normalization probabilities:
[math]\displaystyle{ \text{HE}_a(c) = H_a(J=j|C=c) = \frac{1}{1-a} \log \left( \frac{\left[S^a \ast K^a\right](-c)}{\left[S \ast K\right]^a(-c)} \right) }[/math]
To recall again, [math]\displaystyle{ S }[/math] is our spreading function, and [math]\displaystyle{ K }[/math] is our convolution kernel as defined above.
The definition for [math]\displaystyle{ K }[/math] is:
[math]\displaystyle{ \displaystyle K(c) = \sum_{j \in J} \dfrac{\delta_{-\cent(j)}}{\|j\|} }[/math]
where [math]\displaystyle{ \|j\| }[/math] represents the "complexity" of the JI basis ratio [math]\displaystyle{ j }[/math]. In the particular case of Tenney weighting, we get:
[math]\displaystyle{ \displaystyle K(c) = \sum_{j \in J} \dfrac{\delta_{-\cent(j)}}{\sqrt{j_n \cdot j_d}} }[/math]
where [math]\displaystyle{ j_n }[/math] and [math]\displaystyle{ j_d }[/math] are the numerator and denominator of [math]\displaystyle{ j }[/math], respectively.
Although it may seem odd, we can take the Fourier transform of the above to obtain the following expression:
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(c)\right\}(\omega) = \sum_{j \in J} \dfrac{e^{i \omega \cent(j)}}{\sqrt{j_n \cdot j_d}} }[/math]
Furthermore, for simplicity, we can change the units, so that rather than the argument being given in cents, it is given in "natural" units of "nepers", a technique often used by Martin Gough in his work on Logarithmic approximants. The representation of any interval in nepers is given by simply taking is natural logarithm. Doing so, by defining the change of variables [math]\displaystyle{ c = \frac{1200}{\log(2)}n }[/math], we obtain
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(n)\right\}(\omega) = \sum_{j \in J} \dfrac{e^{i \omega \log (j_n/j_d)}}{\sqrt{j_n \cdot j_d}} }[/math]
We can treat the presence of the logarithm within the exponential function as changing the base of the exponential, so that we get
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(n)\right\}(\omega) = \sum_{j \in J} \dfrac{(j_n/j_d)^{i \omega}}{\sqrt{j_n \cdot j_d}} }[/math]
We can also factor each term in the summation to obtain
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(n)\right\}(\omega) = \sum_{j \in J} \left[ \dfrac{{j_n}^{i \omega}}{\sqrt{j_n}} \cdot \dfrac{{j_d}^{-i \omega}}{\sqrt{j_d}} \right] }[/math]
which we can rewrite as
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(n)\right\}(\omega) = \sum_{j \in J} \left[ \dfrac{1}{{j_n}^{0.5 -i \omega}} \cdot \dfrac{1}{{j_d}^{0.5 + i \omega}} \right] }[/math]
Now, we note our summation is currently written simply as [math]\displaystyle{ \sum_{j \in J} }[/math]. For a Tenney height complexity, we typically bound by [math]\displaystyle{ \sqrt{nd} < N }[/math] for some [math]\displaystyle{ N }[/math]. However, although it is unusual, for the sake of simplifying the derivation, we will bound by [math]\displaystyle{ \max(n,d) < N }[/math] instead, despite the use of Tenney height for complexity. This will not end up being much of a problem, as the two will converge on the same result anyway.
Bounding by [math]\displaystyle{ \max(n,d) < N }[/math] is the same as specifying that [math]\displaystyle{ j_n < N }[/math] and [math]\displaystyle{ j_d < N }[/math]. Doing so, we get
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(n)\right\}(\omega) = \sum_{1<j_n, j_d<N} \left[ \dfrac{1}{{j_n}^{0.5 -i \omega}} \cdot \dfrac{1}{{j_d}^{0.5 + i \omega}} \right] }[/math]
We can now factor the above product to obtain:
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(n)\right\}(\omega) = \left[ \sum_{j_n=1}^N \dfrac{1}{{j_n}^{0.5 -i \omega}} \right] \cdot \left[ \sum_{j_d=1}^N\dfrac{1}{{j_d}^{0.5 + i \omega}} \right] }[/math]
Now, we can see that as [math]\displaystyle{ N \to \infty }[/math] above, the summations do not converge. However, incredibly enough, each of the above expressions has a very well-known analytic continuation, which is the Riemann zeta function.
To perform the analytic continuation, we temporarily change the [math]\displaystyle{ 0.5 }[/math] in the denominator to some value [math]\displaystyle{ \sigma> 1 }[/math]. This is equivalent to changing our original [math]\displaystyle{ \sqrt{nd} }[/math] weighting to some other exponent, such as [math]\displaystyle{ (nd)^2 }[/math] or [math]\displaystyle{ (nd)^{1.5} }[/math]. Doing this causes both of the summations above to converge, so that we obtain
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(n)\right\}(\omega) = \left[ \sum_{j_n=1}^\infty \dfrac{1}{{j_n}^{\sigma -i \omega}} \right] \cdot \left[ \sum_{j_d=1}^\infty\dfrac{1}{{j_d}^{\sigma + i \omega}} \right] }[/math]
It has been well known for more than a century that both of these summations converge to the Riemann zeta function, so that we get
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(n)\right\}(\omega) = \zeta(\sigma-i\omega) \cdot \zeta(\sigma+i\omega) }[/math]
Rewriting as a function of a complex variable [math]\displaystyle{ s = \sigma + i\omega }[/math], and noting that the zeta function obeys the property that [math]\displaystyle{ \zeta(\overline s)=\overline{\zeta(s)} }[/math], where [math]\displaystyle{ \overline s }[/math] represents complex conjugation, we get
[math]\displaystyle{ \displaystyle \mathcal{F}\left\{K(n)\right\}(\omega) = \overline{\zeta(s)} \cdot \zeta(s) = |\zeta(s)|^2 }[/math]
And we have now obtained a very interesting result: if we had instead gone with something like [math]\displaystyle{ (nd)^2 }[/math] complexity on rationals, rather than [math]\displaystyle{ \sqrt{nd} }[/math], that our HE setup would have converged as [math]\displaystyle{ N \to \infty }[/math], and our original HE convolution kernel would have been the Fourier transform of a particular vertical "slice" of the Riemann zeta function where [math]\displaystyle{ \Re(s) = 2 }[/math].
Furthermore, although the above series doesn't converge for [math]\displaystyle{ \sigma = 0.5 }[/math], we can simply use the analytic continuation of the Riemann zeta function to obtain a meaningful function at that point, so that our original convolution kernel can be written as
[math]\displaystyle{ K(n) = \mathcal{F}^{-1}\left\{| \zeta(0.5+\omega) |^2\right\}(n) }[/math]
which is the inverse Fourier transform of the squared absolute value of the Riemann zeta function, taken at the critical line.
Lastly, to do some cleanup, we previously went with [math]\displaystyle{ \max(n,d)<N }[/math] bounds, rather than [math]\displaystyle{ \sqrt{nd}<N }[/math] bounds, despite using [math]\displaystyle{ \sqrt{nd} }[/math] weighting on ratios. However, it is easy to show above that regardless of which bounds you use, both series converge to the same function when [math]\displaystyle{ \sigma > 1 }[/math] in the limit as [math]\displaystyle{ N > \infty }[/math]. Since these series agree on this right half-plane of the Riemann zeta function, they share the same analytic continuation, so that we get the same result, despite using our technique to simplify the derivation.
It is likewise easy to show that the function [math]\displaystyle{ K^a(n) }[/math], taken from the numerator of our original Harmonic Rényi Entropy convolution expression, can be expressed as
[math]\displaystyle{ K^a(n) = \mathcal{F}^{-1}\left\{|\zeta(0.5a+\omega) |^2\right\}(n) }[/math]
so that the choice of [math]\displaystyle{ a }[/math] simply changes our choice of vertical slice of the Riemann zeta function.
Analytic Continuation of Harmonic Rényi Entropy
We can put this back into the original equation for Harmonic Rényi Entropy. To do so, we will continue with our change of units from cents to nepers, corresponding to a change of our variable from [math]\displaystyle{ c }[/math] to [math]\displaystyle{ n }[/math]. Doing so, we obtain
[math]\displaystyle{ \displaystyle \text{HE}_a(n) = \frac{1}{1-a} \log \left( \frac{\left[S^a \ast \mathcal{F}^{-1}\left\{|\zeta(0.5a+\omega) |^2\right\}\right](-n)}{\left[S \ast \mathcal{F}^{-1}\left\{|\zeta(0.5+\omega) |^2\right\}\right]^a(-n)} \right) }[/math]
Note that above, we assume the spreading probability distribution [math]\displaystyle{ S }[/math] has likewise been scaled to reflect the new choice of units. If [math]\displaystyle{ S }[/math] is a Gaussian, this means we assume the standard deviation has been scaled accordingly.
We can simplify the expression of the above if we likewise take the Fourier transform of [math]\displaystyle{ S }[/math]. If we do, we obtain the characteristic function of the distribution, which is typically denoted by [math]\displaystyle{ \phi(\omega) }[/math]. We will use the following definitions:
[math]\displaystyle{ \phi(\omega) = \mathcal{F}\left\{S(n)\right\}(\omega) }[/math]
[math]\displaystyle{ \phi_a(\omega) = \mathcal{F}\left\{S(n)^a\right\}(\omega) }[/math]
Doing so, and noting that convolution becomes multiplication in the Fourier domain, we get
[math]\displaystyle{ \displaystyle \text{HE}_a(n) = \frac{1}{1-a} \log \left( \frac{\mathcal{F}^{-1}\left\{\phi_a(\omega) \cdot |\zeta(0.5a+\omega) |^2\right\}(-n)}{\mathcal{F}^{-1}\left\{\phi(\omega) \cdot |\zeta(0.5+\omega) |^2\right\}^a(-n)} \right) }[/math]
We can write this more neatly by using the notation [math]\displaystyle{ \zeta_\sigma(\omega) = \zeta(\sigma+i\omega) }[/math], and omitting the argument of [math]\displaystyle{ \omega }[/math] from the inside of the Fourier transform notation. Doing so, we get
[math]\displaystyle{ \displaystyle \text{HE}_a(n) = \frac{1}{1-a} \log \left( \frac{\mathcal{F}^{-1}\left\{\phi_a \cdot |\zeta_{0.5a} |^2\right\}(-n)}{\mathcal{F}^{-1}\left\{\phi \cdot |\zeta_{0.5} |^2\right\}^a(-n)} \right) }[/math]
And lastly, we note that for any function [math]\displaystyle{ f(x) }[/math], we have [math]\displaystyle{ \mathcal{F}\left\{f(-x)\right\} = \mathcal{F}\left\{\overline f(x) \right\} = \mathcal{F}\left\{\overline f\right\} }[/math]. Putting that all together, we get
[math]\displaystyle{ \displaystyle \text{HE}_a(n) = \frac{1}{1-a} \log \left( \dfrac{\mathcal{F}^{-1}\left\{\overline {\phi_a} \cdot |\zeta_{0.5a} |^2\right\}}{\mathcal{F}^{-1}\left\{\overline {\phi} \cdot |\zeta_{0.5} |^2\right\}^a} \right) }[/math]
Unnormalized Entropy
Suppose that rather than looking at the Harmonic Rényi Entropy directly, we instead look at the exponential of [math]\displaystyle{ \frac{a}{1-a} }[/math] times the HRE. If we do so, since [math]\displaystyle{ a }[/math] is a constant, and the exponential function is strictly monotonic, we will obtain a new function that has the same exact minima, maxima, and relative ranking of intervals as the original HRE, and hence which is completely equivalent in characterising their relative discordance.
Doing so, we get
[math]\displaystyle{ \exp(\frac{a}{1-a} \text{HE}_a(n)) = \dfrac{\mathcal{F}^{-1}\left\{\overline {\phi_a} \cdot |\zeta_{0.5a} |^2\right\}^a}{\mathcal{F}^{-1}\left\{\overline {\phi} \cdot |\zeta_{0.5} |^2\right\}} }[/math]
so that we now have a much simpler expression which is simply the quotient of two convolutions, or equivalently the quotient of two products in the Fourier domain, followed by an inverse Fourier transform.
Now, if we use the usual definition of HRE, and naively increase the value of [math]\displaystyle{ N }[/math], the curve gets higher and higher. Since the above function is just the exponential of HRE times a constant, the same will happen to it. The same phenomenon happens to both the numerator and denominator of the above expression, which simply increase without bound.
However, surprisingly, once we replace either of these expressions with their analytically continued counterpart, the curves suddenly "jump" so that they are roughly centered around the x-axis. This is a problem for the above expression, because the denominator -- which was supposed to be a "normalization" term -- now takes positive and negative values, and in particular has zeros. As a result, the zeros in the denominator will create poles in the resulting function. Hence, this no longer serves as a useful normalization term.
TO DO: add a picture of this
While a deeper exploration of this will be left to future research, for now we will simply drop the normalization term and look at the unnormalized entropy, or in this case, the exponentiated version from before. We will denote this by [math]\displaystyle{ \text{UHE}_a(n) }[/math].
Doing so, we get
[math]\displaystyle{ \exp(\frac{a}{1-a} \text{UHE}_a(n)) = \mathcal{F}^{-1}\left\{\overline {\phi_a} \cdot |\zeta_{0.5a} |^2\right\}^a }[/math]
Taking both sides to the power of [math]\displaystyle{ 1/a }[/math], we obtain the following particularly simple expression:
[math]\displaystyle{ \exp((1-a) \text{UHE}_a(n)) = \mathcal{F}^{-1}\left\{\overline {\phi_a} \cdot |\zeta_{0.5a} |^2\right\} }[/math]
so that we can see our exponentiated UHE is simply the inverse Fourier transform of the complex conjugate of our spreading distribution's characteristic function, times the squared absolute value of the Riemann zeta function. (Note that if the spreading distribution is symmetric, as in the case of a Gaussian, the characteristic function is purely real, so that the complex conjugate is redundant.)
As a last touch, we note that our dropping of the normalization term has flipped the function upside down, so that higher values show concordance and lower values show discordance!
TO DO: show picture
We can restore the usual orientation by simply taking the negative of the above function, so that we get:
[math]\displaystyle{ -\exp((1-a) \text{UHE}_a(n)) = -\mathcal{F}^{-1}\left\{\overline {\phi_a} \cdot |\zeta_{0.5a} |^2\right\} }[/math]
TO DO: show picture
and now we have a function which, despite our last-minute fudging of unnormalized probabilities above, would appear to resemble the usual Harmonic Entropy curve in every sense, with the same minima, maxima, general contour, and so on, just shifted so that it is roughly centered on the x-axis.
Round-up
As a result of this, we were able to show that Harmonic Entropy is closely related to the Riemann zeta function.
Furthermore, if unnormalized probabilities are used, the exponential of HE times a constant is the Fourier transform of the zeta function, times the characteristic function of the spreading distribution.
This is valid to the extent that the unnormalized entropy behaves similarly to the normalized entropy in the limit of large [math]\displaystyle{ N }[/math], up to an additive shift and a multiplicative scaling (in this case, flipping the curve upside down). Empirically, this would certainly appear to be the case, as the resulting curve appears to resemble the usual HE curve in every possible way, having roughly the same minima, maxima, and general shape. However, it would be nice to show this formally, which we leave open for future work.
Examples
TO DO: add examples for different values of s
Special Properties for Generalized Normal Distributions
TO DO: this
Fast Computation
TO DO: this
References
Harmonic entropy (TonalSoft encyclopedia)








