User:Frostburn/Theory From First Principles

Just using Xen Wiki as a notepad, don't mind me.

I'm currently working on the grammar for Scale Workshop 3. It will naturally include monzos so I'm writing stuff down as a thinking aid. I already implemented vals because of these notes.

Time Domain

We begin our journey in the time domain where one second (1 s) passes for every 9192631770 oscillations of the radiation emited by caesium 133 during the unperturbed ground-state hyperfine transition.

Frequency Domain

We invert time to arrive in the frequency domain where oscillations are measured in repetitions per second i.e. Hertz (Hz = s^-1).

Scalar Domain

Frequencies are scalar multiples of each other and especially the positive rational scalars are of particular interest in music such as the interval between 300Hz and 200Hz, the just perfect fifth.

Pitch Domain

When we take the logarithm of a positive rational scalar its factors separate into a sum e.g. [math]\displaystyle{ \log(15/8) = \log(3) + \log(5) - 3\log(2) }[/math].

Absolute Pitch Domain

The way the logarithm converts multiplicative quantities into linear representation motivates the designation of intermediary domain where the units are kept intact inside the logarithm e.g.

[math]\displaystyle{ \log(440 Hz) = \log(Hz \cdot 2^3 \cdot 5 \cdot 11) = \log(Hz) + 3 \log(2) + \log(5) + \log(11) . }[/math]

To make use of this representation we would like to add quantities to each other so let's see how that works:

[math]\displaystyle{ \log(361 Hz) + \log(529 Hz) = \log(Hz) + \log(19^2) + \log(Hz) + \log(23^2) = 2 \log(Hz) + 2 \log(19) + 2 \log(23) = 2 \log(437 Hz) , }[/math]

which looks perfectly reasonable besides that extra factor of 2 in the front. We choose to ignore it and interprete the sum of absolute pitch quantities as their geometric mean.

In the non-logarithmic domain this simply means that we have to keep track of the exponent of Hz and take the appropriate root. This has the added benefit of unifying the time and frequency domains. e.g. 1 millisecond is simply identified as the period of oscillation and corresponds to a frequency of 1 kilohertz.

Adding Geometry

By the fundamental theorem of arithmetic logarithms of primes are linearly independent over [math]\displaystyle{ \mathbb{Q} }[/math], so we can interprete [math]\displaystyle{ \log(2), \log(3), \ldots }[/math] as basis vectors. We write [math]\displaystyle{ e_p }[/math] in place of [math]\displaystyle{ \log(p) }[/math] and enforce orthogonality

[math]\displaystyle{ e_p \cdot e_q = 0, p \ne q, p, q \in \mathbb{P} }[/math]

To make things slightly more formal we define the right-facing arrow function

[math]\displaystyle{ \overrightarrow{2^x 3^y 5^z \ldots} \mapsto x e_2 + y e_3 + z e_5 + \ldots, x, y, z, \ldots \in \mathbb{Q} }[/math]

which takes objects from the scalar domain to the geometric pitch domain.

We denote the inverse of the arrow function with [math]\displaystyle{ \mathrm{ratio} }[/math] i.e. it turns prime count vectors into ratios:

[math]\displaystyle{ \mathrm{ratio}(\overrightarrow{p/q}) = p/q }[/math]

Pitch is measured in cents (¢) which we define to be the vector quantity [math]\displaystyle{ ¢ := e_2 / 1200 }[/math] i.e. [math]\displaystyle{ \mathrm{ratio}(¢) = 2^{\frac{1}{1200}} \approx 1.0005777895 }[/math] .

We also define the backslash function [math]\displaystyle{ \backslash d \mapsto e_2 / d }[/math] .

In combination with implicit scalar multiplication the similarity with Scale Workshop's N-of-EDO notation is unmistakable e.g. [math]\displaystyle{ 7 \backslash 12 = 700 ¢ }[/math] .

Absolute pitch can be incorporated when we introduce a basis vector [math]\displaystyle{ e_0 }[/math] and associate it with logarithmic frequency

[math]\displaystyle{ \overrightarrow{Hz} \mapsto e_0 }[/math]

It represents a single point of origin in projective geometry (remember how we chose to ignore that extra scalar in front when summing absolute pitches).

In practice it's handy to have a reference frequency that humans can hear so we might use the shifted origin

[math]\displaystyle{ \tilde{e_0} := \overrightarrow{440Hz} = e_0 + 3 e_2 + e_5 + e_{11} }[/math].

When working with absolute pitch, the inverse of the arrow function is called [math]\displaystyle{ \mathrm{freq} }[/math] instead of [math]\displaystyle{ \mathrm{ratio} }[/math]

[math]\displaystyle{ \mathrm{freq}(\overrightarrow{\frac{p}{q} Hz}) = \frac{p}{q} Hz }[/math] .

Care must be taken when the multiplier of the projective origin is not 1. e.g. Let's consider 1.5 Hz which is a perfect fifth above 1 Hz. It's represented as [math]\displaystyle{ e_0 - e_2 + e_3 }[/math].

If some other calculation gave us the result [math]\displaystyle{ 2 e_0 - e_2 + e_3 }[/math] it wouldn't represent 1.5Hz it's

[math]\displaystyle{ \mathrm{freq}(2 e_0 - e_2 + e_3) = 1.5 Hz^2 \sim \sqrt{\frac{3}{2}} Hz \approx 1.22 Hz }[/math]

instead.

Expanding geometry

It's instructive to construct [math]\displaystyle{ \mathrm{ratio} }[/math] more explicitly by considering the inverses of the basis vectors

[math]\displaystyle{ \begin{align} e^p :&= e_p ^ {-1},\\ e_p \cdot e^p &= 1 \end{align} }[/math]

Note that we're leaving the metric unspecified i.e. what [math]\displaystyle{ e_p \cdot e_p }[/math] equals. For the structure of the space we're constructing a positive metric suffices i.e. [math]\displaystyle{ e_p \cdot e_p = w_p^2 > 0, w_p \in \mathbb{R} }[/math]. We now have:

[math]\displaystyle{ \mathrm{ratio}(x) = \prod_{p \in \mathbb{P}} p^{x \cdot e^p} }[/math]

i.e. the superscript vectors act as measuring sticks telling us how much of each prime there is in an arbitrary vector.

Let's coin a new unit called jorp (think Europe) [math]\displaystyle{ € = ¢^{-1} = 1200 \cdot \overrightarrow{2}^{-1} }[/math] which measures intervals with a resolution of 1200 ticks per octave.

A conceptually important measuring stick is the one constructed from the logarithms of the primes:

[math]\displaystyle{ \mathsf{JIP} = \sum_{p \in \mathbb{P}} e^p \log p }[/math]

I say conceptually because this just intonation point violates linear independence over [math]\displaystyle{ \mathbb{Q} }[/math] and doesn't strictly belong in the space we're constructing. Anyway:

[math]\displaystyle{ \begin{align} \mathsf{JIP} \cdot x &= \log(\mathrm{ratio}(x))\\ \mathrm{ratio}(x) &= \exp(\mathsf{JIP} \cdot x). \end{align} }[/math]

However rational approximations to the [math]\displaystyle{ \mathsf{JIP} }[/math] do belong in our space. Equal temperaments can be represented by such approximations called generalized patent vals which we define as

[math]\displaystyle{ \mathrm{gpv}(n; a, b, \ldots, z) := n \overrightarrow{a}^{-1} + \lfloor n \log_a(b) + \frac{1}{2}\rfloor \overrightarrow{b}^{-1} + \ldots + \lfloor n \log_a(z) + \frac{1}{2}\rfloor \overrightarrow{z}^{-1} }[/math] ,

which belong to a class of vals complementary to monzos.

Usually the basis is obvious from context e.g. [math]\displaystyle{ a = 2, b = 3, c = 5 }[/math]. In these cases we use a left-facing arrow e.g.

[math]\displaystyle{ \overleftarrow{12} := \mathrm{gpv}(12; 2, 3, 5) = 12 e^2 + 19 e^3 + 28 e^5 =: \langle 12, 19, 28 \rbrack }[/math]

We can use these new objects to calculate how many steps of 12edo a tempered interval spans e.g.

[math]\displaystyle{ \overleftarrow{12} \cdot \overrightarrow{15/8} = \langle 12, 19, 28 \vert -3, 1, 1 \rangle = 11 }[/math]

The actual pitch is obtained by sandwiching the interval between the val and the step size:

[math]\displaystyle{ \overleftarrow{12} \cdot \overrightarrow{15/8} \backslash 12 = 1100 ¢ }[/math] .

The geometric inverses are mainly relevant for subgroup temperaments. Consider Barbados:

[math]\displaystyle{ \overleftarrow{5} := \mathrm{gpv}(5; 2, 3, 13/5) = 5 \cdot \overrightarrow{2}^{-1} + 8 \cdot \overrightarrow{3}^{-1} + 7 \cdot \overrightarrow{13/5}^{-1} = 5 e^2 + 8 e^3 - \frac{7}{2}e^5 + \frac{7}{2}e^{13} }[/math]

We can verify that the comma 676/675 indeed vanishes using this val:

[math]\displaystyle{ \overleftarrow{5} \cdot \overrightarrow{676/675} = \langle 5, 8, -\frac{7}{2}, 0, 0, \frac{7}{2} \vert 2, -3, -2, 0, 0, 2 \rangle = 0 }[/math]

Note that above I've implicitly used a convenient metric to carry out the calculations, which is fine due to the new basis still being orhogonal. Explicitly we'd have [math]\displaystyle{ \overrightarrow{13/5}^{-1} = (w_{13} \hat{m} - w_5 \hat{k}) / (w_5^2 + w_{13}^2) }[/math], where the weights decide how much 13/5 "leans" towards 5/1 or 13/1.

We make the projective origin [math]\displaystyle{ e_0 }[/math] non-invertible by enforcing a null metric weight [math]\displaystyle{ e_0 \cdot e_0 = 0 }[/math] which can be handy in some calculations and often does the right thing.

Most of the theory developed here deals with relative pitch. It's always possible to ground a result on an origin e.g.

[math]\displaystyle{ \mathrm{freq}(\tilde{e_0} + \overleftarrow{12} \cdot \overrightarrow{15/8} \backslash 12) = \mathrm{freq}(e_0 + \frac{47}{12} e_2 + e_5 + e_{11}) \approx 830.6 Hz }[/math]

On units

Scalars do not have units. That's what makes them scalars. Do relative pitches have units? Maybe they're like radians, unitless but it makes no sense to add them to other kinds of objects. Whatever the case may be, prime count vectors (i.e. monzos) have inverse units to vals. This is enough to distinguish them during SW3 runtime and prevent vals from being interpreted as pitch or turned into frequencies.

Taking these considerations more seriously and remembering that cents are a vector quantity we can try to figure out what units vals such as the jorp (€) have: One cent is one hundreth of a semitone and one octave consists of twelve of these semitones. All vector quantities. Let's call the dimensioneless version of a semitone a demitone. To re-iterate: A cent is 1/100 demitones in the direction of [math]\displaystyle{ e_2 }[/math]. Let's call [math]\displaystyle{ \hat{i} }[/math] the direction of [math]\displaystyle{ e_2 }[/math] i.e. [math]\displaystyle{ e_2 = w_2 \hat{i} = 12 d \hat{i} }[/math], where [math]\displaystyle{ d }[/math] is the metric weight of a demitone. The basis vector itself has unit metric [math]\displaystyle{ \hat{i} \cdot \hat{i} = 1 }[/math].

A reciprocal cent satisfies [math]\displaystyle{ ¢^{-1} \cdot ¢ = 1 }[/math] so as per the usual definition of the geometric inverse of a vector we have [math]\displaystyle{ ¢^{-1} = ¢ / (¢ \cdot ¢) = \frac{1}{1200}e_2 / (\frac{1}{1200}^2 e_2 \cdot e_2) = 1200 w_2 \hat{i} / (w_2^2 \hat{i} \cdot \hat{i}) = \frac{1200}{w_2}\hat{i} }[/math].

As we have [math]\displaystyle{ e^2 = e_2 / w_2^2 = \hat{i} / w_2 }[/math]; A reciprocal cent can now be expressed as [math]\displaystyle{ ¢^{-1} = 1200 e^2 = 100 d^{-1} \hat{i} }[/math] or 100 reciprocal demitones in the [math]\displaystyle{ \hat{i} }[/math] direction.

To be perfectly clear about the nature of these objects: There is no "cent" associated with primes besides 2. The closest we can get is:

[math]\displaystyle{ \overrightarrow{3} \cdot \mathsf{JIP} / \log(2) * 1200 ¢ \approx 1901.955 ¢ }[/math]

Where we had to a) violate linear indepence over [math]\displaystyle{ \mathbb{Q} }[/math] b) remove "twoness" by dividing by the logarithm c) explicitly add the units to "twist" the answer into the requisitioned direction.

Clifford algebra nonsense

Both 12edo and 7edo temper out the syntonic comma: [math]\displaystyle{ \overleftarrow{12} \cdot \overrightarrow{81/80} = 0 = \overleftarrow{7} \cdot \overrightarrow{81/80} }[/math] .

Therefore so does any linear combination of them e. g. [math]\displaystyle{ 2 \cdot \overleftarrow{12} + \overleftarrow{7} = \overleftarrow{31} \implies \overleftarrow{31} \cdot \overrightarrow{81/80} = 0 }[/math]

We can identify the plane spanned by [math]\displaystyle{ \overleftarrow{12} }[/math] and [math]\displaystyle{ \overleftarrow{7} }[/math] as the (5-limit) Meantone temperament. We can use wedges to represent it symbolically:

[math]\displaystyle{ \overleftarrow{12} \wedge \overleftarrow{7} = -4 e^3 \wedge e^5 + 4 e^5 \wedge e^2 - e^2 \wedge e^3 }[/math] ,

where the components are basis planes. E.g. [math]\displaystyle{ e^3 \wedge e^5 }[/math] is the plane where octaves are tempered out. The wedge of any vector with itself is zero i.e. you can't span a plane with only one direction. The wedge product is also antisymmetric and the planes come with signed weights but we mostly care about the orientation they represent. Note that [math]\displaystyle{ e^2 }[/math] represents the "line" where tritaves and pentaves are tempered out.

The largest possible wedge combines all of the basis vectors and represents just intonation i.e. no tempering whatsoever: [math]\displaystyle{ e^2 \wedge e^3 \wedge e^5 }[/math].

We can define the dual operator:

[math]\displaystyle{ \begin{align} \overline{e_2} &= e^3 \wedge e^5 \\ \overline{e_3} &= e^5 \wedge e^2 \\ \overline{e_5} &= e^2 \wedge e^3 , \\ \end{align} }[/math]

which is is extended linearly to all vectors. It is obvious from the definitions that the Meantone temperament is represented by [math]\displaystyle{ \overline{\overrightarrow{81/80}} }[/math].

We define the vee operator [math]\displaystyle{ a \vee b := \overline{\overline{b} \wedge \overline{a}} }[/math], where the inner overlines are the inverses of the dual (always obvious from context).

Wedges combine vals into temperaments that are closer to just intonation while vees do progressive damage to just intonation. As an exercise let's calculate the damage done by combining Meantone with Augmented:

[math]\displaystyle{ \begin{align} & \overline{\overrightarrow{81/80}} \vee \overline{\overrightarrow{128/125}} \\ & = \overline{-4 e_2 + 4 e_3 - e_5 } \vee \overline{7 e_2 - 3 e_5} \\ & = \overline{(7 e_2 - 3 e_5) \wedge (-4 e_2 + 4 e_3 - e_5)} \\ & = \overline{28 e_2 \wedge e_3 - 7 e_2 \wedge e_5 + 12 e_5 \wedge e_2 - 12 e_5 \wedge e_3} \\ & = \overline{12 e_3 \wedge e_5 + 19 e_5 \wedge e_2 + 28 e_2 \wedge e_3} \\ & = 12 e^2 + 19 e^3 + 28 e^5 \\ & = \overleftarrow{12} \end{align} }[/math]

Neat!

Exterior algebras do have a sense of lying-in-the-plane-of but we need a metric to do projection and tuning which we already touched upon in the units section. As far as data structures go, full Clifford algebras are memory-hungry. Not worth the complication in a general purpose tool like Scale Workshop.

To explore our space a bit longer we can consider [math]\displaystyle{ e_2 }[/math] as an interval class consiting of all multiples of the octave. After all we have [math]\displaystyle{ \overrightarrow{8} = 3 \cdot \overrightarrow{2} = 3 e_2 }[/math]. Wedging gives us larger classes. [math]\displaystyle{ e_2 \wedge e_3 }[/math] can be seen as representing Pythagorean just intonation. Unfortunately the wedges are not accurate enough to distinguish subflavors [math]\displaystyle{ \overrightarrow{2} \wedge \overrightarrow{9} = 2 \cdot \overrightarrow{2} \wedge \overrightarrow{3} = \overrightarrow{4} \wedge \overrightarrow{3} }[/math] but at least the "orientation" is correct. Previously we used the term "just intonation" to refer to lack of tempering but it's also used to refer to an interval class such as 5-limit just intonation representable by [math]\displaystyle{ e_2 \wedge e_3 \wedge e_5 }[/math].

One interesting pseudo-object in our space (let's call it [math]\displaystyle{ \mathbb{S} }[/math]) is the unison plane:

[math]\displaystyle{ \{\mathsf{JIP} \cdot x = 0, x \in \mathbb{S}\} = \{\overrightarrow{1}\} }[/math]

which strictly speaking consists of only a single element, namely the unison 1/1. However all commas lie close to this plane so in 3D subspaces of [math]\displaystyle{ \mathbb{S} }[/math] it's informative to face the unison plane head-on to get a 2D view of all commas of interest. The null-space of all reasonable temperaments (i.e. the set of all commas and their products that vanish in the temperament) are also closely aligned with this plane.