User:Sintel/Generator optimization: Difference between revisions

Line 1:

Some derivations on optimal generators for 2-norm (Euclidean) tunings.

In what follows, <math>M</math> is an <math>n \times k</math> integer matrix that represents the temperament, where <math>n</math> is the rank of the temperament and <math>k</math> is the rank of the just intonation subgroup. <math>g</math> is the (unknown) row vector of generators with length <math>n</math>, and <math>j</math> is a row vector of length <math>k</math> containing the log-primes (aka the JIP).

In what follows, <math>M</math> is an <math>n \times k</math> integer matrix that represents the temperament, where <math>n</math> is the rank of the temperament and <math>k</math> is the rank of the just intonation subgroup.

<math>g</math> is the (unknown) row vector of generators with length <math>n</math>, and <math>j</math> is a row vector of length <math>k</math> containing the log-primes (aka the JIP).

The goal is now to find <math>g</math> such that the temperament approximates JI:

:<math>

$$

gM \approx j

</math>

$$

== Solving the system ==

The simplest case involves solving a linear system. Since there are <math>n</math> unknowns, we can introduce <math>n</math> constraints, each forcing an interval of choice to be just.

The simplest case involves solving a linear system.

Since there are <math>n</math> unknowns, we can introduce <math>n</math> constraints, each forcing an interval of choice to be just.

Let <math>V</math> be a <math>k \times n</math> matrix, whose columns are the prime count vectors of the intervals we want to be just. We now want to solve:

$$

:<math>

gMV = jV

$$

</math>

We can recognize the LHS as giving the size of the tempered intervals in <math>V</math>, and the RHS as the size of the just intervals. Since <math>MV</math> is square (<math>n \times n</math>), we can simply invert it to obtain a solution:

We can recognize the LHS as giving the size of the tempered intervals in <math>V</math>, and the RHS as the size of the just intervals.

$$

Since <math>MV</math> is square (<math>n \times n</math>), we can simply invert it to obtain a solution:

g = jV(MV)^{-1}

:<math>

$$

g = jV(MV)^{-1}

</math>

== Least squares solution ==

Line 25:

Line 27:

However, we can instead obtain an approximate solution that minimizes the squared error.

If <math>V</math> is now a <math>k \times m</math> matrix (where <math>m \gt n</math>), then the error for these intervals is:

$$

:<math>

gMV - jV

$$

</math>

Using <math>~~\left~~\| \cdot ~~\right~~\|</math> to denote the usual Euclidean norm, this becomes a standard ordinary least squares problem:

$$

Using <math>\| \cdot \|</math> to denote the usual Euclidean norm, this becomes a standard ordinary least squares problem:

\underset{g}{\text{minimize}} \ \| gMV - jV \|^2

:<math>

$$

\underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|^2

Note that since we are working with row vectors, the norm is defined such that <math>~~\left~~\| x ~~\right~~\|^2 = xx^{\mathsf T}</math>.

</math>

Note that since we are working with row vectors, the norm is defined such that <math>\| x \|^2 = xx^{\mathsf T}</math>.

Differentiating with respect to <math>g</math>:

$$

:<math>\displaystyle

\begin{~~gather~~}

\begin{aligned}

& \frac{\partial }{\partial g} \| gMV - jV \|^2 \\

& \, \frac{\partial }{\partial g} \| gMV - jV \|^2 \\

= & \frac{\partial }{\partial g} (gMV - jV)(gMV - jV)^{\mathsf T} \\

= & \, \frac{\partial }{\partial g} (gMV - jV)(gMV - jV)^{\mathsf T} \\

= & \frac{\partial }{\partial g} (gMVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} + jVV^{\mathsf T}j^{\mathsf T}) \\

= & \, \frac{\partial }{\partial g} (gMVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} + jVV^{\mathsf T}j^{\mathsf T}) \\

= & 2gMVV^{\mathsf T}M^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T} = 0

= & \, 2gMVV^{\mathsf T}M^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T} = 0

\end{~~gather~~}

\end{aligned}

$$

</math>

Now we solve for <math>g</math>:

$$

~~\begin{gather}~~

:<math>

~~& gMVV~~^{\mathsf T}M^{\mathsf T} = ~~MVV~~^{\mathsf T}j^{\mathsf T} \\

g M V V^{\mathsf T} M^{\mathsf T} = M V V^{\mathsf T} j^{\mathsf T}

~~\Rightarrow &~~ g = ~~jVV~~^{\mathsf T}M^{\mathsf T} (~~MVV~~^{\mathsf T}M^{\mathsf T})^{-1}

</math>

~~\end{gather}~~

$$

:<math>

~~Now if~~ we don't know which intervals to optimize for, a simple choice is just to optimize for each prime in the subgroup. This is equivalent to picking the identity matrix <math>V = I_{k \times k}</math>, so that the above expression simplifies to:

g = j V V^{\mathsf T} M^{\mathsf T} \left(M V V^{\mathsf T}M^{\mathsf T} \right)^{-1}

$$

</math>

g = jM^{\mathsf T} (MM^{\mathsf T})^{-1}

$$

If we don't know which intervals to optimize for, a simple choice is just to optimize for each prime in the subgroup.

This is equivalent to picking the identity matrix <math>V = I_{k \times k}</math>, so that the above expression simplifies to:

:<math>

g = j M^{\mathsf T} (M M^{\mathsf T})^{-1}

</math>

== Weighted norm ==

It can be beneficial to introduce weights for each prime. For example, in Tenney-Euclidean tuning, lower primes are prioritized slightly. We can introduce a (usually diagonal) weighting matrix <math>W</math>. For technical reasons, the norm is taken for <math>W^2</math> so that <math>~~\left~~\| x ~~\right~~\|_{W^2}^2 = xW^2x^{\mathsf T}</math>.

It can be beneficial to introduce weights for each prime. For example, in Tenney-Euclidean tuning, lower primes are prioritized slightly.

We can introduce a (usually diagonal) weighting matrix <math>W</math>.

For technical reasons, the norm is taken for <math>W^2</math> so that <math>\| x \|_{W^2}^2 = xW^2x^{\mathsf T}</math>.

This results in a weighted least squares problem:

$$

:<math>

\underset{g}{\text{minimize}} \ \| gMV - jV \|_{W^2}^2

\underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|_{W^2}^2

$$

</math>

We can write this as an ordinary least squares problem by setting:

$$

:<math>

\begin{gather}

j' = jW \\

M' = MW

\end{gather}

$$

</math>

Note that in Tenney-Euclidean tuning <math> j' = \begin{bmatrix}

Note that in Tenney-Euclidean tuning <math>

1, & \dots, & 1

j' =

\end{bmatrix}</math>.

\begin{bmatrix}

1, & \dots, & 1

\end{bmatrix}

</math>.

We then solve

$$

:<math>

\underset{g}{\text{minimize}} \ \| gM'V - j'V \|^2

\underset{g}{\text{minimize}} \ \left\| gM'V - j'V \right\|^2

$$

</math>

as above.

Line 82:

Line 94:

If we want some intervals to be exactly just, while optimizing all the others, we can introduce linear constraints. A good example is CTE tuning, where octaves are always just. This gives the following optimization problem:

$$

:<math>

\begin{align}

\underset{g}{\text{minimize}} & \ \| gMW - jW \|^2 \\

\underset{g}{\text{minimize}} & \ \left\| gMW - jW \right\|^2 \\

\text{subject to} & \quad gMV - jV = 0 \\

\end{align}

$$

</math>

This problem is convex and always has a single solution, we can use the method of Lagrange multipliers.

Line 93:

Line 105:

Let's first simplify a bit:

$$

:<math>

\begin{align}

A &= (MW)^{\mathsf T}

&b &= (jW)^{\mathsf T} \\

C &= (MV)^{\mathsf T}

&d &= (jV)^{\mathsf T} \\

\end{align}

$$

</math>

The problem then becomes:

$$

:<math>

\begin{align}

\underset{g}{\text{minimize}} & \quad \left\| Ag^{\mathsf T} - b \right\|^2 \\

\text{s.t.} & \quad \phantom{\|} Cg^{\mathsf T} - d = 0 \\

\end{align}

$$

</math>

We can write the Lagrangian as:

$$

:<math>

~~\begin{gather}~~

\mathcal{L}(g,\lambda) = \frac{1}{2} \left\| Ag^{\mathsf T} - b \right\|^2 + \lambda^{\mathsf T}(Cg^{\mathsf T} - d)

\mathcal{L}(g,\lambda) =& \frac{1}{2} \left\| Ag^{\mathsf T} - b \right\|^2 + \lambda^{\mathsf T}(Cg^{\mathsf T} - d)

</math>

~~\end{gather}~~

$$

where we introduced a vector of Lagrange multipliers <math>\lambda</math>.

Taking derivatives with respect to <math>g</math> and <math>\lambda</math> we get a linear system:

$$

:<math>

\begin{bmatrix}

A^{\mathsf T}A & C^{\mathsf T} \\

C & 0

\end{bmatrix}

\begin{bmatrix}

g^{\mathsf T} \\

\lambda^{\mathsf T}

\end{bmatrix}

=

\begin{bmatrix}

A^{\mathsf T} b\\

d

\end{bmatrix}

$$

</math>

The left block matrix is square and invertible, so we can solve for <math>(g,\lambda)</math>:

$$

:<math>

\begin{bmatrix}

g^{\mathsf T} \\

\lambda^{\mathsf T}

\end{bmatrix}

=

\begin{bmatrix}

A^{\mathsf T}A & C^{\mathsf T} \\

C & 0

\end{bmatrix}^{-1}

\begin{bmatrix}

A^{\mathsf T} b\\

d

\end{bmatrix}

$$

</math>

The lagrange multipliers have no concrete meaning for the resulting tuning, so they can be ignored.

@@ Line 1: / Line 1: @@
 Some derivations on optimal generators for 2-norm (Euclidean) tunings.
-In what follows, <math>M</math> is an <math>n \times k</math> integer matrix that represents the temperament, where <math>n</math> is the rank of the temperament and <math>k</math> is the rank of the just intonation subgroup. <math>g</math> is the (unknown) row vector of generators with length <math>n</math>, and <math>j</math> is a row vector of length <math>k</math> containing the log-primes (aka the JIP).
+In what follows, <math>M</math> is an <math>n \times k</math> integer matrix that represents the temperament, where <math>n</math> is the rank of the temperament and <math>k</math> is the rank of the just intonation subgroup.
+<math>g</math> is the (unknown) row vector of generators with length <math>n</math>, and <math>j</math> is a row vector of length <math>k</math> containing the log-primes (aka the JIP).
 The goal is now to find <math>g</math> such that the temperament approximates JI:
+:<math>
-$$
+	gM \approx j
-gM \approx j
+</math>
-$$
 == Solving the system ==
-The simplest case involves solving a linear system. Since there are <math>n</math> unknowns, we can introduce <math>n</math> constraints, each forcing an interval of choice to be just.
+The simplest case involves solving a linear system.
+Since there are <math>n</math> unknowns, we can introduce <math>n</math> constraints, each forcing an interval of choice to be just.
 Let <math>V</math> be a <math>k \times n</math> matrix, whose columns are the prime count vectors of the intervals we want to be just. We now want to solve:
-$$
+:<math>
-gMV = jV
+	gMV = jV
-$$
+</math>
-We can recognize the LHS as giving the size of the tempered intervals in <math>V</math>, and the RHS as the size of the just intervals. Since <math>MV</math> is square (<math>n \times n</math>), we can simply invert it to obtain a solution:
+We can recognize the LHS as giving the size of the tempered intervals in <math>V</math>, and the RHS as the size of the just intervals.
-$$
+Since <math>MV</math> is square (<math>n \times n</math>), we can simply invert it to obtain a solution:
-g = jV(MV)^{-1}
+:<math>
-$$
+	g = jV(MV)^{-1}
+</math>
 == Least squares solution ==
@@ Line 25: / Line 27: @@
 However, we can instead obtain an approximate solution that minimizes the squared error.
 If <math>V</math> is now a <math>k \times m</math> matrix (where <math>m \gt n</math>), then the error for these intervals is:
-$$
+:<math>
-gMV - jV
+	gMV - jV
-$$
+</math>
-Using <math>\left\| \cdot  \right\|</math> to denote the usual Euclidean norm, this becomes a standard ordinary least squares problem:
-$$
+Using <math>\| \cdot \|</math> to denote the usual Euclidean norm, this becomes a standard ordinary least squares problem:
-\underset{g}{\text{minimize}} \ \|   gMV - jV   \|^2
+:<math>
-$$
+	\underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|^2
-Note that since we are working with row vectors, the norm is defined such that <math>\left\| x  \right\|^2 = xx^{\mathsf T}</math>.
+</math>
+Note that since we are working with row vectors, the norm is defined such that <math>\| x \|^2 = xx^{\mathsf T}</math>.
 Differentiating with respect to <math>g</math>:
-$$
+:<math>\displaystyle
-\begin{gather}
+\begin{aligned}
-& \frac{\partial }{\partial g}  \|   gMV - jV   \|^2 \\
+	  & \, \frac{\partial }{\partial g}  \|   gMV - jV   \|^2 \\
-= & \frac{\partial }{\partial g}  (gMV - jV)(gMV - jV)^{\mathsf T}  \\
+	= & \, \frac{\partial }{\partial g}  (gMV - jV)(gMV - jV)^{\mathsf T}  \\
-= & \frac{\partial }{\partial g} (gMVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T}  - 2jVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} + jVV^{\mathsf T}j^{\mathsf T}) \\
+	= & \, \frac{\partial }{\partial g} (gMVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} + jVV^{\mathsf T}j^{\mathsf T}) \\
-= & 2gMVV^{\mathsf T}M^{\mathsf T}  - 2jVV^{\mathsf T}M^{\mathsf T} = 0
+	= & \, 2gMVV^{\mathsf T}M^{\mathsf T}  - 2jVV^{\mathsf T}M^{\mathsf T} = 0
-\end{gather}
+\end{aligned}
-$$
+</math>
 Now we solve for <math>g</math>:
-$$
-\begin{gather}
+:<math>
-& gMVV^{\mathsf T}M^{\mathsf T}  = MVV^{\mathsf T}j^{\mathsf T} \\
+	g M V V^{\mathsf T} M^{\mathsf T} = M V V^{\mathsf T} j^{\mathsf T}
-\Rightarrow & g  = jVV^{\mathsf T}M^{\mathsf T} (MVV^{\mathsf T}M^{\mathsf T})^{-1}
+</math>
-\end{gather}
-$$
+:<math>
-Now if we don't know which intervals to optimize for, a simple choice is just to optimize for each prime in the subgroup. This is equivalent to picking the identity matrix <math>V = I_{k \times k}</math>, so that the above expression simplifies to:
+	g  = j V V^{\mathsf T} M^{\mathsf T} \left(M V V^{\mathsf T}M^{\mathsf T} \right)^{-1}
-$$
+</math>
-g  = jM^{\mathsf T} (MM^{\mathsf T})^{-1}
-$$
+If we don't know which intervals to optimize for, a simple choice is just to optimize for each prime in the subgroup.
+This is equivalent to picking the identity matrix <math>V = I_{k \times k}</math>, so that the above expression simplifies to:
+:<math>
+	g  = j M^{\mathsf T} (M M^{\mathsf T})^{-1}
+</math>
 == Weighted norm ==
-It can be beneficial to introduce weights for each prime. For example, in Tenney-Euclidean tuning, lower primes are prioritized slightly. We can introduce a (usually diagonal) weighting matrix <math>W</math>. For technical reasons, the norm is taken for <math>W^2</math> so that <math>\left\| x  \right\|_{W^2}^2 = xW^2x^{\mathsf T}</math>.
+It can be beneficial to introduce weights for each prime. For example, in Tenney-Euclidean tuning, lower primes are prioritized slightly.
+We can introduce a (usually diagonal) weighting matrix <math>W</math>.
+For technical reasons, the norm is taken for <math>W^2</math> so that <math>\| x  \|_{W^2}^2 = xW^2x^{\mathsf T}</math>.
 This results in a weighted least squares problem:
-$$
+:<math>
-\underset{g}{\text{minimize}} \ \|   gMV - jV   \|_{W^2}^2
+	\underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|_{W^2}^2
-$$
+</math>
 We can write this as an ordinary least squares problem by setting:
-$$
+:<math>
 \begin{gather}
-j' = jW \\
+	j' = jW \\
-M' = MW
+	M' = MW
 \end{gather}
-$$
+</math>
-Note that in Tenney-Euclidean tuning <math> j' = \begin{bmatrix}
+Note that in Tenney-Euclidean tuning <math>
-, & \dots, & 1
+	j' =
-\end{bmatrix}</math>.
+	\begin{bmatrix}
+, & \dots, & 1
+	\end{bmatrix}
+</math>.
 We then solve
-$$
+:<math>
-\underset{g}{\text{minimize}} \ \|   gM'V - j'V   \|^2
+\underset{g}{\text{minimize}} \ \left\| gM'V - j'V \right\|^2
-$$
+</math>
 as above.
@@ Line 82: / Line 94: @@
 If we want some intervals to be exactly just, while optimizing all the others, we can introduce linear constraints. A good example is CTE tuning, where octaves are always just. This gives the following optimization problem:
-$$
+:<math>
 \begin{align}
-\underset{g}{\text{minimize}}   & \ \|   gMW - jW   \|^2   \\
+\underset{g}{\text{minimize}} & \ \left\| gMW - jW \right\|^2   \\
 \text{subject to} & \quad gMV - jV = 0 \\
 \end{align}
-$$
+</math>
 This problem is convex and always has a single solution, we can use the method of Lagrange multipliers.
@@ Line 93: / Line 105: @@
 Let's first simplify a bit:
-$$
+:<math>
 \begin{align}
-A &= (MW)^{\mathsf T}
+	A  &= (MW)^{\mathsf T}
-&b &= (jW)^{\mathsf T} \\
+	&b &= (jW)^{\mathsf T} \\
-C &= (MV)^{\mathsf T}
+	C  &= (MV)^{\mathsf T}
-&d &= (jV)^{\mathsf T} \\
+	&d &= (jV)^{\mathsf T} \\
 \end{align}
-$$
+</math>
 The problem then becomes:
-$$
+:<math>
 \begin{align}
-\underset{g}{\text{minimize}}   & \quad  \left\|   Ag^{\mathsf T} - b   \right\|^2   \\
+	\underset{g}{\text{minimize}}   & \quad  \left\|   Ag^{\mathsf T} - b   \right\|^2   \\
-\text{s.t.} & \quad \phantom{\|} Cg^{\mathsf T} - d  = 0 \\
+	\text{s.t.} & \quad \phantom{\|} Cg^{\mathsf T} - d  = 0 \\
 \end{align}
-$$
+</math>
 We can write the Lagrangian as:
-$$
+:<math>
-\begin{gather}
+	\mathcal{L}(g,\lambda) = \frac{1}{2} \left\| Ag^{\mathsf T} - b \right\|^2  + \lambda^{\mathsf T}(Cg^{\mathsf T} - d)
-\mathcal{L}(g,\lambda) =& \frac{1}{2} \left\|   Ag^{\mathsf T} - b   \right\|^2  + \lambda^{\mathsf T}(Cg^{\mathsf T} - d)
+</math>
-\end{gather}
-$$
 where we introduced a vector of Lagrange multipliers <math>\lambda</math>.
 Taking derivatives with respect to <math>g</math> and <math>\lambda</math> we get a linear system:
-$$
+:<math>
-\begin{bmatrix}
+	\begin{bmatrix}
-A^{\mathsf T}A & C^{\mathsf T} \\
+		A^{\mathsf T}A & C^{\mathsf T} \\
-C & 0
+		C & 0
-\end{bmatrix}
+	\end{bmatrix}
-\begin{bmatrix}
+	\begin{bmatrix}
-g^{\mathsf T}  \\
+		g^{\mathsf T}  \\
-\lambda^{\mathsf T}
+		\lambda^{\mathsf T}
-\end{bmatrix}
+	\end{bmatrix}
-=
+	=
-\begin{bmatrix}
+	\begin{bmatrix}
-A^{\mathsf T} b\\
+		A^{\mathsf T} b\\
-d
+		d
-\end{bmatrix}
+	\end{bmatrix}
-$$
+</math>
 The left block matrix is square and invertible, so we can solve for <math>(g,\lambda)</math>:
-$$
+:<math>
-\begin{bmatrix}
+	\begin{bmatrix}
-g^{\mathsf T}  \\
+		g^{\mathsf T}  \\
-\lambda^{\mathsf T}
+		\lambda^{\mathsf T}
-\end{bmatrix}
+	\end{bmatrix}
-=
+	=
-\begin{bmatrix}
+	\begin{bmatrix}
-A^{\mathsf T}A & C^{\mathsf T} \\
+		A^{\mathsf T}A & C^{\mathsf T} \\
-C & 0
+		C & 0
-\end{bmatrix}^{-1}
+	\end{bmatrix}^{-1}
-\begin{bmatrix}
+	\begin{bmatrix}
-A^{\mathsf T} b\\
+		A^{\mathsf T} b\\
-d
+	d
-\end{bmatrix}
+	\end{bmatrix}
-$$
+</math>
 The lagrange multipliers have no concrete meaning for the resulting tuning, so they can be ignored.