User:Sintel/Generator optimization: Difference between revisions

Sintel (talk | contribs)
m grammar
Sintel (talk | contribs)
math formatting
 
Line 1: Line 1:
Some derivations on optimal generators for 2-norm (Euclidean) tunings.
Some derivations on optimal generators for 2-norm (Euclidean) tunings.


In what follows, <math>M</math> is an <math>n \times k</math> integer matrix that represents the temperament, where <math>n</math> is the rank of the temperament and <math>k</math> is the rank of the just intonation subgroup. <math>g</math> is the (unknown) row vector of generators with length <math>n</math>, and <math>j</math> is a row vector of length <math>k</math> containing the log-primes (aka the JIP).
In what follows, <math>M</math> is an <math>n \times k</math> integer matrix that represents the temperament, where <math>n</math> is the rank of the temperament and <math>k</math> is the rank of the just intonation subgroup.
<math>g</math> is the (unknown) row vector of generators with length <math>n</math>, and <math>j</math> is a row vector of length <math>k</math> containing the log-primes (aka the JIP).


The goal is now to find <math>g</math> such that the temperament approximates JI:
The goal is now to find <math>g</math> such that the temperament approximates JI:
 
:<math>
$$
gM \approx j
gM \approx j
</math>
$$


== Solving the system ==
== Solving the system ==
The simplest case involves solving a linear system. Since there are <math>n</math> unknowns, we can introduce <math>n</math> constraints, each forcing an interval of choice to be just.
The simplest case involves solving a linear system.
Since there are <math>n</math> unknowns, we can introduce <math>n</math> constraints, each forcing an interval of choice to be just.


Let <math>V</math> be a <math>k \times n</math> matrix, whose columns are the prime count vectors of the intervals we want to be just. We now want to solve:
Let <math>V</math> be a <math>k \times n</math> matrix, whose columns are the prime count vectors of the intervals we want to be just. We now want to solve:
$$
:<math>
gMV = jV
gMV = jV
$$
</math>
We can recognize the LHS as giving the size of the tempered intervals in <math>V</math>, and the RHS as the size of the just intervals. Since <math>MV</math> is square (<math>n \times n</math>), we can simply invert it to obtain a solution:
We can recognize the LHS as giving the size of the tempered intervals in <math>V</math>, and the RHS as the size of the just intervals.
$$
Since <math>MV</math> is square (<math>n \times n</math>), we can simply invert it to obtain a solution:
g = jV(MV)^{-1}
:<math>
$$
g = jV(MV)^{-1}
</math>


== Least squares solution ==
== Least squares solution ==
Line 25: Line 27:
However, we can instead obtain an approximate solution that minimizes the squared error.  
However, we can instead obtain an approximate solution that minimizes the squared error.  
If <math>V</math> is now a <math>k \times m</math> matrix (where <math>m \gt n</math>), then the error for these intervals is:
If <math>V</math> is now a <math>k \times m</math> matrix (where <math>m \gt n</math>), then the error for these intervals is:
$$
:<math>
gMV - jV
gMV - jV
$$
</math>
Using <math>\left\| \cdot \right\|</math> to denote the usual Euclidean norm, this becomes a standard ordinary least squares problem:
 
$$
Using <math>\| \cdot \|</math> to denote the usual Euclidean norm, this becomes a standard ordinary least squares problem:
\underset{g}{\text{minimize}} \ \|   gMV - jV   \|^2
:<math>
$$
\underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|^2
Note that since we are working with row vectors, the norm is defined such that <math>\left\| x \right\|^2 = xx^{\mathsf T}</math>.
</math>
Note that since we are working with row vectors, the norm is defined such that <math>\| x \|^2 = xx^{\mathsf T}</math>.


Differentiating with respect to <math>g</math>:
Differentiating with respect to <math>g</math>:
$$
:<math>\displaystyle
\begin{gather}
\begin{aligned}
& \frac{\partial }{\partial g}  \|  gMV - jV  \|^2 \\
  & \, \frac{\partial }{\partial g}  \|  gMV - jV  \|^2 \\
= & \frac{\partial }{\partial g}  (gMV - jV)(gMV - jV)^{\mathsf T}  \\
= & \, \frac{\partial }{\partial g}  (gMV - jV)(gMV - jV)^{\mathsf T}  \\
= & \frac{\partial }{\partial g} (gMVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} + jVV^{\mathsf T}j^{\mathsf T}) \\
= & \, \frac{\partial }{\partial g} (gMVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} + jVV^{\mathsf T}j^{\mathsf T}) \\
= & 2gMVV^{\mathsf T}M^{\mathsf T}  - 2jVV^{\mathsf T}M^{\mathsf T} = 0
= & \, 2gMVV^{\mathsf T}M^{\mathsf T}  - 2jVV^{\mathsf T}M^{\mathsf T} = 0
\end{gather}
\end{aligned}
$$
</math>
Now we solve for <math>g</math>:
Now we solve for <math>g</math>:
$$
 
\begin{gather}
:<math>
& gMVV^{\mathsf T}M^{\mathsf T} = MVV^{\mathsf T}j^{\mathsf T} \\
g M V V^{\mathsf T} M^{\mathsf T} = M V V^{\mathsf T} j^{\mathsf T}
\Rightarrow & g  = jVV^{\mathsf T}M^{\mathsf T} (MVV^{\mathsf T}M^{\mathsf T})^{-1}
</math>
\end{gather}
 
$$
:<math>
Now if we don't know which intervals to optimize for, a simple choice is just to optimize for each prime in the subgroup. This is equivalent to picking the identity matrix <math>V = I_{k \times k}</math>, so that the above expression simplifies to:
g  = j V V^{\mathsf T} M^{\mathsf T} \left(M V V^{\mathsf T}M^{\mathsf T} \right)^{-1}
$$
</math>
g  = jM^{\mathsf T} (MM^{\mathsf T})^{-1}
 
$$
If we don't know which intervals to optimize for, a simple choice is just to optimize for each prime in the subgroup.
This is equivalent to picking the identity matrix <math>V = I_{k \times k}</math>, so that the above expression simplifies to:
:<math>
g  = j M^{\mathsf T} (M M^{\mathsf T})^{-1}
</math>


== Weighted norm ==
== Weighted norm ==
It can be beneficial to introduce weights for each prime. For example, in Tenney-Euclidean tuning, lower primes are prioritized slightly. We can introduce a (usually diagonal) weighting matrix <math>W</math>. For technical reasons, the norm is taken for <math>W^2</math> so that <math>\left\| x  \right\|_{W^2}^2 = xW^2x^{\mathsf T}</math>.
It can be beneficial to introduce weights for each prime. For example, in Tenney-Euclidean tuning, lower primes are prioritized slightly.
We can introduce a (usually diagonal) weighting matrix <math>W</math>.
For technical reasons, the norm is taken for <math>W^2</math> so that <math>\| x  \|_{W^2}^2 = xW^2x^{\mathsf T}</math>.


This results in a weighted least squares problem:
This results in a weighted least squares problem:
$$
:<math>
\underset{g}{\text{minimize}} \ \|   gMV - jV   \|_{W^2}^2
\underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|_{W^2}^2
$$
</math>
We can write this as an ordinary least squares problem by setting:
We can write this as an ordinary least squares problem by setting:
$$
:<math>
\begin{gather}
\begin{gather}
j' = jW \\
j' = jW \\
M' = MW
M' = MW
\end{gather}
\end{gather}
$$
</math>
Note that in Tenney-Euclidean tuning <math> j' = \begin{bmatrix}
Note that in Tenney-Euclidean tuning <math>
1, & \dots, & 1
j' =  
\end{bmatrix}</math>.
\begin{bmatrix}
1, & \dots, & 1
\end{bmatrix}
</math>.


We then solve  
We then solve  
$$
:<math>
\underset{g}{\text{minimize}} \ \|   gM'V - j'V   \|^2
\underset{g}{\text{minimize}} \ \left\| gM'V - j'V \right\|^2
$$
</math>
as above.
as above.


Line 82: Line 94:


If we want some intervals to be exactly just, while optimizing all the others, we can introduce linear constraints. A good example is CTE tuning, where octaves are always just. This gives the following optimization problem:
If we want some intervals to be exactly just, while optimizing all the others, we can introduce linear constraints. A good example is CTE tuning, where octaves are always just. This gives the following optimization problem:
$$
:<math>
\begin{align}
\begin{align}
\underset{g}{\text{minimize}}   & \ \|   gMW - jW   \|^2  \\
\underset{g}{\text{minimize}} & \ \left\| gMW - jW \right\|^2  \\
\text{subject to} & \quad gMV - jV = 0 \\
\text{subject to} & \quad gMV - jV = 0 \\
\end{align}
\end{align}
$$
</math>


This problem is convex and always has a single solution, we can use the method of Lagrange multipliers.  
This problem is convex and always has a single solution, we can use the method of Lagrange multipliers.  
Line 93: Line 105:
Let's first simplify a bit:
Let's first simplify a bit:


$$
:<math>
\begin{align}
\begin{align}
A &= (MW)^{\mathsf T}  
A &= (MW)^{\mathsf T}  
&b &= (jW)^{\mathsf T} \\
&b &= (jW)^{\mathsf T} \\
C &= (MV)^{\mathsf T}  
C &= (MV)^{\mathsf T}  
&d &= (jV)^{\mathsf T} \\
&d &= (jV)^{\mathsf T} \\
\end{align}
\end{align}
$$
</math>


The problem then becomes:
The problem then becomes:


$$
:<math>
\begin{align}
\begin{align}
\underset{g}{\text{minimize}}  & \quad  \left\|  Ag^{\mathsf T} - b  \right\|^2  \\
\underset{g}{\text{minimize}}  & \quad  \left\|  Ag^{\mathsf T} - b  \right\|^2  \\
\text{s.t.} & \quad \phantom{\|} Cg^{\mathsf T} - d  = 0 \\
\text{s.t.} & \quad \phantom{\|} Cg^{\mathsf T} - d  = 0 \\
\end{align}
\end{align}
$$
</math>


We can write the Lagrangian as:
We can write the Lagrangian as:
$$
:<math>
\begin{gather}
\mathcal{L}(g,\lambda) = \frac{1}{2} \left\| Ag^{\mathsf T} - b \right\|^2  + \lambda^{\mathsf T}(Cg^{\mathsf T} - d)
\mathcal{L}(g,\lambda) =& \frac{1}{2} \left\|   Ag^{\mathsf T} - b   \right\|^2  + \lambda^{\mathsf T}(Cg^{\mathsf T} - d)
</math>
\end{gather}
$$
where we introduced a vector of Lagrange multipliers <math>\lambda</math>.
where we introduced a vector of Lagrange multipliers <math>\lambda</math>.


Taking derivatives with respect to <math>g</math> and <math>\lambda</math> we get a linear system:
Taking derivatives with respect to <math>g</math> and <math>\lambda</math> we get a linear system:
$$
:<math>
\begin{bmatrix}
\begin{bmatrix}
A^{\mathsf T}A & C^{\mathsf T} \\
A^{\mathsf T}A & C^{\mathsf T} \\
C & 0
C & 0
\end{bmatrix}
\end{bmatrix}
\begin{bmatrix}
\begin{bmatrix}
g^{\mathsf T}  \\
g^{\mathsf T}  \\
\lambda^{\mathsf T}
\lambda^{\mathsf T}
\end{bmatrix}
\end{bmatrix}
=
=
\begin{bmatrix}
\begin{bmatrix}
A^{\mathsf T} b\\
A^{\mathsf T} b\\
d
d
\end{bmatrix}
\end{bmatrix}
$$
</math>
The left block matrix is square and invertible, so we can solve for <math>(g,\lambda)</math>:
The left block matrix is square and invertible, so we can solve for <math>(g,\lambda)</math>:
$$
:<math>
\begin{bmatrix}
\begin{bmatrix}
g^{\mathsf T}  \\
g^{\mathsf T}  \\
\lambda^{\mathsf T}
\lambda^{\mathsf T}
\end{bmatrix}
\end{bmatrix}
=
=
\begin{bmatrix}
\begin{bmatrix}
A^{\mathsf T}A & C^{\mathsf T} \\
A^{\mathsf T}A & C^{\mathsf T} \\
C & 0
C & 0
\end{bmatrix}^{-1}
\end{bmatrix}^{-1}
\begin{bmatrix}
\begin{bmatrix}
A^{\mathsf T} b\\
A^{\mathsf T} b\\
d
d
\end{bmatrix}
\end{bmatrix}
$$
</math>
The lagrange multipliers have no concrete meaning for the resulting tuning, so they can be ignored.
The lagrange multipliers have no concrete meaning for the resulting tuning, so they can be ignored.