User:Sintel/Generator optimization: Difference between revisions
No edit summary |
math formatting |
||
| (2 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
Some derivations on optimal generators for 2-norm (Euclidean) tunings. | Some derivations on optimal generators for 2-norm (Euclidean) tunings. | ||
In what follows, <math>M</math> is an <math>n \times k</math> integer matrix that represents the temperament, where <math>n</math> is the rank of the temperament and <math>k</math> is the rank of the just intonation subgroup. <math>g</math> is the (unknown) row vector of generators with length <math>n</math>, and <math>j</math> is a row vector of length <math>k</math> containing the log-primes (aka the JIP). | In what follows, <math>M</math> is an <math>n \times k</math> integer matrix that represents the temperament, where <math>n</math> is the rank of the temperament and <math>k</math> is the rank of the just intonation subgroup. | ||
<math>g</math> is the (unknown) row vector of generators with length <math>n</math>, and <math>j</math> is a row vector of length <math>k</math> containing the log-primes (aka the JIP). | |||
The goal is now to find <math>g</math> such that the temperament approximates JI: | The goal is now to find <math>g</math> such that the temperament approximates JI: | ||
:<math> | |||
gM \approx j | |||
gM \approx j | </math> | ||
== Solving the system == | == Solving the system == | ||
The simplest case involves solving a linear system. Since there are <math>n</math> unknowns, we can introduce <math>n</math> constraints, each forcing an interval of choice to be just. | The simplest case involves solving a linear system. | ||
Since there are <math>n</math> unknowns, we can introduce <math>n</math> constraints, each forcing an interval of choice to be just. | |||
Let <math>V</math> be a <math>k \times n</math> matrix, whose columns are the prime count vectors of the intervals we want to be just. We now want to solve: | Let <math>V</math> be a <math>k \times n</math> matrix, whose columns are the prime count vectors of the intervals we want to be just. We now want to solve: | ||
:<math> | |||
gMV = jV | gMV = jV | ||
</math> | |||
We can recognize the LHS as giving the size of the tempered intervals in <math>V</math>, and the RHS as the size of the just intervals. Since <math>MV</math> is square (<math>n \times n</math>), we can simply invert it to obtain a solution: | We can recognize the LHS as giving the size of the tempered intervals in <math>V</math>, and the RHS as the size of the just intervals. | ||
Since <math>MV</math> is square (<math>n \times n</math>), we can simply invert it to obtain a solution: | |||
g = jV(MV)^{-1} | :<math> | ||
g = jV(MV)^{-1} | |||
</math> | |||
== Least squares solution == | == Least squares solution == | ||
| Line 25: | Line 27: | ||
However, we can instead obtain an approximate solution that minimizes the squared error. | However, we can instead obtain an approximate solution that minimizes the squared error. | ||
If <math>V</math> is now a <math>k \times m</math> matrix (where <math>m \gt n</math>), then the error for these intervals is: | If <math>V</math> is now a <math>k \times m</math> matrix (where <math>m \gt n</math>), then the error for these intervals is: | ||
:<math> | |||
gMV - jV | gMV - jV | ||
</math> | |||
Using <math> | |||
Using <math>\| \cdot \|</math> to denote the usual Euclidean norm, this becomes a standard ordinary least squares problem: | |||
\underset{g}{\text{minimize}} | :<math> | ||
\underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|^2 | |||
Note that since we are working with row vectors, the norm is defined such that <math> | </math> | ||
Note that since we are working with row vectors, the norm is defined such that <math>\| x \|^2 = xx^{\mathsf T}</math>. | |||
Differentiating with respect to <math>g</math>: | Differentiating with respect to <math>g</math>: | ||
:<math>\displaystyle | |||
\begin{ | \begin{aligned} | ||
& \frac{\partial }{\partial g} \| gMV - jV \|^2 \\ | & \, \frac{\partial }{\partial g} \| gMV - jV \|^2 \\ | ||
= & \frac{\partial }{\partial g} (gMV - jV)(gMV - jV)^{\mathsf T} \\ | = & \, \frac{\partial }{\partial g} (gMV - jV)(gMV - jV)^{\mathsf T} \\ | ||
= & \frac{\partial }{\partial g} (gMVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} | = & \, \frac{\partial }{\partial g} (gMVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} + jVV^{\mathsf T}j^{\mathsf T}) \\ | ||
= & 2gMVV^{\mathsf T}M^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T} = 0 | = & \, 2gMVV^{\mathsf T}M^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T} = 0 | ||
\end{ | \end{aligned} | ||
</math> | |||
Now we solve for <math>g</math>: | Now we solve for <math>g</math>: | ||
:<math> | |||
g M V V^{\mathsf T} M^{\mathsf T} = M V V^{\mathsf T} j^{\mathsf T} | |||
</math> | |||
:<math> | |||
g = j V V^{\mathsf T} M^{\mathsf T} \left(M V V^{\mathsf T}M^{\mathsf T} \right)^{-1} | |||
</math> | |||
g = | |||
If we don't know which intervals to optimize for, a simple choice is just to optimize for each prime in the subgroup. | |||
This is equivalent to picking the identity matrix <math>V = I_{k \times k}</math>, so that the above expression simplifies to: | |||
:<math> | |||
g = j M^{\mathsf T} (M M^{\mathsf T})^{-1} | |||
</math> | |||
== Weighted norm == | == Weighted norm == | ||
It can be beneficial to introduce weights for each prime. For example, in Tenney-Euclidean tuning, lower primes are prioritized slightly. We can introduce a diagonal weighting matrix <math>W</math>. For technical reasons, the norm is taken for <math>W^2</math> so that <math> | It can be beneficial to introduce weights for each prime. For example, in Tenney-Euclidean tuning, lower primes are prioritized slightly. | ||
We can introduce a (usually diagonal) weighting matrix <math>W</math>. | |||
For technical reasons, the norm is taken for <math>W^2</math> so that <math>\| x \|_{W^2}^2 = xW^2x^{\mathsf T}</math>. | |||
This results in a weighted least squares problem: | This results in a weighted least squares problem: | ||
:<math> | |||
\underset{g}{\text{minimize}} | \underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|_{W^2}^2 | ||
</math> | |||
We can write this | We can write this as an ordinary least squares problem by setting: | ||
:<math> | |||
\begin{gather} | \begin{gather} | ||
j' = jW \\ | j' = jW \\ | ||
M' = MW | M' = MW | ||
\end{gather} | \end{gather} | ||
</math> | |||
Note that in Tenney-Euclidean tuning <math> j' = \begin{bmatrix} | Note that in Tenney-Euclidean tuning <math> | ||
1, & \dots, & 1 | j' = | ||
\end{bmatrix}</math>. | \begin{bmatrix} | ||
1, & \dots, & 1 | |||
\end{bmatrix} | |||
</math>. | |||
We then solve | We then solve | ||
:<math> | |||
\underset{g}{\text{minimize}} | \underset{g}{\text{minimize}} \ \left\| gM'V - j'V \right\|^2 | ||
</math> | |||
as above. | as above. | ||
== Constraints == | == Constraints == | ||
If we want some intervals to be exactly just, while optimizing all the | If we want some intervals to be exactly just, while optimizing all the others, we can introduce linear constraints. A good example is CTE tuning, where octaves are always just. This gives the following optimization problem: | ||
:<math> | |||
\begin{align} | \begin{align} | ||
\underset{g}{\text{minimize}} | \underset{g}{\text{minimize}} & \ \left\| gMW - jW \right\|^2 \\ | ||
\text{subject to} & \quad gMV - jV = 0 \\ | \text{subject to} & \quad gMV - jV = 0 \\ | ||
\end{align} | \end{align} | ||
</math> | |||
This problem is convex and always has a single solution, we can use the method of Lagrange multipliers. | This problem is convex and always has a single solution, we can use the method of Lagrange multipliers. | ||
| Line 93: | Line 105: | ||
Let's first simplify a bit: | Let's first simplify a bit: | ||
:<math> | |||
\begin{align} | \begin{align} | ||
A &= (MW)^{\mathsf T} | A &= (MW)^{\mathsf T} | ||
&b &= (jW)^{\mathsf T} \\ | &b &= (jW)^{\mathsf T} \\ | ||
C &= (MV)^{\mathsf T} | C &= (MV)^{\mathsf T} | ||
&d &= (jV)^{\mathsf T} \\ | &d &= (jV)^{\mathsf T} \\ | ||
\end{align} | \end{align} | ||
</math> | |||
The problem then becomes: | The problem then becomes: | ||
:<math> | |||
\begin{align} | \begin{align} | ||
\underset{g}{\text{minimize}} & \quad \left\| Ag^{\mathsf T} - b \right\|^2 \\ | \underset{g}{\text{minimize}} & \quad \left\| Ag^{\mathsf T} - b \right\|^2 \\ | ||
\text{s.t.} & \quad \phantom{\|} Cg^{\mathsf T} - d = 0 \\ | \text{s.t.} & \quad \phantom{\|} Cg^{\mathsf T} - d = 0 \\ | ||
\end{align} | \end{align} | ||
</math> | |||
We can write the Lagrangian as: | We can write the Lagrangian as: | ||
:<math> | |||
\mathcal{L}(g,\lambda) = \frac{1}{2} \left\| Ag^{\mathsf T} - b \right\|^2 + \lambda^{\mathsf T}(Cg^{\mathsf T} - d) | |||
\mathcal{L}(g,\lambda) = | </math> | ||
where we introduced a vector of Lagrange multipliers <math>\lambda</math>. | where we introduced a vector of Lagrange multipliers <math>\lambda</math>. | ||
Taking derivatives with respect to <math>g</math> and <math>\lambda</math> we get a linear system: | Taking derivatives with respect to <math>g</math> and <math>\lambda</math> we get a linear system: | ||
:<math> | |||
\begin{bmatrix} | \begin{bmatrix} | ||
A^{\mathsf T}A & C^{\mathsf T} \\ | A^{\mathsf T}A & C^{\mathsf T} \\ | ||
C & 0 | C & 0 | ||
\end{bmatrix} | \end{bmatrix} | ||
\begin{bmatrix} | \begin{bmatrix} | ||
g^{\mathsf T} \\ | g^{\mathsf T} \\ | ||
\lambda^{\mathsf T} | \lambda^{\mathsf T} | ||
\end{bmatrix} | \end{bmatrix} | ||
= | = | ||
\begin{bmatrix} | \begin{bmatrix} | ||
A^{\mathsf T} b\\ | A^{\mathsf T} b\\ | ||
d | d | ||
\end{bmatrix} | \end{bmatrix} | ||
</math> | |||
The left block matrix is square and invertible, so we can solve for <math>(g,\lambda)</math>: | The left block matrix is square and invertible, so we can solve for <math>(g,\lambda)</math>: | ||
:<math> | |||
\begin{bmatrix} | \begin{bmatrix} | ||
g^{\mathsf T} \\ | g^{\mathsf T} \\ | ||
\lambda^{\mathsf T} | \lambda^{\mathsf T} | ||
\end{bmatrix} | \end{bmatrix} | ||
= | = | ||
\begin{bmatrix} | \begin{bmatrix} | ||
A^{\mathsf T}A & C^{\mathsf T} \\ | A^{\mathsf T}A & C^{\mathsf T} \\ | ||
C & 0 | C & 0 | ||
\end{bmatrix}^{-1} | \end{bmatrix}^{-1} | ||
\begin{bmatrix} | \begin{bmatrix} | ||
A^{\mathsf T} b\\ | A^{\mathsf T} b\\ | ||
d | d | ||
\end{bmatrix} | \end{bmatrix} | ||
</math> | |||
The lagrange multipliers have no concrete meaning for the resulting tuning, so they can be ignored. | The lagrange multipliers have no concrete meaning for the resulting tuning, so they can be ignored. | ||
Latest revision as of 17:01, 26 April 2025
Some derivations on optimal generators for 2-norm (Euclidean) tunings.
In what follows, [math]\displaystyle{ M }[/math] is an [math]\displaystyle{ n \times k }[/math] integer matrix that represents the temperament, where [math]\displaystyle{ n }[/math] is the rank of the temperament and [math]\displaystyle{ k }[/math] is the rank of the just intonation subgroup. [math]\displaystyle{ g }[/math] is the (unknown) row vector of generators with length [math]\displaystyle{ n }[/math], and [math]\displaystyle{ j }[/math] is a row vector of length [math]\displaystyle{ k }[/math] containing the log-primes (aka the JIP).
The goal is now to find [math]\displaystyle{ g }[/math] such that the temperament approximates JI:
- [math]\displaystyle{ gM \approx j }[/math]
Solving the system
The simplest case involves solving a linear system. Since there are [math]\displaystyle{ n }[/math] unknowns, we can introduce [math]\displaystyle{ n }[/math] constraints, each forcing an interval of choice to be just.
Let [math]\displaystyle{ V }[/math] be a [math]\displaystyle{ k \times n }[/math] matrix, whose columns are the prime count vectors of the intervals we want to be just. We now want to solve:
- [math]\displaystyle{ gMV = jV }[/math]
We can recognize the LHS as giving the size of the tempered intervals in [math]\displaystyle{ V }[/math], and the RHS as the size of the just intervals. Since [math]\displaystyle{ MV }[/math] is square ([math]\displaystyle{ n \times n }[/math]), we can simply invert it to obtain a solution:
- [math]\displaystyle{ g = jV(MV)^{-1} }[/math]
Least squares solution
If we want more than [math]\displaystyle{ n }[/math] intervals to be just, the result is an overconstrained system, and no solutions can be found. However, we can instead obtain an approximate solution that minimizes the squared error. If [math]\displaystyle{ V }[/math] is now a [math]\displaystyle{ k \times m }[/math] matrix (where [math]\displaystyle{ m \gt n }[/math]), then the error for these intervals is:
- [math]\displaystyle{ gMV - jV }[/math]
Using [math]\displaystyle{ \| \cdot \| }[/math] to denote the usual Euclidean norm, this becomes a standard ordinary least squares problem:
- [math]\displaystyle{ \underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|^2 }[/math]
Note that since we are working with row vectors, the norm is defined such that [math]\displaystyle{ \| x \|^2 = xx^{\mathsf T} }[/math].
Differentiating with respect to [math]\displaystyle{ g }[/math]:
- [math]\displaystyle{ \displaystyle \begin{aligned} & \, \frac{\partial }{\partial g} \| gMV - jV \|^2 \\ = & \, \frac{\partial }{\partial g} (gMV - jV)(gMV - jV)^{\mathsf T} \\ = & \, \frac{\partial }{\partial g} (gMVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T}g^{\mathsf T} + jVV^{\mathsf T}j^{\mathsf T}) \\ = & \, 2gMVV^{\mathsf T}M^{\mathsf T} - 2jVV^{\mathsf T}M^{\mathsf T} = 0 \end{aligned} }[/math]
Now we solve for [math]\displaystyle{ g }[/math]:
- [math]\displaystyle{ g M V V^{\mathsf T} M^{\mathsf T} = M V V^{\mathsf T} j^{\mathsf T} }[/math]
- [math]\displaystyle{ g = j V V^{\mathsf T} M^{\mathsf T} \left(M V V^{\mathsf T}M^{\mathsf T} \right)^{-1} }[/math]
If we don't know which intervals to optimize for, a simple choice is just to optimize for each prime in the subgroup. This is equivalent to picking the identity matrix [math]\displaystyle{ V = I_{k \times k} }[/math], so that the above expression simplifies to:
- [math]\displaystyle{ g = j M^{\mathsf T} (M M^{\mathsf T})^{-1} }[/math]
Weighted norm
It can be beneficial to introduce weights for each prime. For example, in Tenney-Euclidean tuning, lower primes are prioritized slightly. We can introduce a (usually diagonal) weighting matrix [math]\displaystyle{ W }[/math]. For technical reasons, the norm is taken for [math]\displaystyle{ W^2 }[/math] so that [math]\displaystyle{ \| x \|_{W^2}^2 = xW^2x^{\mathsf T} }[/math].
This results in a weighted least squares problem:
- [math]\displaystyle{ \underset{g}{\text{minimize}} \ \left\| gMV - jV \right\|_{W^2}^2 }[/math]
We can write this as an ordinary least squares problem by setting:
- [math]\displaystyle{ \begin{gather} j' = jW \\ M' = MW \end{gather} }[/math]
Note that in Tenney-Euclidean tuning [math]\displaystyle{ j' = \begin{bmatrix} 1, & \dots, & 1 \end{bmatrix} }[/math].
We then solve
- [math]\displaystyle{ \underset{g}{\text{minimize}} \ \left\| gM'V - j'V \right\|^2 }[/math]
as above.
Constraints
If we want some intervals to be exactly just, while optimizing all the others, we can introduce linear constraints. A good example is CTE tuning, where octaves are always just. This gives the following optimization problem:
- [math]\displaystyle{ \begin{align} \underset{g}{\text{minimize}} & \ \left\| gMW - jW \right\|^2 \\ \text{subject to} & \quad gMV - jV = 0 \\ \end{align} }[/math]
This problem is convex and always has a single solution, we can use the method of Lagrange multipliers.
Let's first simplify a bit:
- [math]\displaystyle{ \begin{align} A &= (MW)^{\mathsf T} &b &= (jW)^{\mathsf T} \\ C &= (MV)^{\mathsf T} &d &= (jV)^{\mathsf T} \\ \end{align} }[/math]
The problem then becomes:
- [math]\displaystyle{ \begin{align} \underset{g}{\text{minimize}} & \quad \left\| Ag^{\mathsf T} - b \right\|^2 \\ \text{s.t.} & \quad \phantom{\|} Cg^{\mathsf T} - d = 0 \\ \end{align} }[/math]
We can write the Lagrangian as:
- [math]\displaystyle{ \mathcal{L}(g,\lambda) = \frac{1}{2} \left\| Ag^{\mathsf T} - b \right\|^2 + \lambda^{\mathsf T}(Cg^{\mathsf T} - d) }[/math]
where we introduced a vector of Lagrange multipliers [math]\displaystyle{ \lambda }[/math].
Taking derivatives with respect to [math]\displaystyle{ g }[/math] and [math]\displaystyle{ \lambda }[/math] we get a linear system:
- [math]\displaystyle{ \begin{bmatrix} A^{\mathsf T}A & C^{\mathsf T} \\ C & 0 \end{bmatrix} \begin{bmatrix} g^{\mathsf T} \\ \lambda^{\mathsf T} \end{bmatrix} = \begin{bmatrix} A^{\mathsf T} b\\ d \end{bmatrix} }[/math]
The left block matrix is square and invertible, so we can solve for [math]\displaystyle{ (g,\lambda) }[/math]:
- [math]\displaystyle{ \begin{bmatrix} g^{\mathsf T} \\ \lambda^{\mathsf T} \end{bmatrix} = \begin{bmatrix} A^{\mathsf T}A & C^{\mathsf T} \\ C & 0 \end{bmatrix}^{-1} \begin{bmatrix} A^{\mathsf T} b\\ d \end{bmatrix} }[/math]
The lagrange multipliers have no concrete meaning for the resulting tuning, so they can be ignored.