Lils using left inverse
This page is a work in progress. It is working towards being a replacement for the Log-integer-limit-squared section in D&D's guide, adding a "left-inverse" approach.
Log-integer-limit-squared
Here's where things start to get pretty weird.
In the cases of the minimax-lils-S and minimax-E-lils-S tuning schemes, our complexity pretransformer is the size-sensitizing matrix composed with the log prime matrix, [math]ZL[/math] and thus its inverse is [math](ZL)^{-1}[/math]. Unfortunately, though, even though it looks like we should be able to say the dual norm [math]\text{lils-C}^{*}()[/math] should be [math]βπ(ZL)^{-1}β_β[/math], it's not that simple, that is, it's not as simple as the typical case where we can just take the dual power and the inverse pretransformer. The problem is that [math]ZL[/math] is a rectangular matrix, and so it could only have some sort of generalized inverse, not a true inverse. The Moore-Penrose inverse, AKA the pseudoinverse, is a classic example of a generalized inverse; however, it only works for some temperaments. What we really seek is a complete solution to the root problem.
We have three possible approaches here (which are closely related, mathematically):
- The max-minus-min approach
- The tipweil.py approach
- The left-inverse approach
The first approach only works when [math]\text{dual}(q) = β[/math], such as with minimax-lils-S (but not with minimax-E-lils-S). The other two approaches work for any retuning magnitude norm power, [math]β[/math] or otherwise.
The min-minus-max approach
Mike Battaglia worked out a proof that the dual norm to the log-integer-limit complexity is this:
[math]
\begin{align}
\text{lils-C}^{*}(\textbf{π}) &= \max( \frac{r_1}{\log_2{\!p_1}}, \frac{r_2}{\log_2{\!p_2}}, \ldots, \frac{r_d}{\log_2{\!p_d}}, 0) \; - \\
& \quad\quad\quad \min(\frac{r_1}{\log_2{\!p_1}}, \frac{r_2}{\log_2{\!p_2}}, \ldots, \frac{r_d}{\log_2{\!p_d}}, 0)
\end{align}
[/math]
This can be used straight away in a numerical optimization problem solver, to find an approximate solution.
The tipweil.py approach
This approach is also credited to Mike[1]. This approach integrates with the coinciding-damage method originally developed by Keenan Pepper for use with TOP tuning, in tiptop.py. Unlike the min-minus-max approach, this one can give exact solutions. This way achieves the dual norm through custom-applied augmentations to the relevant matrices as used in this method.
To see these tunings worked out with exact solutions in the form of generator embeddings, using Mike's technique, see Generator embedding optimization#Minimax-lils-S and Generator embedding optimization#Minimax-E-lils-S.
The left-inverse approach
Either of the previous two methods require custom code paths to handle. We'd prefer to have a completely generic procedure, if possible. Is there a way to define a generalized inverse such that we still find the correct true optimum tunings we seek, but structure our approach in the same way as we can do it for all the other complexities? After a lengthy struggle, we have determined the answer to that question is "yes".
Generalized inverses
When [math]X[/math] is a rectangular complexity pretransformer, such as the [math]ZL[/math] that is used for [math]\text{lils-C}()[/math], there are alternative inverses available which will still effectively leverage the dual norm inequality in order to cap the maximum damage across all intervals. Specifically, any matrix [math]X^{-}[/math] (that's [math]X[/math] but with a superscript minus, as opposed to the superscript plus that we use for the Moore-Penrose inverse) which satisfies [math]X^{-}X=I[/math] will suffice. We call such a matrix a "left-inverse", because it cancels the original matrix out when it is left-multiplied by it.
There's also a thing called a right-inverse, which is the opposite. Tall matrices (more rows than columns) like [math]ZL[/math] have left-inverses (when they are full-rank. Wide matrices (more columns than rows) like mappings [math]M[/math] have right-inverses (again, when they are full-rank). Square matrices (same count of rows and columns) have matrices that are both left-inverses and right-inverses, or in other words, true inverses (again, when they are full-rank).
Fortunately for us, a left-inverse is the kind we need! That's because left-multiplication is the direction in which we need canceling out of [math]X[/math] to occur, because of the way it figures in the dual norm inequality, as shown here:
[math]
\dfrac{|π\cancel{X^{-1}}\cancel{X}\textbf{i}|}{βX\textbf{i}β_1} \leq βπX^{-1}β_β
[/math]
(You may wish to review the relevant background info here.)
Now it turns out that tall matrices have not just one left-inverse but a range of possible left-inverses (same goes for wide matrices and right-inverses, FWIW). These left-inverses follow a parameterized pattern, row-by-row, which you can learn about on this page, which is about a type of generalized inverse called a 1-inverse: https://mathworld.wolfram.com/Matrix1-Inverse.html. Left- and right- inverses are specific types of 1-inverse. Generalized inverses can be categorized by the subset of pseudoinverse conditions 1 through 4 that they satisfy. All generalized inverses must at least satisfy condition 1. A generalized inverse that satisfies conditions 1, 2, and 4 would be a (1,2,4)-inverse. So a 1-inverse is the most general of all types of generalized inverses. Condition 1 is that [math]AA^{-}A = A[/math]. This condition could be satisfied by either a left-inverse or a right-inverse. A left-inverse would satisfy it because a left-inverse means [math]A^{-}A = I[/math] and so
[math]
\begin{align}
A(A^{-}A) &= A \\
AI &= A \\
A &= A
\end{align}
[/math]
and a right-inverse would satisfy it because a right-inverse means [math]AA^{-} = I[/math] and so
[math]
\begin{align}
(AA^{-})A &= A \\
IA &= A \\
A &= A
\end{align}
[/math]
As a consequence of this, there is no more general way of generating all possible left inverses of [math]Z[/math] than the method given at the MathWorld link above[2], so our proof that follows below is sufficient. Any algorithm that generates all possible 1-inverses will generate only left-inverses in the case where [math]Z[/math] is tall and full rank, as it is in the case of minimax-lils-S. And since there can be no left inverses [math]A^{-}A = I[/math] that are not also 1-inverses [math]AA^{-}A = A[/math], the algorithm must generate all possible left-inverses.[3]
Optimizable simplicity pretransformers
Now here's where things get really weird. The parameters of these left-inverses can themselves be optimized. In other words, when optimizing for a minimax-lils-S tuning, and attempting to use a left-inverse of our complexity pretransformer to do so, then we may find ourselves doing a nested optimization problem! (At least, it seems this way at first. We'll figure a way out of this problem soon enough.)
Said another way, when we're attempting to minimize [math]β(π - π)(ZL)^{-}β_β[/math], we're not just trying different values for [math]t_1[/math], [math]t_2[/math], etc. (the tunings of the primes), but by potentially using different values for the entries of [math](ZL)^{-}[/math], we can still use the dual norm inequality to prove we've minimaxed lils-C-weight damage across all intervals (i.e. we can still validly cancel out [math]ZL[/math]), but we can do it with the values of [math](ZL)^{-}[/math] that let us get [math]β(π - π)(ZL)^{-}β_β[/math] as low as possible.
Understandably, since normal optimization problems can already be computationally intensive, doing an entire optimization problem within each one of the potentially thousands of steps of the outer optimization problem, may be so computationally intensive that most mathematical tools available to the public could not handle it. It would be much better if instead we had a simple closed-form expression for the parameters on this left-inverse.
Well, without further ado, here it is:
[math]
Z^{β»} =
\left[ \begin{array} {r}
1-kx & -kx & -kx & \cdots & -kx & x \\
-kx & 1-kx & -kx & \cdots & -kx & x \\
-kx & -kx & 1-kx & \cdots & -kx & x \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
-kx & -kx & -kx & \cdots & 1-kx & x \\
\end{array} \right]
[/math]
where
- [math]k[/math] is our size factor (so this works for hybrids, as well),
- [math]x = \dfrac{\max(πL^{-1}|0) + \min(πL^{-1}|0)}{\text{sum}(πL^{-1})}[/math],
- [math]\max()[/math], [math]\min()[/math], and [math]\text{sum}()[/math] are applied entry-wise to their covector arguments, and
- [math]πL^{-1}|0[/math] is the covector [math]πL^{-1}[/math] with a zero appended to its right-hand end.
Notice that there are no absolute values in that formula, and notice that it's [math]\max + \min[/math], not [math]\max - \min[/math] as appears in Mike's max-minus-min approach (which of course still works, but in a different, closely related way).
So, given [math]π[/math], we can calculate [math]πL^{-1}[/math]. From that we calculate [math]x[/math]. From that we calculate [math]Zβ»[/math]. From that we calculate [math](πL^{-1})Zβ»[/math]. Essentially, we've defined [math]Z^{-}[/math] as a function of [math]π[/math]. So when the solver (this assumes a numeric minimization, such as is described here) is then told to minimise [math]β(πL^{-1})Z^{-}β_{β}[/math] by changing [math]π[/math], it works a treat. No nested optimization anymore; it's just a single optimization, but the values of [math]π[/math] plug in a few more places than before.
This left-inverse matrix always gives the same results as the other two approaches from Mike, but can be written in the form of a pretransformed power-norm like all the other complexities as we wished. The main difference is that [math]Z^{-}[/math] is not a constant matrix throughout the minimization, as the simplicity pretransformation typicially tends to be.
We know that any [math]x[/math] we could possibly come up with still satisfy the requirement that [math]Z^{-}Z = I[/math], but only one [math]x[/math] will be optimal for our particular temperament's tuning.
Examples
For meantone temperament, with mapping [⟨1 1 0] ⟨0 1 4]}, with [math]k=1[/math] (minimax-lils-S), we find the optimal [math]x = 1[/math] with an optimal [math]π[/math] of ⟨0.000 -3.393 0.000]. So the max is 0, the min is -3.393, and the sum is -3.393, so (max+min)/sum = (0-3.393)/-3.393 = 1. We can see that any tuning which has only a single prime with nonzero error is going to result in [math]x = 1[/math]. When [math]x = 1[/math] and [math]k = 1[/math] like this, [math]Z^{-} = Z^{+}[/math], that is, the pseudoinverse of [math]Z[/math] is the optimal left-inverse.
For porcupine temperament, with mapping [⟨1 2 3] ⟨0 -3 -5]}, we find the optimal [math]x = 0.5[/math] with an optimal [math]π[/math] of ⟨-6.172 0.000 -6.172]. So (max+min)/sum = -6.172/-12.344 = 0.5.
For 12-ET, with mapping [⟨12 19 28]}, we find the optimal [math]x = 0.547[/math] with an optimal [math]π[/math] of ⟨-5.866 -7.093 0.000]. So (max+min)/sum = -7.093/-12.959 = 0.547.
Derivation
To derive the above [math]Z^{-}[/math] matrix which we've given in terms of [math]k[/math] and [math]π[/math] from MathWorld's take on the topic, we have a little work to do. So MathWorld says we're looking for matrices [math]P[/math] and [math]Q[/math] such that [math]J[/math] =
[math]
\left[ \begin{array} {c}
I & 0 \\
0 & 0 \\
\end{array} \right]
[/math]
And [math]PAQ = J[/math]. The reason why more than one possible generalize inverse exists arises from the fact that the [math]X[/math], [math]Y[/math] and [math]Z[/math] in MathWorld's final expression [math]A^{-}[/math] =
[math]
Q
\left[ \begin{array} {c}
I & X \\
Y & Z \\
\end{array} \right]
P
[/math]
have completely arbitrary entries; the only requirement is that their shape is correct. (Note: MathWorld's [math]Z[/math] is different than our [math]Z[/math]; thankfully we won't need theirs much.) In our situation, we only have [math]X[/math] (no [math]Y[/math] or [math]Z[/math] needed), because the rank of our [math]A[/math] is equal to its width. In other words, if we take [math]Z[/math] and put it into something like Hermite Normal Form, for example, it will still have nonzero entries in every column; it will just have a row of all zeroes at the bottom. So our [math]J[/math] looks like
[math]
\left[ \begin{array} {c}
I \\
0 \\
\end{array} \right]
[/math]
and our [math]A^{-}[/math] therefore looks like
[math]
Q
\left[ \begin{array} {c}
I & X \\
\end{array} \right]
P
[/math]
except that there's a further simplification due to this same fact: we won't require any column operations (only row operations) to get our [math]A[/math] into the form [math]J[/math] =
[math]
\left[ \begin{array} {c}
I \\
0 \\
\end{array} \right]
[/math]
so [math]Q[/math] will be an identity matrix, and can therefore be omitted, giving
[math]
A^{-} =
\left[ \begin{array} {c}
I & X \\
\end{array} \right]
P
[/math]
where [math]P[/math] represents the row operations that are needed to convert [math]A[/math] to an upper identity matrix with lower zeros. i.e. [math]P[/math] such that
[math]
PA =
\left[ \begin{array} {c}
I \\
0 \\
\end{array} \right]
[/math]
When we use our [math]Z[/math] for [math]A[/math], that is, for a 5-limit temperament we'd have
[math]
\left[ \begin{array} {c}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
k & k & k \\
\end{array} \right]
[/math]
in order to clear the bottom row, we need to subtract each of the other rows from it [math]k[/math] times. So we start with [math]P[/math] as a (4,4)-shaped identity matrix and do the same to it.
So we have [math]P[/math] =
[math]
\left[ \begin{array} {c}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
-k & -k & -k & 1 \\
\end{array} \right]
[/math]
Since [math]A^{-} = [I X]P[/math], and [math]P[/math] is square, then [math][I X][/math] must have the same shape as [math]A^{-}[/math], which must be the transpose of the shape of [math]A[/math]. So [math][I X][/math] must be (3,4)-shaped and [math]X[/math] must be (3,1)-shaped, as follows. [math]X[/math] =
[math]
\left[ \begin{array} {c}
x_1 \\
x_2 \\
x_3 \\
\end{array} \right]
[/math]
And [math][I X][/math] =
[math]
\left[ \begin{array} {c}
1 & 0 & 0 & x_1 \\
0 & 1 & 0 & x_2 \\
0 & 0 & 1 & x_3 \\
\end{array} \right]
[/math]
And therefore [math]A^{-} = [I X]P[/math] =
[math]
\left[ \begin{array} {c}
1 & 0 & 0 & x_1 \\
0 & 1 & 0 & x_2 \\
0 & 0 & 1 & x_3 \\
\end{array} \right]
\left[ \begin{array} {c}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
-k & -k & -k & 1 \\
\end{array} \right]
=
\left[ \begin{array} {c}
1-kx_1 & -kx_1 & -kx_1 & x_1 \\
-kx_2 & 1-kx_2 & -kx_2 & x_2 \\
-kx_3 & -kx_3 & 1-kx_3 & x_3 \\
\end{array} \right]
[/math]
A visual demonstration
Bringing it back to our goal of optimizing tuning, in general we want to find [math]\min(βπL^{-1}Z^{-}β_{\text{dual}(q)})[/math], so we want to find the [math]x[/math] parameter of [math]Z^{-}[/math] that allows us to do this.
For convenience we'd like a way to refer to the [math]πL^{-1}[/math] part of the expression. Let's make that [math]π[/math]. For a 5-limit example, we have [math]π[/math] = ⟨[math]e_1[/math] [math]e_2[/math] [math]e_3[/math]]. So the inside of our dual norm, [math]πL^{-1}Z^{-} = πZ^{-}[/math] = [math] \left[ \begin{array} {c} e_1 & e_2 & e_3 \\ \end{array} \right] \left[ \begin{array} {c} 1-kx_1 & -kx_1 & -kx_1 & x_1 \\ -kx_2 & 1-kx_2 & -kx_2 & x_2 \\ -kx_3 & -kx_3 & 1-kx_3 & x_3 \\ \end{array} \right] = \\[30pt] \left[ \begin{array} {c} e_1-k(e_1x_1+e_2x_2+e_3x_3) & e_2-k(e_1x_1+e_2x_2+e_3x_3) & e_3-k(e_1x_1+e_2x_2+e_3x_3) & e_1x_1+e_2x_2+e_3x_3 \\ \end{array} \right] [/math]
Note that for our minimax-lils-S example, we know that all [math]x_i[/math] will be the same, which is why we showed it simply as [math]x[/math] above, but here we'll show it in a more general form, with a vector [math]π[/math] of different values for each row.
To remove some clutter here, let's define a [math]c = (e_1x_1+e_2x_2+e_3x_3) = β¨e_1 e_2 e_3][x_1 x_2 x_3β© = ππ[/math]. So now we can simplify the above to:
[math]
\left[ \begin{array} {c}
e_1-kc & & e_2-kc & & e_3-kc & & c \\
\end{array} \right]
[/math]
Note that the last term does not include a [math]k[/math]. That's a [math]c[/math] by itself. (By the way, this [math]c[/math] is equal to [math]-r[/math] as it is used in Mike's proof here: https://en.xen.wiki/w/Dual_of_the_Weil_norm_proof)
So remember, this was all just the stuff inside our norm bars. So we have
[math]
\begin{align}
\text{lil-C}^{*}(π) &= βπL^{-1}Z^{-}β_{β} \\
&= β\left[ \begin{array} {r} e_1-kc & & e_2-kc & & e_3-kc & & c \end{array} \right]β_{β} \\
&= \max(|e_1-kc|, |e_2-kc|, |e_3-kc|, |c|)
\end{align}
[/math]
And we want the minimum of that:
[math]
\min(\text{lil-C}^{*}(π)) = \min(\max(|e_1-kc|, |e_2-kc|, |e_3-kc|, |c|))
[/math]
Remember, [math]k[/math] is just a constant we choose as part of our tuning scheme config. And [math]π = π - π = πM - π[/math] where [math]M[/math] is the mapping for the temperament we're optimizing a tuning for and [math]π[/math] is the corresponding just tuning map. So our only unknowns that we're optimizing for are [math]π[/math] and [math]c[/math]. We don't actually care what [math]c[/math] is, so long as we can find the value for it that allows us to come away from this with the [math]π[/math] that we know minimizes the particular simplicity-weight damage from [math]π = πM[/math] versus using just intonation with [math]π[/math].
So then we ask, as Mike did, what value of [math]c[/math] will minimize this? Here's a visual demonstration of how we know.
Visualize [math]e_1[/math], [math]e_2[/math], and [math]e_3[/math] plotted on a real number line. For now, imagine that [math]e_2[/math] and [math]e_3[/math] are random positive values and [math]e_1[/math] is a smaller negative value, making a pattern of dots like this:
| | -----------β’-+----β’----β’--- πβ| πβ πβ |
And notice that the maximum of their absolute values is just the distance to the origin of whichever dot is most distant from the origin. In this case, [math]|e_3|[/math]. Note that it's [math]e_3[/math] here because [math]e_3 = \max(\left[ \begin{array} {r} e_1 & e_2 & e_3 \end{array} \right])[/math].
Now recognize that subtracting some constant from all of them would shift the pattern of dots to the left. For some constant [math]k[/math] we could end up with this.
| | ----β’------β’-+--β’---------- πβ πβ| πβ -π -π | -π
Now [math]e_1-k[/math] is the one furthest from the origin, so the maximum of their absolute values will be [math]|e_1-k|[/math]. Note that it's [math]e_1[/math] here because [math]e_1[/math] = [math]\min(\left[ \begin{array} {r} e_1 & e_2 & e_3 \end{array} \right])[/math].
Clearly the maximum absolute value will be minimized for some choice of [math]k[/math] where the most positive dot and the most negative dot are the same distance from the origin, like this:
| | -------β’-----+β’----β’-------- πβ |πβ πβ -π |-π -π
And now realize that we have a minus [math]k[/math] essentially built in to our information already. We're actually looking at points that are these [math]e_i[/math] values but with [math]kc[/math] subtracted from them!
| | -------β’-----+β’----β’-------- πβ |πβ πβ -ππ |-ππ -ππ
So while [math]k[/math] is fixed by our tuning scheme config, we can adjust [math]c[/math], our "centering constant", in order to slide these points left and right; [math]k[/math] simply affects by how much we have to change [math]c[/math] to make this happen. For example, with [math]k = 0.5[/math], the hybrid halfway between minimax-S and minimax-lils-S, we'd have to change [math]c[/math] by twice as much in order to shift things by the same amount as we would with [math]k = 1[/math]. This isn't important right now but will be in a little bit.
In the case we've shown above, then the maximum distance from the origin will be half the distance between [math]e_1-kc[/math] and
[math]
\begin{align}
e_3-kc &= ((e_1-kc) - (e_3-kc))/2 \\
&= (e_1-kc-e_3+kc)/2 \\
&= (e_1-\cancel{kc}-e_3+\cancel{kc})/2 \\
&= (e_1-e_3)/2
\end{align}
[/math]
So the [math]kc[/math] part has mercifully canceled out.
But remember that the example with [math]e_1[/math] and [math]e_3[/math] was only an arbitrary illustrative example. Extrapolating from it, we can see that if we want to state this in general, the minimized value, i.e. [math]\min(\max(|e_1-kc|, |e_2-kc|, |e_3-kc|))[/math] will be [math]\dfrac{\max(e_1, e_2, e_3) - \min(e_1, e_2, e_3)}{2}[/math], which is just another way of expressing this centering idea; it asks us to imagine whatever the situation looks like before we start shifting left or right, just take the distance from the leftmost point to the rightmost point, and divide it by 2.
Also note that this works even if every [math]e_i[/math] is positive or every [math]e_i[/math] is negative, i.e. that all of them are on the same side of the origin. This is basically explained by the fact that the absolute value of the max and min are not taken. So if the min is negative, it's subtracted, for a net positive, as it should be, and if it's positive, it's subtracted, for a net negative, as it should be. (More on this later.)
Okay, but it's not quite this simple, however. Because so far we've only dealt with the first three elements in that list of values whose max we're taking. We haven't dealt with the fourth element [math]|c|[/math] yet.
Let's first consider the simplest case where [math]k = 1[/math]. Here, we could think of it as we're looking for
[math]
\min(\max(|e_1-c|, |e_2-c|, |e_3-c|, |c|))
[/math]
or in other words
[math]
\min(\max(|e_1-c|, |e_2-c|, |e_3-c|, |0-c|))
[/math]
so we're just including 0 among the points being shifted by [math]c[/math].
This would only make a difference if all values of [math]e_i[/math] are positive or all are negative. (Which may never happen anyway?) Or we could just think of it as plotting [math]c[/math] on our number line along with the other [math]e_i[/math] values. And here we can also see that [math]c[/math] will only be our maximum absolute value when all values of [math]e_i[/math] are positive or all are negative. In this case it's because if, say, the [math]\min(π) = 0[/math] and the [math]\max(π) = 5[/math], then [math]c[/math] will be 2.5, and the [math]\min(π) - c = 0[/math] will be -2.5 and the [math]\max(π) - c = 0[/math] will be 2.5, so in this case [math]c[/math] will be exactly tied with the [math]\max(π) - c[/math], but if we went any further past this with min also positive, then [math]c[/math] would finally exceed [math]\max(π) - c[/math].
And in the general case where [math]k[/math] may equal something else, we have no choice but to think of it in this second manner, with [math]c[/math] plotted on the number line along with our [math]e_i[/math] values, but its relationship with them is a bit more complicated, e.g. for [math]k = 0.5[/math], we can already see [math]c[/math] being the maximum absolute value even if the mininum is not on the same side of the origin as the max, but within half of the distance of the max from the origin. (And if [math]k = 0[/math], then we can't slide anything with [math]c[/math], so it doesn't matter what we set [math]c[/math] to, so we may as well have it at 0, so then we're just taking our [math]e_i[/math] values as they are. And as [math]k β 0[/math], then [math]c β β[/math], which seems crazy, because then [math]c[/math] will almost certainly dominate, and one would think that there'd be a smooth continuum from lp-C to lils-C, but this seems like there'd be some craziness happening in the zone right outside of lp-C.)
(If anyone can explain what the sliding back and forth of these [math]e_i[/math] values has to do with log integer limit, by the way, that would be wonderful. But perhaps that relationship is wildly indirect at this point, so thre's no intuitive link between them.)
Solving for the centering constant
Back on our [math]e_1[/math] and [math]e_3[/math] example, both points are equidistant from the origin, so we have:
[math]
\begin{align}
-(e_1-kc) &= (e_3-kc) \\
kc - e_1 &= e_3 - kc \\
2kc &= e_1 + e_3 \\
c &= (e_1 + e_3)/2k
\end{align}
[/math]
But in the general case, that's
[math]
c = \max(\left[ \begin{array} {r} e_1 & e_2 & e_3 & c \end{array} \right] + \min(\left[ \begin{array} {r} e_1 & e_2 & e_3 & c \end{array} \right])
[/math]
So its.... circularly defined?! Except, it's fine when we realize that generally [math]c[/math] is neither the min or max. And if [math]c[/math] is the max, then we have [math]c = c + \min()[/math] so that just implies the min is 0. And if [math]c[/math] is the min, then we have [math]c = \max() + c[/math] so that just implies that the max is 0. So we can just simplify it to say that
[math]
c = \max(\left[ \begin{array} {r} e_1 & e_2 & e_3 & 0 \end{array} \right] + \min(\left[ \begin{array} {r} e_1 & e_2 & e_3 & 0 \end{array} \right])
[/math]
That is, the same thing, but now with 0's instead of [math]c[/math]'s.
The reason we wanted to solve for [math]c[/math] is that we want to solve for our optimum [math]Z^{-}[/math]. Which means we need to know [math]π[/math].
Now substitute [math](e_1x_1+e_2x_2+e_3x_3)[/math] = ⟨[math]e_1[/math] [math]e_2[/math] [math]e_3[/math]][[math]x_1[/math] [math]x_2[/math] [math]x_3[/math]⟩ = [math]ππ[/math] back in for [math]c[/math] to find:
[math]
\begin{align}
ππ &= \max(\left[ \begin{array} {r} e_1 & e_2 & e_3 & 0 \end{array} \right] + \min(\left[ \begin{array} {r} e_1 & e_2 & e_3 & 0 \end{array} \right]) \\
π &= \max(\left[ \begin{array} {r} e_1 & e_2 & e_3 & 0 \end{array} \right] + \min(\left[ \begin{array} {r} e_1 & e_2 & e_3 & 0 \end{array} \right]) / π \\
π &= \max(π, 0) + \min(π, 0) / π \\
\end{align}
[/math]
So [math]x_1[/math], for example, is [math]\dfrac{\max(π, 0) + \min(π, 0)}{e_1}[/math]. And then you just need to plug those values into:
[math]
\left[ \begin{array} {c}
1-kx_1 & -kx_1 & -kx_1 & x_1 \\
-kx_2 & 1-kx_2 & -kx_2 & x_2 \\
-kx_3 & -kx_3 & 1-kx_3 & x_3 \\
\end{array} \right]
[/math]
And that will be the optimum [math]Z^{-}[/math] for any given [math]π[/math].
And equipped with that, we can just plug this in for [math]Z^{-}[/math] anywhere we'd normally have our dual norm's matrices. It's just that unlike most cases, the same parameters that affect [math]π[/math], that is, [math]π[/math], also affect [math]Z^{-}[/math].
Exact solutions
Mike's tipweil.py approach essentially shortcuts the whole optimal left-inverse process, going straight to the an exact solution. But what if we still want to find an exact solution. Will we find that using the [math]Z^{-}[/math] as we've understood it here, in the coinciding-damage method, will give the same results?
The values of Mike's augmented generators, the ones we throw away when we optimize for minimax-lil-S tunings, and which Mike calls [math]r[/math] in the Tenney-Weil article, for every example we've checked, it was related to the max damage by similar proportions as we see with the formulas for [math]x[/math] and [math]c[/math] as demonstrated above. This augmented generator is basically there to allow us to shift the log-product scaled retunings into their best balance between each other.
Perhapse we can show how Mike does it in tipweil.py, but with our variable names (note that [math]L^{-1}[/math] here is the diagonal of [math]L^{-1}[/math]):
[math]
\left[ \begin{array} {r}
g_1 & g_2 & c \\
\end{array} \right]
\left[ \begin{array} {r}
\frac{m_{11}}{s_{\text{p}1}} & \frac{m_{12}}{s_{\text{p}2}} & \frac{m_{13}}{s_{\text{p}3}} & 0 \\
\frac{m_{21}}{s_{\text{p}1}} & \frac{m_{22}}{s_{\text{p}2}} & \frac{m_{23}}{s_{\text{p}3}} & 0 \\
k & k & k & -1 \\
\end{array} \right]
-
\left[ \begin{array} {r}
j_1 & j_2 & j_3 & 0 \\
\end{array} \right]
[/math]
Is the same as minimizing:
[math]
\left[ \begin{array} {r}
g_1 & g_2 \\
\end{array} \right]
\left[ \begin{array} {r}
m_{11} & m_{12} & m_{13} \\
m_{21} & m_{22} & m_{23} \\
\end{array} \right]
\left[ \begin{array} {r}
1-kx_1 & -kx_1 & -kx_1 & x_1 \\
-kx_2 & 1-kx_2 & -kx_2 & x_2 \\
-kx_3 & -kx_3 & 1-kx_3 & x_3 \\
\end{array} \right]
-
\left[ \begin{array} {r}
j_1 & j_2 & j_3 & 0 \\
\end{array} \right]
[/math]
That is, those being the inside of our norm bars.
It seems like there's a pretty good chance this works out to the same thing. Until it's totally worked out, though, we haven't really answered how it works to find an exact solution. And my first attempts to do this have failed, unfortunately. But I probably just goofed somewhere. And have run out of time for now. Best of luck to the interested reader!
- β https://yahootuninggroupsultimatebackup.github.io/tuning-math/topicId_21029.html
- β from JodΓ‘r et al, 1991
- β Note that even if a wide matrix is not full-rank, i.e. it is (row-)rank-deficient, it may still have a generalized inverse satisfying [math]AA^{-}A = A[/math], i.e. even if [math]A^{-}A β I[/math]. And similarly, even if a tall matrix is not full-rank, i.e. it is (column-)rank-deficient, it may still have a 1-inverse that satisfies that weaker condition that [math]AA^{-}A = A[/math]. Tall and wide matrices have different strong conditions, but share the same weaker condition. And as for a square matrix, when full-rank, it has a true inverse [math]A^{-1}[/math] which satisfies both the right-inverse condition [math]AA^{-1} = I[/math] and the left-inverse condition [math]A^{-1}A = I[/math]. Otherwiseβββwhen it is rank-deficientβββit does not; however, it may still have a generalized inverse called a 1-inverse which satisfies the weaker condition that [math]AA^{-}A = A[/math].