Lils using left inverse: Difference between revisions

BudjarnLambeth (talk | contribs)
m Categorised this uncategorised page
Dave Keenan (talk | contribs)
m Generalized inverses: Added missing closing paren.
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<div style="background: aliceblue; color: darkslategray; font-style: italic; border: 1px solid lightblue; margin: 15px; padding: 15px; text-align: center;">
<div style="background: aliceblue; color: darkslategray; font-style: italic; border: 1px solid lightblue; margin: 15px; padding: 15px; text-align: center;">


This page is a work in progress. It is working towards being a replacement for the [[Dave_Keenan_%26_Douglas_Blumeyer%27s_guide_to_RTT:_alternative_complexities#Log-integer-limit-squared|Log-integer-limit-squared section in D&D's guide]], adding a "left-inverse" approach.
This page is a work in progress. It is working towards being a replacement for the [[Dave Keenan & Douglas Blumeyer's guide to RTT/Alternative complexities#Log-integer-limit-squared|Log-integer-limit-squared section in D&D's guide]], adding a "left-inverse" approach.


</div>
</div>




==Log-integer-limit-squared==
== Log-integer-limit-squared ==
 
Here's where things start to get pretty weird.
Here's where things start to get pretty weird.


Line 13: Line 12:


We have three possible approaches here (which are closely related, mathematically):
We have three possible approaches here (which are closely related, mathematically):
# the max-minus-min approach
# The max-minus-min approach
# the tipweil.py approach
# The tipweil.py approach
# the left-inverse approach
# The left-inverse approach


The first approach only works when <math>\text{dual}(q) = ∞</math>, such as with minimax-lils-S (but not with minimax-E-lils-S). The other two approaches work for any retuning magnitude norm power, <math>∞</math> or otherwise.
The first approach only works when <math>\text{dual}(q) = ∞</math>, such as with minimax-lils-S (but not with minimax-E-lils-S). The other two approaches work for any retuning magnitude norm power, <math>∞</math> or otherwise.


===The min-minus-max approach===
=== The min-minus-max approach ===
 
Mike Battaglia worked out [[Dual_of_the_Weil_norm_proof|a proof]] that the dual norm to the log-integer-limit complexity is this:
Mike Battaglia worked out [[Dual_of_the_Weil_norm_proof|a proof]] that the dual norm to the log-integer-limit complexity is this:


Line 34: Line 32:
This can be used straight away in a numerical optimization problem solver, to find an approximate solution.
This can be used straight away in a numerical optimization problem solver, to find an approximate solution.


===The tipweil.py approach===
=== The tipweil.py approach ===
 
This approach is also credited to Mike<ref>https://yahootuninggroupsultimatebackup.github.io/tuning-math/topicId_21029.html</ref>. This approach integrates with the [[Generator_embedding_optimization#Coinciding-damage_method|coinciding-damage method]] originally developed by Keenan Pepper for use with TOP tuning, in tiptop.py. Unlike the min-minus-max approach, this one can give exact solutions. This way achieves the dual norm through custom-applied augmentations to the relevant matrices as used in this method.   
This approach is also credited to Mike<ref>https://yahootuninggroupsultimatebackup.github.io/tuning-math/topicId_21029.html</ref>. This approach integrates with the [[Generator_embedding_optimization#Coinciding-damage_method|coinciding-damage method]] originally developed by Keenan Pepper for use with TOP tuning, in tiptop.py. Unlike the min-minus-max approach, this one can give exact solutions. This way achieves the dual norm through custom-applied augmentations to the relevant matrices as used in this method.   


To see these tunings worked out with exact solutions in the form of generator embeddings, using Mike's technique, see [[Generator embedding optimization#Minimax-lils-S]] and [[Generator embedding optimization#Minimax-E-lils-S]].
To see these tunings worked out with exact solutions in the form of generator embeddings, using Mike's technique, see [[Generator embedding optimization#Minimax-lils-S]] and [[Generator embedding optimization#Minimax-E-lils-S]].


===The left-inverse approach===
=== The left-inverse approach ===
 
Either of the previous two methods require custom code paths to handle. We'd prefer to have a completely generic procedure, if possible. Is there a way to define a generalized inverse such that we still find the correct true optimum tunings we seek, but structure our approach in the same way as we can do it for all the other complexities? After a lengthy struggle, we have determined the answer to that question is "yes".
Either of the previous two methods require custom code paths to handle. We'd prefer to have a completely generic procedure, if possible. Is there a way to define a generalized inverse such that we still find the correct true optimum tunings we seek, but structure our approach in the same way as we can do it for all the other complexities? After a lengthy struggle, we have determined the answer to that question is "yes".


====Generalized inverses====
==== Generalized inverses ====
 
When <math>X</math> is a rectangular complexity pretransformer, such as the <math>ZL</math> that is used for <math>\text{lils-C}()</math>, there are alternative inverses available which will still effectively leverage the dual norm inequality in order to cap the maximum damage across all intervals. Specifically, any matrix <math>X^{-}</math> (that's <math>X</math> but with a superscript ''minus'', as opposed to the superscript ''plus'' that we use for the Moore-Penrose inverse) which satisfies <math>X^{-}X=I</math> will suffice. We call such a matrix a "left-inverse", because it cancels the original matrix out when it is left-multiplied by it.
When <math>X</math> is a rectangular complexity pretransformer, such as the <math>ZL</math> that is used for <math>\text{lils-C}()</math>, there are alternative inverses available which will still effectively leverage the dual norm inequality in order to cap the maximum damage across all intervals. Specifically, any matrix <math>X^{-}</math> (that's <math>X</math> but with a superscript ''minus'', as opposed to the superscript ''plus'' that we use for the Moore-Penrose inverse) which satisfies <math>X^{-}X=I</math> will suffice. We call such a matrix a "left-inverse", because it cancels the original matrix out when it is left-multiplied by it.


There's also a thing called a right-inverse, which is the opposite. Tall matrices (more rows than columns) like <math>ZL</math> have left-inverses (when they are [[full-rank]]. Wide matrices (more columns than rows) like mappings <math>M</math> have right-inverses (again, when they are full-rank). Square matrices (same count of rows and columns) have matrices that are both left-inverses and right-inverses, or in other words, true inverses (again, when they are full-rank).
There's also a thing called a right-inverse, which is the opposite. Tall matrices (more rows than columns) like <math>ZL</math> have left-inverses (when they are [[full-rank]]). Wide matrices (more columns than rows) like mappings <math>M</math> have right-inverses (again, when they are full-rank). Square matrices (same count of rows and columns) have matrices that are both left-inverses and right-inverses, or in other words, true inverses (again, when they are full-rank).


Fortunately for us, a left-inverse is the kind we need! That's because left-multiplication is the direction in which we need canceling out of <math>X</math> to occur, because of the way it figures in the dual norm inequality, as shown here:
Fortunately for us, a left-inverse is the kind we need! That's because left-multiplication is the direction in which we need canceling out of <math>X</math> to occur, because of the way it figures in the dual norm inequality, as shown here:
Line 58: Line 53:




(You may wish to review the relevant background info here: [[Dave Keenan & Douglas Blumeyer's guide to RTT: all-interval tuning schemes#Bringing it back to the dual norm inequality]].)
(You may wish to review the relevant background info [[Dave Keenan & Douglas Blumeyer's guide to RTT/All-interval tuning schemes#Bringing it back to the dual norm inequality|here]].)


Now it turns out that tall matrices have not just one left-inverse but a range of possible left-inverses (same goes for wide matrices and right-inverses, FWIW). These left-inverses follow a parameterized pattern, row-by-row, which you can learn about on this page, which is about a type of generalized inverse called a ''1-inverse'': https://mathworld.wolfram.com/Matrix1-Inverse.html. Left- and right- inverses are specific types of 1-inverse. Generalized inverses can be categorized by the subset of pseudoinverse conditions 1 through 4 that they satisfy. All generalized inverses must at least satisfy condition 1. A generalized inverse that satisfies conditions 1, 2, and 4 would be a (1,2,4)-inverse. So a 1-inverse is ''the most general'' of all types of generalized inverses. Condition 1 is that <math>AA^{-}A = A</math>. This condition could be satisfied by either a left-inverse or a right-inverse. A left-inverse would satisfy it because a left-inverse means <math>A^{-}A = I</math> and so
Now it turns out that tall matrices have not just one left-inverse but a range of possible left-inverses (same goes for wide matrices and right-inverses, FWIW). These left-inverses follow a parameterized pattern, row-by-row, which you can learn about on this page, which is about a type of generalized inverse called a ''1-inverse'': https://mathworld.wolfram.com/Matrix1-Inverse.html. Left- and right- inverses are specific types of 1-inverse. Generalized inverses can be categorized by the subset of pseudoinverse conditions 1 through 4 that they satisfy. All generalized inverses must at least satisfy condition 1. A generalized inverse that satisfies conditions 1, 2, and 4 would be a (1,2,4)-inverse. So a 1-inverse is ''the most general'' of all types of generalized inverses. Condition 1 is that <math>AA^{-}A = A</math>. This condition could be satisfied by either a left-inverse or a right-inverse. A left-inverse would satisfy it because a left-inverse means <math>A^{-}A = I</math> and so
Line 86: Line 81:
As a consequence of this, there is no more general way of generating all possible left inverses of <math>Z</math> than the method given at the MathWorld link above<ref>from Jodár et al, 1991</ref>, so our proof that follows below is sufficient. Any algorithm that generates all possible 1-inverses will generate only left-inverses in the case where <math>Z</math> is tall and full rank, as it is in the case of minimax-lils-S. And since there can be no left inverses <math>A^{-}A = I</math> that are not also 1-inverses <math>AA^{-}A = A</math>, the algorithm must generate all possible left-inverses.<ref>Note that even if a wide matrix is ''not'' full-rank, i.e. it is (row-)rank-deficient, it may still have a generalized inverse satisfying <math>AA^{-}A = A</math>, i.e. even if <math>A^{-}A ≠ I</math>. And similarly, even if a tall matrix is ''not'' full-rank, i.e. it is (column-)rank-deficient, it may still have a 1-inverse that satisfies that weaker condition that <math>AA^{-}A = A</math>. Tall and wide matrices have different strong conditions, but share the same weaker condition. And as for a square matrix, when full-rank, it has a true inverse <math>A^{-1}</math> which satisfies both the right-inverse condition <math>AA^{-1} = I</math> ''and'' the left-inverse condition <math>A^{-1}A = I</math>. Otherwise — when it is rank-deficient — it does not; however, it may still have a generalized inverse called a 1-inverse which satisfies the weaker condition that <math>AA^{-}A = A</math>.</ref>
As a consequence of this, there is no more general way of generating all possible left inverses of <math>Z</math> than the method given at the MathWorld link above<ref>from Jodár et al, 1991</ref>, so our proof that follows below is sufficient. Any algorithm that generates all possible 1-inverses will generate only left-inverses in the case where <math>Z</math> is tall and full rank, as it is in the case of minimax-lils-S. And since there can be no left inverses <math>A^{-}A = I</math> that are not also 1-inverses <math>AA^{-}A = A</math>, the algorithm must generate all possible left-inverses.<ref>Note that even if a wide matrix is ''not'' full-rank, i.e. it is (row-)rank-deficient, it may still have a generalized inverse satisfying <math>AA^{-}A = A</math>, i.e. even if <math>A^{-}A ≠ I</math>. And similarly, even if a tall matrix is ''not'' full-rank, i.e. it is (column-)rank-deficient, it may still have a 1-inverse that satisfies that weaker condition that <math>AA^{-}A = A</math>. Tall and wide matrices have different strong conditions, but share the same weaker condition. And as for a square matrix, when full-rank, it has a true inverse <math>A^{-1}</math> which satisfies both the right-inverse condition <math>AA^{-1} = I</math> ''and'' the left-inverse condition <math>A^{-1}A = I</math>. Otherwise — when it is rank-deficient — it does not; however, it may still have a generalized inverse called a 1-inverse which satisfies the weaker condition that <math>AA^{-}A = A</math>.</ref>


====Optimizable simplicity pretransformers====
==== Optimizable simplicity pretransformers ====
 
Now here's where things get ''really'' weird. The parameters of these left-inverses can ''themselves'' be optimized. In other words, when optimizing for a minimax-lils-S tuning, and attempting to use a left-inverse of our complexity pretransformer to do so, then we may find ourselves doing a ''nested optimization problem''! (At least, it seems this way at first. We'll figure a way out of this problem soon enough.)
Now here's where things get ''really'' weird. The parameters of these left-inverses can ''themselves'' be optimized. In other words, when optimizing for a minimax-lils-S tuning, and attempting to use a left-inverse of our complexity pretransformer to do so, then we may find ourselves doing a ''nested optimization problem''! (At least, it seems this way at first. We'll figure a way out of this problem soon enough.)


Line 117: Line 111:
Notice that there are no absolute values in that formula, and notice that it's <math>\max + \min</math>, not <math>\max - \min</math> as appears in Mike's max-minus-min approach (which of course still works, but in a different, closely related way).
Notice that there are no absolute values in that formula, and notice that it's <math>\max + \min</math>, not <math>\max - \min</math> as appears in Mike's max-minus-min approach (which of course still works, but in a different, closely related way).


So, given <math>𝒓</math>, we can calculate <math>𝒓L^{-1}</math>. From that we calculate <math>x</math>. From that we calculate <math>Z⁻</math>. From that we calculate <math>(𝒓L^{-1})Z⁻</math>. Essentially, we've defined <math>Z^{-}</math> as a function of <math>𝒓</math>. So when the solver (this assumes a numeric minimization, such as is described [[Dave_Keenan_%26_Douglas_Blumeyer%27s_guide_to_RTT:_tuning_computation#General_method|here]]) is then told to minimise <math>‖(𝒓L^{-1})Z^{-}‖_{∞}</math> by changing <math>𝒓</math>, it works a treat. No nested optimization anymore; it's just a single optimization, but the values of <math>𝒓</math> plug in a few more places than before.
So, given <math>𝒓</math>, we can calculate <math>𝒓L^{-1}</math>. From that we calculate <math>x</math>. From that we calculate <math>Z⁻</math>. From that we calculate <math>(𝒓L^{-1})Z⁻</math>. Essentially, we've defined <math>Z^{-}</math> as a function of <math>𝒓</math>. So when the solver (this assumes a numeric minimization, such as is described [[Dave Keenan & Douglas Blumeyer's guide to RTT/Tuning computation#General method|here]]) is then told to minimise <math>‖(𝒓L^{-1})Z^{-}‖_{∞}</math> by changing <math>𝒓</math>, it works a treat. No nested optimization anymore; it's just a single optimization, but the values of <math>𝒓</math> plug in a few more places than before.


This left-inverse matrix always gives the same results as the other two approaches from Mike, but can be written in the form of a pretransformed power-norm like all the other complexities as we wished. The main difference is that <math>Z^{-}</math> is not a constant matrix throughout the minimization, as the simplicity pretransformation typicially tends to be.
This left-inverse matrix always gives the same results as the other two approaches from Mike, but can be written in the form of a pretransformed power-norm like all the other complexities as we wished. The main difference is that <math>Z^{-}</math> is not a constant matrix throughout the minimization, as the simplicity pretransformation typicially tends to be.
Line 123: Line 117:
We know that any <math>x</math> we could possibly come up with still satisfy the requirement that <math>Z^{-}Z = I</math>, but only one <math>x</math> will be optimal for our particular temperament's tuning.
We know that any <math>x</math> we could possibly come up with still satisfy the requirement that <math>Z^{-}Z = I</math>, but only one <math>x</math> will be optimal for our particular temperament's tuning.


====Examples====
==== Examples ====
 
For meantone temperament, with mapping {{rket|{{map|1 1 0}} {{map|0 1 4}}}}, with <math>k=1</math> (minimax-lils-S), we find the optimal <math>x = 1</math> with an optimal <math>𝒓</math> of {{map|0.000 -3.393 0.000}}. So the max is 0, the min is -3.393, and the sum is -3.393, so (max+min)/sum = (0-3.393)/-3.393 = 1. We can see that any tuning which has only a single prime with nonzero error is going to result in <math>x = 1</math>. When <math>x = 1</math> and <math>k = 1</math> like this, <math>Z^{-} = Z^{+}</math>, that is, the pseudoinverse of <math>Z</math> is the optimal left-inverse.
For meantone temperament, with mapping {{rket|{{map|1 1 0}} {{map|0 1 4}}}}, with <math>k=1</math> (minimax-lils-S), we find the optimal <math>x = 1</math> with an optimal <math>𝒓</math> of {{map|0.000 -3.393 0.000}}. So the max is 0, the min is -3.393, and the sum is -3.393, so (max+min)/sum = (0-3.393)/-3.393 = 1. We can see that any tuning which has only a single prime with nonzero error is going to result in <math>x = 1</math>. When <math>x = 1</math> and <math>k = 1</math> like this, <math>Z^{-} = Z^{+}</math>, that is, the pseudoinverse of <math>Z</math> is the optimal left-inverse.


Line 131: Line 124:
For 12-ET, with mapping {{rket|{{map|12 19 28}}}}, we find the optimal <math>x = 0.547</math> with an optimal <math>𝒓</math> of {{map|-5.866 -7.093 0.000}}. So (max+min)/sum = -7.093/-12.959 = 0.547.
For 12-ET, with mapping {{rket|{{map|12 19 28}}}}, we find the optimal <math>x = 0.547</math> with an optimal <math>𝒓</math> of {{map|-5.866 -7.093 0.000}}. So (max+min)/sum = -7.093/-12.959 = 0.547.


====Derivation====
==== Derivation ====
 
To derive the above <math>Z^{-}</math> matrix which we've given in terms of <math>k</math> and <math>𝒙</math> from MathWorld's take on the topic, we have a little work to do. So MathWorld says we're looking for matrices <math>P</math> and <math>Q</math> such that <math>J</math> =
To derive the above <math>Z^{-}</math> matrix which we've given in terms of <math>k</math> and <math>𝒙</math> from MathWorld's take on the topic, we have a little work to do. So MathWorld says we're looking for matrices <math>P</math> and <math>Q</math> such that <math>J</math> =


Line 145: Line 137:


And <math>PAQ = J</math>. The reason why more than one possible generalize inverse exists arises from the fact that the <math>X</math>, <math>Y</math> and <math>Z</math> in MathWorld's final expression <math>A^{-}</math> =
And <math>PAQ = J</math>. The reason why more than one possible generalize inverse exists arises from the fact that the <math>X</math>, <math>Y</math> and <math>Z</math> in MathWorld's final expression <math>A^{-}</math> =
<math>
<math>
Q
Q
Line 244: Line 234:


Since <math>A^{-} = [I X]P</math>, and <math>P</math> is square, then <math>[I X]</math> must have the same shape as <math>A^{-}</math>, which must be the transpose of the shape of <math>A</math>. So <math>[I X]</math> must be (3,4)-shaped and <math>X</math> must be (3,1)-shaped, as follows. <math>X</math> =
Since <math>A^{-} = [I X]P</math>, and <math>P</math> is square, then <math>[I X]</math> must have the same shape as <math>A^{-}</math>, which must be the transpose of the shape of <math>A</math>. So <math>[I X]</math> must be (3,4)-shaped and <math>X</math> must be (3,1)-shaped, as follows. <math>X</math> =
<math>
<math>
\left[ \begin{array} {c}
\left[ \begin{array} {c}
Line 268: Line 256:


And therefore <math>A^{-} = [I X]P</math> =
And therefore <math>A^{-} = [I X]P</math> =
<math>
<math>


Line 296: Line 282:




====A visual demonstration====
==== A visual demonstration ====
 
Bringing it back to our goal of optimizing tuning, in general we want to find <math>\min(‖𝒓L^{-1}Z^{-}‖_{\text{dual}(q)})</math>, so we want to find the <math>x</math> parameter of <math>Z^{-}</math> that allows us to do this.
Bringing it back to our goal of optimizing tuning, in general we want to find <math>\min(‖𝒓L^{-1}Z^{-}‖_{\text{dual}(q)})</math>, so we want to find the <math>x</math> parameter of <math>Z^{-}</math> that allows us to do this.


For convenience we'd like a way to refer to the <math>𝒓L^{-1}</math> part of the expression. Let's make that <math>𝒆</math>. For a 5-limit example, we have <math>𝒆</math> = {{map|<math>e_1</math> <math>e_2</math> <math>e_3</math>}}. So the inside of our dual norm, <math>𝒓L^{-1}Z^{-} = 𝒆Z^{-}</math> =
For convenience we'd like a way to refer to the <math>𝒓L^{-1}</math> part of the expression. Let's make that <math>𝒆</math>. For a 5-limit example, we have <math>𝒆</math> = {{map|<math>e_1</math> <math>e_2</math> <math>e_3</math>}}. So the inside of our dual norm, <math>𝒓L^{-1}Z^{-} = 𝒆Z^{-}</math> =
<math>
<math>


Line 457: Line 440:
(If anyone can explain what the sliding back and forth of these <math>e_i</math> values has to do with log integer limit, by the way, that would be wonderful. But perhaps that relationship is wildly indirect at this point, so thre's no intuitive link between them.)
(If anyone can explain what the sliding back and forth of these <math>e_i</math> values has to do with log integer limit, by the way, that would be wonderful. But perhaps that relationship is wildly indirect at this point, so thre's no intuitive link between them.)


====Solving for the centering constant====
==== Solving for the centering constant ====
 
Back on our <math>e_1</math> and <math>e_3</math> example, both points are equidistant from the origin, so we have:
Back on our <math>e_1</math> and <math>e_3</math> example, both points are equidistant from the origin, so we have:


Line 519: Line 501:
And equipped with that, we can just plug this in for <math>Z^{-}</math> anywhere we'd normally have our dual norm's matrices. It's just that unlike most cases, the same parameters that affect <math>𝒓</math>, that is, <math>𝒈</math>, also affect <math>Z^{-}</math>.
And equipped with that, we can just plug this in for <math>Z^{-}</math> anywhere we'd normally have our dual norm's matrices. It's just that unlike most cases, the same parameters that affect <math>𝒓</math>, that is, <math>𝒈</math>, also affect <math>Z^{-}</math>.


====Exact solutions====
==== Exact solutions ====
 
Mike's tipweil.py approach essentially shortcuts the whole optimal left-inverse process, going straight to the an exact solution. But what if we still want to find an exact solution. Will we find that using the <math>Z^{-}</math> as we've understood it here, in the coinciding-damage method, will give the same results?
Mike's tipweil.py approach essentially shortcuts the whole optimal left-inverse process, going straight to the an exact solution. But what if we still want to find an exact solution. Will we find that using the <math>Z^{-}</math> as we've understood it here, in the coinciding-damage method, will give the same results?


Line 581: Line 562:


It seems like there's a pretty good chance this works out to the same thing. Until it's totally worked out, though, we haven't really answered how it works to find an exact solution. And my first attempts to do this have failed, unfortunately. But I probably just goofed somewhere. And have run out of time for now. Best of luck to the interested reader!
It seems like there's a pretty good chance this works out to the same thing. Until it's totally worked out, though, we haven't really answered how it works to find an exact solution. And my first attempts to do this have failed, unfortunately. But I probably just goofed somewhere. And have run out of time for now. Best of luck to the interested reader!


[[Category:Math]]
[[Category:Math]]
[[Category:Pages with proofs]]