Defactoring algorithms: Difference between revisions
Cmloegcmluin (talk | contribs) →How/why it works: note about actual defactoring conditions |
Cmloegcmluin (talk | contribs) →How/why it works: add proof from Tom Price's advice |
||
| Line 179: | Line 179: | ||
==== The actual defactoring conditions ==== | ==== The actual defactoring conditions ==== | ||
The HNF is used out of convenience, but in fact the conditions necessary to achieve defactoring are less strict. This was pointed out by Tom Price when he explained his defactoring method. The conditions for attaining the HNF are strict enough to enforce uniqueness, in particular the condition that entries in pivot columns must be between 0 and the pivot. This part is not relevant to defactoring. The only thing that is critical is that all-zero columns are created such that all the information in the matrix has been compressed into a single square submatrix, i.e. into a single non-zero minor determinant, and of course that this transformation has been achieved using only unimodular operations, so that the image of the matrix remains the same. So fitting the information from a rectangle into a square is the target, although in practice this is usually only possible by getting the information into echelon form. However, no method has been invented in mathematics that finds matrices like this; i.e. while it seems like it might be strictly easier to do this, to do the same stuff but stop earlier, it's not actually as simple as that or it would already exist. If such a thing did exist it would have massive ramifications for linear algebra in a broader sense, because it could be used for more basic things like finding null-space bases. And so if such a thing were possible to develop, it is probably very challenging to create. The Hermite decomposition is just a practical and commonly available resource for attaining the state we need to defactor, even if it is slightly overkill. And we need the HNF otherwise to achieve canonical form, so pedagogically speaking, we may as well just explain it and that be it; why explain a second method for being simpler, if it's simplest to just use the one method. The specific conditions for HNF are not firmly established and vary slightly from application to application, but one thing they do share is a level of strictness to enforce uniqueness. But Tom's square-only conditions are not that strict; there are many unimodular decompositions one can find for any given matrix that will lead you to the defactored result. So, Tom's conditions, which are the true conditions for defactoring, are not technically speaking HNF conditions. And so column Hermite defactoring can not truly be said to be named after a thing that is fundamental to its function, but merely a present-day practicality in implementing it. But we find this acceptable (and haven't found a better alternative). | The HNF is used out of convenience, but in fact the conditions necessary to achieve defactoring are less strict. This was pointed out by Tom Price when he explained his defactoring method. | ||
The conditions for attaining the HNF are strict enough to enforce uniqueness, in particular the condition that entries in pivot columns must be between 0 and the pivot. This part is not relevant to defactoring. The only thing that is critical is that all-zero columns are created such that all the information in the matrix has been compressed into a single square submatrix, i.e. into a single non-zero minor determinant, and of course that this transformation has been achieved using only unimodular operations, so that the image of the matrix remains the same. | |||
So fitting the information from a rectangle into a square is the target, although in practice this is usually only possible by getting the information into echelon form. | |||
However, no method has been invented in mathematics that finds matrices like this; i.e. while it seems like it might be strictly easier to do this, to do the same stuff but stop earlier, it's not actually as simple as that or it would already exist. If such a thing did exist it would have massive ramifications for linear algebra in a broader sense, because it could be used for more basic things like finding null-space bases. And so if such a thing were possible to develop, it is probably very challenging to create. The Hermite decomposition is just a practical and commonly available resource for attaining the state we need to defactor, even if it is slightly overkill. | |||
And we need the HNF otherwise to achieve canonical form, so pedagogically speaking, we may as well just explain it and that be it; why explain a second method for being simpler, if it's simplest to just use the one method. | |||
The specific conditions for HNF are not firmly established and vary slightly from application to application, but one thing they do share is a level of strictness to enforce uniqueness. But Tom's square-only conditions are not that strict; there are many unimodular decompositions one can find for any given matrix that will lead you to the defactored result. So, Tom's conditions, which are the true conditions for defactoring, are not technically speaking HNF conditions. And so column Hermite defactoring can not truly be said to be named after a thing that is fundamental to its function, but merely a present-day practicality in implementing it. But we find this acceptable (and haven't found a better alternative). | |||
==== Proof of why column Hermite defactoring (and Pernet-Stein defactoring) work ==== | |||
The following proof is adapted primarily from Tom Price's thinking: | |||
# The input matrix is an m×n matrix A. | |||
# It decomposes into a slightly bigger and square (n×n) unimodular matrix U and another m×n matrix which is not exactly A in HNF (because we only have to use unimodular operations so far as to get all the all-zero columns off to one side of A; we don't need to satisfy all of the conventional constraints of HNF), but we'll still call it H. The unimodular matrix is a transformation from A into H, so, AU = H. | |||
# If we were to actually slice off the all-zero cols we've isolated in H, we'd end up with a slightly smaller and square (m×m) matrix. So let's call this little square matrix S. | |||
# We can left-multiply both sides of our equation by the inverse of S (S⁻¹) and right-multiply both sides of our equation by the inverse of U (U⁻¹) to get S⁻¹AUU⁻¹ = S⁻¹HU⁻¹. The U's cancel out on the left so we end up with S⁻¹A = S⁻¹HU⁻¹. At first glance we don't seem to have gained any further insight. But there's more we can do from here. | |||
# Because H is just S with a bunch of 0 cols appended, S⁻¹H is just the identity matrix with a bunch of zero columns appended, in other words it is a truncated identity matrix. We could call that T, and now we have S⁻¹A = TU⁻¹. | |||
# Multiplying U⁻¹ on the left by a truncated identity matrix is the same as truncating U⁻¹. That's how we think of the output of column Hermite defactoring — our supposedly defactored matrix — so let's call that D. We now have S⁻¹A = D. (This is how we can see that the Pernet-Stein method of multiplying the input matrix by a transformation matrix that is a truncated and inversed column Hermite normal form of the input is equivalent to our column Hermite method, which takes the other route to the same result: inverting and truncating the unimodular result of the Hermite decomposition.) | |||
# We need to prove now that D has three qualities: a) it's defactored, b) it still represents the same temperament (i.e. it has the same null-space as A), and c) it's integer. | |||
# Proving (a) is easy. It's defactored because U was unimodular. U's determinant was 1, and neither inverting it nor truncating it would change that. Alternatively, we can prove this by showing how on the other side of the equation, S⁻¹A is surjective as a function on lattice points (in other words, there's no points in the tempered lattice that JI lattice points don't map to). We begin with the fact that H has the same image as A, because right-multiplication with a unimodular matrix such as U doesn't change the image. Then S has the same image as H, too, and therefore the same image as A, because removing the all-zero columns doesn't change the image either. Now that we've established this, we can assert that S⁻¹A is surjective by describing a lattice point x such that y = S⁻¹Ax for any given lattice point y. And because S and A have the same image, we know that Sy = Ax, and therefore y = S⁻¹Ax. | |||
# Proving (b) is even easier. Multiplying any matrix with an invertible matrix on the left keeps the null-space the same. S⁻¹ is clearly invertible, being itself the inverse of S. A way to understand this is: a non-invertible matrix is the same as a singular matrix, i.e. one whose determinant is 0. So as long as you don't wipe things out by essentially multiplying by 0, the null-space information is preserved, just scaled. | |||
# Proving (c) is a bit trickier, because S⁻¹ is not necessarily an integer matrix. But can show that S⁻¹A is an integer matrix by showing that it maps lattice points to lattice points. Suppose we have that same equation from the proof of (a), namely that y = S⁻¹Ax, where x is a lattice point. We want to show that y is a lattice point. Again, since A and S have the same image (when considered as functions on lattice points), there must be some lattice point z with Sz = Ax. But we also know that Sy = Ax. Since S is invertible, and therefore injective, y = z, so y is a lattice point. | |||
=== Various additional ways of thinking about how/why it works === | === Various additional ways of thinking about how/why it works === | ||