Talk:Defactoring algorithms

Readability and conclusion

This article is way too hard to read than necessary. It looks more like a note rather than an article, with so much development information that's better off collected in a dedicated "history" section. The algorithms should be presented much much more concisely.

The algorithms are currently presented in Mathematica, which isn't quite easy to decode. I think it should be presented in math formula or pseudocode.

The conclusion seems pretty subjective. I'm not sure "Hermite decomposition" can be treated as a single operation. It's (perhaps superficially) the case in Mathematica, but not in other programming languages or math in general.

~~I wonder if the original author(s) of this article will do me a favor to allow me to clean it up. Otherwise I'll have to start anew.~~

FloraC (talk) 10:09, 4 February 2023 (UTC)

Aw, dang. I'm sorry that it was hard for you to read. It feels like so long since I thought about this topic. I think it's good for it to have a new pair of eyes on it, to help balance out my own idiosyncrasies. Please feel free to take a pass on everything, improving the article how you see fit. If we can come to a consensus on how best to present it, that will be best for the community. I only wish for anyone who wants to understand this stuff to have good resources for doing so. I'm glad you're interested in the same stuff! --Cmloegcmluin (talk) 17:32, 4 February 2023 (UTC)

Thank you for giving me the green light. FloraC (talk) 19:20, 4 February 2023 (UTC)

Well I changed my mind. I decided to leave the contents of this dev-note- and research-style page as is, and start anew at Saturation, torsion, and contorsion/Methods, since what I actually want is an encyclopedic article that is suitable for reference. I'm taking the liberty instead of moving this page to something like Research on defactoring algorithms. FloraC (talk) 09:35, 9 February 2023 (UTC)

Hm. Well, as I said before, I'd really rather you not start anew, for the benefit of the community. Could we please first at least try to find a way together to make this article work in your eyes, then? (Note: by the end of this reply, I come to see things closer to the way you see them, but I'd like you to see my thought process.)

I have no problem with presenting algorithms in pseudocode. Feel free to make that change.

I certainly agree that the "Other defactoring methods" section is dev note and research related. Dave himself has recommended I leave out. So I can take that part out, no problem.

As for the main "Defactoring methods" section, however, it seems we have a decision to make. We can either:

1) consider column Hermite defactoring to be the only relevant method, and thus the info on the other two is dev notes,

2) consider all three methods to be relevant, and thus none of them are dev notes.

Personally, I think that it's clear that column Hermite defactoring is the only relevant method, and so the information on the other two should be taken out.

(Why did Dave and I include them in the first place? As you can probably understand, and maybe you also remember: we were introducing our defactoring findings as part of a greater effort, including terminological reform, and a questioning of the usefulness of the EA branch of RTT. So we felt it was important to err on the side of caution, carefully providing all of our research, evidence, proofs, etc. At this point 15 months later, however, it does definitely seem a bit overkill.)

However, I'm not certain you agree that column Hermite defactoring is superior, because you say: "The conclusion seems pretty subjective." I don't know which "conclusion" you're referring to, but my best guess is that you are referring to the statement, "Column Hermite defactoring is arguably the best defactoring algorithm". However, I don't understand how the rest of what you say on that pertains to this. You question whether Hermite decomposition is a "single operation". I don't recall stating in the article that it is a single operation. And I don't know or care whether it is. That doesn't seem relevant to the argument.

* Comparing column Hermite defactoring with the Pernet-Stein method, it's irrelevant because they have the same steps except Pernet-Stein has an extra step.

* Comparing with Smith defactoring it's irrelevant, because Smith defactoring's Smith decomposition is equivalent to repeated Hermite decomposition, as we explain here: "...the nature of Smith decomposition is to essentially repeatedly Hermite decompose from either angle until you've achieved Smith normal form, which is wasteful and unnecessary in this context, when all that is required is a single column-Hermite decomposition. This helps explain why column Hermite defactoring is more performant in general than Smith defactoring, when code is run against thousands of examples at a time."

But perhaps you have a different understanding of the relationship between the Hermite and Smith decompositions that you could explain for me. I do not consider myself an expert from the math side of things. My conclusions, as you know, are based on a bit of math research and a lot of empirical testing with code.

I do appreciate that I have a tendency to write articles on this wiki in a style that is less like an encyclopedia and more like a beginner's textbook. Of course, Wikipedia is an encyclopedia, per the "pedia" part, but in general, wikis don't have to be encyclopedic in style. I rather like how this article starts by getting the reader comfortable with the idea of enfactored mappings and comma bases. And I think it's good to include the brief "Finding the greatest factor" section at the end, too.

Anyway, now as I go over the "New development: column Hermite defactoring" section, even if we decided this was the only method worth keeping here, I can see that you might find a lot of that information tedious, too: the "How/why it works" and "By hand" sections are both way more in my textbook style, less in your encyclopedic style. So, perhaps we should provide two separate resources, since there may be two different learning types which shouldn't be compromised together. That said:

* While it's fine with me if you think this article's title is misleading and needs to be changed, I don't think "Research on defactoring algorithms" is quite right. It's more than only that, particularly on account of the introduction for beginners, tutorials by example, proofs, etc. Perhaps it's more of a "Guide to defactoring". (Though I'm sorry it was not very helpful to you as a guide, that is certainly its intention, given its structure, style, and contents.)

* I recommend that you only need "Saturation, torsion, and contorsion/Method". Singular. That is, I think you only need to document column Hermite defactoring if you are doing a separate encyclopedic type article.

--Cmloegcmluin (talk) 17:13, 9 February 2023 (UTC)

I'm afraid your conclusion (that the column Hermite defactoring was the superior method) was critically affected by Mathematica, the tool of your choice, as it wraps "Hermite decomposition" as a single operation where you simultaneously get both the Hermite normal form and the unimodular matrix. In general, this isn't the case. For example, in SymPy, you have direct access to the function HNF: A → H. You gotta obtain the unimodular matrix U = HA⁺ from UA = H. So using the unimodular matrix involves three steps:

find H = HNF (A);
find the pseudoinverse A⁺;
multiply them.

By contrast, the Pernet-Stein method only needs H i.e. the first step at this point. It could be simplified a bit in an algorithm that's designed to return both, but the unimodular matrix is still something more than the HNF itself (see discussion at add igcdLLL to numbers by smichr · Pull Request #22845 · sympy/sympy). Your comparsion made it as if they were the same.

Then you actually use U^-1 instead of U, so to sum it up you're taking an inverse after taking a pseudoinverse, with multiplication involved also. That looks to me like there's a lot of back and forth in it.

That's why I chose to implement Pernet-Stein in my code:

import numpy as np
from scipy import linalg
from sympy import Matrix, normalforms

def __hnf_col (main):
    return np.flip (np.array (normalforms.hermite_normal_form (Matrix (np.flip (main))), dtype = int))

def __sat (main):
    return np.rint (linalg.inv (__hnf_col (main)) @ main).astype (int)

where __hnf is just a wrapper for A → H due to SymPy's unusual implementation as we know. Note it somehow removes the extra rows of zeros already.

So my finding is a 180-degree turn from yours, and I'll insist that Pernet-Stein be included. To be clear I never intended to remove any of these methods, but I just want every word to count. For example, in the first section I think one obvious enfactoring and one hidden enfactoring are enough to show the possibility as well as the difficulty of the heuristic method.

I accept the other suggestions and I'm dropping the idea of starting anew or changing the title.

FloraC (talk) 10:58, 10 February 2023 (UTC)

Ah, very interesting. Thank you for this. I've reviewed that GitHub thread, in particular https://github.com/sympy/sympy/pull/22845#issuecomment-1013936367, "It looks like [Cohen's] Alg 2.4.10 is almost exactly the same [as Alg 2.4.5] except that it also computes the unimodular matrix." Cohen's 2.4.5 is what's implemented in sympy here: https://github.com/sympy/sympy/blob/99b4abb6313990136cde93735f552a6a98152fc6/sympy/polys/matrices/normalforms.py#L177-L248 . Cohen's work can be found here: http://tomlr.free.fr/Math%E9matiques/Math%20Complete/Number%20theory/A%20course%20in%20computational%20algebraic%20number%20theory%20-%20Cohen%20H..pdf (2.4.5 is on page 69, and 2.4.10 is on page 74). So, yes, I can now see that it is possible to compute HNF without also computing the unimodular matrix. However, the unimodular matrix comes essentially for free; it's nothing but the same operations the algorithm chooses to apply to the original matrix, but also applied to an identity matrix. The main effort of the algorithm is choosing those unimodular transformations; tracking them is almost zero cost, computationally.

So I think the article should be amended to reflect this fact. Sorry I missed it.

However, in order to prove that Pernet-Stein is more efficient than column Hermite defactoring, you'd have to prove that its extra matrix multiplication step at the end is cheaper than just tracking the unimodular transformations applied to the original matrix. Intuitively, I doubt this is the case. Maybe they'd be almost the same. But empirically testing this would be more work than I'm willing to invest. So it's fine by me if you want to include both methods for now.

I don't particularly care about "beating" Pernet and Stein. I only wanted to simplify things for our readership if I could. Maybe one day we could settle it. It would be a big win if we only had to teach them one method.

And regardless to the outcome of possible empirical performance testing, assuming the results will be very very similar whichever way that goes, I think we should teach the one which is the most straightforward to understand why/how it works. I make an argument in the article for that being column Hermite defactoring. You are welcome to write a counterargument for Pernet-Stein if you see it differently, and believe that some readers will find its process more obvious and illuminating.

In short, if we can't agree on one algorithm, that's alright. It's not necessarily a strict loss. We can still find a different sort of mutual strength through our different approaches to the problem.

As for this part of what you said, though, I'm sorry, but I can't figure out what you're referring to: "Then you actually use U^-1 instead of U, so to sum it up you're taking an inverse after taking a pseudoinverse, with multiplication involved also. That looks to me like there's a lot of back and forth in it." Can you explain another way? Perhaps it may be helpful to reference this set of flow charts I prepared: https://en.xen.wiki/w/File:Comparison_of_defactoring_algorithms.png --Cmloegcmluin (talk) 17:37, 10 February 2023 (UTC)

I hope you've actively tried to understand what I said, cuz I believe I was presenting it clearly enough. I'm summarizing them as lists here since I apparently can't edit the figure. The Pernet-Stein method does these things for a mapping V:

find the transpose A = V^T;
find the Hermite normal form H = HNF (A);
take the first r rows H_1:r;
find the transpose H_1:r^T;
find the inverse (H_1:r^T)^-1;
multiply it by the original matrix and return.

You should see it involves an HNF, two transposes, an inverse, a multiplication, and a slice, as you noted. In comparison, the column Hermite method does these things:

find the transpose A = V^T;
find the Hermite normal form H = HNF (A);
find the pseudoinverse A⁺;
multiply them for the unimodular matrix U = HA⁺;
find the inverse U^-1;
find the transpose (U^-1)^T;
take the first r rows and return.

You should see it involves an HNF, two transposes, an inverse, a pseudoinverse, a multiplication, and a slice. So to sum it up it has an additional pseudoinverse step compared to Pernet-Stein. Note I unwrapped the "Hermite decomposition" into an HNF, a pseudoinverse, and a multiplication, shown in the steps 2–4, cuz that's what happens in general. After that you immediately take the inverse as in step 5. A pseudoinverse (step 3) followed by an inverse (step 5), that's what I meant by "back and forth".

Now, as I said, an algorithm designed to return both the HNF and the unimodular matrix could somewhat optimize it, but that's still not the same as HNF itself. Apparently you missed the point of the SymPy development thread I showed you – they said asking for the unimodular matrix would slow it much down. Moreover you can't rely on the implementation of such an optimized algorithm as I've seen many libraries that come with an HNF routine without also giving the unimodular matrix, such as the current stable version of SymPy, as well as Sage, another library based on Python. And I'm afraid expecting the user to implement the Cohen algorithm themselves is even less practical.

In fact, using the unimodular matrix with SymPy is more difficult than it seems, since the HNF routine removes the extra rows of zeros. I have to pad them back to the correct dimension, which I'm yet to implement, before doing the multiplication. I think it's more advisable to simply avoid the unimodular matrix and its quirks.

FloraC (talk) 21:43, 10 February 2023 (UTC)

I am actively trying to understand what you're saying. I'm sorry I missed the point you were trying to make with the GitHub thread. Given how long and obscure it was, I should have asked you to be more specific, rather than try to guess at your intentions. It looked to me, based on a quick look over their code and Cohen's work it was based on, that their algorithm worked in a way fairly similar to my existing understanding of how HNF is computed, whereby tracking unimodular transformations is cheap. It's quite possible that their algorithm looks superficially similar but is completely different. I am not interested enough in this problem to figure it out, though, at least not now. Sorry. So again, it's fine by me for now if you want to keep both algorithms.

But I'm still at a loss about how you've gotten the impression that column Hermite defactoring involves a pseudoinverse step. As you can see in the diagram I shared in my previous reply, it does not. The word "pseudoinverse" only appears once in the article, in a tangential footnote. On my end, "column Hermite defactoring" is a method which Dave Keenan and I developed about a year ago, and the only information about it exists on this wiki article. Are you confusing it with something else? These are the steps:

find the transpose A = V^T;
find the unimodular matrix U = First[HermiteDecomposition[A]];
find the inverse U^-1;
find the transpose (U^-1)^T;
take the first r rows and return.

You can find an example of us computing the entire thing by hand in the article. No pseudoinverse is required. --Cmloegcmluin (talk) 22:12, 10 February 2023 (UTC)

Oh, allow me to correct myself. After some trial and errors it turns out the pseudoinverse doesn't get you the unimodular matrix, but a singular matrix, so it doesn't work. The correct way seems to be using the augmented matrix to track the elementary row operations in HNF: (A|I) → (H|U), as you pointed out and as one of them showed in the development thread. Here's my implementation of the column Hermite method:

def __sat (main):
    r = Matrix (main).rank ()
    unimodular = __hnf_col (np.vstack ((main, np.eye (main.shape[1], dtype = int))))[r:]
    return np.rint (linalg.inv (unimodular)[:r]).astype (int)

As I said, "Hermite decomposition" isn't a single operation. Here it involves concatenation, HNF, and slice, tho it doesn't involve the pseudoinverse. Compare Pernet-Stein:

def __sat (main):
    r = Matrix (main).rank ()
    return np.rint (linalg.inv (__hnf_col (main)[:, :r]) @ main).astype (int)

Note the slice and thus computing the rank are redundant; they're there just for futureproofness.

FloraC (talk) 10:58, 11 February 2023 (UTC)

Edit: update the code to adopt column-style HNF so as to get rid of all the transpositions.

FloraC (talk) 07:08, 15 February 2023 (UTC)

Talk:Defactoring algorithms

Readability and conclusion

Navigation menu

Search