I am interested in applications of algebraic geometry to machine learning. I have found some papers and books, mainly by Bernd Sturmfels on algebraic statistics and machine learning. However, all this seems to be only applicable to rather low dimensional toy problems. Is this impression correct? Is there something like computational algebraic machine learning that has practical value for real world problems, may be even very high dimensional problems, like computer vision?

2$\begingroup$ honglangwang.wordpress.com/as $\endgroup$– Mirco A. MannucciMar 19 '16 at 21:57

1$\begingroup$ books.google.com/books/about/… $\endgroup$– Steve HuntsmanMar 19 '16 at 23:34

$\begingroup$ I'm not an expert but I guess that lowrank matrix completion may provide an example. $\endgroup$– user62562Mar 20 '16 at 9:42
One useful remark is that dimension reduction is a critical problem in data science for which there are a variety of useful approaches. It is important because a great many good machine learning algorithms have complexity which depends on the number of parameters used to describe the data (sometimes exponentially!), so reducing the dimension can turn an impractical algorithm into a practical one.
This has two implications for your question. First, if you invent a cool new algorithm then don't worry too much about the dimension of the data at the outset  practitioners already have a bag of tricks for dealing with it (e.g. JohnsonLindenstrauss embeddings, principal component analysis, various sorts of regularization). Second, it seems to me that dimension reduction is itself an area where more sophisticated geometric techniques could be brought to bear  many of the existing algorithms already have a geometric flavor.
That said, there are a lot of barriers to entry for new machine learning algorithms at this point. One problem is that the marginal benefit of a great answer over a good answer is not very high (and sometimes even negative!) unless the data has been studied to death. The marginal gains are much higher for feature extraction, which is really more of a domainspecific problem than a math problem. Another problem is that many new algorithms which draw upon sophisticated mathematics wind up answering questions that nobody was really asking, often because they were developed by mathematicians who tend to focus more on techniques than applications. Topological data analysis is a typical example of this: it has generated a lot of excitement among mathematicians but most data scientists have never heard of it, and the reason is simply that higher order topological structures have not so far been found to be relevant to practical inference / classification problems.

$\begingroup$ ...and the reason is simply that higher order topological structures have not so far been found to be relevant to practical inference / classification problems. Persistence cohomology is quite a thing at the first glance in cosmology as well as neural data due to its scalingresistant. But the more I delved into it I agree with your comment more that it just formally involve in the data. $\endgroup$– Henry.LMar 14 '17 at 2:54

$\begingroup$ The same problem also occur in some of Sturmfel's students' works, they seem to use tropical geometry as an optimization tool instead of an explanatory framework when it comes to real data. Maybe I am wrong, but what I think AG should provide is a theoretical framework underlying the data... $\endgroup$– Henry.LMar 14 '17 at 2:57
Check websites of some of the former students/postdocs of Bernd Sturmfels: Pablo Parillo at MIT and Rekha Thomas at University of Washington. They tackle real (meaning relatively big) problems with their algebraic geometry insight. This is mostly related to semidefinite programming and related problems. You can check Pablo's MIT course to get some idea: http://ocw.mit.edu/courses/electricalengineeringandcomputerscience/6972algebraictechniquesandsemidefiniteoptimizationspring2006/
There is also a huge line of research on computational algebraic geometry. You can start visiting the website of Jon Hauenstein http://www3.nd.edu/~jhauenst/ They work with big industrial projects.
Algebraic geometry is also used in the topological data analysis which is now becoming a big thing in the data science.

$\begingroup$ Can you elaborate a bit on how the techniques you cited (Lassarre/Parrilo hierrachy, numerical ag) relates to the data science question above? $\endgroup$– alpxApr 5 '20 at 3:52

$\begingroup$ To give you a quick answer I will simply link this highly influential paper in which algebraic varieties are featured already in the abstract: projecteuclid.org/download/pdfview_1/euclid.aos/1351602527 $\endgroup$ Apr 6 '20 at 5:18

$\begingroup$ Thanks. I understand you may not have time under current pandemic, but I think it's not considered appropriate to link a paper without explanation in this forum. It would be great if you could link some papers accompanied with short explanations on how these papers apply the algebraic geometry techniques to data science. You can consider updating your answer with these precise references for instance. Thanks again. $\endgroup$– alpxApr 7 '20 at 10:16
I'm going to assume that by 'high dimensional problems' you specifically mean learning speech recognition and image recognition. As such, I'm afraid that the answer seems to be 'sadly no'. Real high dimensional problems are almost always best solved by the use of what is called neural nets and specifically 'deep learning' which amounts to creating neural nets with many layers that will effectively predict. I'm not personally aware of any approach that comes close to DL for these kinds of problems. I might add that I teach data mining, so my impression is somewhat stronger than just a casual impression. There are two main issues with high dimensional data and neural nets. The first is called feature selection. Basically, the variables one is given are likely not the variables one wants. In addition there is the question of how the nodes in the various layers should be connected. The basic flow chart of a neural net can be encompassed in a directed acyclic graph i.e. a DAG. One could fantasize that somehow choosing the correct DAG would amount to an algebraic geometry problem along the lines of the uses of algebraic geometry in phylogenetics. But that is just a fantasy as far as I know.

$\begingroup$ a question : I'm not sure of what "algebraic geometry" means, but until now, I have the feeling that the problem of finding something better than the (tuned) gradient descent for optimizing the parameters of a neural net is mostly unsolved, and the question of "what could (in general) be better than the gradient descent" is itself a quite deep conceptual/geometrical/AI problem, no ? $\endgroup$– reunsMar 20 '16 at 22:01

$\begingroup$ @ user19... . Indeed, to some extent deep learning is necessary because gradient descent tends to systematically fail in fully connected multilayered nets. Still, I think any solution would have to be called numerical analysis as I think it's highly unlikely an exact solution can/(will ever) be found. I think it would be awesome if somehow deep mathematics that was previously considered 'pure' turned up in a solution. I certainly don't know of any such solutions at the moment. (continued) $\endgroup$– mehMar 21 '16 at 14:05

$\begingroup$ @ user 19... There are a lot of very interesting mathematical questions associated with nets. How about "probabilistically, why do neural nets work ? Or  " how does the choice of the error function effect the probabilities of the (tacit) distribution function". I can think of others. Much like my comment about DAG's , one can fantasize. Meanwhile, in the real world, I got yelled at for calling Google's TensorFlow, a numerical linear algebra library. Not that there's anything wrong with numerical linear algebra :) . $\endgroup$– mehMar 21 '16 at 14:08
This article by Sylvain Petitjean is worth reading:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.57.4923&rep=rep1&type=pdf
It may partially answer your question, or it may not.
An interesting question is to what extent any such method is feasible for large data sets (scalability) . Unfortunately I do not know the answer, though I suspect that something can be done to address it.