I have a large amount of data and need to calculate an equation of the line of best fit. This may sound simple but.... I need it of a very high order due to the fact that each data set i have has 3000 data points that dont correlate particularly closely. The line will have to be a complex polynomial with over 1000 roots I am prepared to have a computer do the actual number crunching but need some kind of idea about how to calculate the equation. Thanks in advance Pi
Is that the same as a linear regression line of best fit? If so, you can plot a scatter plot using Microsoft Excel and ask for the equation.
the linear regression line is a linear (y = mx+c) equation, i need a curved line, i tried the statistics program "Autograph" but that only works to order 6 I have consolidated my data now to around 200 key points if that makes it easier
If you need a curved line, then don't tick on linear, and you can still get it in excel. I use SAS but that involves writing code.
I did it in excel but excel only goes up to order 6 Please Register or Log in to view the hidden image! I am interested in this SAS thing....
Do you have academic access to SAS? If so, you can access their online training for code writing http://www.sas.com/technologies/analytics/statistics/stat/index.html
A polynomial with such high degree sounds like major overfitting. It will be memorising the data rather than giving you a few parameters to interpret. If you're sure that's what you want, though, look at polynomial regression. And if you don't have access to SAS, try R. It's open source. http://www.r-project.org/
Zephyr is spot-on here. It is very rare to need a model of thousands of terms, and when such a model is needed, millions of data points are needed rather than just a few thousand. The problem with overfitting is that an overfit model has very limited extrapolative capabilities. With even more overfitting, the resultant model has very little interpolative capabilities.
Polynomial fitting is the same as linear fitting. You just have to solve a bigger least squares problem. Fitting by a degree 1000 polynomial would give you 1001x1001 normal equation, which is nothing for computer. I would suggest you to use spline fitting or a low degree piecewise polynomial fitting, which are much more flexible and efficient.
Depending on what you are doing, it's possible that specifying too high an order in your fitting equation will actually reduce your accuracy. Very few real situations in my admittedly limited experience (ecology/biology) yield an improved fit past order 5.
If a polynomial fit of order ten or less does not do the job, you should not try a polynomial fit at all. Are you sure that your data is not random? Have you plotted it? If so, what does it look like? If you have not plotted it, you have skipped a critical step in developing an approximating function.
No form of polynomial fitting is appropriate, probably. You have a wave more than a function. Why not just use a Fourier Transform?