Home | Python |     Share This Page
PolyRegress

A Python polynomial regression analysis application

P. Lutus Message Page

Copyright © 2010, P. Lutus

Discussion | Program Details | Licensing, Source
Revision History | Program Listing

(double-click any word to see its definition)



The PolyRegress program in operation
(click image to switch between matplotlib backend renderers:
FigureCanvasGTK / FigureCanvasGTKAgg)
Discussion

PolyRegress is a polynomial regression application — it produces polynomial coefficients meant to model user-entered data points to varying degrees of accuracy. Here is the online version of the same program written in Java, and here is a technical description of the methods used. PolyRegress is a useful program in its own right, but it is just meant to show some Python programming methods, in particular those having to do with supporting and interacting with a graphical user interface (GUI).

There are many GUI support environments — GTK, Qt and others. Each of them has graphic design utilities, programs that help you design the actual GUI. For this project I used the GTK toolkit and Glade 3.6, a user interface designer that supports GTK. It's important to say that Glade creates an XML output specification file, which means the result is not language-specific.

Once a GUI has been designed, because Glade's output is in the form of an XML file, it can be used in many environments and many languages. Each supported language has a module or library that is able ot support the Glade XML database specification.

One goal of this project was to use as many existing libraries and modules as possible — avoid writing all the code explicitly as I have done in the past. To achieve this goal, I used SciPy for the numerical and scientific functions, and matplotlib for plotting. The matplotLib plotting display can be seamlessly integrated into an application's GUI, a nice feature. Because PolyRegress requires these libraries, users who plan to run this program will need to download the required Python modules with the usual package management utilities.

I designed the program to support features most often inquired about in online discussions of GUI design, for example it should respond to keyboard and mouse inputs, dynamically update its display in response to user inputs, things like that. This allows the program listing to serve as a convenient resources and example for various popular design techniques.

Initially a set of sample data is placed in the text editing window of the "Data" tab. This allows the program to show a classic polynomial regression immediately. Users can paste any data pairs into the "Data" tab, with any sequential formatting — examples:

  • 1 2 2 4 3 6 4 8 5 10 ...
  • 1, 2, 2, 4, 3, 6, 4, 8, 5, 10 ...
  • [[1,2],[2,4],[3,6],[4,8],[5,10] ...]
  • {1 2 | 2 4 | 3 6 | 4 8 | 5 10 ...}
  • Data on individual lines:
    • 1, 2
    • 2, 4
    • 3, 6
    • 4, 8
    • 5,10

— and the program will successfully read and process the data. The only requirement is that the data be serially paired, i.e, x1, y1, x2, y2, x3, y3 ... Users can also simply type in numbers just to see what happens. In this case and in general, whenever there is an odd number of numerical entries (i.e. an x without a paired y or the reverse), the program will show an error message instead of a graph.

The PolyRegress interface has three tabs, one for data entries, the next for plotting, and a third that lists the polynomial coefficients written as a mathematical function and the accuracy of the fit. In a practical application, the user would enter data in the data tab, use the arrow buttons (paired with the arrow keyboard keys) to choose the most desirable polynomial degree, and then copy the polynomial coefficients from the "function" window.

The plot display shows the user's x,y data pairs in red, plus the current polynomial regression function superimposed in blue. If the user moves the mouse cursor across the plot, a flyout "tooltip" text will appear with the present cursor coordinates and an explanatory note. This allows the user to identify specific data points as well as estimate the polynomial function's maxima and minima.

Program Details

GTK and Glade

For Python, the required Glade support module is "gtk" (line 28 below). For the scientific content, the scipy module is required (line 25). I use the Pango module (line 25) to specify a monospaced font for the data entry and results windows (lines 116-118). I use the matplotlib module and its various components (lines 26-27) to support plotting.

One of the critical aspects ot GUI design is to link the GUI interface's controls and events to the Python program's functions, so when a user presses a button, the Python program will carry out the desired action. The general procedure is:

  • In Glade, the GUI interface designer, create a button:

  • Give the button a name. Let's say we chose the name "my_button".

  • Save the Glade project file: "my_project.glade"

  • Create a Python source file and load the gtk module (lines 28 below).

  • Use "gtkBuilder" functions to specify the required glade interface database file and populate a local database (lines 93-94 below).

  • Create a function that will allow your class's "self" reference to double as a list containing the GTK widgets (lines 85-86 below). This is a very convenient utility function — it simplifies access to the various widgets specified in Glade.

  • Connect your button's "clicked" event to a Python function, for example line 101 from the program listing below:

    self['button_right'].connect('clicked',lambda w: self.adjust_poly(1))
              

    Notice "self['button_right']" in the above expression. This shows the beneficial effect of the function (lines 85-86 below) in simplifying access to the Glade XML database — it essentially defines the class as accessible in the same way that a list is accessible, using the same syntax.

  • We now have a button in a GTK GUI, and a Python function that will run whenever the button is pressed.

In some cases, a particular widget is assigned to a local variable instead of being accessed using the above list syntax (example line 95). This is done to speed things up and/or make program writing more convenient for lazy programmers.

Keyboard access

I wanted to use the keyboard to control some program functions, so I connected "key-press-event" from the main window to a local function (line 108). The associated function "keyboard_event()" gets the source widget and the event as arguments. The keyboard event includes the identity of the key that was pressed, and my code associates this with a list of keys I am interested in (lines 110-112).

Each entry in the list has a string identifying a key, and an associated function. My function tests keys to see if they are present in the list, and if so, executes the associated function (lines 144-148). Notice about this function that, if a match is found and there is a local action, the function returns True, which prevents that key from having any other effect in the program. I did this in order to be able to control use of the Tab key, which ordinarily shifts program focus to each widget in a sequence.

Mouse access

In much the same way, I connect mouse motions to a local function (Line 123). In this case I am using a special "connect" function provided by matplotlib that provides some numbers I am especially interested in. But the basic idea is that you connect mouse motions or actions for a specific widget to a local function. By doing this, you get mouse coordinates with respect to that widget, not the application as a whole.

My mouse-motion function (lines 127-138) acquires and displays the mouse's position over the matplotlib-generated plot, expressed in that plot's scale units. This makes it possible to orient oneself in the plot and identify specific data points. If I had instead connected to the plot's parent GTK widget, I would have gotten mouse coordinates expresssed in that widget's screen dimensions, rather than the plot's dimensions. I would have had to convert from one to the other — not much fun.

The mouse motion tracker function dynamically updates the parent GTK widget's "tooltip" text (line 137), the flyout message that appears when one hovers the mouse cursor over a widget. This allows the user to see the mouse coordinates almost on top of the mouse cursor's present position.

The actual math

Computing a polynomial regression is a bit more complex than a typical mathematical operation as I explain here. But SciPy contains all the infrastructure required to perform this and many other advanced mathematical operations, and SciPy is a Python module — no fuss, just import it and use it.

PolyRegress takes the text content from the user data window and converts it into two arrays for the x and y data points repectively (lines 160-185). Then some SciPy functions compute polynomial coefficents and associated values based on the user data and the user's choice of polynomial degree (lines 206-218). Finally, the resulting polynomial function's curve is drawn into the plot in blue (lines 219-226).

Plotting and rendering

Compared to the many programs I have written to render graphics for the past 35 years, matplotlib saves a tremendous amount of time. Its plot display can be integrated into your application, where it cooperates by rescaling itself when you rescale the application window. But the programmer must choose one of a number of matplotlib backend renderers, and they aren't created equal. To see the difference, go back up to the top of this page and click the image of the PolyRegress application. Clicking the graphic will switch between images taken with two different backend renderers (FigureCanvasGTKAgg and FigureCanvasGTK). It turns out the FigureCanvasGTK backend doesn't suppport antialiasing, as a result of which it looks terrible. It took me quite a while to sort this out.

Event-handling strategy

There are many online Glade/GTK code examples that require the programmer to specify in the Glade designer which events he wants to expose and forward to the application. This means the programmer must edit the Glade interface specification, then write Python code in parallel, keeping all the names straight, for each event. In this program I simply connect widgets and their GTK events to the intended local functions, which means I don't have to specify events in the Glade designer. This is much more efficient.

Lambda functions

Lambda functions are small, anonymous functions that are very useful. They can be used to associate variable names with functions in much the same way that a variable name is associated with a value. In this program I use them for things not easily managed in other ways. I use lambda functions to specify a function with an argument in a way that normally would cause the function to be executed (lines 111,112) rather than referenced. In another case, for convenience I define a one-argument function that is actually more complex than it appears (line 207).

Licensing, Source

Revision History

  • Version 1.2 12/15/2010. Made a few improvements, added a program icon in XPM format within the program listing.
  • Version 1.1 12/09/2010. Changed from libglade to GTKBuilder library.
  • Version 1.0 12/01/2010. Initial Public Release.

Program Listing

  1: #!/usr/bin/env python
  2: # -*- coding: utf-8 -*-
  3: 
  4: # Version 1.2 12/15/2010
  5: 
  6: # ***************************************************************************
  7: # *   Copyright (C) 2010, Paul Lutus                                        *
  8: # *                                                                         *
  9: # *   This program is free software; you can redistribute it and/or modify  *
 10: # *   it under the terms of the GNU General Public License as published by  *
 11: # *   the Free Software Foundation; either version 2 of the License, or     *
 12: # *   (at your option) any later version.                                   *
 13: # *                                                                         *
 14: # *   This program is distributed in the hope that it will be useful,       *
 15: # *   but WITHOUT ANY WARRANTY; without even the implied warranty of        *
 16: # *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the         *
 17: # *   GNU General Public License for more details.                          *
 18: # *                                                                         *
 19: # *   You should have received a copy of the GNU General Public License     *
 20: # *   along with this program; if not, write to the                         *
 21: # *   Free Software Foundation, Inc.,                                       *
 22: # *   59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.             *
 23: # ***************************************************************************
 24: 
 25: import re, scipy, pango
 26: import matplotlib, matplotlib.pyplot
 27: from matplotlib.backends.backend_gtkagg import FigureCanvasGTKAgg as Canvas
 28: import gtk
 29: 
 30: class Icon:
 31:   icon = [
 32:     "32 32 17 1",
 33:     "   c None",
 34:     ".  c #000000",
 35:     "+  c #070707",
 36:     "@  c #0C0C0C",
 37:     "#  c #141414",
 38:     "$  c #1D1D1D",
 39:     "%  c #222222",
 40:     "&  c #262626",
 41:     "*  c #2C2C2C",
 42:     "=  c #343434",
 43:     "-  c #3C3C3C",
 44:     ";  c #444444",
 45:     ">  c #4B4B4B",
 46:     ",  c #595959",
 47:     "'  c #646464",
 48:     ")  c #6D6D6D",
 49:     "!  c #808080",
 50:     "                                ",
 51:     "                                ",
 52:     "       =%@...................#  ",
 53:     "      ->>>>>>;;;----====***&$.  ",
 54:     "     *>;;----=====****&&&%%$..  ",
 55:     "    #;-----=====****&&&&%%#...  ",
 56:     "   .*#+.....%.......#%........  ",
 57:     "   +.      +$      .#$.         ",
 58:     "  ..       ##      .$#.         ",
 59:     "  .       .$@      .@+.         ",
 60:     "          .%+      .#+.         ",
 61:     "          .*.      +#@.         ",
 62:     "          .$.      @$@.         ",
 63:     "          +$.      @&@.         ",
 64:     "          @$.      #*@          ",
 65:     "          #$.      %*@          ",
 66:     "          %$.     .*=#          ",
 67:     "         .*$      .=-#          ",
 68:     "         .=#      .--#          ",
 69:     "         +-@      .;-#          ",
 70:     "         #-+      .;-$          ",
 71:     "        .=-.      .;-&.      ,  ",
 72:     "        %*=.      .--=.      ;  ",
 73:     "       $*&=.       -==#     .+  ",
 74:     "      $&$%*.       *=*=='  @#.  ",
 75:     "     $%#$&%        %*&%-)!,&&.  ",
 76:     "    +*+@#=@        +-%$###+$+   ",
 77:     "    @%+@#=.         *%##@+#$.   ",
 78:     "     =++*#.         +=$@@%$.    ",
 79:     "     +**#.           +%*&@.     ",
 80:     "      ...             ...       ",
 81:     "                                "
 82:   ]
 83: 
 84: class PolyRegress:
 85:   def __getitem__(self, key):
 86:     return self.builder.get_object(key)
 87:   def __init__(self):
 88:     self.copyright = "Copyright © 2010 P.Lutus, http://arachnoid.com"
 89:     self.n = 0
 90:     self.page = 1
 91:     self.poly = 3
 92:     self.data_error = False
 93:     self.builder = gtk.Builder()
 94:     self.builder.add_from_file('polyregress_gui.glade')
 95:     self.mainwindow = self['matplotlib_window']
 96:     self.mainwindow.set_icon(gtk.gdk.pixbuf_new_from_xpm_data(Icon.icon))
 97:     self.mainwindow.connect('destroy', lambda w: gtk.main_quit())
 98:     self.graphic_box = self['graphic_box']
 99:     self['quit_button'].connect('clicked', lambda w: gtk.main_quit())
100:     self['button_left'].connect('clicked', lambda w: self.adjust_poly(-1))
101:     self['button_right'].connect('clicked', lambda w: self.adjust_poly(1))
102:     self.tabbed_pane = self['tabbed_pane']
103:     self.tabbed_pane.set_current_page(self.page)
104:     self.poly_disp = self['poly_disp']
105:     self.data_text = self['data_text']
106:     self.data_textb = self.data_text.get_buffer()
107:     self.data_textb.connect('modified-changed', self.decode_text_data)
108:     self.mainwindow.connect('key-press-event', self.keyboard_event)
109:     self.keymap = {
110:       'Tab' : self.increment_page,
111:       'Right' : lambda: self.adjust_poly(1),
112:       'Left' : lambda: self.adjust_poly(-1),
113:     }
114:     self.function_text = self['function_text']
115:     self.function_textb = self.function_text.get_buffer()
116:     fontms = pango.FontDescription("Monospace 10")
117:     self.function_text.modify_font(fontms)
118:     self.data_text.modify_font(fontms)
119:     self.plt = matplotlib.pyplot
120:     self.figure = self.plt.figure()
121:     self.canvas = Canvas(self.figure)
122:     self.graphic_box.pack_start(self.canvas, True, True)
123:     self.canvas.mpl_connect('motion_notify_event', self.track_mouse_position)
124:     self.load_sample_data()
125:     self.decode_text_data()
126: 
127:   def track_mouse_position(self,evt):
128:     tt = "Red = data points\nBlue = polynomial fit"
129:     s = ""
130:     try: # automatically reject out-of-bounds condition
131:       xv = float(evt.xdata)
132:       yv = float(evt.ydata)
133:       s = "X=%.2f, Y=%.2f" % (xv,yv)
134:       tt = s + "\n" + tt
135:     except:
136:       pass
137:     self.graphic_box.set_tooltip_text(tt)
138:     self['pos_label'].set_text(s)
139: 
140:   def increment_page(self):
141:     self.page = ((self.page + 1) % self.tabbed_pane.get_n_pages())
142:     self.tabbed_pane.set_current_page(self.page)
143: 
144:   def keyboard_event(self,widget,evt):
145:     ks = gtk.gdk.keyval_name(evt.keyval)
146:     if(ks in self.keymap):
147:       self.keymap[ks]()
148:       return True
149: 
150:   def load_sample_data(self):
151:     sample_data = [(-1,-1),(0,3),(1,2.5),(2,5),(3,4),(5,2),(7,5),(9,4)]
152:     a = []
153:     float_spec = "%+2.1f "
154:     for item in sample_data:
155:       for v in item:
156:         a.append(float_spec % v)
157:       a.append("\n")
158:     self.data_textb.set_text("".join(a))
159: 
160:   def decode_text_data(self,*args):
161:     x = []
162:     y = []
163:     n = 0
164:     self.maxx = self.maxy = -1e9
165:     self.minx = self.miny = 1e9
166:     start, end = self.data_textb.get_bounds()
167:     data = self.data_textb.get_text(start,end)
168:     for n,item in enumerate(re.finditer(r"([\d\.e\+-]+)",data)):
169:       s  = item.group(0)
170:       try:
171:         v = float(s)
172:       except:
173:         print("Data conversion error: %s" % s)
174:       if(n % 2):
175:         y.append(v)
176:       else:
177:         x.append(v)
178:     self.data_error = (n % 2 == 0)
179:     if not (self.data_error):
180:       self.xdata = scipy.array(x)
181:       self.ydata = scipy.array(y)
182:       self.minx,self.maxx = min(x),max(x)
183:       self.miny,self.maxy = min(y),max(y)
184:       self.datasize = len(self.xdata)
185:     self.adjust_poly(0)
186: 
187:   def adjust_poly(self,n):
188:     maxdeg = self.datasize-1
189:     q = self.poly + n
190:     if(q < 0): q = 0
191:     if(q > maxdeg): q = maxdeg
192:     self.poly = q
193:     self.poly_disp.set_text("Degree %d" % self.poly)
194:     self.polyfit_plot()
195: 
196:   def polyfit_plot(self):
197:     self.plt.clf()
198:     if(self.data_error):
199:       self.plt.title('Data Incomplete (no data or not in pairs)',size=16,color='red')
200:     else:
201:       self.plt.ylabel('Y data')
202:       self.plt.xlabel('X data')
203:       self.plt.title('Polynomial Regression')
204:       self.plt.grid(True)
205:       self.plt.plot(self.xdata, self.ydata, 'ro',markersize=4)
206:       self.polycoeffs = scipy.polyfit(self.xdata, self.ydata,self.poly)
207:       yfit = lambda x: scipy.polyval(self.polycoeffs, x)
208:       ya = [yfit(z) for z in self.xdata]
209:       yb = sum(self.ydata)/self.datasize
210:       sr = sum([ (yi - yb)**2 for yi in ya])
211:       st = sum([ (yi - yb)**2 for yi in self.ydata])
212:       self.corr_coeff = sr / st
213:       self.stderr = 0
214:       if(self.datasize > 2):
215:         a = 0
216:         for i,x in enumerate(self.xdata):
217:           a += (yfit(x) - self.ydata[i])**2
218:         self.stderr = scipy.sqrt(a / (self.datasize-2))
219:       delta = 0.1
220:       x = scipy.arange(self.minx,self.maxx+delta,delta)
221:       self.plt.plot(x,yfit(x), 'b-')
222:       self.plt.axis([self.minx-1,self.maxx+1,self.miny-2,self.maxy+2])
223:     self.figure.canvas.draw()
224:     self.canvas.show()
225:     self.gen_function()
226:     self.data_textb.set_modified(False)
227: 
228:   def gen_function(self):
229:     out = "Degree %d, %d x,y pairs" % (self.poly,len(self.xdata))
230:     out += "\nCorr. coeff. (r^2) = %+.16e" % self.corr_coeff
231:     out += "\nStandard Error     = %+.16e" % self.stderr
232:     out += "\n\nf(x) = "
233:     a = []
234:     for n,v in enumerate(self.polycoeffs[::-1]):
235:       s = "%+.16e" % v
236:       a.append("%s * x^%02d" % (s,n))
237:     self.function_textb.set_text(out + "\n       ".join(a) + "\n\n" + self.copyright)
238: 
239: # end of class PolyRegress
240: 
241: app=PolyRegress()
242: gtk.main()
243: 
 

Home | Python |     Share This Page