wxPython: Extracting XML from the RichTextCtrl

I recently came across a StackOverflow question where the fellow was asking how to get wxPython's RichTextCtrl's XML data so he could save it to a database. I don't know much about this control, but after a quick Google search, I found an article from 2008 that gave me the information I needed. I took that example and cut it down to the following example:

import wx
import wx.richtext

from StringIO import StringIO


########################################################################
class MyFrame(wx.Frame):

    #----------------------------------------------------------------------
    def __init__(self):
        wx.Frame.__init__(self, None, title='Richtext Test')

        sizer = wx.BoxSizer(wx.VERTICAL)
        self.rt = wx.richtext.RichTextCtrl(self)
        self.rt.SetMinSize((300,200))

        save_button = wx.Button(self, label="Save")
        save_button.Bind(wx.EVT_BUTTON, self.on_save)

        sizer = wx.BoxSizer(wx.VERTICAL)
        sizer.Add(self.rt, 1, wx.EXPAND|wx.ALL, 6)
        sizer.Add(save_button, 0, wx.EXPAND|wx.ALL, 6)

        self.SetSizer(sizer)
        self.Show()


    #----------------------------------------------------------------------
    def on_save(self, event):
        out = StringIO()
        handler = wx.richtext.RichTextXMLHandler()
        rt_buffer = self.rt.GetBuffer()
        handler.SaveStream(rt_buffer, out)
        self.xml_content = out.getvalue()
        print self.xml_content


#----------------------------------------------------------------------
if __name__ == "__main__":
    app = wx.App(False)
    frame = MyFrame()
    app.MainLoop()

Let's break this down a bit. First we create our lovely application and add an instance of the RichTextCtrl widget to the frame along with a button for saving whatever we happen to write in said widget. Next we set up the binding for the button and layout the widgets. Lastly, we create our event handler. This is where the magic happens. Here we create the RichTextXMLHandler and grab the RichTextCtrl;s buffer so we can write out the data. But instead of writing to a file, we write to a file-like object, which is our StringIO instance. We do this so we can write the data to memory and then read it back out. The reason we do this is because the person on StackOverflow wanted a way to extract the XML that the RichTextCtrl generates and write it to a database. We could have written it to disk first and then read that file, but this is less messy and faster.

Note however that if someone had written a novel into the RichTextCtrl, then this would be a BAD idea! While it's not likely that we would run out of room, there are certainly plenty of text files that exceed your computer's memory. If you know that the file you are loading is going to take up a lot of memory, then you wouldn't go this route. Instead you would read and write the data in chunks. Anyway, this code works for what we wanted to do. I hope you found this useful. It was certainly fun to figure out.

Unfortunately this code example doesn't work in **wxPython Phoenix**. In this next section, we will update the example so that it will!


Updating for Phoenix / wxPython 4

The first problem you'll encounter when running the example above in wxPython 4 (AKA Phoenix) is that the SaveStream method no longer exists. You will need to use SaveFile instead. The other problem is actually one introduced by Python 3. If you run this code in Python 3, you will find that the StringIO module doesn't exist and you'll need to use io instead. So for our next example, I updated the code to support both Python 3 and wxPython Phoenix. Let's see how it differs:

# wxPython 4 (Phoenix) / Python 3 Version

import wx
import wx.richtext

from io import BytesIO


class MyFrame(wx.Frame):

    def __init__(self):
        wx.Frame.__init__(self, None, title='Richtext Test')

        sizer = wx.BoxSizer(wx.VERTICAL)
        self.rt = wx.richtext.RichTextCtrl(self)
        self.rt.SetMinSize((300,200))

        save_button = wx.Button(self, label="Save")
        save_button.Bind(wx.EVT_BUTTON, self.on_save)

        sizer = wx.BoxSizer(wx.VERTICAL)
        sizer.Add(self.rt, 1, wx.EXPAND|wx.ALL, 6)
        sizer.Add(save_button, 0, wx.EXPAND|wx.ALL, 6)

        self.SetSizer(sizer)
        self.Show()

    def on_save(self, event):
        out = BytesIO()
        handler = wx.richtext.RichTextXMLHandler()
        rt_buffer = self.rt.GetBuffer()
        handler.SaveFile(rt_buffer, out)
        self.xml_content = out.getvalue()
        print(self.xml_content)


if __name__ == "__main__":
    app = wx.App(False)
    frame = MyFrame()
    app.MainLoop()

The main differences lie in the imports section at the beginning and the on_save method. You will note that we are using the io module's BytesIO class. Then we grab the rest of the data the same way as before except for where we swap SaveStream with SaveFile. The XML that is printed out is a binary string, so if you plan to parse that, then you may need to cast that result into a string. I've had some XML parsers that wouldn't work with binary strings correctly.


Wrapping

While this article only covers extracting XML, you could easily extend it to extract the other formats that the RichTextCtrl supports, such as HTML or the Rich Text Format (RTF) itself. This can be a useful tool to have should you need to save the data in your application off to a database or some other data storage.

Copyright © 2022 Mouse Vs Python | Powered by Pythonlibrary