Python: Bad Code of the Day (Oct 30th, 2013 Ed)

We all write bad code from time to time. I tend to do it deliberately when I’m experimenting in the interpreter or the debugger. I can’t say why I do it exactly except that it’s fun to see what I can fit in one line of code. Anyway, this week I needed to parse a file that looked like this:

00000       TM        YT                                                TSA1112223  0000000000000000000000000020131007154712PXXXX_Name_test_test_section.txt
data line 1
data line 2


You can download it here if you don’t want to scroll around the above snippet. Anyway, the objective was to parse out the part of the filename in the header that is labeled “section” in the above. The filename itself is XXXX_Name_test_test_section.txt. That part of the filename is used to identify something in a database. So in the Python interpreter, I came up with the following hack:

header.split("|")[0][125:].split("_")[-1].split(".")[0]

Let’s give this some more context. The above snippet fits into a function like this:

#----------------------------------------------------------------------
def getSection(path):
    """"""
    with open(path) as inPath:
        lines = inPath.readlines()
        
    header = lines.pop(0)
    section = header.split("|")[0][125:].split("_")[-1].split(".")[0]

That’s pretty ugly code and would require a fairly lengthy comment explaining what I’m doing. Not that I planned to actually use it in the first place, but it was fun to write. Anyway, I ended up breaking it down into some code that’s something like the following:

#----------------------------------------------------------------------
def getSection(path):
    """"""
    with open(path) as inPath:
        lines = inPath.readlines()
        
    header = lines.pop(0)
    filename = header[125:256].strip()
    fname = os.path.splitext(filename)[0]
    section = fname.split("_")[-1]
    
    print section
    print header.split("|")[0][125:].split("_")[-1].split(".")[0]
    
#----------------------------------------------------------------------
if __name__ == "__main__":
    path = "test.txt"
    getSection(path)

I’m leaving the “bad” code in the example above to show that the result is the same. Yes, it’s still a little ugly, but it’s much easier to follow and the code is self-documenting.

What funky code have you written lately?

9 thoughts on “Python: Bad Code of the Day (Oct 30th, 2013 Ed)”

  1. Pingback: Mike Driscoll: Python: Bad Code of the Day (Oct 30th, 2013 Ed) | The Black Velvet Room

  2. Scott, give the regular expression then? If the rules are this simple and don’t have to be more generic, I don’t see the need for regex.

    In this case I would simply do:

    with open(path) as f_in:
    header = f_in.readline()
    section = header[header.rfind(‘_’) + 1: header.rfind(‘.’)].

  3. A RegEx seems to be overkill. I like melker’s solution, it’s simple and elegant. I might have done a `os.path.splitext(fname)` before and simply used `header[header.rfind(‘_’) + 1:]`, but it seems unnessesary in this case.

  4. Why to read all the lines? Files that are extracts from a database may be very large. Why not to read only the header line as

    line = inPath.readline()

  5. you mean you don’t know how to write the RE? There is nothing simple or elegant about “header.split(“|”)[0][125:].split(“_”)[-1].split(“.”)[0]”. That could take someone an hour to figure out. Especially with no comments or example input line to reference. And it’s hard-coded (125). It’s fragile and brittle as hell. Bad, bad, bad.

Comments are closed.