[Header Picture]

# Beautiful Soup: Turning tag soup into structured data

So a friend of mine has been trying to get her various writings collected into the same place. She has a LJ, a couple personal sites, and now a drupal site.

One of her personal sites has journal like entries going back to 1997. She'd like them to be imported into her drupal site. Unfortunately they are in hand-written HTML. Fortunately they're all pretty well formatted.

Beautiful Soup to the rescue! Using this I was able to screenscrape her site and pull out all the syntactic goodness I needed to get entry dates, titles and text. I have it working great now for 2006. Now I just have to massage it until the rest of the years work, which shouldn't be too hard.

YAY FOR TOOLS THAT MAKE SCREENSCRAPING EASY AND FUN!

posted at: 2006 Apr 14 01:37 UTC | category: tech | (story link)


Copyright © 2006-2008 Zach White