So a friend of mine has been trying to get her various writings collected into the same place. She has a LJ, a couple personal sites, and now a drupal site.
One of her personal sites has journal like entries going back to 1997. She'd like them to be imported into her drupal site. Unfortunately they are in hand-written HTML. Fortunately they're all pretty well formatted.
Beautiful Soup to the rescue! Using this I was able to screenscrape her site and pull out all the syntactic goodness I needed to get entry dates, titles and text. I have it working great now for 2006. Now I just have to massage it until the rest of the years work, which shouldn't be too hard.
YAY FOR TOOLS THAT MAKE SCREENSCRAPING EASY AND FUN!
posted at: 2006 Apr 14 01:37 UTC | category: tech | (story link)
Copyright © 2006-2008 Zach White