Tags : Python, BeautifulSoup, bookmark, browser, html
Priyans Murarka was having troubles managing his bookmarks. He wanted to keep all the bookmarks from all the browsers from all the OSes in one place. So, I suggested him to use rowz . It’s a really good website to keep your bookmarks at one place and also you can view others’ bookmarks also. Recently, they added a new feature, using which one could import the bookmarks from the browser. What they do is they ask you to export your bookmarks in an HTML file and upload it on the site. So, I see an HTML file, and all I can think of is BeautifulSoup. So, I open the HTML file. And.. OK! it’s damn easy!
from BeautifulSoup import BeautifulSoup import time
Get the HTML file for the bookmark.
Go to Browser > Bookmarks > Show All Bookmarks > Import and Backup > Export Bookmarks to HTML
Save the HTML file.
All the bookmark details are given in a <dt> tag. So just parse the HTML with BeautfulSoup, look up for <dt> and get relevant information.
Fetching Bookmark details
f = open('bookmarks.html','r') soup = BeautifulSoup(f.read()) f.close() dt= for d in soup.findAll('dt'): dt.append(d) for i in range(len(dt)): if dt[i].contents.has_key('href') and dt[i].contents.has_key('add_date'): uri = dt[i].contents['href'] title = dt[i].contents.contents add_date = time.ctime(int(dt[i].contents['add_date'])) last_modified = time.ctime(int(dt[i].contents['last_modified']))
It’s so simple doing anything in Python!
So, this has kind of a set back since an HTML bookmark file needs to be provided. I went to the Mozilla folder and checked few things up. I found out that firefox keeps few backup files for the bookmarks encoded in json! I never learned to use json! But then, today I learned it! It’s easy! And, I have successfully managed to extract bookmark information from the backup files. I would post about it later as it needs a finishing touch.
P.S. Things are looking good and easy with Python!