[Python Study 2] - urllib2/sgmllib

1. to use urllib2/sgmllib,list all URLs on a web page: 

import urllib2


from sgmllib import SGMLParser 


class URLLister(SGMLParser):

    def reset(self):                             

        SGMLParser.reset(self)

        self.urls = []


    def start_a(self, attrs):                     

        href = [v for k, v in attrs if k=='href'] 

        if href:

            self.urls.extend(href)


f = urllib2.urlopen("http://icode.csdn.net")


if f.code == 200:

    parser = URLLister()

    parser.feed(f.read())

    f.close()

    for url in parser.urls: print url 
2. to use BeautifulSoup analysis data
http://www.crummy.com/software/BeautifulSoup/