用selenium+pyquery爬取javascript生成的页面内容

首页新闻小组威客人才下载博客代码贴在线编程论坛

作者在 2015-03-08 20:36:03 发布以下内容

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
from pyquery import PyQuery as pq
from selenium import webdriver

browser = webdriver.Firefox()
browser.get('http://www.baidu.com/')

html = browser.find_element_by_xpath("//*").get_attribute("outerHTML")  # 不要用 browser.page_source，那样得到的页面源码不标准
browser.quit()

html = pq(html)
html.find("script").remove()    # 清理 <script>...</script>
html.find("style").remove()     # 清理 <style>...</style>

print html.outer_html()

Python | 阅读 9308 次

文章评论，共0条

静夜沉思

静夜思

浏览3036743次

文章归档

2025年12月(5)
2024年10月(1)
2023年08月(5)
2023年07月(19)
2023年06月(9)
2023年03月(2)
2022年12月(5)
2022年11月(35)
2022年10月(19)
2022年09月(1)
2022年07月(4)
2022年06月(2)
2022年05月(19)
2022年04月(13)
2022年03月(7)
2022年02月(3)
2022年01月(3)
2021年12月(4)
2021年11月(7)
2021年10月(5)
2021年09月(1)
2021年08月(5)
2021年07月(3)
2021年06月(1)
2021年05月(1)
2021年04月(1)
2021年03月(1)
2021年02月(1)
2021年01月(2)
2020年12月(1)
2020年11月(4)
2020年10月(2)
2020年08月(1)
2020年07月(3)
2020年06月(1)
2020年06月(8)
2020年05月(9)
2020年04月(1)
2020年03月(2)
2020年02月(2)
2020年01月(2)
2019年10月(2)
2019年09月(4)
2019年08月(2)
2019年07月(5)
2019年06月(17)
2019年05月(7)
2019年03月(1)
2019年02月(1)
2019年01月(4)
2018年10月(1)
2018年09月(1)
2018年08月(2)
2018年07月(1)
2018年06月(2)
2018年05月(1)
2018年04月(3)
2018年04月(1)
2018年03月(9)
2018年02月(1)
2018年01月(5)
2017年11月(2)
2017年09月(2)
2017年08月(4)
2017年07月(5)
2016年12月(6)
2016年11月(19)
2016年10月(3)
2016年09月(1)
2016年08月(9)
2016年07月(5)
2016年06月(3)
2016年05月(5)
2016年04月(1)
2016年02月(8)
2015年12月(4)
2015年11月(3)
2015年09月(6)
2015年08月(1)
2015年07月(6)
2015年06月(2)
2015年05月(5)
2015年04月(9)
2015年03月(8)
2015年02月(2)
2015年01月(4)
2014年12月(1)
2014年11月(1)
2014年10月(1)
2014年09月(3)
2014年08月(10)
2014年07月(7)
2014年06月(3)
2014年05月(1)
2014年04月(5)
2014年03月(17)
2014年02月(5)
2014年01月(1)
2013年12月(1)
2013年10月(2)
2013年09月(1)
2013年08月(5)
2013年07月(7)
2013年03月(3)
2013年02月(3)
2013年01月(20)
2012年12月(18)
2012年11月(10)
2012年02月(1)
2011年11月(2)
2011年09月(1)
2010年10月(2)
2010年09月(1)
2010年08月(5)
2010年07月(1)
2010年06月(5)
2010年05月(1)
2010年04月(17)
2010年03月(12)
2010年02月(3)
2010年01月(1)
2009年12月(4)
2009年11月(7)
2009年07月(1)
2008年11月(2)
2008年08月(5)
2008年07月(5)
2008年06月(7)
2008年05月(7)
2008年04月(1)
2008年03月(1)
2008年01月(5)
2007年12月(6)
2007年07月(2)
2007年05月(1)
2007年04月(1)
2007年02月(5)
2007年01月(1)
2006年12月(2)
2006年11月(3)
2006年10月(8)
2006年08月(2)
2006年07月(1)
2006年06月(4)
2006年05月(18)

展开

收起