2020-03-27

爬虫

前端保存结构代码（代码源于网络）

(function (console) {
    console.save = function (data, filename) {
        if (!data) {
            console.error('Console.save: No data')
            return;
        }
        if (!filename) filename = 'console.json'
        if (typeof data === "object") {
            data = JSON.stringify(data, undefined, 4)
        }
        var blob = new Blob([data], { type: 'text/ json' }),
            e = document.createEvent('MouseEvents'),
            a = document.createElement('a')
        a.download = filename
        a.href = window.URL.createObjectURL(blob)
        a.dataset.downloadurl = ['text / json', a.download, a.href].join(': ')
        e.initMouseEvent('click', true, false, window, 0, 0, 0, 0, 0, false, false, false, false, 0, null)
        a.dispatchEvent(e)
    }
})(console)

// 使用方法：console.save(obj)

Python爬取的编码转换

import requests
from bs4 import BeautifulSoup
# 获取当前网站编码并转换
r = requests.get(website)

# 获取当前网站编码并转换
if r.encoding == 'ISO-8859-1':
    encodings = requests.utils.get_encodings_from_content(r.text)
    if encodings:
        encoding = encodings[0]
    else:
        encoding = r.apparent_encoding
else:
    encoding = r.encoding
encode_content = r.content.decode(
    encoding, 'replace').encode('utf-8', 'replace')
soup = BeautifulSoup(encode_content, features="html.parser")

本文标题:爬虫

文章作者:Wuny

发布时间:2020-03-27, 19:01:53

最后更新:2020-03-27, 19:24:07

原始链接:https://a595859893.github.io/2020/03/27/Python爬虫/

许可协议: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。