常用的模块
导航:
- 时间:datetime
- 集合管理:collections
- base64:base64
- 二进制转换:struct
- 摘由算法库:hashlib
- 迭代管理:itertools
- 上下文管理器:contextlib
- html解析:htmlparser
- URL访问:urllib
- 第三方图像处理库:pillow
- 函数检查:inspect
datetime(日期与时间)
导入模块:from datetime import datetime
- 获取当前的日期和时间:
>>> now = datetime.now() # 获取当前datetime >>> print(now) 2015-05-18 16:28:07.198690 >>> print(type(now)) <class 'datetime.datetime'>
- 获取指定的时间:
>>> from datetime import datetime >>> dt = datetime(2015, 4, 19, 12, 20) # 用指定日期时间创建datetime >>> print(dt) 2015-04-19 12:20:00
- datetime、timestamp(时间戳)的相互转换:
在计算机中,时间实际上是用数字表示的。我们把1970年1月1日 00:00:00 UTC+00:00时区的时刻称为epoch time,记为0(1970年以前的时间timestamp为负数),当前时间就是相对于epoch time的秒数,称为timestamp。
北京时间是UTC+8。datetime的timestamp方法就可以将datetime对象转换为从epoch time开始的秒数。一年31536000秒。时间是依赖于当前计算机设置的时区。
>>> from datetime import datetime >>> dt = datetime(2015, 4, 19, 12, 20) # 用指定日期时间创建datetime >>> dt.timestamp() # 把datetime转换为timestamp 1429417200.0
同时一可以从timestamp设置时间。
>>> from datetime import datetime >>> t = 1429417200.0 >>> print(datetime.fromtimestamp(t)) 2015-04-19 12:20:00
timestamp是没有失时区概念的。而datetime是有的。timestamp可以直接转换到UTC标准时区。
>>> from datetime import datetime >>> t = 1429417200.0 >>> print(datetime.fromtimestamp(t)) # 本地时间 2015-04-19 12:20:00 >>> print(datetime.utcfromtimestamp(t)) # UTC时间 2015-04-19 04:20:00
- str、datetime的相互转换:
通过:datetime.strptime(str,format)可以讲format格表述的str日期格式转换为日期对象。
>>> from datetime import datetime >>> cday = datetime.strptime('2015-6-1 18:19:59', '%Y-%m-%d %H:%M:%S') >>> print(cday) 2015-06-01 18:19:59
%Y、%m、%d:表示四位(0001-2999)年份、两位的月份(01-12)、两位的日期(01-31)。
%H、%M、%S:24小时制的且有两位(01-23)小时、分钟、秒数
更多参考:https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
转换后的datetime没有时区信息。
datetime的strftime(format)方法可以将时间转换为字符串,format的参数可以如上一个例子。
>>> from datetime import datetime >>> now = datetime.now() >>> print(now.strftime('%a, %b %d %H:%M')) Mon, May 05 16:28
- datetime加减:datetime.timedelta可以计算日期之间的加减运算
>>> from datetime import datetime, timedelta >>> now = datetime.now() >>> now datetime.datetime(2015, 5, 18, 16, 57, 3, 540997) >>> now + timedelta(hours=10) datetime.datetime(2015, 5, 19, 2, 57, 3, 540997) >>> now - timedelta(days=1) datetime.datetime(2015, 5, 17, 16, 57, 3, 540997) >>> now + timedelta(days=2, hours=12) datetime.datetime(2015, 5, 21, 4, 57, 3, 540997)
- 时区转换:
本地时间转UTC:
>>> from datetime import datetime, timedelta, timezone >>> tz_utc_8 = timezone(timedelta(hours=8)) # 创建时区UTC+8:00 >>> now = datetime.now() >>> now datetime.datetime(2015, 5, 18, 17, 2, 10, 871012) >>> dt = now.replace(tzinfo=tz_utc_8) # 强制设置为UTC+8:00 >>> dt datetime.datetime(2015, 5, 18, 17, 2, 10, 871012, tzinfo=datetime.timezone(datetime.timedelta(0, 28800)))
时区转换的关键在于,拿到一个datetime时,要获知其正确的时区,然后强制设置时区,作为基准时间。
利用带时区的datetime,通过astimezone()方法,可以转换到任意时区。
注:不是必须从UTC+0:00时区转换到其他时区,任何带时区的datetime都可以正确转换,例如上述bj_dt到tokyo_dt的转换。
collections集合模块
- namedtuple
>>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> p = Point(1, 2) >>> p.x 1 >>> p.y 2
namedtuple是一个函数,它用来创建一个自定义的tuple对象,并且规定了tuple元素的个数,并可以用属性而不是索引来引用tuple的某个元素。这样一来,我们用namedtuple可以很方便地定义一种数据类型,它具备tuple的不变性,又可以根据属性来引用,使用十分方便。Point还是tuple的一个子类。
- deque
类似于C++的双端队列,可以从对头插入、删除(appendleft和popleft方法)。
>>> from collections import deque >>> q = deque(['a', 'b', 'c']) >>> q.append('x') >>> q.appendleft('y') >>> q deque(['y', 'a', 'b', 'c', 'x'])
- defaultdict
当某一个key不存在的时候,直接访问会返回KeyError大错误。所以defaultdict可以指定默认值,当key不存在的时候就返回默认值
>>> from collections import defaultdict >>> dd = defaultdict(lambda: 'N/A') >>> dd['key1'] = 'abc' >>> dd['key1'] # key1存在 'abc' >>> dd['key2'] # key2不存在,返回默认值 'N/A'
- OrderedDict
OrderDict会按照key插入的优先顺序进行排序。dict的关键字是无序的。所以根据这些特性就可以创建一个FIFO的dict:
from collections import OrderedDict class LastUpdatedOrderedDict(OrderedDict): def __init__(self, capacity): super(LastUpdatedOrderedDict, self).__init__() self._capacity = capacity def __setitem__(self, key, value): containsKey = 1 if key in self else 0 if len(self) - containsKey >= self._capacity: last = self.popitem(last=False) print('remove:', last) if containsKey: del self[key] print('set:', (key, value)) else: print('add:', (key, value)) OrderedDict.__setitem__(self, key, value)
- Counter
统计字符出现的次数:
>>> from collections import Counter >>> c = Counter() >>> for ch in 'programming': ... c[ch] = c[ch] + 1 ... >>> c Counter({'g': 2, 'm': 2, 'r': 2, 'a': 1, 'i': 1, 'o': 1, 'n': 1, 'p': 1})
Base64编码
base64通过查表的方法编码二进制的数据。base64通过解码之后可以直接显示。
>>> import base64 >>> base64.b64encode(b'binary\x00string') b'YmluYXJ5AHN0cmluZw==' >>> base64.b64decode(b'YmluYXJ5AHN0cmluZw==') b'binary\x00string'
有些时候URL不能出现特殊的字符例如+、/,通过urlsafe_开头的函数,就可以吧+、/替换为-、_:
>>> base64.b64encode(b'i\xb7\x1d\xfb\xef\xff') b'abcd++//' >>> base64.urlsafe_b64encode(b'i\xb7\x1d\xfb\xef\xff') b'abcd--__' >>> base64.urlsafe_b64decode('abcd--__') b'i\xb7\x1d\xfb\xef\xff'
由于=字符也可能出现在Base64编码中,但=用在URL、Cookie里面会造成歧义,所以,很多Base64编码后会把=去掉。
struct
这个模块可以用来解决bytes和其他二进制数据类型的转换。
>>> import struct >>> struct.pack('>I', 10240099) b'\x00\x9c@c'
pack的第一个参数是处理指令,'>I'的意思是:>表示字节顺序是big-endian,也就是网络序,I表示4字节无符号整数。后面的参数个数要和处理指令一致。
unpack把bytes变成相应的数据类型:
>>> struct.unpack('>IH', b'\xf0\xf0\xf0\xf0\x80\x80') (4042322160, 32896)
根据>IH的说明,后面的bytes依次变为I:4字节无符号整数和H:2字节无符号整数。
更多的说明信息:https://docs.python.org/3/library/struct.html#format-characters
读取BMP的信息:
BMP格式采用小端方式存储数据,文件头的结构按顺序如下:
前两个字节:'BM'表示Windows位图,'BA'表示OS/2位图;
一个4字节整数:表示位图大小;
一个4字节整数:保留位,始终为0;
一个4字节整数:实际图像的偏移量;
一个4字节整数:Header的字节数;
一个4字节整数:图像宽度;
一个4字节整数:图像高度;
一个2字节整数:始终为1;
一个2字节整数:颜色数。
# coding=utf-8 # 读取bmp的信息 import struct bmpfile = 'banner.bmp' with open(bmpfile, 'rb') as f: b = f.read()[:30] print(b) print(struct.unpack('<ccIIIIIIHH', b)) """输出 BMP的前最低30位:b'BM6=\x0e\x00\x00\x00\x00\x006\x00\x00\x00(\x00\x00\x00\xc0\x03\x00\x00D\x01\x00\x00\x01\x00\x18\x00' BMP的信息:(b'B', b'M', 933174, 0, 54, 40, 960, 324, 1, 24) """
摘要算法(hashlib)
摘要算法简介:
Python的hashlib提供了常见的摘要算法,如MD5,SHA1等等。
什么是摘要算法呢?摘要算法又称哈希算法、散列算法。它通过一个函数,把任意长度的数据转换为一个长度固定的数据串(通常用16进制的字符串表示)。
举个例子,你写了一篇文章,内容是一个字符串'how to use python hashlib - by Michael',并附上这篇文章的摘要是'2d73d4f15c0db7f5ecb321b6a65e5d6d'。如果有人篡改了你的文章,并发表为'how to use python hashlib - by Bob',你可以一下子指出Bob篡改了你的文章,因为根据'how to use python hashlib - by Bob'计算出的摘要不同于原始文章的摘要。
可见,摘要算法就是通过摘要函数f()对任意长度的数据data计算出固定长度的摘要digest,目的是为了发现原始数据是否被人篡改过。
摘要算法之所以能指出数据是否被篡改过,就是因为摘要函数是一个单向函数,计算f(data)很容易,但通过digest反推data却非常困难。而且,对原始数据做一个bit的修改,都会导致计算出的摘要完全不同。
我们以常见的摘要算法MD5为例,计算出一个字符串的MD5值:
import hashlib md5 = hashlib.md5() md5.update('how to use md5 in python hashlib?'.encode('utf-8')) print(md5.hexdigest()) # 输出d26a53750bc40b38b65a520292f69306
分块调用update()生成完整的结果:
import hashlib md5 = hashlib.md5() md5.update('how to use md5 in '.encode('utf-8')) md5.update('python hashlib?'.encode('utf-8')) print(md5.hexdigest())
SHA1摘要算法:
import hashlib sha1 = hashlib.sha1() sha1.update('how to use sha1 in '.encode('utf-8')) sha1.update('python hashlib?'.encode('utf-8')) print(sha1.hexdigest())
SHA1的结果是160 bit字节,通常用一个40位的16进制字符串表示。
比SHA1更安全的算法是SHA256和SHA512,不过越安全的算法不仅越慢,而且摘要长度更长。
有没有可能两个不同的数据通过某个摘要算法得到了相同的摘要?完全有可能,因为任何摘要算法都是把无限多的数据集合映射到一个有限的集合中。这种情况称为碰撞,比如Bob试图根据你的摘要反推出一篇文章'how to learn hashlib in python - by Bob',并且这篇文章的摘要恰好和你的文章完全一致,这种情况也并非不可能出现,但是非常非常困难。
这是一个以MD5存储用户的密码,注意:通过在原始的密码首位加入用户名(作为一个唯一的标识符)和一个后缀,就可以防止密码被计算以及密码冲突的问题。
# coding=utf-8 # hashlib算法 import hashlib """ md5 = hashlib.md5() md5.update('how to use md5 in python hashlib?'.encode('utf-8')) print(md5.hexdigest()) """ # 设计一个保存用户登录信息的程序 userinfo = dict({}) class UserInfo(object): def login(self, username, passwd): global userinfo if username in userinfo.keys(): md = hashlib.md5() md.update(username.encode('utf-8')) md.update(passwd.encode('utf-8')) md.update('_the_salt'.encode('utf-8')) comp = md.hexdigest() if comp == userinfo[username]: print('Welcome!') else: print('Incorrect password') else: print('No such username') def __setitem__(self, username, passwd): global userinfo md = hashlib.md5() md.update(username.encode('utf-8')) md.update(passwd.encode('utf-8')) md.update('_the_salt'.encode('utf-8')) comp = md.hexdigest() userinfo[username] = comp def display(self): global userinfo for k, v in userinfo.items(): print('Username : ', k, ',Password(MD5) : ', v) if __name__ == '__main__': i = UserInfo() i['shu'] = 'abc' i['fang'] = 'bbc' i.login('shu', 'abc') i.display()
itertools迭代管理
# coding=utf-8 # itertools 工具 import itertools natuls = itertools.count(1) # 步长为1 end = itertools.takewhile(lambda x: x <= 10, natuls) # 允许循环十次 for n in end: print(n) # 输出1\n 2\n 3\n 4\n 5\n 6\n 7\n 8\n 9\n 10(\n是换行) # cycle:无限的循环 cs = itertools.cycle('ABC') # ABC作为一个序列 for c in cs: print(cs) # A\n B\n C\n A\n..... # repeat re = itertools.repeat('A', times=3) for n in re: print(n) # A\n A\n A\n 只会重复times次。没有指定times就无线重复 # chain for c in itertools.chain('ABC', 'XYZ'): print(c) # 将ABC XYZ组合为ABCXYZ # groupby : 不区分大小写 for key, group in itertools.groupby('AAABBBCCaaaAAAa', lambda x: x.upper()): print(key, list(group)) # 将相同的满足第二个参数的重复一组为一组(关键字为重复的元素个体) """输出 A ['A', 'A', 'A'] B ['B', 'B', 'B'] C ['C', 'C'] A ['a', 'a', 'a', 'A', 'A', 'A', 'a']s """
所有的这些都是迭代的类型,需要用for等语句来处理。他们都可以通过takewhile方法限定截取出满足takewhile参数中指定的谓词的序列:
>>> natuals = itertools.count(1) >>> ns = itertools.takewhile(lambda x: x <= 10, natuals) >>> list(ns) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
contextlib
方便的创建自定义类型的上下文管理器。
# coding=gb2312 # 上下文管理器 class Query(object): def __init__(self, name): self.name = name def __enter__(self): print('begin....') return self def __exit__(self, exc_type, exc_value, traceback): if exc_type: print('Error') else: print('End') def query(self): print('Query info about %s...' % self.name) def __getattr__(self, att): if att == 'error': raise RuntimeError('用户发起的错误') else: return 'no problem' with Query('hahaha') as qq: qq.query() print(qq.nanananan) print(qq.error)
__enter__方法就相当于类型初始化之前的操作,__exit__是类的对象销毁时的操作。
# coding=utf-8 # 上下文管理器2 使用库 from contextlib import contextmanager class Query(object): def __init__(self, name): self.name = name def query(self): print('Query info about %s...' % self.name) def __getattr__(self, att): if att == 'error': raise RuntimeError('用户发起的错误') else: return 'no problem' @contextmanager def create_query(name): print('Begin') q = Query(name) yield q print('End') with create_query('hahaha') as qq: qq.query() print(qq.nanananan) #print(qq.error) @contextmanager def tag(name): print('<%s>' % name) yield print('</%s>' % name) """输出 <h1> hello world </h1> """ with tag('h2'): print('6666', '7777') # 上下文等价的管理 from contextlib import closing from urllib.request import urlopen with closing(urlopen('https://www.python.org')) as page: for line in page: print(line)
@contextmanager这个decorator接受一个generator,用yield语句把with ... as var把变量(这个例子中,把‘Bob’作为create_query的参数传递过去)输出出去,然后,with语句就可以正常地工作了:
with create_query('Bob') as q: q.query()
@contextmanager,可以用来在某段代码执行前后自动执行特定代码。例如上述代码的tag函数。
代码的执行顺序是:
1.with语句首先执行yield之前的语句,因此打印出<h1>;
2.yield调用会执行with语句内部的所有语句,因此打印出hello和world;
3.最后执行yield之后的语句,打印出</h1>。
contextlib.closing可以把没有实现上下文的对象变成上下文对象。例如上面代码的with r......lopen('https://www.python.org')) as page部分。
contextlib的一些其他的装饰器:http://blog.jobbole.com/64175/
HTMLParser:
HTML解析工具:
# coding=utf-8 # html解析器 from html.parser import HTMLParser from html.entities import name2codepoint class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attr): # 标签的起始 # pass # print('<%s>' % tag) if tag == 'title': # print('Event:', attr, end='') pass elif tag == 'a': # 获取元组的信息 print('Event:%s, Link:%s, Event info:' % (attr[1][1], attr[0][1]), end='') def handle_endtag(self, tag): # 标签的终止 pass # print('</%s>' % tag) def handle_startendtag(self, tag, attrs): # 单行元素 pass # print('<%s/>' % tag) def handle_data(self, data): # 处理标签的值 print(data) def handle_comment(self, data): # 处理注释 pass # print('<!--', data, '-->') def handle_charref(self, name): # 处理转义的字符 pass # print('&#%s;' % name) parser = MyHTMLParser() parser.feed(''' <ul class="subnav menu" role="menu" aria-hidden="true"> <li class="tier-2 element-1" role="treeitem"><a href="/blogs/" rel="external nofollow" title="Python Insider Blog Posts">Python News</a></li> <li class="tier-2 element-2" role="treeitem"><a href="http://planetpython.org/" rel="external nofollow" title="Planet Python">Community News</a></li> <li class="tier-2 element-3" role="treeitem"><a href="http://pyfound.blogspot.com/" rel="external nofollow" title="PSF Blog">PSF News</a></li> <li class="tier-2 element-4" role="treeitem"><a href="http://pycon.blogspot.com/" rel="external nofollow" title="PyCon Blog">PyCon News</a></li></ul> ''') """输出的时Python的会议信息 Event:Python Insider Blog Posts, Link:/blogs/, Event info:Python News Event:Planet Python, Link:http://planetpython.org/, Event info:Community News Event:PSF Blog, Link:http://pyfound.blogspot.com/, Event info:PSF News Event:PyCon Blog, Link:http://pycon.blogspot.com/, Event info:PyCon News """
urllib:
操作url专用
with request.urlopen('https://api.douban.com/v2/book/2129650') as f: data = f.read() print('Status:', f.status, f.reason) for k, v in f.getheaders(): print('%s: %s' % (k, v)) print('Data:', data.decode('utf-8'))
打开一个url。然后读取信息。HTTP请求首先返回HTTP的报头,getheaders方法可以迭代对应的请求报头(提供浏览器的信息)和值(多个值用 ; 分隔)
输出:200OK表示正常。
Status: 200 OK Server: ADSSERVER/45672 Date: Sat, 27 May 2017 11:46:21 GMT Content-Type: application/json; charset=utf-8 ...... Data: {"rating":{"max":10,"numRaters":16,"average":"7.4","min":0},"subtitle":.... # 这个网页返回的时JSON格式的数据
HTTP协议详解:http://www.ruanyifeng.com/blog/2016/08/http.html和http://www.cnblogs.com/li0803/archive/2008/11/03/1324746.html
如果需要模拟HTTP的GET请求,就需要使用Request对象,通过往Request对象添加HTTP头,我们就可以把请求伪装成浏览器。例如,模拟我的爵士人生8去请求豆瓣首页:
# 模拟HHTP请求 req = request.Request('http://www.douban.com/') #req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36') # 我的Win10 的用户代理 req.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 7.0; HUAWEI NXT-AL10 Build/HUAWEINXT-AL10) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30') # 我大华为的用户代理 #req.add_header('User-Agent', 'Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A403 Safari/8536.25') with request.urlopen(req) as f: print('Status:', f.status, f.reason) for k, v in f.getheaders(): print('%s: %s' % (k, v)) print('Data:', f.read().decode('utf-8'))
返回的值: X-DAE-Node: daisy1a X-DAE-App: talion Strict-Transport-Security: max-age=15552000; X-Content-Type-Options: nosniff Set-Cookie: __ads_session=eic+R2qK6gjexggAYQA=; domain=.douban.com; path=/ X-Powered-By-ADS: chn-shads-1-09 Data: 整个网页的数据 用户代理设置的不同,返回等针对性的网页内容也就不同 <head> <meta charset="UTF-8"> <title>豆瓣(手机版)</title> <meta name="viewport" content="width=device-width, height=device-height, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0"> <meta name="format-detection" content="telephone=no"> <link rel="canonical" href="https://m.douban.com/" rel="external nofollow" > <link href="https://img3.doubanio.com/f/talion/50f7d50f2f898665f7ee8d2888545e01b164b6b4/css/card/base.css" rel="external nofollow" rel="stylesheet">
如果要以POST发送一个请求,只需要把参数data以bytes形式传入。我们模拟一个微博登录,先读取登录的邮箱和口令,然后按照weibo.cn的登录页的格式以username=xxx&password=xxx的编码传入:
# coding=utf-8 # POST发送请求 from urllib import request, parse import urllib print('Login to weibo.cn...') email = input('name:') # input函数在VSCODE中用不了吗? passwd = input('password:') login_data = parse.urlencode([ ('username', email), ('password', passwd), ('entry', 'mweibo'), ('client_id', ''), ('savestate', '1'), ('ec', ''), ('pagerefer', 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F') ]) req = request.Request('https://passport.weibo.cn/sso/login') req.add_header('Origin', 'https://passport.weibo.cn') req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25') req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F') with request.urlopen(req, data=login_data.encode('utf-8')) as f: print('Status:', f.status, f.reason) for k, v in f.getheaders(): print('%s: %s' % (k, v)) print('Data:', f.read().decode('utf-8'))
返回的内容 Access-Control-Allow-Origin: https://passport.weibo.cn Access-Control-Allow-Credentials: true Set-Cookie: SUB=_2A250LRstDeRhGeVG71cS8i7KwjSIHXVX0aVlrDV6PUJbkdBeLVbnkW2be1jrkGlx-AOo2s6GuQ3JgVe80Q..; Path=/; Domain=.weibo.cn; Expires=Sun, 27 May 2018 12:05:17 GMT; HttpOnly Set-Cookie: SUHB=0vJ4VlnVbO68bt; expires=Sunday, 27-May-2018 12:05:17 GMT; path=/; domain=.weibo.cn Set-Cookie: SCF=ApB3kGpV0nSmD6_kvit8yqY0eLhZB6zr7VTIOmQ8mn3_xgOI89lhyE7NP45RRXR_P0M5RXxcCQor9SNxBIJK80s.; expires=Tuesday, 25-May-2027 12:05:17 GMT; path=/; domain=.weibo.cn; httponly Set-Cookie: SSOLoginState=1495886717; path=/; domain=weibo.cn Set-Cookie: ALF=1498478717; expires=Monday, 26-Jun-2017 12:05:17 GMT; path=/; domain=.sina.cn DPOOL_HEADER: dryad22 SINA-LB: aGEuOTAuZzEucXhnLmxiLnNpbmFub2RlLmNvbQ== SINA-TS: ZTdjYTk0Y2UgMCAwIDAgOCA4NjIK Data: xxxxx}
代理服务器:
# handler 的使用 proxy_handler = urllib.request.ProxyHandler({'http': 'http://127.0.0.1:1080'}) opener = urllib.request.build_opener(proxy_handler) with opener.open('http://httpbin.org/ip') as f: print(f.read())
返回的网页内容: b'{\n "origin": "45.32.60.xx"\n}\n' IP地址就是代理服务器的地址,
小例子:雅虎天气API使用
雅虎天气API对的URL路由:《地区》是查找位置的英文名。API的使用查询
XML 版本 https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22《地区》%2C%20china%22)%20and%20u%3D%27c%27%20&format=xml&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeysw%2C%20scotland%22)%20and%20u%3D%27c%27%20&format=xml&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys%E8%BF%99%E4%B8%AAxml%E6%98%AF%E6%80%8E%E4%B9%88%E5%BE%97%E5%88%B0%E7%9A%84? JSON : https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22《地区》%22)&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys JSON : 版本的返回的数据 JSON 返回数据格式 { "query": { "count": 1, "created": "2017-05-18T11:59:45Z", "lang": "zh-CN", "results": { "channel": { "units": { "distance": "mi", "pressure": "in", "speed": "mph", "temperature": "F" }, "title": "Yahoo! Weather - Xiangtan, Hunan, CN", "link": "http://us.rd.yahoo.com/dailynews/rss/weather/Country__Country/*https://weather.yahoo.com/country/state/city-2142701/", "description": "Yahoo! Weather for Xiangtan, Hunan, CN", "language": "en-us", "lastBuildDate": "Thu, 18 May 2017 07:59 PM CST", "ttl": "60", "location": { "city": "Xiangtan", "country": "China", "region": " Hunan" }, "wind": { "chill": "82", "direction": "190", "speed": "11" }, "atmosphere": { "humidity": "51", "pressure": "1004.0", "rising": "0", "visibility": "16.1" }, "astronomy": { "sunrise": "5:37 am", "sunset": "7:13 pm" }, "image": { "title": "Yahoo! Weather", "width": "142", "height": "18", "link": "http://weather.yahoo.com", "url": "http://l.yimg.com/a/i/brand/purplelogo//uh/us/news-wea.gif" }, "item": { "title": "Conditions for Xiangtan, Hunan, CN at 07:00 PM CST", "lat": "27.859819", "long": "112.892067", "link": "http://us.rd.yahoo.com/dailynews/rss/weather/Country__Country/*https://weather.yahoo.com/country/state/city-2142701/", "pubDate": "Thu, 18 May 2017 07:00 PM CST", "condition": { "code": "30", "date": "Thu, 18 May 2017 07:00 PM CST", "temp": "82", "text": "Partly Cloudy" }, "forecast": [ { "code": "28", "date": "18 May 2017", "day": "Thu", "high": "88", "low": "66", "text": "Mostly Cloudy" }, { "code": "28", "date": "19 May 2017", "day": "Fri", "high": "82", "low": "69", "text": "Mostly Cloudy" }, { "code": "30", "date": "20 May 2017", "day": "Sat", "high": "88", "low": "70", "text": "Partly Cloudy" }, { "code": "4", "date": "21 May 2017", "day": "Sun", "high": "87", "low": "74", "text": "Thunderstorms" }, { "code": "4", "date": "22 May 2017", "day": "Mon", "high": "89", "low": "76", "text": "Thunderstorms" }, { "code": "47", "date": "23 May 2017", "day": "Tue", "high": "83", "low": "73", "text": "Scattered Thunderstorms" }, { "code": "30", "date": "24 May 2017", "day": "Wed", "high": "83", "low": "70", "text": "Partly Cloudy" }, { "code": "30", "date": "25 May 2017", "day": "Thu", "high": "82", "low": "69", "text": "Partly Cloudy" }, { "code": "30", "date": "26 May 2017", "day": "Fri", "high": "85", "low": "67", "text": "Partly Cloudy" }, { "code": "30", "date": "27 May 2017", "day": "Sat", "high": "87", "low": "68", "text": "Partly Cloudy" } ], "description": "<![CDATA[<img src=\"http://l.yimg.com/a/i/us/we/52/30.gif\"/>\n<BR />\n<b>Current Conditions:</b>\n<BR />Partly Cloudy\n<BR />\n<BR />\n<b>Forecast:</b>\n<BR /> Thu - Mostly Cloudy. High: 88Low: 66\n<BR /> Fri - Mostly Cloudy. High: 82Low: 69\n<BR /> Sat - Partly Cloudy. High: 88Low: 70\n<BR /> Sun - Thunderstorms. High: 87Low: 74\n<BR /> Mon - Thunderstorms. High: 89Low: 76\n<BR />\n<BR />\n<a href=\"http://us.rd.yahoo.com/dailynews/rss/weather/Country__Country/*https://weather.yahoo.com/country/state/city-2142701/\">Full Forecast at Yahoo! Weather</a>\n<BR />\n<BR />\n(provided by <a href=\"http://www.weather.com\" >The Weather Channel</a>)\n<BR />\n]]>", "guid": { "isPermaLink": "false" } } } } } }
#!/bin/python3 # coding=utf-8 # Weather report # yahoo api https://developer.yahoo.com/weather/#python import requests class WeatherReport(object): def __init__(self, json_data): self.city = json_data['query']['results']['channel']['location']['city'] self.country = json_data['query']['results']['channel']['location']['country'] self.region = json_data['query']['results']['channel']['location']['region'] # # self.unit_distance = json_data['query']['results']['channel']['units']['distance'] # self.unit_pressure = json_data['query']['results']['channel']['units']['pressure'] # self.unit_speed = json_data['query']['results']['channel']['units']['speed'] self.unit_temperature = json_data['query']['results']['channel']['units']['temperature'] # 温度的单位(F) # self.curr_temp = json_data['query']['results']['channel']['item']['condition']['temp'] # 当前的温度 self.curr_wea = json_data['query']['results']['channel']['item']['condition']['text'] # self.forcast = {} for vs in json_data['query']['results']['channel']['item']['forecast']: self.forcast[vs['date']] = list((vs['low'], vs['high'], vs['text'])) def report(self): print('Here is weather of %s, %s, %s' % (self.city, self.region, self.country)) print('Currect weather is %s, %s %s' % (self.curr_wea, self.curr_temp, self.unit_temperature)) print('Weather forcast :') for da, dat in self.forcast.items(): print('On %s Low:%s %s High:%s %s Weather:%s' % (da, dat[0], self.unit_temperature, dat[1], self.unit_temperature, dat[2])) city = 'tianjin' # city = input('Please inuput the city name (eg \'xiangtan\') : ') link = 'https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20weather.forecast%20where%20woeid%20in%20(select%20woeid%20from%20geo.places(1)%20where%20text%3D%22' + city +'%22)&format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys' s = requests.session() wp = WeatherReport(s.get(link).json()) wp.report()
requests.session()可以打开一个对话(返回会话对象Session),持续的会话。
s.get(link)可以读取一个会话s的返回的数据json()方法可以把内容作为json文件在python中地格式(dict)。
输出:
Here is weather of Tianjin, Tianjin, China Currect weather is Mostly Cloudy, 76 F Weather forcast : On 27 May 2017 Low:65 F High:87 F Weather:Mostly Sunny On 28 May 2017 Low:70 F High:93 F Weather:Mostly Sunny On 29 May 2017 Low:70 F High:79 F Weather:Cloudy On 30 May 2017 Low:65 F High:81 F Weather:Mostly Cloudy On 31 May 2017 Low:64 F High:88 F Weather:Mostly Sunny On 01 Jun 2017 Low:69 F High:91 F Weather:Mostly Sunny On 02 Jun 2017 Low:67 F High:84 F Weather:Partly Cloudy On 03 Jun 2017 Low:66 F High:80 F Weather:Partly Cloudy On 04 Jun 2017 Low:65 F High:86 F Weather:Partly Cloudy On 05 Jun 2017 Low:67 F High:86 F Weather:Partly Cloudy
第三方模块:pillow
pillow拥有许多的模块:Image(图像处理)、ImageFilter(滤镜)、ImageDraw(绘图方法)、 ImageFont(图像字体)...更多的内容:官方介绍
# coding=utf-8 # pillow 库的使用 from PIL import Image from PIL import ImageFilter # 滤镜 im = Image.open('bing42.jpg') w, h = im.size print('width : %s, height : %s' % (w, h)) # 设置缩放图 im.thumbnail((w // 2, h // 2)) # 长和宽整除2 im.save('thumbnail1.png', 'png') # 模糊效果 使用滤镜 im = Image.open('thumbnail1.png') im2 = im.filter(ImageFilter.BLUR) im2.save('thumbnail_blur.png')
生成验证码:
# coding=utf-8 # pillow ref https://pillow.readthedocs.io/en/4.1.x/ # 生成验证码 from PIL import Image, ImageDraw, ImageFont, ImageFilter import random # 随机数 def rndChar(): return chr(random.randint(65, 90)) # A到Z chr可以把Unicode(ASCII包含其中)转换为字符 # 随机颜色1 def rndColor(): return (random.randint(64, 255), random.randint(64, 255), random.randint(64, 255)) # 背景色 def rndColor2(): return (random.randint(32, 127), random.randint(32, 127), random.randint(32, 127)) # 这个主要是验证码的颜色 # 240 x 60: width = 60 * 4 height = 60 image = Image.new('RGB', (width, height), (255, 255, 255)) # color背景色为白色,这个必须指定否则就是没有初始化的Image对象 ref:https://pillow.readthedocs.io/en/4.1.x/reference/Image.html # 创建Font对象: font = ImageFont.truetype('CALIFB.ttf', 36) # 目录下有这个字体就行 字号:36 # 创建Draw对象: draw = ImageDraw.Draw(image) # 填充每个像素: for x in range(width): for y in range(height): draw.point((x, y), fill=rndColor()) # fill填充色 # 输出文字: for t in range(4): draw.text((60 * t + 10, 10), rndChar(), font=font, fill=rndColor2()) # 参数是左边点(x,y),文字,字体对象,填充颜色对象 # 模糊: image = image.filter(ImageFilter.BLUR) image.save('code.jpg', 'jpeg')
缩略图处理:
验证码生成:
inspect模块:
inspect的signature(funcname)函数可以获取函数funcname的签名。这个函数返回的对象是:inspect.Signature类型:
inspect.Signature.parameters属性是一个mappingproxy对象,值是OrderedDict(顺序是函数参数的定义顺序)。
OrderedDict的关键字是参数名
OrderedDict的值是inspect.Parameter对象:
inspect.Parameter对象:
kind属性:_ParameterKind枚举类型的对象:
VAR_KEYWORD:可变长关键字参数 **kw
KEYWORD_ONLY:关键字参数(必须出现在可变长参数后)
VAR_POSITIONAL:可变长位置参数例*p
POSITIONAL_ONLY:位置参数
POSITIONAL_OR_KEYWORD:关键字或位置参数
default属性(是否有默认值):如果有默认值,返回这个默认值,没有的话返回inspect._empty对象。
更多的属性:https://docs.python.org/3/library/inspect.html
测试例子:
# coding=utf-8 import inspect def a(a, b=0, *c, d, e=1, **f): pass aa = inspect.signature(a) print('aa type is %s, value is %s' % (type(aa), aa)) ab = aa.parameters print('ab type is %s, value is %s' % (type(ab), ab)) for k ,v in ab.items(): print('parameter : %s' % k) print('parameter info type: %s, kind is %s, default is %s' % (type(v), v.kind, v.default)) """输出 aa type is <class 'inspect.Signature'>, value is (a, b=0, *c, d, e=1, **f) ab type is <class 'mappingproxy'>, value is OrderedDict([('a', <Parameter "a">), ('b', <Parameter "b=0">), ('c', <Parameter "*c">), ('d', <Parameter "d">), ('e', <Parameter "e=1">), ('f', <Parameter "**f">)]) parameter : a parameter info type: <class 'inspect.Parameter'>, kind is POSITIONAL_OR_KEYWORD, default is <class 'inspect._empty'> parameter : b parameter info type: <class 'inspect.Parameter'>, kind is POSITIONAL_OR_KEYWORD, default is 0 parameter : c parameter info type: <class 'inspect.Parameter'>, kind is VAR_POSITIONAL, default is <class 'inspect._empty'> parameter : d parameter info type: <class 'inspect.Parameter'>, kind is KEYWORD_ONLY, default is <class 'inspect._empty'> parameter : e parameter info type: <class 'inspect.Parameter'>, kind is KEYWORD_ONLY, default is 1 parameter : f """
插曲
不知为何,VSCODE中的调试界面不会出现导入hashlib的问题,但是普通运行就会出问题。但是VSCODE的调试似乎不能通过python的read函数输入数据。
参考
- 有很多部分引用廖雪峰dalao的blog,感谢:http://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014320027235877860c87af5544f25a8deeb55141d60c5000
- HTTP协议讲解:http://www.cnblogs.com/li0803/archive/2008/11/03/1324746.html和http://www.ruanyifeng.com/blog/2016/08/http.html
- pillow的模块介绍:https://pillow.readthedocs.io/en/4.1.x/reference/Image.html
- Python inspect模块:http://blog.csdn.net/weixin_35955795/article/details/53053762和官方文档:https://docs.python.org/3/library/inspect.html
- 上下文管理模块contextlib:http://blog.jobbole.com/64175/
- requests的用法:http://docs.python-requests.org/en/master/user/advanced/和http://www.jianshu.com/p/cba83709c64c
- HTTP持续连接Wikipedia:https://en.wikipedia.org/wiki/HTTP_persistent_connection
- 雅虎天气API介绍:https://developer.yahoo.com/weather/#python