代码大清理,删除已经不再使用的代码;国内IP段改成从Loyalsoldier/geoip项目获取;支持IPv6;支持纯IP HOST的规则查找;IP查找使用前缀树;尽量减少生成的pac文件体积;优化规则判断;美化README;

This commit is contained in:
zhiyi
2024-10-03 01:06:25 +08:00
parent f926c61f1f
commit dafd79edda
9 changed files with 18757 additions and 91240 deletions

View File

@@ -13,10 +13,10 @@ jobs:
uses: actions/checkout@v4
- name: Download delegated-apnic-latest
run: curl https://ftp.apnic.net/apnic/stats/apnic/delegated-apnic-latest -o delegated-apnic-latest.txt
run: curl https://raw.githubusercontent.com/Loyalsoldier/geoip/refs/heads/release/text/cn.txt -o cidrs-cn.txt
- name: Run gfw-pac.py script
run: ./gfw-pac.py -f gfw.pac -p "PROXY 127.0.0.1:3128" --proxy-domains=proxy-domains.txt --direct-domains=direct-domains.txt --localtld-domains=local-tlds.txt --ip-file=delegated-apnic-latest.txt
run: ./gfw-pac.py -f gfw.pac -p "PROXY 127.0.0.1:3128" --proxy-domains=proxy-domains.txt --direct-domains=direct-domains.txt --localtld-domains=local-tlds.txt --ip-file=cidrs-cn.txt
- name: Commit gfw.pac
run: |

View File

@@ -1,20 +1,23 @@
# gfw-pac
科学上网 PAC 文件以及生成器。通过自定义域名和 CNIP 地址生成 PAC(Proxy auto-config) 文件。对存在于自定义域名和解析出的IP不是CNIP的域名使用代理。
科学上网 PAC 文件以及生成器。通过自定义域名和 CNIP 地址生成 PAC(Proxy auto-config) 文件。对存在于自定义域名和解析出的IP不是CNIP的域名使用代理支持IPv6
**此仓库每14天自动通过GitHub Action从apnic获取国内IPv4地址段并更新gfw.pac文件**
**此仓库每 14 天自动通过 GitHub Action 从 `Loyalsoldier/geoip` 获取国内地址段并更新 `gfw.pac` 文件**
## 代理工具普遍支持路由规则,为什么还要用 pac 文件?
如果浏览器所有流量都进入代理程序,那即使命中代理的直连规则,网络流量也要经过代理程序转发,性能会受影响。而先由浏览器通过 pac 文件决定用代理还是直连后,直连的流量不经过代理程序,性能更好。所有流行代理前端几乎都内置了 pac 文件当选择代理前端提供的“pac模式”的时候代理前端会将浏览器设置为它自动生成的 pac 文件。
## 特性
* 速度快:优先按域名匹配,常用域名节省解析时间
* IP规则前置若域名解析出的 IPv4 地址属于国内,返回直连,流量不经过代理程序
* 可自定义需要代理的域名
* 可自定义直连的域名
* 可自定义直连的 TLD 域名,例如 .test
* 直接可用的 `gfw.pac` 包含了常用的直连域名和代理域名
* 开箱即用,直接可用的 `gfw.pac` 包含了常用的直连域名和代理域名以及国内IPv4/IPv6地址段
* IP规则前置若域名解析出的 IP 地址属于国内,返回直连,流量不经过代理程序
* 速度快优先按域名匹配常用域名节省解析时间。IP段匹配使用Radix Tree时间复杂度O(1)
* 支持 IPv6能正确处理IPv6地址段
* 纯 IP 地址能正确处理使用HTTP DNS的APP可正常使用。
* 支持 iOS/MacOS/Windows/Android/chrome/edge/firefox。生成的 pac 文件体积小全部使用ES5大多数系统可正常执行。
* 可自定义需要代理的域名需可运行python
* 可自定义直连的域名需可运行python
* 可自定义直连的 TLD 域名,例如 .test需可运行python
## 用法
@@ -27,7 +30,7 @@
[--proxy-domains 自定义使用代理域名的文件]
[--direct-domains 自定义直连域名域名的文件]
[--localtld-domains 本地TLD文件]
[--ip-file APNIC下载的delegated文件]
[--ip-file 从 Loyalsoldier/geoip/blob/release 中下载的 text/cn.txt 文件]
参数说明:
@@ -37,7 +40,7 @@
--proxy-domains 自定义使用代理的域名文件,文件里每行一个域名
--direct-domains 自定义直连的域名文件,文件里每行一个域名
--localtld-domains 自定义直连的顶级域,文件里每行一个域名,必须带前导圆点(例如 .test
--ip-file 指定本地的从 apnic 下载的 IP 分配文件。若不指定则自动从 apnic 下载
--ip-file 从 Loyalsoldier/geoip release 中下载的 text/cn.txt 文件
举例:
@@ -46,10 +49,9 @@
--proxy-domains=proxy-domains.txt \
--direct-domains=direct-domains.txt \
--localtld-domains=local-tlds.txt \
--ip-file=delegated-apnic-latest.txt
--ip-file=cidrs-cn.txt
## 技巧
* 若自动下载 APNIC 的 IP 分配文件很慢,可自行用科学办法下载 <https://ftp.apnic.net/apnic/stats/apnic/delegated-apnic-latest> 后,用 `--ip-file` 参数指定下载好的文件。
* 自行解决 DNS 污染问题。
* 代理工具最好也配置 GEOIP 路由规则。
* 代理工具最好也配置 GEOIP/GEOSITE 等路由规则。

18417
cidrs-cn.txt Normal file
View File

File diff suppressed because it is too large Load Diff

View File

File diff suppressed because it is too large Load Diff

View File

@@ -1,10 +1,16 @@
gov.cn
115.com
123pan.com
123957.com
baidu.com
baidupcs.com
baidustatic.com
bdimg.com
bdstatic.com
cdn.bcebos.com
cdnnode.cn
qq.com
weixinbridge.com
gtimg.com
gtimg.cn
qstatic.com
@@ -37,11 +43,13 @@ alicdn.com
alibabausercontent.com
alipay.com
alipayobjects.com
aliyundrive.com
dingtalk.com
mmstat.com
tmall.com
jd.com
360buyimg.com
300hu.com
126.com
163.com
189.cn
@@ -55,8 +63,14 @@ amemv.com
ecombdapi.com
baike.com
byteimg.com
douyin.com
douyinpic.com
douyinstatic.com
douyinvod.com
supercachenode.com
bytedance.com
bytescm.com
bytecdn.cn
cmbchina.com
mi.com
xiaomi.com
@@ -64,7 +78,12 @@ amap.com
autonavi.com
meituan.com
meituan.net
sogou.com
dianping.com
o.pki.goog
www.googletagmanager.com
www.google-analytics.com
pagead2.googlesyndication.com
adservice.google.com
fonts.googleapis.com
fonts.gstatic.com
@@ -120,8 +139,13 @@ weatherkit.apple.com
adcdownload.apple.com
alpdownloadit.cdn-apple.com
bricks.cdn-apple.com
pancake.apple.com
storage.live.com
blob.core.windows.net
self.events.data.microsoft.com
mobile.events.data.microsoft.com
browser.events.data.microsoft.com
ocsp.globalsign.com
ocsp2.globalsign.com
ocsp.digicert.cn
ocsp.dcocsp.cn

View File

@@ -1,20 +1,12 @@
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import re
import math
import socket
import struct
import pkgutil
import urllib.parse
import json
import logging
import urllib.request, urllib.error, urllib.parse
from argparse import ArgumentParser
import base64
gfwlist_url = 'https://raw.githubusercontent.com/gfwlist/gfwlist/master/gfwlist.txt'
import ipaddress
import json
def parse_args():
parser = ArgumentParser()
@@ -31,222 +23,70 @@ def parse_args():
parser.add_argument('--localtld-domains', dest='localtld_rule',
help='本地 TLD 规则文件, 不走代理, 每行一个,以 . 开头')
parser.add_argument('--ip-file', dest='ip_file',
help='delegated-apnic-latest from apnic.net')
help='中国IP地址段文件')
return parser.parse_args()
#from https://github.com/Leask/Flora_Pac
def ip2long(ip):
packedIP = socket.inet_aton(ip)
return struct.unpack("!L", packedIP)[0]
def convert_cidr(cidr):
if '/' in cidr:
network = ipaddress.ip_network(cidr.strip(), strict=False)
network_address = network.network_address
prefixlen = network.prefixlen
else:
network = ipaddress.ip_address(cidr.strip())
network_address = network
prefixlen = network.max_prefixlen
if network.version == 4:
return hex(int(network_address))[2:] + '/' + str(prefixlen)
else:
return network.compressed
#from https://github.com/Leask/Flora_Pac
def fetch_ip_data():
def generate_cnip_cidrs():
""" 从文件中读取CIDR地址 """
args = parse_args()
if (args.ip_file):
with open(args.ip_file, 'r') as f:
data = f.read()
else:
#fetch data from apnic
print("Fetching data from apnic.net, it might take a few minutes, please wait...")
url=r'https://ftp.apnic.net/apnic/stats/apnic/delegated-apnic-latest'
# url=r'http://flora/delegated-apnic-latest' #debug
data=urllib.request.urlopen(url).read().decode('utf-8')
with open(args.ip_file, 'r') as file:
cidrs = file.read().splitlines()
converted_cidrs = []
for cidr in cidrs:
converted_cidrs.append(convert_cidr(cidr))
cnregex=re.compile(r'apnic\|cn\|ipv4\|[0-9\.]+\|[0-9]+\|[0-9]+\|a.*',re.IGNORECASE)
cndata=cnregex.findall(data)
cidr_list = ','.join(converted_cidrs)
return f"'{cidr_list}'.split(',')"
results=[]
prev_net=''
for item in cndata:
unit_items=item.split('|')
starting_ip=unit_items[3]
num_ip=int(unit_items[4])
imask=0xffffffff^(num_ip-1)
#convert to string
imask=hex(imask)[2:]
mask=[0]*4
mask[0]=imask[0:2]
mask[1]=imask[2:4]
mask[2]='0' #imask[4:6]
mask[3]='0' #imask[6:8]
#convert str to int
mask=[ int(i,16 ) for i in mask]
mask="%d.%d.%d.%d"%tuple(mask)
#mask in *nix format
mask2=32-int(math.log(num_ip,2))
ip=starting_ip.split('.')
ip[2] = '0'
ip[3] = '0'
starting_ip = '.'.join(ip)
if starting_ip != prev_net:
results.append((ip2long(starting_ip), ip2long(mask), mask2))
prev_net = starting_ip
results.insert(0, (ip2long('127.0.0.1'), ip2long('255.0.0.0'), 0))
results.insert(1, (ip2long('10.0.0.0'), ip2long('255.0.0.0'), 0))
results.insert(2, (ip2long('172.16.0.0'), ip2long('255.240.0.0'), 0))
results.insert(3, (ip2long('192.168.0.0'), ip2long('255.255.0.0'), 0))
def ip(item):
return item[0]
results = sorted(results, key = ip)
return results
def decode_gfwlist(content):
# decode base64 if have to
try:
if '.' in content:
raise Exception()
return base64.b64decode(content).decode('utf-8')
except:
return content
def get_hostname(something):
try:
# quite enough for GFW
if not something.startswith('http:'):
something = 'http://' + something
r = urllib.parse.urlparse(something)
return r.hostname
except Exception as e:
logging.error(e)
return None
def add_domain_to_set(s, something):
hostname = get_hostname(something)
if hostname is not None:
s.add(hostname)
def combine_lists(content, user_rule=None):
gfwlist = content.splitlines(False)
if user_rule:
gfwlist.extend(user_rule.splitlines(False))
return gfwlist
def parse_gfwlist(gfwlist):
domains = set()
for line in gfwlist:
if line.find('.*') >= 0:
continue
elif line.find('*') >= 0:
line = line.replace('*', '/')
if line.startswith('||'):
line = line.lstrip('||')
elif line.startswith('|'):
line = line.lstrip('|')
elif line.startswith('.'):
line = line.lstrip('.')
if line.startswith('!'):
continue
elif line.startswith('['):
continue
elif line.startswith('@'):
# ignore white list
continue
add_domain_to_set(domains, line)
return domains
def reduce_domains(domains):
# reduce 'www.google.com' to 'google.com'
# remove invalid domains
with open('./tld.txt', 'r') as f:
tld_content = f.read()
tlds = set(tld_content.splitlines(False))
new_domains = set()
for domain in domains:
domain_parts = domain.split('.')
last_root_domain = None
for i in range(0, len(domain_parts)):
root_domain = '.'.join(domain_parts[len(domain_parts) - i - 1:])
if i == 0:
if not tlds.__contains__(root_domain):
# root_domain is not a valid tld
break
last_root_domain = root_domain
if tlds.__contains__(root_domain):
continue
else:
break
if last_root_domain is not None:
new_domains.add(last_root_domain)
uni_domains = set()
for domain in new_domains:
domain_parts = domain.split('.')
for i in range(0, len(domain_parts)-1):
root_domain = '.'.join(domain_parts[len(domain_parts) - i - 1:])
if domains.__contains__(root_domain):
break
else:
uni_domains.add(domain)
return uni_domains
def generate_pac_fast(domains, proxy, direct_domains, cnips, local_tlds):
def generate_pac_fast(domains, proxy, direct_domains, cidrs, local_tlds):
# render the pac file
with open('./pac-template', 'r') as f:
proxy_content = f.read()
domains_dict = {}
domains_list = []
for domain in domains:
domains_dict[domain] = 1
domains_list.append(domain)
proxy_content = proxy_content.replace('__PROXY__', json.dumps(str(proxy)))
proxy_content = proxy_content.replace(
'__DOMAINS__',
json.dumps(domains_dict, indent=2, sort_keys=True)
json.dumps(domains_list, sort_keys=True, separators=(',', ':'))
)
direct_domains_dict = {}
direct_domains_list = []
for domain in direct_domains:
direct_domains_dict[domain] = 1
direct_domains_list.append(domain)
proxy_content = proxy_content.replace(
'__DIRECT_DOMAINS__',
json.dumps(direct_domains_dict, indent=2, sort_keys=True)
json.dumps(direct_domains_list, sort_keys=True, separators=(',', ':'))
)
proxy_content = proxy_content.replace(
'__CN_IPS__',
json.dumps(cnips, indent=2, sort_keys=False)
'__CIDRS__', cidrs
)
tlds_dict = {}
tlds_list = []
for domain in local_tlds:
tlds_dict[domain] = 1
tlds_list.append(domain)
proxy_content = proxy_content.replace(
'__LOCAL_TLDS__',
json.dumps(tlds_dict, indent=2, sort_keys=True)
json.dumps(tlds_list, sort_keys=True, separators=(',', ':'))
)
return proxy_content
def generate_pac_precise(rules, proxy):
def grep_rule(rule):
if rule:
if rule.startswith('!'):
return None
if rule.startswith('['):
return None
return rule
return None
# render the pac file
proxy_content = pkgutil.get_data('gfwlist2pac', './abp.js')
rules = list(filter(grep_rule, rules))
proxy_content = proxy_content.replace('__PROXY__', json.dumps(str(proxy)))
proxy_content = proxy_content.replace('__RULES__',
json.dumps(rules, indent=2))
return proxy_content
def main():
args = parse_args()
user_rule = None
@@ -292,10 +132,10 @@ def main():
else:
localtld_rule = []
cnips = fetch_ip_data()
cidrs = generate_cnip_cidrs()
# domains = reduce_domains(domains)
pac_content = generate_pac_fast(user_rule, args.proxy, direct_rule, cnips, localtld_rule)
pac_content = generate_pac_fast(user_rule, args.proxy, direct_rule, cidrs, localtld_rule)
with open(args.output, 'w') as f:
f.write(pac_content)

11229
gfw.pac
View File

File diff suppressed because one or more lines are too long

View File

@@ -8,78 +8,81 @@ var domainsUsingProxy = __DOMAINS__;
var localTlds = __LOCAL_TLDS__;
var cnips = __CN_IPS__;
var cidrs = __CIDRS__;
var hasOwnProperty = Object.hasOwnProperty;
var ipRegExp = new RegExp(/^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/);
function isIPv6(ip) {
// Split the IP address into groups of hexadecimal digits
const groups = ip.split(':');
// An IPv6 address must have at least one group and at most 8 groups
if (groups.length < 1 || groups.length > 8) {
return false;
}
// Check that each group is a valid hexadecimal number
for (const group of groups) {
// Check that the group is not null, undefined, or an empty string before calling parseInt()
if (group === null || group === undefined || group === '') {
continue;
}
// Use parseInt() to check if the group is a valid hexadecimal number
const value = parseInt(group, 16);
if (isNaN(value) || value < 0 || value > 0xFFFF) {
return false;
}
}
// If the address contains a double colon, ensure that it appears only once
if (ip.includes('::')) {
if (ip.indexOf('::') !== ip.lastIndexOf('::')) {
return false;
}
}
// The address is valid if it passes all the checks
return true;
function isIpAddress(ip) {
return /^\d{1,3}(\.\d{1,3}){3}$/.test(ip) || /^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$/.test(ip);
}
function convertAddress(ipchars) {
var bytes = ipchars.split('.');
var result = ((bytes[0] & 0xff) << 24) |
((bytes[1] & 0xff) << 16) |
((bytes[2] & 0xff) << 8) |
(bytes[3] & 0xff);
return result;
function RadixTree() {
this.root = {};
}
function match(ip) {
var left = 0, right = cnips.length;
do {
var mid = Math.floor((left + right) / 2),
ipf = (ip & cnips[mid][1]) >>> 0,
m = (cnips[mid][0] & cnips[mid][1]) >>> 0;
if (ipf == m) {
return true;
} else if (ipf > m) {
left = mid + 1;
RadixTree.prototype.insert = function(string) {
var node = this.root;
for (var i = 0; i < string.length; i++) {
var char = string[i];
if (!node[char]) {
node[char] = {};
}
node = node[char];
}
};
RadixTree.prototype.to_list = function() {
return this.root;
};
function ipToBinary(ip) {
// Check if it's IPv4
if (/^\d{1,3}(\.\d{1,3}){3}$/.test(ip)) {
return ip.split('.').map(function(num) {
return ("00000000" + parseInt(num, 10).toString(2)).slice(-8);
}).join('');
} else if (/^([0-9a-fA-F]{0,4}:){1,7}[0-9a-fA-F]{0,4}$/.test(ip)) {
// Expand the IPv6 address if it contains '::'
var parts = ip.split('::');
var left = parts[0] ? parts[0].split(':') : [];
var right = parts[1] ? parts[1].split(':') : [];
// Calculate the number of zero groups to insert
var zeroGroups = 8 - (left.length + right.length);
// Create the full address by inserting zero groups
var fullAddress = left.concat(Array(zeroGroups + 1).join('0').split('')).concat(right);
// Convert each group to binary and pad to 16 bits
return fullAddress.map(function(group) {
return ("0000000000000000" + parseInt(group || '0', 16).toString(2)).slice(-16);
}).join('');
}
}
function searchRadixTree(bits) {
var currentNode = radixTree;
var string = '';
let isLastNode = false;
for (var i=0; i<bits.length; i++) {
var char = bits[i];
string += char;
if (currentNode[char]) {
currentNode = currentNode[char];
isLastNode = Object.keys(currentNode).every(function(key) {
return !currentNode[key]
})
} else {
right = mid;
break;
}
} while (left + 1 <= right)
return false;
}
return isLastNode
}
function isInDirectDomain(host) {
if (hasOwnProperty.call(directDomains, host)) {
return true;
}
for (var domain in directDomains) {
if (host.endsWith('.' + domain)) {
for (var i = 0; i < directDomains.length; i++) {
var domain = directDomains[i];
if (host === domain || host.endsWith('.' + domain)) {
return true;
}
}
@@ -87,11 +90,9 @@ function isInDirectDomain(host) {
}
function isInProxyDomain(host) {
if (hasOwnProperty.call(domainsUsingProxy, host)) {
return true;
}
for (var domain in domainsUsingProxy) {
if (host.endsWith('.' + domain)) {
for (var i = 0; i < domainsUsingProxy.length; i++) {
var domain = domainsUsingProxy[i];
if (host === domain || host.endsWith('.' + domain)) {
return true;
}
}
@@ -104,7 +105,9 @@ function isLocalTestDomain(domain) {
if (tld === domain) {
return false;
}
return Object.hasOwnProperty.call(localTlds, tld);
return localTlds.some(function(localTld) {
return tld === localTld;
});
}
/* https://github.com/frenchbread/private-ip */
@@ -121,37 +124,65 @@ function isPrivateIp(ip) {
}
function FindProxyForURL(url, host) {
if (isPlainHostName(host)
|| isPrivateIp(host)
|| isLocalTestDomain(host)
|| host === 'localhost') {
return direct;
}
if (shExpMatch(url, "http:*")) {
return direct;
}
if (!ipRegExp.test(host)) {
if (isInDirectDomain(host)) {
return direct
}
if (isInProxyDomain(host)) {
debug('命中直连域名', host, 'N/A');
return direct;
} else if (isInProxyDomain(host)) {
debug('命中代理域名', host, 'N/A');
return proxy;
}
strIp = dnsResolve(host);
} else {
strIp = host
}
if (!strIp) {
return proxy;
}
intIp = convertAddress(strIp);
if (match(intIp)) {
} else if (isPlainHostName(host) || host === 'localhost' || isLocalTestDomain(host)) {
debug('命中本地主机名或本地tld', host, 'N/A');
return direct;
} else if (isPrivateIp(host)) {
debug('命中私有 IP 地址', host, 'N/A');
return direct;
}
ip = isIpAddress(host) ? host : dnsResolve(host);
if (!ip) {
debug('无法解析 IP 地址', host, 'N/A');
return proxy;
} else if (isPrivateIp(ip)) {
debug('域名解析后命中私有 IP 地址', host, ip);
return direct;
} else if (searchRadixTree(ipToBinary(ip))) {
debug('匹配到直连IP', host, ip);
return direct;
}
debug('未命中任何规则', host, ip);
return proxy;
}
var allowAlert = true
function debug(msg, host, ip) {
if (!allowAlert) {
return
}
try {
alert('[' + host + ' -> ' + ip + '] ' + msg);
} catch (e) {
allowAlert = false
}
}
var radixTree = new RadixTree();
(function () {
var startTime = new Date().getMilliseconds()
debug('开始生成 Radix Tree', 'PAC文件载入开始', startTime.toString());
for (let i=0; i<cidrs.length; i++) {
var cidr = cidrs[i];
var [ip, prefixLen] = cidr.split('/');
if (!cidr.includes(':')) {
var ip = ip.match(/.{1,2}/g).map(function(byte) {
return parseInt(byte, 16);
}).join('.');
}
var bits = ipToBinary(ip).slice(0, prefixLen);
radixTree.insert(bits);
}
radixTree = radixTree.to_list();
debug('Radix Tree 已生成', 'PAC文件载入完毕', cidrs.length.toString()+'个CIDR条目');
})();

View File

@@ -20,6 +20,7 @@ wikipedia.org
godaddy.com
cloudflare.com
twitter.com
twimg.com
docker.com
facebook.com
facebook.net
@@ -27,3 +28,4 @@ fbcdn.net
segment.io
unpkg.com
jsdelivr.com
tv.apple.com