26个版本 (5个破坏性更新)
0.6.3 | 2024年8月5日 |
---|---|
0.6.2 | 2024年7月27日 |
0.5.5 | 2024年6月23日 |
0.5.4 | 2024年3月23日 |
#104 in 网络编程
每月531次下载
66KB
1K SLoC
摘要
受HellPot的启发,pandoras_pot
是一个HTTP蜜罐,旨在给那些不尊重你的robots.txt
的放肆的爬虫带来更多的痛苦。
pandoras_pot
的目标是向进入的不请自来的连接发送尽可能多的数据,同时不耗尽你的Web服务器资源,因为服务器可能可以用这些资源做更有意义的事情。
为了确保机器人不会检测到pandoras_pot
,它会生成类似网站(对机器人来说)的随机数据,非常快。像疯狂一样快。甚至可以说是一阵狂风。 希望如此。
pandoras_pot
支持多种生成模式,具体取决于其配置。例如,它可以生成随机字符串作为数据,或使用马尔可夫链生成“实际”的句子。很酷!
功能
- 闪电般快速
- 用Rust编写
- TOML配置格式,请参阅下面的示例(但默认值没有配置!)
- 可选的健康端口,用于反向代理健康检查
- 多种生成模式,并且很容易添加更多!发送纯随机数据,使用马尔可夫链生成的文本,或静态文件!
- 可配置的滥用保护(最大并发生成连接数,时间和大小限制)
- 我提到它是用Rust编写的了吗?
设置方法
Web和反向代理
最可能的使用场景是使用另一个服务器作为反向代理,然后选择一些应该转发到pandoras_pot
的路径,例如/wp-login.php
、/.git/config
和/.env
。
请注意,您使用的URI应该在您的/robots.txt
中设置Disallow
,否则您可能会因为像googlebot这样的东西而遇到麻烦,它会讨厌您奇怪的“页面死亡”。对于上面的路径,您可能有一个如下所示的robots.txt
User-agent: *
Disallow: /wp-login.php
Disallow: /.git
Disallow: /.env
常见的反向代理包括nginx
、httpd
(Apache)和Caddy
。
在Caddy中,您可以添加以下内容以匹配我们已创建的/robots.txt
(pandorust) {
@pandorust_paths {
path /wp-login.php /.git* /.env*
}
handle @pandorust_paths {
reverse_proxy localhost:6669 # Or whatever you run pandoras_pot on
}
}
# ...
example.com {
# ...
# Your actual website
# ...
import pandorust
}
然后您可以直接运行(如果您使用cargo install pandoras_pot
安装)
pandoras_pot
完成!
使用Docker
设置pandoras_pot
的最简单方法是使用docker。您可以使用docker的--build-arg CONFIG=<path to your config>
标志(但它在构建上下文中应该是可用的)传递一个参数到配置文件。
首先,通过运行以下命令来克隆仓库
git clone [email protected]:ginger51011/pandoras_pot.git
cd pandoras_pot
然后您可以构建一个镜像并部署它,这里命名和标记为pandoras_pot
,并使其在localhost:6669
端口上可用
docker build -t pandoras_pot . # You can add --build-arg CONFIG=<...> here
docker run --name=pandoras_pot --restart=always -p 6669:8080 -d pandoras_pot
systemd
服务
您还可以轻松设置一个systemd
服务。这要求您安装Rust,但需要一个更小的docker镜像,并且使重新加载配置更容易。在这个例子中,我将设置一个新的用户,pandora-user
,但您可以使用任何您想要的用户(但我们将锁定pandora-user
)。
注意:除克隆和构建pandoras_pot之外,这里的大多数命令都需要root权限。
首先,克隆仓库并构建pandoras_pot
(安装Rust后)
git clone [email protected]:ginger51011/pandoras_pot.git
cd pandoras_pot
cargo build --release
# Move the binary to a better place
cp ./target/release/pandoras_pot /usr/bin/
然后我们创建一个将运行进程的用户;这个用户不是root,甚至无法登录
adduser --disabled-password --gecos '' --shell /sbin/nologin --no-create-home --home /iamadirandidontexist 'pandora-user'
然后我们创建一个目录来保存我们的配置(以及一些生成器的data
文件等)
mkdir /etc/pandoras_pot
# Ensure the config file exists; you can copy the default one in this README
# into this file
touch /etc/pandoras_pot/config.toml
# Optionally you can create your data file here. You need to point to it from
# the config.
# Make pandora-user the owner of this dir
chown -R pandora-user:pandora-user /etc/pandoras_pot
现在我们创建实际的服务。如果您已经使用了这里的示例,您可以直接将此内容复制粘贴到位于/etc/systemd/system/pandorad.service
的新文件中
[Unit]
Description=Pandora's Pot "service"
After=network.target
StartLimitIntervalSec=0
[Service]
# Change to another user/group if needed
User=pandora-user
Group=pandora-user
Restart=always
RestartSec=1
WorkingDirectory=/etc/pandoras_pot/
# Requires that the file /etc/pandoras_pot/config.toml exists; you can also
# remove config.toml to use plain default settings.
ExecStart=/usr/bin/pandoras_pot config.toml
###
## Hardening; this is optional and can be commented out, but is generally
## good practice. Some might prevent pandoras_pot from functioning, see below.
##
## Other settings may exist and be suitable.
##
## For more info, see systemd.exec(5)
##
MemoryDenyWriteExecute=yes
NoNewPrivileges=yes
PrivateDevices=yes
PrivateTmp=yes
PrivateUsers=yes
ProtectClock=yes
ProtectControlGroups=yes
ProtectHostname=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
RestrictNamespaces=yes
RestrictSUIDSGID=yes
# These might prevent pandoras_pot from writing to a log file if ReadWritePaths is misconfigured.
ProtectHome=yes
ProtectSystem=strict
# This should point to the output log file; this is the default value.
# It should be the same as `logging.output_path` in the config.toml.
# A sane alternative is `/var/log/pandoras.log`.
ReadWritePaths=/etc/pandoras_pot/pandoras.log
##
## End of hardening
###
[Install]
WantedBy=multi-user.target
然后您需要重新加载一些守护程序,启用并启动您的服务
systemctl daemon-reload
systemctl enable pandorad.service
systemctl start pandorad.service
您可以检查一切是否正常
systemctl status pandorad.service
完成!
配置
pandoras_pot
使用toml作为配置格式。如果您没有使用docker,您可以像这样将配置作为一个参数传递
pandoras_pot <path-to-config>
或者将其放在$HOME/.config/pandoras_pot/config.toml
的文件中。
以下是一个示例文件
[http]
# Make sure this matches your Dockerfile's "EXPOSE" if using Docker
port = "8080"
# Routes to send misery to. Is overridden by `http.catch_all`
routes = ["/wp-login.php", "/.env"]
# If all routes are to be served.
catch_all = true
# How many connections that can be made over `http.rate_limit_period` seconds. Will
# not set any limit if set to 0.
rate_limit = 0
# Amount of seconds that `http.rate_limit` checks on. Does nothing if rate limit is set
# to 0.
rate_limit_period = 300 # 5 minutes
# Enables `http.health_port` to be used for health checks (to see if
# `pandoras_pot` is running). Useful if you want to use your chad gaming PC
# that might not always be up and running to back up an instance running on
# your RPi 3 web server.
health_port_enabled = false
# Port to be used for health checks. Should probably not be accessible from the
# outside. Has no effect if `http.health_port_enabled` is `false`.
health_port = "8081"
# The `Content-Type` header set in responses.
content_type = "text/html; charset=utf-8"
[generator]
# The size of each generated chunk in bytes. Has a big impact on performance, so
# play around a bit! Note that if this is set too low (like 10 bytes), `pandoras_pot`
# will refuse to run.
chunk_size = 16384 # 1024 * 16
# The type of generator to be used
type = { name = "random" }
# For generator.type it is also possible to set a markov chain generator, using
# a text file as a source of data. Then you can use this (but uncommented, duh):
# type = { name = "markov_chain", data = "<path to some text file>" }
# Another alternative is a static generator, that always outputs the full contents
# of a file. Does not respect chunking.
# type = { name = "static", data = "<path to some file>" }
# The max amount of simultaneous generators that can produce output.
# Useful for preventing abuse. `0` means no limit.
max_concurrent = 100
# The amount of time in seconds a generator can be active before
# it stops sending. `0` means no limit.
time_limit = 0
# The amount of data in bytes that a generator can
# send before it stops sending. `0` means no limit.
size_limit = 0
# How many chunks should be buffered for each connection. Higher values mean
# more memory usage, but may lead to increased performance. Must be >= 1.
chunk_buffer = 20
# Prefix that will be used for the first message to an incoming connection.
# Usually used to set an HTML prefix. Can be set to "" to disable.
#
# Example usage: Set to "{" for a static generator using a JSON file to make
# output look like a valid stream of JSON that will eventually end (it won't).
prefix = "<!DOCTYPE html><html><body>"
[logging]
# Output file for logs.
output_path = "pandoras.log"
# If pretty logs should be written to standard output.
print_pretty_logs = true
# If no logs at all should be printed to stdout. Overrides other stdout logging
# settings.
no_stdout = false
测量输出
您可以使用curl
轻松测量您的设置发送数据的速度。请注意,使用localhost
可能不可靠,因为它不会显示外人可能看到的内容。更好的选择可能是在另一台机器上使用。
此示例假设您已启用http.catch_all
,否则您应添加一个有效的路由。
curl localhost:8080/ >> /dev/null
支持
我不接受任何捐赠。但是如果您发现我为乐趣编写的任何软件有用,请考虑捐赠给一个效率最高的慈善机构,该机构每花费$CURRENCY
就能拯救或改善最多的生命。
GiveWell.org 是一个优秀的网站,可以帮助您向世界上最有效的慈善机构捐赠。列出当前最佳慈善机构的替代方案有 Founders Pledge,以及针对动物福利的 Animal Charity Evaluators。
- 瑞典居民可以通过 Ge Effektivt 进行可抵税的捐赠给 GiveWell。
- 挪威居民可以通过 Gi Effektivt 做同样的事情。
此列表并不全面;您所在的国家可能有一个等效的列表。
依赖项
~13–24MB
~337K SLoC