1 个不稳定版本
0.1.0 | 2022年11月5日 |
---|
#334 在 机器学习 中
1.5MB
4.5K SLoC
Stitch 的预印本可在 此处 获取。
教程即将推出!
Stitch
快速入门
运行 cargo run --release --bin=compress -- data/cogsci/nuts-bolts.json --max-arity=3 --iterations=10
在一秒内,这将产生类似以下输出
=======Compression Summary=======
Found 10 inventions
Cost Improvement: (11.93x better) 1919558 -> 160946
fn_0 (1.78x wrt orig): utility: 837792 | final_cost: 1079238 | 1.78x | uses: 320 | body: [fn_0 arity=2: (T (repeat (T l (M 1 0 -0.5 (/ 0.5 (tan (/ pi #1))))) #1 (M 1 (/ (* 2 pi) #1) 0 0)) (M #0 0 0 0))]
fn_1 (3.81x wrt orig): utility: 572767 | final_cost: 503538 | 2.14x | uses: 190 | body: [fn_1 arity=3: (repeat (T (T #2 (M 0.5 0 0 0)) (M 1 0 (* #1 (cos (/ pi 4))) (* #1 (sin (/ pi 4))))) #0 (M 1 (/ (* 2 pi) #0) 0 0))]
fn_2 (6.06x wrt orig): utility: 185436 | final_cost: 316890 | 1.59x | uses: 168 | body: [fn_2 arity=1: (T (T c (M 2 0 0 0)) (M #0 0 0 0))]
fn_3 (7.18x wrt orig): utility: 48984 | final_cost: 267198 | 1.19x | uses: 82 | body: [fn_3 arity=2: (C #1 (T r (M #0 0 0 0)))]
fn_4 (8.29x wrt orig): utility: 35046 | final_cost: 231646 | 1.15x | uses: 88 | body: [fn_4 arity=2: (C (fn_0 4 #1) (fn_0 #0 6))]
fn_5 (9.04x wrt orig): utility: 18885 | final_cost: 212456 | 1.09x | uses: 95 | body: [fn_5 arity=3: (C #2 (fn_1 #1 1.5 #0))]
fn_6 (9.93x wrt orig): utility: 18885 | final_cost: 193266 | 1.10x | uses: 95 | body: [fn_6 arity=3: (C #2 (fn_1 #1 3 #0))]
fn_7 (10.53x wrt orig): utility: 10604 | final_cost: 182358 | 1.06x | uses: 54 | body: [fn_7 arity=2: (C #1 (fn_0 #0 6))]
fn_8 (11.20x wrt orig): utility: 10503 | final_cost: 171450 | 1.06x | uses: 36 | body: [fn_8 arity=2: (C (fn_0 4 #1) (fn_2 #0))]
fn_9 (11.93x wrt orig): utility: 10202 | final_cost: 160946 | 1.07x | uses: 52 | body: [fn_9 arity=0: (fn_4 4.25 6)]
Time: 227ms
阅读此指南的简要说明
fn_0
是自动生成的抽象名称(1.78x wrt orig)
意味着使用inv0
生成的压缩程序比原始程序小 1.78 倍,而在行中的稍后位置,另一个1.78x
是与前一步相比的压缩率(对于第一步,它们是相同的)。utility: 836528
这是对程序在重写时新原语的数量进行测量的一个指标(除以 100 以获得删除原语的大致数量)uses: 320
在程序集的 320 个地方使用了这个抽象- 请注意,在这些抽象中
#i
用于抽象变量,而$i
用于原始程序变量。
常见的命令行参数
--max-arity=2
或-a2
用于控制找到的抽象的最大算子数量(默认为2)--iterations=10
或-i10
用于控制压缩运行迭代次数。每次迭代产生一个抽象(可以基于前一个抽象)--threads=10
或-t10
是通过多线程提高性能的快捷方式(默认为1)
所有命令行参数
从 cargo run --release --bin=compress -- --help
ARGS:
<FILE> json file to read compression input programs from
OPTIONS:
-a, --max-arity <MAX_ARITY>
max arity of abstractions to find (will find all from 0 to this number inclusive)
[default: 2]
--args-from-json
extracts argument values from the json; specifically assumes a key value pair like
"stitch_args": "data/dc/logo_iteration_1_stitchargs.json -a3 -t8 --fmt=dreamcoder
--dreamcoder-drop-last --no-mismatch-check", in the toplevel dictionary of the json. All
other commandline args get discarded when you specify this option
-b, --batch <BATCH>
how many worklist items a thread will take at once [default: 1]
--dreamcoder-comparison
anything related to running a dreamcoder comparison
--dynamic-batch
threads will autoadjust how large their batches are based on the worklist size
--fmt <FMT>
the format of the input file, e.g. 'programs-list' for a simple JSON array of programs
or 'dreamcoder' for a JSON in the style expected by the original dreamcoder codebase.
See [formats.rs] for options or to add new ones [default: programs-list] [possible
values: dreamcoder, programs-list, split-programs-list]
--follow-track
for debugging: prunes all branches except the one that leads to the `--track`
abstraction
-h, --help
Print help information
--hole-choice <HOLE_CHOICE>
Method for choosing hole to expand at each step, doesn't have a huge effect [default:
depth-first] [possible values: random, breadth-first, depth-first, max-largest-subset,
high-entropy, low-entropy, max-cost, min-cost, many-groups, few-groups, few-apps]
-i, --iterations <ITERATIONS>
Number of iterations to run compression for (number of inventions to find) [default: 3]
-n, --inv-candidates <INV_CANDIDATES>
Number of invention candidates compression_step should return in a *single* step. Note
that these will be the top n optimal candidates modulo subsumption pruning (and the top-
1 is guaranteed to be globally optimal) [default: 1]
--no-mismatch-check
disables the safety check for the utility being correct; you only want to do this if you
truly dont mind unsoundness for a minute
--no-opt
disable all optimizations
--no-opt-arity-zero
disable the arity zero priming optimization
--no-opt-force-multiuse
disable the force multiuse pruning optimization
--no-opt-free-vars
disable the free variable pruning optimization
--no-opt-single-task
disable the single task pruning optimization
--no-opt-single-use
disable the single structurally hashed subtree match pruning
--no-opt-upper-bound
disable the upper bound pruning optimization
--no-opt-useless-abstract
disable the useless abstraction pruning optimization
--no-other-util
makes it so utility is based purely on corpus size without adding in the abstraction
size
--no-stats
Disable stat logging - note that stat logging in multithreading requires taking a mutex
so it can be a source of slowdown in the massively multithreaded case, hence this flag
to disable it
--no-top-lambda
makes it so inventions cant start with a lambda at the top
-o, --out <OUT>
json output file [default: out/out.json]
--print-stats <PRINT_STATS>
print stats this often (0 means never) [default: 0]
-r, --show-rewritten
print out programs rewritten under abstraction
--rewrite-check
whenever you finish an invention do a full rewrite to check that rewriting doesnt raise
a cost mismatch exception
--save-rewritten <SAVE_REWRITTEN>
saves the rewritten frontiers in an input-readable format
--shuffle
shuffle order of set of inventions
-t, --threads <THREADS>
number of threads (no parallelism if set to 1) [default: 1]
--track <TRACK>
for debugging: pattern or abstraction to track
--truncate <TRUNCATE>
truncate set of inventions to include only this many (happens after shuffle if shuffle
is also specified)
--utility-by-rewrite
calculate utility exhaustively by performing a full rewrite; mainly used when cost
mismatches are happening and we need something slow but accurate
--verbose-best
prints whenever a new best abstraction is found
--verbose-worklist
prints every worklist item as it is processed (will slow things down a ton due to
rendering out expressins)
禁用优化
cargorun --release --bin=compress --data/cogsci/nuts-bolts.json --no-opt
或者查看以 --no-opt-
开头的其他命令行参数,以禁用特定的优化
Python 绑定
目前提供初始的 Python 绑定。
- 根据您的操作系统运行
./gen_bindings_osx.sh
或./gen_bindings_linux.sh
来构建绑定(它们将被添加到bindings/
)- 如果此命令不起作用,请告诉我或打开一个问题!它可能因操作系统而异,并且当前的命令可能过拟合到我的电脑上。
- 将
stitch/bindings/
文件夹添加到您的$PYTHONPATH
中,例如,通过将export PYTHONPATH="$PYTHONPATH:path/to/stitch/bindings/"
添加到您的~/.bashrc
或您特定的 shell / venv 中。这意味着stitch.so
文件在您的 python 路径中,这将允许您导入它。 - 启动
python
并尝试import stitch
(如果成功,则不应打印任何内容) - 作为一个简单的例子,运行 Python 代码
import stitch,json; result = json.loads(stitch.compression(["(a a a)", "(b b b)"], iterations=1, max_arity=2, max_arity=2)); print("Result:", result)
应找到(#0 #0 #0)
抽象。 - 请注意,目前它输出一个类似于 stitch 常规 out/out.json 输出的大的 Python 字典。
- 有更多可用的关键字参数(完整列表在
examples/stitch.rs
中,这是绑定所在的位置,因为将它们保存在examples/
中是生成项目为 Python 绑定生成 cdylib 的一个解决方案)。基本上,你可以在cargo run --release --bin=compress -- --help
中找到的任何东西都包含在内。
详细信息
--save-baseline=main
保存一个命名的基线(如果存在,则与其过去的版本进行比较,然后覆盖它)--load-baseline=feature
表示 不运行任何基准测试,只加载文件,就像它是你刚刚生成的结果一样--baseline=master
覆盖我们将比较哪个基准--bench=compress_bench
避免了详细的“未识别的选项”错误 这里
-->
火焰图
如果你还没有安装: cargo install flamegraph
cargo flamegraph --root --open --deterministic --output=out/flamegraph.svg --bin=compress -- data/cogsci/nuts-bolts.json
致谢
这项工作得到了美国国家科学基金会(NSF)的资助,资助编号为 1918839《通过代码理解世界》http://www.neurosymbolic.org/
这项工作部分得到了国防高级研究计划局(DARPA)的资助,资助项目为 Symbiotic Design for Cyber Physical Systems(SDCPS),合同编号 FA8750-20-C-0542(Systemic Generative Engineering)。所表达的观点、意见和/或发现是作者的观点,不一定反映DARPA的观点。
依赖项
~5–12MB
~127K SLoC