4 个版本
0.2.4 | 2022 年 3 月 21 日 |
---|---|
0.2.3 | 2021 年 11 月 11 日 |
0.2.2 | 2021 年 10 月 8 日 |
0.2.1 | 2021 年 10 月 5 日 |
#357 在 编程语言 中
450 每月下载量
12MB
363K SLoC
weggli
简介
weggli 是一个针对 C 和 C++ 代码库的快速且健壮的语义搜索工具。它旨在帮助安全研究人员在大代码库中识别有趣的功能。
weggli 基于用户提供的查询对抽象语法树进行模式匹配。其查询语言类似于 C 和 C++ 代码,这使得将有趣的代码模式转换为查询变得容易。
weggli 受到像 Semgrep、Coccinelle、joern 和 CodeQL 这样的优秀工具的启发,但做出了一些不同的设计决策
-
C++ 支持:weggli 对现代 C++ 构造(如 lambda 表达式、基于范围的 for 循环和 constexpr)提供了一等支持。
-
最小化设置:weggli 应该在大多数您会遇到的应用程序上“开箱即用”。weggli 不需要构建软件的能力,并且可以使用不完整的源代码或缺少的依赖项。
-
交互式:weggli 是为交互式使用和快速查询性能而设计的。大多数时候,一个 weggli 查询将比 grep 搜索更快。目标是实现一个交互式工作流程,其中快速在代码审查和查询创建/改进之间切换是可能的。
-
贪婪:weggli 的模式匹配旨在为特定查询找到尽可能多的(有用的)匹配项。虽然这增加了假阳性的风险,但简化了查询创建。例如,查询
$x = 10;
将匹配赋值表达式(foo = 10;
)和声明(int bar = 10;
)。
使用方法
Use -h for short descriptions and --help for more details.
Homepage: https://github.com/googleprojectzero/weggli
USAGE: weggli [OPTIONS] <PATTERN> <PATH>
ARGS:
<PATTERN>
A weggli search pattern. weggli's query language closely resembles
C and C++ with a small number of extra features.
For example, the pattern '{_ $buf[_]; memcpy($buf,_,_);}' will
find all calls to memcpy that directly write into a stack buffer.
Besides normal C and C++ constructs, weggli's query language
supports the following features:
_ Wildcard. Will match on any AST node.
$var Variables. Can be used to write queries that are independent
of identifiers. Variables match on identifiers, types,
field names or namespaces. The --unique option
optionally enforces that $x != $y != $z. The --regex option can
enforce that the variable has to match (or not match) a
regular expression.
_(..) Subexpressions. The _(..) wildcard matches on arbitrary
sub expressions. This can be helpful if you are looking for some
operation involving a variable, but don't know more about it.
For example, _(test) will match on expressions like test+10,
buf[test->size] or f(g(&test));
not: Negative sub queries. Only show results that do not match the
following sub query. For example, '{not: $fv==NULL; not: $fv!=NULL *$v;}'
would find pointer dereferences that are not preceded by a NULL check.
strict: Enable stricter matching. This turns off statement unwrapping
and greedy function name matching. For example 'strict: func();'
will not match on 'if (func() == 1)..' or 'a->func()' anymore.
weggli automatically unwraps expression statements in the query source
to search for the inner expression instead. This means that the query `{func($x);}`
will match on `func(a);`, but also on `if (func(a)) {..}` or `return func(a)`.
Matching on `func(a)` will also match on `func(a,b,c)` or `func(z,a)`.
Similarly, `void func($t $param)` will also match function definitions
with multiple parameters.
Additional patterns can be specified using the --pattern (-p) option. This makes
it possible to search across functions or type definitions.
<PATH>
Input directory or file to search. By default, weggli will search inside
.c and .h files for the default C mode or .cc, .cpp, .cxx, .h and .hpp files when
executing in C++ mode (using the --cpp option).
Alternative file endings can be specified using the --extensions (-e) option.
When combining weggli with other tools or preprocessing steps,
files can also be specified via STDIN by setting the directory to '-'
and piping a list of filenames.
OPTIONS:
-A, --after <after>
Lines to print after a match. Default = 5.
-B, --before <before>
Lines to print before a match. Default = 5.
-C, --color
Force enable color output.
-X, --cpp
Enable C++ mode.
--exclude <exclude>...
Exclude files that match the given regex.
-e, --extensions <extensions>...
File extensions to include in the search.
-f, --force
Force a search even if the queries contains syntax errors.
-h, --help
Prints help information.
--include <include>...
Only search files that match the given regex.
-l, --limit
Only show the first match in each function.
-p, --pattern <p>...
Specify additional search patterns.
-R, --regex <regex>...
Filter variable matches based on a regular expression.
This feature uses the Rust regex crate, so most Perl-style
regular expression features are supported.
(see https://docs.rs/regex/1.5.4/regex/#syntax)
Examples:
Find calls to functions starting with the string 'mem':
weggli -R 'func=^mem' '$func(_);'
Find memcpy calls where the last argument is NOT named 'size':
weggli -R 's!=^size$' 'memcpy(_,_,$s);'
-u, --unique
Enforce uniqueness of variable matches.
By default, two variables such as $a and $b can match on identical values.
For example, the query '$x=malloc($a); memcpy($x, _, $b);' would
match on both
void *buf = malloc(size);
memcpy(buf, src, size);
and
void *buf = malloc(some_constant);
memcpy(buf, src, size);
Using the unique flag would filter out the first match as $a==$b.
-v, --verbose
Sets the level of verbosity.
-V, --version
Prints version information.
示例
写入堆缓冲区的 memcpy 调用
weggli '{
_ $buf[_];
memcpy($buf,_,_);
}' ./target/src
未检查返回值的 foo 调用
weggli '{
strict: foo(_);
}' ./target/src
可能存在漏洞的 snprintf() 用户
weggli '{
$ret = snprintf($b,_,_);
$b[$ret] = _;
}' ./target/src
可能未初始化的指针
weggli '{ _* $p;
NOT: $p = _;
$func(&$p);
}' ./target/src
潜在不安全的WeakPtr使用
weggli --cpp '{
$x = _.GetWeakPtr();
DCHECK($x);
$x->_;}' ./target/src
仅调试迭代器验证
weggli -X 'DCHECK(_!=_.end());' ./target/src
基于函数参数写入栈缓冲区的函数
weggli '_ $fn(_ $limit) {
_ $buf[_];
for (_; $i<$limit; _) {
$buf[$i]=_;
}
}' ./target/src
名称中包含字符串decode的函数
weggli -R func=decode '_ $func(_) {_;}'
编码/转换函数
weggli '_ $func($t *$input, $t2 *$output) {
for (_($i);_;_) {
$input[$i]=_($output);
}
}' ./target/src
安装
$ cargo install weggli
构建说明
# optional: install rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
git clone https://github.com/googleprojectzero/weggli.git
cd weggli; cargo build --release
./target/release/weggli
实现细节
Weggli建立在tree-sitter
解析库及其C
和C++
语法之上。搜索查询首先使用对应语法的扩展版本进行解析,然后将生成的AST
转换为builder.rs
中的树-sitter查询集。实际的查询匹配是在query.rs
中实现的,这是一个相对较小的tree-sitter查询引擎包装,用于添加weggli特定的功能。
贡献
有关详细信息,请参阅CONTRIBUTING.md
。
许可证
Apache 2.0;有关详细信息,请参阅LICENSE
。
免责声明
本项目不是官方的Google项目。它不受Google支持,Google特别声明不对其质量、商业性或特定用途的适用性承担任何保证。
依赖项
~6–18MB
~198K SLoC