1 个不稳定版本

0.1.1	2024 年 7 月 14 日

#753 在命令行工具中

每月 106 次下载

Apache-2.0

66KB
1K SLoC

模型卡工具

用于处理模型卡的多功能命令行工具。

CLI 主要支持两种工作模式：

管道模式：用于 CI/CD 管道或作为终端中的独立实用程序
项目模式：创建自定义模式和模板

在任何模式下都工作的子命令有：

completion 生成 shell 完成项
```
❯ modelcards completion
```

help 打印一般帮助或给定子命令的帮助

❯ modelcards help
A fast modelcard generator with built-in templates

Usage: modelcards [OPTIONS] <COMMAND>

Commands:
  init        Create a new modelcard project
  build       Deletes the output directory if there is one and builds the modelcard
  check       Try to build the project without rendering it. Checks inputs
  validate    Validate the modelcard data file against the schema
  render      Render the modelcard using template
  merge       Merge multiple modelcard data files into one
  completion  Generate shell completion
  help        Print this message or the help of the given subcommand(s)

Options:
  -r, --root <ROOT>      Directory to use as root of project [default: .]
  -c, --config <CONFIG>  Path to a config file other than config.toml in the root of project [default: config.toml]
  -v, --verbose...       Increase logging verbosity
  -q, --quiet...         Decrease logging verbosity
  -h, --help             Print help
  -V, --version          Print version

管道模式

当前管道模式支持三个子命令

merge - 合并多个 JSON 文件

merge 子命令用于在值级别合并两个或多个 JSON 文件。这允许您分别处理大型 JSON 结构或创建具有默认值或全局值的 JSON 文件。

这特别有助于减少开发人员在文档方面的工作量。例如，您可以在单独的 JSON 文件中存储全局默认值，预先填写必填字段或分配公司范围内的版权、参考等。然后，您可以将特定用例的文档（如用途、考虑因素等）放入单独的 JSON 文件中，该文件可以用于您的用例中的所有模型，最后放入包含特定模型详细信息的 JSON 文件。然后，您可以使用以下命令生成完整的模型卡 JSON 数据文件

❯ modelcards merge defaults.json usecase.json model.json -o modelcard.json

语法

Usage: modelcards merge [OPTIONS] [SOURCES]...

Arguments:
  [SOURCES]...  The source modelcard data files to be merged

Options:
  -o, --target <TARGET>  The output file to write the merged data to
  -v, --verbose...       Increase logging verbosity
  -q, --quiet...         Decrease logging

validate - 对模型卡数据与 JSON 架构进行验证

将模型卡 JSON 数据文件传递给验证。如果没有给出架构，则使用 Google 模型卡工具包内置的架构。

如果您传递了多个 JSON 文件，它们不会逐个进行验证，而是在验证之前合并，就像您首先调用 merge 命令然后验证结果一样。

要验证 Google 架构

❯ modelcards validate modelcard.json

要验证您自己的自定义架构

❯ modelcards validate modelcard.json -s myschema.json

语法

Usage: modelcards validate [OPTIONS] [SOURCES]...

Arguments:
  [SOURCES]...  The source modelcard data file to be verified

Options:
  -s, --schema <SCHEMA>  The schema file to validate against (defaults to build-in schema)
  -v, --verbose...       Increase logging verbosity
  -q, --quiet...         Decrease logging verbosity
  -h, --help             Print help

render - 使用给定的 Jinja 模板渲染模型卡

render 命令使用 Jinja 模板将模型卡 JSON 数据转换为所需的任何格式。

将模型卡 JSON 数据文件传递给 render。如果没有给出模板，则使用 Google 模型卡工具包数据架构的内置 Markdown 模板。

如果您传递了多个json文件，它们不会被逐个渲染，而是在渲染之前合并，就像您首先调用merge命令，然后渲染结果一样。

结果将存储在以您最后传递的modelcard源命名的文件中，但扩展名为.md。

要使用默认模板进行渲染，您可以：

❯ modelcards render modelcard.json

这将创建结果文件modelcard.md。

或者如果您传递了多个文件：

❯ modelcards render default.json usecase.json model.json

这将创建结果文件model.md，因为最后传递的文件是model.json源。

要使用您自己的自定义模板进行渲染

❯ modelcards render modelcard.json -t my-html-template.jinja

语法

Usage: modelcards render [OPTIONS] [SOURCES]...

Arguments:
  [SOURCES]...  The source modelcard data file to be verified

Options:
  -t, --template <TEMPLATE>  The jinjia template file to use (defaults to build-in markdown template)
  -v, --verbose...           Increase logging verbosity
  -q, --quiet...             Decrease logging verbosity
  -h, --help                 Print help

持续集成示例

为了在您的机器学习项目中有效地使用cli实用程序，假设您在存储库中有一个default.json、usecase.jsonl、first_model.json和second_model.json，您可以使用最新的度量来更新模型json文件，然后合并、验证和渲染模型卡。

# merge both model details to final modelcard for each
modelcards merge default.json usecase.json first_model.json -o modelcard_first.json
modelcards merge default.json usecase.json second_model.json -o modelcard_second.json
# assure that modelcard data is valid (exits with 1 on validation error and 0 if data is valie)
modelcards validate modelcard_first.json
modelcards validate modelcard_second.json
# render the data to markdown
modelcards render modelcard_first.json
modelcards render modelcard_second.json
# optionally create links to the generated modelcards in your README.md

项目模式

项目模式的文档将随后提供，目前有三个子命令在项目模式下工作

init - 创建新的模型卡项目

语法

Usage: modelcards init [OPTIONS] [NAME]

Arguments:
  [NAME]  Name of the project. Will create a new directory with that name in the current directory [default: .]

Options:
  -f, --force       Force creation of project even if directory is non-empty
  -v, --verbose...  Increase logging verbosity
  -q, --quiet...    Decrease logging verbosity
  -h, --help        Print help

check - 在渲染之前构建项目以检查所有输入

语法

Usage: modelcards check [OPTIONS]

Options:
  -s, --source <SOURCE>  The source modelcard data file to be verified (defaults to sample.json or settings in config.toml)
  -v, --verbose...       Increase logging verbosity
  -q, --quiet...         Decrease logging verbosity
  -h, --help             Print help

build - 将模型卡项目构建到输出目录

语法

Usage: modelcards build [OPTIONS]

Options:
  -s, --source <SOURCE>  The source modelcard data file to be build (defaults to all in 'data' dir in project root)
  -o, --target <TARGET>  Outputs the generated site in the given path (by default 'card' dir in project root)
  -f, --force <FORCE>    Force building the modelcard even if output directory is non-empty [possible values: true, false]
  -v, --verbose...       Increase logging verbosity
  -q, --quiet...         Decrease logg

功能

从模板创建模型卡
分层设置（默认、config.toml、环境、CLI参数）
使用crossterm crate进行美化输出
通过inquire crate从终端接收数据输入

贡献

计划整合HuggingCard模板。

参考文献

不相关，但可能有未来用途

依赖项

~13–21MB
~369K SLoC