6 个版本
0.1.5 | 2024年3月27日 |
---|---|
0.1.4 | 2024年3月27日 |
#1691 in 命令行工具
25KB
299 代码行
solr_post
这是一个简单的库和 CLI,用于将目录中的文件发布到 Solr 集合进行索引。它是一个基于 Rust 的替代方案(速度提高 10 倍以上),用于替代 Solr 默认包含的基于 Java 的 Solr 发布工具。它还包含默认 Solr 发布工具中未包含的附加功能,例如能够使用包含/排除正则表达式模式过滤文件。
库
该库提供了一个名为 solr_post()
的函数,您可以将一个 PostConfig
结构体以及进度回调函数传递给它,用于监控或记录进度。
基本示例
use solr_post::{PostConfig, solr_post};
std::path::PathBuf;
#[tokio::main]
async fn main() {
// Configure
let config = PostConfig {
host: String::from("localhost"),
port: 8983,
collection: String::from("my_collection"),
directory_path: PathBuf::from("/var/www/html"),
..Default::default()
};
// Make the Solr post request
solr_post(config, None, None, None).await;
}
在这个示例中,我们将递归地将位于 /var/www/html 的文件索引到运行在本地的 Solr 服务器 (localhost:8983) 上的 "my_collection" 集合。
使用进度回调的示例
use solr_post::{solr_post, PostConfig};
use std::io::{self, Write};
use std::sync::{Mutex, OnceLock};
#[tokio::main]
async fn main() {
static TOTAL_FILES_TO_INDEX: OnceLock<Mutex<u64>> = OnceLock::new();
TOTAL_FILES_TO_INDEX.get_or_init(|| Mutex::new(0u64));
let on_start = move |total_files: u64| {
// Retrieve the total_files_to_index from the static variable
let total_files_to_index = TOTAL_FILES_TO_INDEX.get().unwrap();
// Lock the mutex to update the value
let mut total_files_to_index = total_files_to_index.lock().unwrap();
// Initialize the total_files_to_index to total_files
*total_files_to_index = total_files;
println!(
"Start indexing {} files",
total_files_to_index
);
};
// log the progress as a percent complete
let on_next = |indexed_count: u64| {
let total_files_to_index = TOTAL_FILES_TO_INDEX.get().unwrap();
let total_files_to_index = total_files_to_index.lock().unwrap();
// get the percent complete as a float
let percetn_complete = (indexed_count as f64 / *total_files_to_index as f64) * 100.0;
// print the precent complete to presicion of 2 decimal places
print!(
"{}/{} indexed {:.2}%\r",
indexed_count, *total_files_to_index, percetn_complete
);
io::stdout().flush().unwrap(); // Flush the output buffer
};
let on_finish = || {
println!("\nFinished indexing.");
};
// Configure
let config = PostConfig {
host: String::from("localhost"),
port: 8983,
collection: String::from("my_collection"),
directory_path: std::path::PathBuf::from("/var/www/html"),
file_extensions: vec![
String::from("html"),
String::from("txt"),
String::from("pdf"),
],
..Default::default()
};
solr_post(
config,
Some(Box::new(on_start)),
Some(Box::new(on_next)),
Some(Box::new(on_finish)),
)
.await;
}
在这个示例中,我们将仅索引位于 /var/www/html 中的 html、txt 和 pdf 文件,并将其递归地索引到运行在本地的 Solr 服务器 (localhost:8983) 上的 "my_collection" 集合。我们还将提供回调以输出进度信息。运行此命令将生成类似以下内容的输出:
Start indexing 152 files
152/152 indexed 100.00%
Finished indexing.
CLI 使用方法
还包括一个二进制文件,您可以通过运行 cargo install solr_post
在命令行中使用。
Usage: solr-post -c <collection> [-h <host>] [-p <port>] [--url <url>] [-u <user>] -d <directory> [-f <file-extensions>] [--concurrency <concurrency>] [-e <exclude-regex>] [-i <include-regex>]
Post files to a solr collection
Options:
-c, --collection the solr collection to post to
-h, --host the host of the solr server defaults to localhost
-p, --port the port of the solr server defaults to 8983
--url base Solr update URL e.g.
http://localhost:8983/solr/my_collection/update if this is
set, the collection, host, and port are ignored
-u, --user basic auth user credentials e.g. "username:password"
-d, --directory the directory to search for files to post
-f, --file-extensions
the file extensions to post defaults to
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
e.g. "html,txt,json"
--concurrency concurrency level defauls to 8 the number of concurrent
requests to make to the solr server
-e, --exclude-regex
exclude files who's content contains this regex pattern e.g.
"no_index". only files files who's content does not contains
this pattern will be indexed. this is case insensitive. if
both exclude_regex and include_regex are set, exclude_regex
will takes precedence.
-i, --include-regex
include only files who's content contains this regex pattern
e.g. "index_me". only files files who's content contains
this pattern will be indexed. this is case insensitive. if
both exclude_regex and include_regex are set, exclude_regex
will takes precedence.
--help display usage information
示例
solr-post -c my_collection -d /var/www/html -f html,txt,pdf
依赖项
~12–25MB
~381K SLoC