11次发布
0.1.12 | 2024年6月1日 |
---|---|
0.1.11 | 2024年5月11日 |
0.1.7 | 2024年4月27日 |
#1036 in 解析器实现
每月102次下载
88KB
1.5K SLoC
XlsxBatchReader
一个纯Rust实现的Excel/OpenDocument电子表格文件批处理读取器。此crate支持Office 2007或更高版本文件格式(xlsx, xlsm等)。与其他Excel文件读取crate最明显的区别是它不会将整个文件读入内存,而是按批次读取。因此,它可以保持低内存使用,尤其是在读取大文件时。此crate支持日期和时间识别,以及获取合并单元格范围。为了提高速度,它仅支持读取数据,不支持公式和其他样式。
示例
- 简单读取器
use xlsx_batch_reader::{get_ord_from_tuple, read::XlsxBook, MAX_COL_NUM};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut book = XlsxBook::new("xlsx/test.xlsx", true)?;
for shname in book.get_visible_sheets().clone() {
// left_ncol should not be 0
// the tail empty rows will be ignored, if you want the length of cells in each row is fixed, you can set right_ncol to a number not MAX_COL_NUM
let sheet = book.get_sheet_by_name(&shname, 100, 0, 1, MAX_COL_NUM, false)?;
for batch in sheet {
let (rows_nums, rows_data) = batch?;
// empty rows will be skiped
for (row, cells) in rows_nums.into_iter().zip(rows_data) {
for (col, cel) in cells.into_iter().enumerate() {
// supprted types: String, i64, f64, bool, Date32, Timestamp(v0.1.4), NaiveDate, NaiveDateTime(v0.1.2), NaiveTime(v0.1.2)
let val: String = cel.get()?.unwrap();
println!("the value of {} is {val}; raw cell is {:?}", get_ord_from_tuple(row, (col+1) as u16)?, cel);
}
}
};
}
Ok(())
}
可能的输出
the value of A1 is a; raw cell is Shared("a")
the value of B1 is ; raw cell is Blank
the value of C1 is c; raw cell is Shared("c")
the value of D1 is d; raw cell is Shared("d")
the value of A2 is 1; raw cell is Number(1.0)
the value of B2 is ; raw cell is Blank
the value of C2 is s; raw cell is Shared("s")
the value of A4 is 2024-01-04; raw cell is Date(45295.58405092593)
the value of B4 is ; raw cell is Blank
the value of C4 is 4; raw cell is Number(4.0)
- 合并范围
use xlsx_batch_reader::{get_num_from_ord, is_merged_cell, read::XlsxBook};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut book = XlsxBook::new("xlsx/test.xlsx", true)?;
for shname in book.get_visible_sheets().clone() {
// left_ncol should not be 0
// each row will have 3 cells.
let mut sheet = book.get_sheet_by_name(&shname, 100, 0, 1, get_num_from_ord("C".as_bytes())?, true)?;
// this is not necessary, if you don't care about the headers.
let (_, _header) = sheet.get_header_row()?;
if let Some((_rows_nums, _rows_data)) = sheet.get_remaining_cells()? {
// some code
};
// should be called when all data have been scaned.
let merged_rngs = sheet.get_merged_ranges()?;
match is_merged_cell(merged_rngs, 2, get_num_from_ord("A".as_bytes())?) {
(true, None) => {
println!("a merged cell(not top left cell)");
},
(true, Some((nrow, ncol))) => {
println!("a merged cell(top left cell), taking {nrow} row(s) and {ncol} column(s)");
},
_ => {
println!("not a merged cell");
}
}
}
Ok(())
}
可能的输出
a merged cell(top left cell), taking 2 row(s) and 2 column(s)
- 读取日期和时间
use chrono::{NaiveDate, NaiveDateTime, NaiveTime};
use xlsx_batch_reader::{read::XlsxBook, Timestamp, MAX_COL_NUM};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut book = XlsxBook::new("xlsx/test.xlsx", true)?;
for shname in book.get_visible_sheets().clone() {
// left_ncol should not be 0
// the tail empty cells will be ignored, if you want the length of cells in each row is fixed, you can set right_ncol to a number not MAX_COL_NUM
let mut sheet = book.get_sheet_by_name(&shname, 100, 3, 1, MAX_COL_NUM, false)?;
if let Some((_, rows_data)) = sheet.get_remaining_cells()? {
let row = &rows_data[0];
let val_dt: NaiveDate = row[0].get()?.unwrap();
let val_tm: NaiveTime = row[0].get()?.unwrap();
let val_dttm: NaiveDateTime = row[0].get()?.unwrap();
let val_stamp: Timestamp = row[0].get()?.unwrap(); // since v0.1.4
println!("date:{}\ntime:{}\ndatetime:{}\ntimestamp:{}", val_dt, val_tm, val_dttm, val_stamp.utc());
};
}
Ok(())
}
可能的输出
date:2024-01-04
time:14:01:02
datetime:2024-01-04 14:01:02
timestamp:1704376862
- 缓存读取器(应启用缓存功能)
use xlsx_batch_reader::{read::XlsxBook, MAX_COL_NUM};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut book = XlsxBook::new("xlsx/test.xlsx", true)?;
for shname in book.get_visible_sheets().clone() {
// left_ncol should not be 0
// iter_batch will be supported in the future
// the tail empty cells will be ignored, if you want the length of cells in each row is fixed, you can set right_ncol to a number not MAX_COL_NUM
let sheet = book.get_cached_sheet_by_name(&shname, 100, 1, 1, MAX_COL_NUM, false)?;
println!("sheet: {}, row_ranges: {:?}, col_ranges: {:?}", sheet.sheet_name(), sheet.row_range(), sheet.column_range());
let (_, merge_info) = sheet.get_cell_value_with_merge_info("B2")?;
match merge_info {
(true, None) => {
println!("B2 is a merged cell(not top left cell)");
},
(true, Some((nrow, ncol))) => {
println!("B2 is a merged cell(top left cell), taking {nrow} row(s) and {ncol} column(s)");
},
_ => {
println!("B2 is not a merged cell");
}
};
let a4 = sheet.get_cell_value("A4")?;
println!("A4={:?}", a4);
}
Ok(())
}
可能的输出
sheet: Sheet1, row_ranges: (2, 4), col_ranges: (1, 4)
B2 is a merged cell(not top left cell)
A4=Date(45295.58405092593)
- 简单批处理写入器(应启用xlsxwriter功能)
use xlsx_batch_reader::{get_num_from_ord, read::XlsxBook, write::XlsxWriter};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut writer = XlsxWriter::new();
let mut book = XlsxBook::new("xlsx/test.xlsx", true)?;
for shname in book.get_visible_sheets().clone() {
// left_ncol should not be 0
// each row will have 3 cells.
let mut sheet = book.get_sheet_by_name(&shname, 100, 0, 1, get_num_from_ord("C".as_bytes())?, true)?;
// the sheet name will be write at the begin of each row
let pre_cells = vec![shname];
if let Some((rows_nums, rows_data)) = sheet.get_remaining_cells()? {
writer.append_rows("sheet", rows_nums, rows_data, &pre_cells)?;
// if you don't want row numbers to be writed before data, set nrows = vec![];
};
};
writer.save_as("xlsx/out.xlsx")?;
Ok(())
}
- 按列名写入行(应启用xlsxwriter功能)
use std::collections::HashMap;
use xlsx_batch_reader::write::XlsxWriter;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut writer = XlsxWriter::new();
// if you have many sheets call this for each sheet
writer.with_columns("Sheet1".to_string(), vec!["A".to_string(), "B".to_string(), "C".to_string(), "D".to_string()], true);
let row: HashMap<String, i32> = vec![("A".to_string(), 1), ("C".to_string(), 3)].into_iter().collect();
writer.append_row_by_name("Sheet1", row)?;
let row1: HashMap<String, &str> = vec![("A".to_string(), "A3"), ("B".to_string(), "B3"), ("D".to_string(), "D3")].into_iter().collect();
let row2: HashMap<String, &str> = vec![("B".to_string(), "B4"), ("C".to_string(), "C4")].into_iter().collect();
writer.append_rows_by_name("Sheet1", vec![row1, row2])?;
writer.save_as("xlsx/out.xlsx")?;
Ok(())
}
功能
功能 | 描述 |
---|---|
cached |
启用缓存xlsx工作表(在创建时将所有工作表数据读入内存) |
xlsxwriter |
启用简单xlsx写入器(依赖于rust_xlsxwriter crate) |
full |
启用功能cached和xlsxwriter |
待办事项
- 获取包含额外信息的单元格
依赖项
~7–14MB
~152K SLoC