9个版本 (稳定版)

2.0.0	2021年3月10日
1.2.1	2021年1月19日
1.1.0	2020年12月31日
1.0.0	2020年11月17日
0.1.3	2020年11月12日

#596 in 数据结构

每月57次下载

MIT许可证

100KB
2K SLoC

后缀集合

针对子串搜索、最长公共前缀数组（lcp数组）的后缀数组和后缀树的快速实现。

Unicode

当前实现使用字节构建后缀结构，构建和搜索过程中不解码Unicode字符串。但如果在构建和搜索之前将Unicode字符串标准化，则结构支持Unicode（因为所有字节序列都被无歧义地解码）。此外，搜索和lcp返回索引，如字节数组，而不是解码后的Unicode字符串。

示例

SuffixTree

 use suff_collections::{array::*, tree::*, lcp::*};

 let word: &str = "Some word";
 let find: &str = "word";

 // construct suffix tree
 let st: SuffixTree = SuffixTree::new(word);

 // finds the entry position of the line 'find' in 'word'
 let res: Option<usize> = st.find(find);

 // construct lcp
 // lcp[i] = max_pref(sa[i], sa[i - 1]) && lcp.len() == sa.len()
 // let lcp: LCP<u8> = st.lcp::<u8>();
 // let lcp: LCP<u16> = st.lcp::<u16>();
 // let lcp: LCP<u32> = st.lcp::<u32>();
 let lcp: LCP<usize> = st.lcp::<usize>();

 // convert suffix tree to suffix array
 // let sa = SuffixArray::<u8>::from(st);
 // let sa = SuffixArray::<u16>::from(st);
 // let sa = SuffixArray::<u32>::from(st);
 let sa = SuffixArray::<usize>::from(st);

 // construct online suffix tree
 let mut ost: OnlineSuffixTree = OnlineSuffixTree::new();

 // add word to online suffix tree
 ost.add(word);

 // finds the entry position of the line 'find' in 'word'
 let res: Option<usize> = ost.find(find);

 // convert online suffix tree to suffix tree
 let st: SuffixTree = ost.finish();

 let graph = SuffixTree::new("mississippi").to_graphviz(&mut buff);
 println!("{}", &graph);

"mississippi"单词的后缀树

SuffixArray

 use suff_collections::{array::*, tree::*, lcp::*};

 // let word = "Some word";
 let word: &str = "Some word\0";
 let find: &str = "word";

 // construct suffix array
 // let sa = SuffixArray::<usize>::new_stack(word);
 // let sa = SuffixArray::<u8>::new(word);
 // let sa = SuffixArray::<u16>::new(word);
 // let sa = SuffixArray::<u32>::new(word);
 let sa = SuffixArray::<usize>::new(word);

 // construct lcp
 // lcp[i] = max_pref(sa[i], sa[i - 1]) && lcp.len() == sa.len()
 let lcp: LCP<usize> = sa.lcp();

 // finds the entry position of the line 'find' in 'word'
 // O(|find| * log(|word|))
 let res: Option<usize> = sa.find(find);

 // finds all the entry position of the line 'find' in 'word'
 // O(|find| * log(|word|))
 let res_all: &[usize] = sa.find_all(find);

 // finds the entry position of the line 'find' in 'word'
 // O(|word|)
 let res: Option<usize> = sa.find_big(&sa.lcp(), find);

 // finds all the entry position of the line 'find' in 'word'
 // O(|word|)
 let res_all: &[usize] = sa.find_all_big(&sa.lcp(), find);

 // convert suffix array to suffix tree
 let st = SuffixTree::from(sa);

 let word = "mississippi";
 let sa = SuffixArray::<u32>::new(word);
 let lcp = sa.lcp();
 println!("LCP    index    suffixe's");
 sa.iter().zip(lcp.iter()).for_each(|(&sa, &lcp)| {
           let suff = std::str::from_utf8(&word.as_bytes()[sa as usize..]).unwrap();
           println!("{}    {}    {}", lcp, sa, suff);
      }
 );

"mississippi"单词的后缀数组和lcp

LCP    index    suffixe's
0      11
0      10       i
1      7        ippi
1      4        issippi
4      1        ississippi
0      0        mississippi
0      9        pi
1      8        ppi
0      6        sippi
2      3        sissippi
1      5        ssippi
3      2        ssissippi

所有构建和搜索操作均为O(n)。对于后缀树实现，采用Ukkonen算法，对于后缀数组实现，采用SA-IS算法。

no-std suff_collections

9个版本 (稳定版)

后缀集合

Unicode

示例

无运行时依赖