Rust 实战|实现一个简单的 grep

本实战项目需要实现一个简单的类似grep的程序,具体要求是查找存在目标字符的行,并输出。

被查找的文件中的内容:

  • text.txt
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know!

How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!

1 读取命令行参数

use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();  // 返回一个Vector,无法处理非法Unicode字符

    let query = &args[1];
    let filename = &args[2];

    println!("Search for {}", query);
    println!("In file {}", filename);
}

2 读取文件内容

use std::env;
use std::fs;

fn main() {
    let args: Vec<String> = env::args().collect(); // 返回一个Vector,无法处理非法Unicode字符

    let query = &args[1];
    let filename = &args[2];

    println!("Search for {}", query);
    println!("In file {}", filename);

    let contents = fs::read_to_string(filename)
        .expect("Something went wrong when opening file!");

    println!("With text:\n{}", contents);
}

这个程序看似可以完成功能,但是存在不少瑕疵:

  1. main函数中存在许多功能,但是合理的做法是将代码拆分为多个函数,一个函数只负责一个功能,这样后期代码量大了易于维护;
  2. 打开文件错误处理不够明确和灵活;
  3. 查询字段queryfilename存在一定的关联,可以放在一个结构体中;
  4. 程序panic时输出的错误信息对用户来说难以理解。

3 重构程序

3.1 改进模块化

提取函数:

use std::env;
use std::fs;

fn main() {
    let args: Vec<String> = env::args().collect();

    let (query, filename) = parse_config(&args);

    println!("Search for {}", query);
    println!("In file {}", filename);

    let contents = fs::read_to_string(filename)
        .expect("Something went wrong when opening file!");

    println!("With text:\n{}", contents);
}

fn parse_config(args: &[String]) -> (&str, &str) {
    let query = &args[1];
    let filename = &args[2];

    (query, filename)
}

使用结构体和构造函数:

use std::env;
use std::fs;

struct Config {
    query: String,
    filename: String,
}

impl Config {
    fn new(args: &[String]) -> Config {
        let query = args[1].clone();
        let filename = args[2].clone();

        Config { query, filename }
    }
}

fn main() {
    let args: Vec<String> = env::args().collect(); // 返回一个Vector,无法处理非法Unicode字符

    let config = Config::new(&args);

    println!("Search for {}", config.query);
    println!("In file {}", config.filename);

    let contents =
        fs::read_to_string(config.filename).expect("Something went wrong when opening file!");

    println!("With text:\n{}", contents);
}

3.2 错误处理

处理参数不足的错误:

impl Config {
    fn new(args: &[String]) -> Config {
        if args.len() < 3 {
            panic!("Not enough arguments");
        }
        let query = args[1].clone();
        let filename = args[2].clone();

        Config { query, filename }
    }
}

上面的程序虽然可以输出错误信息,但是对于用户来说还有很多信息冗余。

use std::{env, process, fs};

struct Config {
    query: String,
    filename: String,
}

impl Config {
    fn new(args: &[String]) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("Not enough arguments!");
        }
        let query = args[1].clone();
        let filename = args[2].clone();

        Ok(Config { query, filename })
    }
}

fn main() {
    let args: Vec<String> = env::args().collect(); // 返回一个Vector,无法处理非法Unicode字符

    let config = Config::new(&args).unwrap_or_else(
        |err| {
            println!("Problem parsing arguments: {}", err);
            process::exit(1);
        }
    );

    println!("Search for {}", config.query);
    println!("In file {}", config.filename);

    let contents =
        fs::read_to_string(config.filename).expect("Something went wrong when opening file!");

    println!("With text:\n{}", contents);
}

处理业务处理的错误信息:

fn main() {
    let args: Vec<String> = env::args().collect(); // 返回一个Vector,无法处理非法Unicode字符

    let config = Config::new(&args).unwrap_or_else(|err| {
        println!("Problem parsing arguments: {}", err);
        process::exit(1);
    });

    println!("Search for {}", config.query);
    println!("In file {}", config.filename);

    if let Err(e) = my_grep::run(config) {
        println!("Application error: {}", e);
        process::exit(1);
    }
}

3.3 TDD测试驱动开发

  • lib.rs
use std::error::Error;
use std::fs;

pub struct Config {
    pub query: String,
    pub filename: String,
}

impl Config {
    pub fn new(args: &[String]) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("Not enough arguments!");
        }
        let query = args[1].clone();
        let filename = args[2].clone();

        Ok(Config { query, filename })
    }
}

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    // 返回的错误实现了Error这个trait
    let contents = fs::read_to_string(config.filename)?;
    for line in search(&config.query, &contents) {
        println!("{}", line);
    }
    Ok(())
}

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    let mut result = Vec::new();

    for line in contents.lines() {
        if line.contains(query) {
            result.push(line);
        }
    }

    result
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn one_result() {
        let query = "duct";
        let content = "\
Rust:
safe, fast, productive.
Pick three.";

        assert_eq!(vec!["safe, fast, productive."], search(query, content));
    }
}

在终端执行命令:cargo run body text.txt

4 使用环境变量

使用环境变量实现大小写搜索:

use std::error::Error;
use std::{env, fs};

pub struct Config {
    pub query: String,
    pub filename: String,
    pub case_sensitive: bool,
}

impl Config {
    pub fn new(args: &[String]) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("Not enough arguments!");
        }
        let query = args[1].clone();
        let filename = args[2].clone();
        let case_sensitive = env::var("CASE_INSENSITIVE").is_err(); // 只关心环境变量是否出现

        Ok(Config { query, filename, case_sensitive })
    }
}

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    // 返回的错误实现了Error这个trait
    let contents = fs::read_to_string(config.filename)?;
    
    let results = if config.case_sensitive {
        search(&config.query, &contents)
    } else {
        search_case_insensitive(&config.query, &contents)
    };
    
    for line in results {
        println!("{}", line);
    }
    Ok(())
}

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    let mut result = Vec::new();

    for line in contents.lines() {
        if line.contains(query) {
            result.push(line);
        }
    }

    result
}

pub fn search_case_insensitive<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    let mut result = Vec::new();
    let query = query.to_lowercase();   // 创建一个新的数据,不会获得所有权

    for line in contents.lines() {
        if line.to_lowercase().contains(&query) {
            result.push(line);
        }
    }

    result
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn case_sensitive() {
        let query = "duct";
        let content = "\
Rust:
safe, fast, productive.
Duct three.";

        assert_eq!(vec!["safe, fast, productive."], search(query, content));
    }

    #[test]
    fn case_insensitive() {
        let query = "duct";
        let content = "\
Rust:
safe, fast, productive.
Duct three.";

        assert_eq!(
            vec!["safe, fast, productive.", "Duct three."],
            search_case_insensitive(query, content)
        );
    }
}

博主使用MacOS,所以使用env CASE_INSENSITIVE=1设置临时环境变量。这个环境变量在执行完这条命令后就会失效。如果希望在当前终端会话中一直保持有效,则可以使用(&&:仅当前面的命令执行成功后才执行后续的命令):

5 标准输出重定向和标准错误

使用eprintln!宏将标准错误输出命令行中,而使用>将标准输出(println!宏输出的内容)重定向到指定文件中:

fn main() {
    let args: Vec<String> = env::args().collect(); // 返回一个Vector,无法处理非法Unicode字符

    let config = Config::new(&args).unwrap_or_else(|err| {
        eprintln!("Problem parsing arguments: {}", err);
        process::exit(1);
    });

    if let Err(e) = my_grep::run(config) {
        eprintln!("Application error: {}", e);
        process::exit(1);
    }
}

6 使用迭代器和闭包优化

6.1 使用迭代器获取命令行参数

实际上,env::args()返回的是std::env::Args类型,这个类型是一个实现了Iterator trait的类型,所以它就是一个迭代器。所以new函数可以直接获取这个迭代器作为参数,在new函数内部通过消耗这个迭代器获取迭代器包含的值的所有权,这样就规避了clone()的问题。

    let config = Config::new(env::args()).unwrap_or_else(|err| {    
        eprintln!("Problem parsing arguments: {}", err);
        process::exit(1);
    });
impl Config {
    pub fn new(mut args: env::Args) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("Not enough arguments!");
        }
        args.next();

        let query = match args.next() {
            Some(v) => v,
            None => {
                return Err("Can't get query string!");
            }
        };
        let filename = match args.next() {
            Some(v) => v,
            None => {
                return Err("Can't get file name!");
            }
        };
        let case_sensitive = env::var("CASE_INSENSITIVE").is_err(); // 只关心环境变量是否出现

        Ok(Config {
            query,
            filename,
            case_sensitive,
        })
    }
}

6.2 使用迭代器 + filter + 闭包获取变量

在搜索函数中,contents.lines()返回的也是一个迭代器,通过迭代器的filter()方法可以使用一个闭包返回一个满足条件的迭代器,并使用collect()方法将迭代器转换成一个集合。

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    // let mut result = Vec::new();
    //
    // for line in contents.lines() {
    //     if line.contains(query) {
    //         result.push(line);
    //     }
    // }
    //
    // result

    contents
        .lines()
        .filter(|line| line.contains(query))
        .collect()
}

pub fn search_case_insensitive<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    // let mut result = Vec::new();
    // let query = query.to_lowercase(); // 创建一个新的数据,不会获得所有权
    //
    // for line in contents.lines() {
    //     if line.to_lowercase().contains(&query) {
    //         result.push(line);
    //     }
    // }
    //
    // result

    contents
        .lines()
        .filter(|line| line.to_lowercase().contains(&query.to_lowercase())) // contains方法要求传入一个字符串切片
        .collect()
}

优化后的的程序源码:

  • main.rs
use std::{env, process};
use my_grep::Config;

fn main() {
    let config = Config::new(env::args()).unwrap_or_else(|err| {    
        eprintln!("Problem parsing arguments: {}", err);
        process::exit(1);
    });

    if let Err(e) = my_grep::run(config) {
        eprintln!("Application error: {}", e);
        process::exit(1);
    }
}
  • lib.rs
use std::error::Error;
use std::{env, fs};

pub struct Config {
    pub query: String,
    pub filename: String,
    pub case_sensitive: bool,
}

impl Config {
    pub fn new(mut args: env::Args) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("Not enough arguments!");
        }
        args.next();

        let query = match args.next() {
            Some(v) => v,
            None => {
                return Err("Can't get query string!");
            }
        };
        let filename = match args.next() {
            Some(v) => v,
            None => {
                return Err("Can't get file name!");
            }
        };
        let case_sensitive = env::var("CASE_INSENSITIVE").is_err(); // 只关心环境变量是否出现

        Ok(Config {
            query,
            filename,
            case_sensitive,
        })
    }
}

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    // 返回的错误实现了Error这个trait
    let contents = fs::read_to_string(config.filename)?;

    let results = if config.case_sensitive {
        search(&config.query, &contents)
    } else {
        search_case_insensitive(&config.query, &contents)
    };

    for line in results {
        println!("{}", line);
    }
    Ok(())
}

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    contents
        .lines()
        .filter(|line| line.contains(query))
        .collect()
}

pub fn search_case_insensitive<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    contents
        .lines()
        .filter(|line| line.to_lowercase().contains(&query.to_lowercase())) // contains方法要求传入一个字符串切片
        .collect()
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn case_sensitive() {
        let query = "duct";
        let content = "\
Rust:
safe, fast, productive.
Duct three.";

        assert_eq!(vec!["safe, fast, productive."], search(query, content));
    }

    #[test]
    fn case_insensitive() {
        let query = "duct";
        let content = "\
Rust:
safe, fast, productive.
Duct three.";

        assert_eq!(
            vec!["safe, fast, productive.", "Duct three."],
            search_case_insensitive(query, content)
        );
    }
}
转载声明:

除特殊声明外,本站所有文章均由 debussy 原创,均采用 CC BY-NC-SA 4.0 协议,转载请注明出处:Include Everything 的博客
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇