The hash-object command

We will want to put our own data in our repositories, though. hash-object is basically the opposite of cat-file: it reads a file and computes its hash as an object, either storing it in the repository (if the -w flag is passed) or just printing its hash.

After this step, we need to refactor the code again like before. Refactoring makes it easy to develop and add more features to the code.

Add hash-object to argument parser

Let's add a new Command entry:

src/cli/mod.rs
#[derive(Debug)]
pub enum Command {
    Init {
        path: String,
    },
    CatFile {
        object_type: GitObjectType,
        object_hash: String,
    },
    HashObject {
        object_type: GitObjectType,
        filename: String,
        write: bool,
    },
}

If write is set (i.e. -w or --write is passed to the application) we write the object, otherwise, we print out the hash.

Now let's update parse_args()

pub fn parse_args() -> Result<Command, ParseArgumentsError> {
    let matches = command!()
        .subcommand(
            ClapCommand::new("init").arg(Arg::new("path").value_name("PATH").required(true)),
        )
        .subcommand(
            ClapCommand::new("cat-file")
                .arg(Arg::new("type").value_name("TYPE").required(true))
                .arg(Arg::new("object").value_name("OBJECT").required(true)),
        )
        .subcommand(
            ClapCommand::new("hash-object")
                .about("Compute object ID and optionally creates a blob from a file")
                .arg(
                    Arg::new("write")
                        .short('w')
                        .long("write")
                        .action(ArgAction::SetTrue),
                )
                .arg(
                    Arg::new("type")
                        .value_name("TYPE")
                        .short('t')
                        .long("type")
                        .default_value("blob"),
                )
                .arg(Arg::new("file").value_name("FILE").required(true)),
        )
        .get_matches();

In order to make write a flag, we need to add line 18. We also set a default value for type . Finally, we need to add another else if statement to check if we get a hash-object subcommand:

  } else if let Some(subcommand) = matches.subcommand_matches("hash-object") {
        let filename: String = subcommand.get_one::<String>("file").unwrap().clone();
        let object_type = subcommand.get_one::<String>("type").unwrap();
        let write = subcommand.get_flag("write");
        Ok(Command::HashObject {
            filename,
            object_type: object_type.parse()?,
            write,
        })
    } else {

Now let's update main.rs

src/main.rs
fn main() -> Result<()> {
    let command = parse_args().unwrap();
    match command {
        Command::Init { path } => {
            GitRepository::create(path)?;
        }
        Command::CatFile {
            object_type,
            object_hash,
        } => {
            cmd_cat_file(object_type, object_hash)?;
        }
        Command::HashObject {
            object_type,
            filename,
            write,
        } => {
            cmd_hash_object(filename, object_type, write)?;
        }
    };

    Ok(())
}

If the command is HashObject we call cmd_hash_object function:

fn cmd_hash_object(
    filename: String,
    object_type: git_object::GitObjectType,
    write: bool,
) -> Result<()> {
    let hash = if write {
        let current_directory = std::env::current_dir()?;
        let repo = GitRepository::find(&current_directory)?;
        git_object::write(repo, filename, object_type)?
    } else {
        let (hash, _) = git_object::create(filename, object_type)?;
        hash
    };

    println!("{hash}");

    Ok(())
}

Here if the write flag is set, we create a repository and write the object. Otherwise, we just create an object and get the hash. In any case, we print the hash.

write Function

Add a new function to the git_object module. It will call the create function we're going to add later:

src/git_object.rs
pub fn write(
    repo: GitRepository,
    filename: String,
    object_type: GitObjectType,
) -> Result<String, ObjectCreateError> {
    let (hash, data) = create(filename, object_type)?;
    let file_path = repo.directory_manager.sha_to_file_path(&hash);

    std::fs::create_dir_all(
        file_path
            .parent()
            .context("Failed to get the parent directory")?,
    )?;

    let mut z = ZlibEncoder::new(data.as_bytes(), Compression::fast());
    let mut buffer = Vec::new();
    z.read_to_end(&mut buffer)?;
    std::fs::write(file_path, buffer)?;
    Ok(hash)
}

First, we call create to create a new hash object. It is supposed to return the hash and the encoded data.

In line 7 we convert the hash to the actual file path using sha_to_file_path we implemented earlier.

Then in line 9, we create the directory containing the object file.

Lines 15-17 encode the object using zlib and finally, we write it in line 18 and we return the hash.

create function

Now let's add the create function. First, we read the file and based on the type, we call GitObject::serialize to get the serialized data.

In line 18 we write the header and serialized data to a vector through a BufWriter object. As you can see, in line 21, we convert object_type to string. It implies that we need to implement ToString trait for this enum. We do it later.

In line 30 we compute the hash using sha1_smol crate and return the hash and object.

src/git_object.rs
pub fn create(
    filename: String,
    object_type: GitObjectType,
) -> Result<(String, String), ObjectCreateError> {
    let input_data = std::fs::read_to_string(filename)?;
    let serialized = match object_type {
        GitObjectType::Commit => todo!(),
        GitObjectType::Tree => todo!(),
        GitObjectType::Tag => todo!(),
        GitObjectType::Blob => {
            let object = BlobObject { blob: input_data };
            object.serialize()
        }
    };

    let buffer = Vec::<u8>::new();
    let mut buf_writer = BufWriter::new(buffer);
    write!(
        buf_writer,
        "{} {}\x00{}",
        object_type.to_string(),
        serialized.len(),
        serialized
    )?;

    buf_writer.flush()?;
    let buffer = buf_writer
        .into_inner()
        .context("Failed to take buffer out of buf writer")?;
    let hash = sha1_smol::Sha1::from(&buffer).hexdigest();
    Ok((hash, String::from_utf8(buffer)?))
}

So please don't forget to add sha1_smol crate to your Cargo.toml file under [dependencies] section:

sha1_smol = { version = "1.0.0", features = ["std"] }

Implement Display for GitObjectType

According to the Rust doc, Implementing this trait for a type will automatically implement the ToString trait for the type, allowing the usage of the .to_string() method. Prefer implementing the Display trait for a type, rather than ToString

src/git_object.rs
impl Display for Type {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        let string = match self {
            Type::Commit => "commit",
            Type::Tree => "tree",
            Type::Tag => "tag",
            Type::Blob => "blob",
        };

        write!(f, "{}", string)
    }
}

Add ObjectCreateError

src/error.rs
#[derive(Debug, Error)]
pub enum ObjectCreateError {
    #[error(transparent)]
    Utf8Error(#[from] std::string::FromUtf8Error),

    #[error(transparent)]
    IoError(#[from] std::io::Error),

    #[error(transparent)]
    UnexpectedError(#[from] anyhow::Error),
}

Last updated