Code Review

178 readers

2 users here now

Welcome to the programming.dev code review community! This is a place where you can review other people's code or request reviews on your own

Before posting please read the sections below (click on one to expand it)

What is a code review?

A code review is a process where people who did not write a piece of code look over it to find errors, suggest improvements, and make sure it follows coding standards

When doing a code review look for these aspects:

Correctness: Does the code behave as expected and meet the requirements?
Readability: Is the code easy to understand and maintain?
Efficiency: Are there more efficient ways to accomplish the same task?
Best Practices: Does the code follow established coding standards and best practices?
Security: Are there potential security vulnerabilities?
Scalability: Will the code perform well as the application scales?

Post Guidelines

Guidelines

Put the programming language you used for your code in square brackets ahead of your post title. e.g. [Python], [React], [C#], etc.
Provide a link to a spot where people can view your code (e.g. github, etc.) in your post body
In your post title put a small description of what the thing is that you want reviewed and then you can expand on that in more details if you want in your post body

Rules

Any code review requests you post after your first must be done after you have reviewed two other people's code (and then you need another two for each request after that as well).

Youre free to post your first code review request in this community though without needing to follow this

Credits

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

Wormhole

!react@programming.dev

founded 1 year ago

MODERATORS

Vacant

[Rust] Implementation of an algorithm I invented - RANDEVU (lemm.ee)

submitted 11 months ago* (last edited 11 months ago) by TypicalHog@lemm.ee to c/code_review

10 comments fedilink hide all child comments

I came up with an algorithm and implemented it in Rust.
I want to ensure the code is of sufficient quality before releasing it on Crates.io and GitHub. You can also find more info about the algorithm and its purpose there.

I'm eager to hear your feedback and thoughts and am open to ANY suggestions that may help me improve the code further.
The code should be performant, clean, easy to use, idiomatic, etc.
I'm also looking for potentially better names for certain variables and stuff.

Note that some things have changed a bit since the last release (for example - the hash calculation has been changed from blake3(blake3(OBJECT) || blake3(DATE)) to blake3::keyed_hash(DATE, OBJECT) to improve performance by eliminating 2 out of 3 hash calculations), but the README is still valid (and hopefully at least somewhat decent source of info about the algorithm - I did my best trying to explain stuff, but I'm still looking to improve it further in the future).

The functions can panic if they are given a date where the year is outside of the 0000 to 9999 range.
I'm not sure if I should make the function return the result/option instead of panicking since I'd like to avoid having to unwrap() the output and would prefer if the function returned just a simple u32/chrono::NaiveTime (but I'm open to changing my mind on this one).

I'm also curious if there is a cleaner way to create a key from a date. Previously, I've been using format("%Y-%m-%d") to convert it into a string slice and copy it into the key array, but I've found that approach to be very slow (over 50% of the CPU time was being spent just for that alone), so I opted out for an approach you can see below.

The newest function rdvt() is not yet documented anywhere, but I'll do my best to explain it here: In addition to an object and a date (same as rdv()), it also takes a rank (which is a positive integer - u32).
The function calculates the keyed blake3 hash of an object, using the date and a rank as key, and uses the resulting (pseudorandom) bits to generate and return a random uniform time (chrono::NaiveTime) between 0h and 24h (down to nanosecond precision).
Time is calculated by taking the first bit of the hash and in case it's a binary one - 12h is added to the time, then we add 6h if the 2nd bit of the hash is a one, 3h for the 3rd bit, 1.5h for 4th and so on until the increment reaches the small enough value where it doesn't contribute anything to the time (when it becomes less than 1ns, essentially). This means if all of the bits in the hash were zeros - time would be zero, and if they were all ones - time would be 23:59:59:999:999:999h (the very last and highest possible value). The code short-circuits and stops earlier than going through all 256 bits since we usually only need around 46 bits before the increment becomes smaller than 1ns (the code stops only in case the sum of tiny sub 1ns increments can't contribute enough to change the last digit in the total time (even if all of the rest of the bits in the hash were to be ones))):

Feel free to ask ANY questions regarding the code, the algorithm, its function, use cases, or anything else you'd like explained or clarified.

Here's the code (I haven't written any tests yet, so they are not included for review):

//! The official Rust implementation of the [RANDEVU](https://github.com/TypicalHog/randevu) algorithm
//!
//! # Example
//! ```rust
//! use chrono::Utc;
//! use randevu::{rdv, rdvt};
//!
//! fn main() {
//!     let object = "THE_SIMPSONS";
//!     let date = Utc::now();
//!     let rdv = rdv(object, &date);
//!     let rdvt = rdvt(0, object, &date);
//!
//!     println!("Object {} has RDV{} today with RDVT0 at {:?}", object, rdv, rdvt);
//! }
//! ```

use blake3;
use chrono::{DateTime, Datelike, NaiveTime, TimeDelta, Utc};
use itoa;

/// Returns the 32-byte KEY `[u8; 32]` created from a given DATE `&DateTime<Utc>` and an optional RANK `Option<u32>`
fn create_key(date: &DateTime<Utc>, rank: Option<u32>) -> [u8; 32] {
    let mut key = [0u8; 32];

    let mut year = Datelike::year(date);
    let mut month = Datelike::month(date);
    let mut day = Datelike::day(date);

    let mut year_len = 4;
    let mut prefix_len = 0;

    // Add a prefix (-/+) if the year is not between 0 and 9999 (-YYYY-MM-DD / +YYYY-MM-DD)
    if year < 0 {
        key[0] = b'-';
        prefix_len = 1;

        year = year.abs(); // Make year positive
    } else if year > 9999 {
        key[0] = b'+';
        prefix_len = 1;
    }

    // Adjust year_len for very large years (both positive and negative)
    if year > 9999 {
        year_len += 1;
        if year > 99999 {
            year_len += 1;
        }
    }

    let full_year_len = prefix_len + year_len;

    // If a rank is provided, write it into the key after the date, separated by an '_'
    if rank != None {
        let mut buffer = itoa::Buffer::new();
        let rank_str = buffer.format(rank.unwrap());
        key[7 + full_year_len..7 + full_year_len + rank_str.len()]
            .copy_from_slice(&rank_str.as_bytes()[..rank_str.len()]);

        key[6 + full_year_len] = b'_';
    }

    // Write the day into the key
    key[5 + full_year_len] = b'0' + (day % 10) as u8;
    day /= 10;
    key[4 + full_year_len] = b'0' + day as u8;

    key[3 + full_year_len] = b'-';

    // Write the month into the key
    key[2 + full_year_len] = b'0' + (month % 10) as u8;
    month /= 10;
    key[1 + full_year_len] = b'0' + month as u8;

    key[full_year_len] = b'-';

    // Write the year into the key
    for i in (prefix_len..full_year_len).rev() {
        key[i] = b'0' + (year % 10) as u8;
        year /= 10;
    }

    key
}

/// Returns the RDV value `u32` for an OBJECT `&str` on a specific DATE `&DateTime<Utc>`
///
/// **RDV = number of leading zero bits in blake3::keyed_hash(key: DATE, data: OBJECT)**
pub fn rdv(object: &str, date: &DateTime<Utc>) -> u32 {
    let hash = blake3::keyed_hash(&create_key(date, None), object.as_bytes());

    // Count the number of leading zero bits in the hash
    let mut rdv = 0;
    for &byte in hash.as_bytes() {
        rdv += byte.leading_zeros();

        if byte != 0 {
            break;
        }
    }

    rdv
}

/// Returns the RDVT time `DateTime<Utc>` of a given RANK `u32` for an OBJECT `&str` on a specific DATE `&DateTime<Utc>`
pub fn rdvt(rank: u32, object: &str, date: &DateTime<Utc>) -> DateTime<Utc> {
    let hash = blake3::keyed_hash(&create_key(date, Some(rank)), object.as_bytes());

    // Calculate the time using bits from the hash
    let mut total: f64 = 0.0;
    let mut increment = 12.0 * 60.0 * 60.0 * 1_000_000_000.0; // 12h in nanoseconds
    for (i, byte) in hash.as_bytes().iter().enumerate() {
        for j in (0..8).rev() {
            let bit = (byte >> j) & 1;
            if bit == 1 {
                total += increment;
            }
            increment /= 2.0;
        }
        // Stop once increments become too small to affect the total
        if i > 4 && (2.0 * increment) < (1.0 - total.fract()) {
            break;
        }
    }

    // Construct the RDVT time from total
    let rdvt = date.with_time(NaiveTime::MIN).unwrap() + TimeDelta::nanoseconds(total as i64);

    rdvt
}

you are viewing a single comment's thread
view the rest of the comments

[–] TypicalHog@lemm.ee 1 points 11 months ago* (last edited 11 months ago) (1 children)

Thank you for the answer!

This version is a huge overhaul of the currently live version. I could've made a separate branch for it, but I've never done that before since I haven't really had a need yet. I definitely need to learn how to do it tho, eventually (learn git branches).
I've decided to create version 2.0 and just publish it once I think it's good enough and when I'm confident I've finally decided on the final version of the algorithm.

The algorithm uses NaiveDate and NaiveTime because it's UTC-based. Users need to make sure they pass UTC time to the algorithm. Are you saying problems could arise if people pass a timezonelss local time to the function instead of UTC time?

I might try your suggestion about using the match statement, just not sure how to go about it or if it's even possible to do considering the way total is calculated, but it seems like it might not be very clean (or perhaps it will be even cleaner, idk). I haven't really used macros yet. I don't think we can go from lowest to highest due to the nature of the calculation, except if we go over all 32 bytes.

2.0 * increment < 1 - total.fraq() checks if the sum of all increments from the current point on are able to contribute enough to make the whole part of the total increase. Btw, note the < instead of >.

Example:
Let's imagine the increment is 0.1 and the total is 123.2. Since 2 * 0.1< 1 - 0.2 is true, we can conclude that even if we added all of the rest of the increments to the total (0.1 + 0.05 + 0.025 + 0.0125 + 0.00625 + ... = 0.2), will never be able to change total enough to cause its integer part to increase from 123 to 124, thus, we stop. If the increment was 0.1 and the total was 123.9, we would continue because it is possible the rest of the increments (somewhere between 0.0 to 0.2) could make 123.9 go over 124.

[–] sukhmel 1 points 11 months ago (1 children)

Regarding NaiveDate: if there is a type for date and time that enforces UTC or includes timezone information, use it. Documenting that the user must pass UTC time is a weak spot, whatever you can enforce in types, you should enforce in types.

Regarding the stopping condition: yeah, I got it, it may be a good place to add a comment about why (although it is quite straightforward, so I might have been too stupid with that)

Regarding the macro usage: well, I'm not sure how that should be, but I imagine using a macro to precalculate what are possible coefficients multiplied by increment. Since coefficients will be calculated in compile-time, you may ignore the extra precision (it will not require additional computation) and only check per byte.

Also, it may be so, that removing precision check and doing the math for all the positions you care about will be more efficient because of how optimisations work. Less branching is usually better, and separate bytes may be processed in parallel. That depends on the compiler and the target processor, and I'm far from being expert, so you may want to research that.

[–] TypicalHog@lemm.ee 2 points 11 months ago* (last edited 11 months ago) (1 children)

Hi, I've modified my code and implemented your suggestion about enforcing UTC with types. I edited the post to include the newest code. I'm just not sure if I should still call the variables/arguments "date" since now it's a DateTime.

[–] sukhmel 2 points 11 months ago

If you feel like datetime or date_time would be clearer, go with it

I will try to take a fresh look and tell how it feels some time later