Rust notes
The meaining of "!" in Rust
In Rust, the "!" in println!
signifies that we are calling a macro rather than a regular function.
Here's a breakdown of what that means:
-
Macros vs. Functions: While they can look similar, the key difference is that macros are expanded at compile-time, meaning they write code that is then compiled with the rest of your program. Functions, on the other hand, are called at runtime. The
!
is a special syntax in Rust to make it clear that a macro is being invoked. -
Why
println!
is a Macro:println!
needs to be a macro to provide its powerful features, primarily for handling a variable number of arguments. A Rust function must declare a specific number and type of arguments it accepts. However,println!
can take a format string and a varying number of additional arguments to print, like so:
let x = 5;
let y = "hello";
println!("The value of x is {} and y is '{}'", x, y);
A regular function couldn't be defined to handle this flexibility, but a macro can be programmed to parse the format string and generate the appropriate code to handle the arguments at compile time.
In short, the "!" is a visual cue in Rust's syntax to let you know you're using a macro, which is a piece of code that generates other code.
The meaning of ":?" in Rust
The :?
in that println!
macro is a format specifier that tells Rust to print the variable using its debug format.
Here’s a more detailed explanation:
-
What is Formatting?: In Rust, when we use macros like
println!
orformat!
, the{}
curly braces are placeholders for variables. We can add extra information inside the braces to change how the variable is printed. -
{}
vs.{:?}
:{}
(Display Trait): This is the standard, "pretty" format, intended for user-facing output. For a type to be printable with{}
, it must implement theDisplay
trait. Simple types like numbers (i32
,f64
) and strings (&str
) have this by default.{:?}
(Debug Trait): This is the "debug" format, intended for developer-facing output. It provides a more detailed, programmer-friendly representation of the value. For a type to be printable with{:?}
, it must implement theDebug
trait.
-
Why is
{:?}
Needed Here?: The functionarray_and_vec()
likely returns a compound type like an array or aVec
(a vector). These types don't have a single, simple value to display to an end-user, so they don't implement theDisplay
trait by default. You can't print a whole vector with{}
.However, they do implement the
Debug
trait, which knows how to print all the elements within the collection for debugging purposes.
Example with a Struct:
If you create your own struct, it won't be printable by default. You can automatically make it printable for debugging by adding #[derive(Debug)]
.
// This annotation automatically implements the Debug trait for our struct.
#[derive(Debug)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let p = Point { x: 10, y: 20 };
// This would cause a compile error because Point doesn't implement Display.
// println!("My point is {}", p);
// This works perfectly because we derived the Debug trait.
// It will print a developer-friendly representation of the struct.
println!("My point is {:?}", p);
// Output: My point is Point { x: 10, y: 20 }
}
Bonus Tip: Pretty-Printing
You can make the debug output even more readable for complex types by using :#?
. This adds newlines and indentation.
#[derive(Debug)]
struct Point {
x: i32,
y: i32,
}
fn main() {
let p = Point { x: 10, y: 20 };
println!("My point, pretty-printed:\n{:#?}", p);
}
Output:
My point, pretty-printed:
Point {
x: 10,
y: 20,
}
The meaning of "&" in Rust
The &
in the function signature fn vec_loop(input: &[i32])
signifies a reference, but the combination &[i32]
as a whole means the function accepts a slice of i32
integers.
Let's break down what that means.
1. The &
Symbol: Borrowing
In Rust, the &
symbol is used to create a reference. A reference allows us to "borrow" access to data without taking ownership of it.
- Without
&
(Ownership Transfer): If the type were justVec<i32>
, the function would take ownership of the vector passed to it. The original owner in the calling code could no longer use that vector after the function call. - With
&
(Borrowing): By using a reference, the functionvec_loop
only borrows the data. The calling code retains ownership and can continue to use the data after the function finishes. This is more efficient as it avoids copying the entire collection of data.
2. The [i32]
part: The Slice
The [i32]
part of the type is not an array (which would be [i32; N]
with a fixed size N
). Instead, [i32]
represents a "slice"—a dynamically sized view into a contiguous sequence of i32
s.
A slice is powerful because it's an abstraction. It doesn't care if the underlying data comes from a Vec<i32>
, an array [i32; 5]
, or even part of another slice. It just represents "a sequence of integers in memory."
Putting It Together: &[i32]
(A Borrowed Slice)
When we combine them, &[i32]
means "a borrowed reference to a slice of i32
s". This is the idiomatic and most flexible way to accept a sequence of items in Rust.
Here’s why it's so useful in the vec_loop
function:
- Efficiency: The function avoids copying the data by just borrowing it.
- Flexibility: The function is more generic. It can now accept input from multiple sources without changing the signature.
Code Example:
Notice how the vec_loop
function can accept a reference to a Vec
and a reference to an array, and even a slice of an array, without any changes.
fn vec_loop(input: &[i32]) -> Vec<i32> {
let mut output = Vec::new();
for item in input {
// We're just doubling the items for this example.
output.push(item * 2);
}
output
}
fn main() {
// 1. Call it with a Vector
let my_vec = vec![1, 2, 3];
// Rust automatically converts &Vec<i32> into &[i32] (this is called "deref coercion")
let result_from_vec = vec_loop(&my_vec);
println!("From Vec: {:?}", result_from_vec); // Prints "From Vec: [2, 4, 6]"
println!("Original vec is still usable: {:?}", my_vec); // Prints "[1, 2, 3]"
// 2. Call it with an Array
let my_array = [10, 20, 30];
let result_from_array = vec_loop(&my_array);
println!("From Array: {:?}", result_from_array); // Prints "From Array: [20, 40, 60]"
// 3. Call it with a slice of an array
let result_from_slice = vec_loop(&my_array[1..3]); // Pass only [20, 30]
println!("From slice of Array: {:?}", result_from_slice); // Prints "From slice of Array: [40, 60]"
}
Summary Table
Parameter Type | Meaning | Can it accept a Vec ? |
Can it accept an array? | Ownership |
---|---|---|---|---|
Vec<i32> |
An owned vector. | Yes | No | Takes ownership. |
&Vec<i32> |
A reference to a vector. | Yes | No | Borrows. |
&[i32] |
A reference to a slice. | Yes | Yes | Borrows. |
The meaning of "|" in Rust (anonymous function)
The |
characters are used to define a closure.
A closure is a short, anonymous (unnamed) function that can be passed around as a variable or as an argument to other functions.
Let's break down:
input.iter().map(|element| element + 1).collect()
-
.iter()
: This creates an iterator from theinput
slice. An iterator is a special object that lets you process a sequence of items one by one. -
.map(...)
: This is a method on the iterator that transforms each item. It takes a closure as its argument, and it calls that closure on every single element the iterator produces. -
|element| element + 1
: This is the closure itself.- The
|...|
part: This defines the parameters for the anonymous function. In this case,|element|
means "this closure accepts one argument, and we will call itelement
inside the closure's body." - The
element + 1
part: This is the body of the closure. It's the code that gets executed for each element. It takes the inputelement
and returns its value plus one.
- The
So, for each number that .iter()
provides from the input
slice, the .map()
function calls our closure. The number is passed in as element
, and the closure returns element + 1
.
A Simple Analogy
Imagine we have a list of numbers and you tell a friend: "For every number I give you, add one to it and tell me the result."
- The list of numbers is
input.iter()
. - Our instruction, "add one to it," is the closure
|element| element + 1
. - The process of applying that instruction to each number is
.map()
. - Finally,
.collect()
is the gathering of our friend's answers into a new list.
How it could look with a regular function
To make it clearer, we could achieve the same result by defining a separate, named function and passing it to map
.
// A named function that does the same thing as the closure.
fn add_one(n: &i32) -> i32 {
n + 1
}
fn vec_map_example_with_function(input: &[i32]) -> Vec<i32> {
// Note: iter() yields references (`&i32`), so our function
// must accept a reference.
input.iter().map(add_one).collect()
}
fn main() {
let numbers = vec![1, 2, 3];
let result = vec_map_example_with_function(&numbers);
println!("{:?}", result); // Prints "[2, 3, 4]"
}
The closure |element| element + 1
is just a much more concise, inline way of writing the add_one
function right where you need it.
The meaning of "<T>" type in Rust
Let's take the following example: fn largest<T>(list: &[T]) -> &T
.
High-Level Meaning
In plain English, this signature means:
"I am defining a function named largest
that works on a slice of any type, as long as all elements in the slice are of the same type. It doesn't take ownership of the slice, and it will return a reference to the single largest element found within that slice."
Detailed Breakdown
1. fn largest
This is the standard way to start a function definition. It declares a function named largest
.
2. <T>
(The Generic Type Parameter)
This is the core of what makes the function "generic."
T
is a placeholder for a specific, concrete type. Think of it as a variable for a data type. It's conventional to useT
for "Type."- By using
<T>
, we are telling the Rust compiler: "This function can operate on any typeT
we want, whether it'si32
,f64
,char
, or astruct
we made ourselves. The compiler will figure out whatT
is when the function is called." - This allows us to write one function that works for many different types, avoiding code duplication (e.g., not having to write
largest_i32
,largest_char
, etc.).
3. list: &[T]
(The Parameter)
This defines the function's single input parameter, named list
.
[T]
: This is a "slice" of elements of our generic typeT
. A slice is a view into a block of memory, like a part of an array or aVec
.&
: This is the "borrow" symbol. It means the function is taking a reference to the slice. It does not take ownership of the data.- Putting it together (
&[T]
): The function accepts a borrowed slice of items of some typeT
. This is highly efficient and flexible because:- Efficient: The program doesn't need to copy the entire list of data into the function. It just passes a pointer to the original data.
- Flexible: It can accept a reference to a
Vec<T>
, an array[T; N]
, or a part of either.
4. -> &T
(The Return Type)
This specifies what the function will return.
->
: The arrow syntax simply indicates that what follows is the return type.&T
: The function returns a reference to a value of typeT
. This is very important. It means the function is not returning a new copy of the largest value. Instead, it's returning a pointer that points directly to the largest element's location within the originallist
that was passed in.
The Missing Piece: The Trait Bound
As written, that function signature is incomplete and will not compile. The compiler will ask: "You told me T
could be any type, but how do I know I can compare two values of type T
to see which one is larger?"
To fix this, we need to add a trait bound to guarantee that T
is a type that can be ordered.
The corrected, working signature looks like this:
fn largest<T: PartialOrd>(list: &[T]) -> &T {
: PartialOrd
: This is the "trait bound." It constrains the generic typeT
.- Meaning: It says, "
T
can be any type, as long as it implements thePartialOrd
trait." ThePartialOrd
trait is what gives types the ability to be compared with operators like>
and<
. Standard types likei32
andchar
already implement this.
Complete Example
Here is how the full function would look and be used:
// The complete, working signature with the trait bound.
// This function requires that the type T has an ordering.
fn largest<T: PartialOrd>(list: &[T]) -> &T {
let mut largest = &list[0];
for item in list {
if item > largest {
largest = item;
}
}
largest
}
fn main() {
// Example 1: Use it with numbers
let number_list = vec![34, 50, 25, 100, 65];
// Here, the compiler sees you passed a `&Vec<i32>`, so it sets T = i32.
let result = largest(&number_list);
println!("The largest number is {}", result); // Prints "The largest number is 100"
// Example 2: Use it with characters
let char_list = vec!['y', 'm', 'c', 'a'];
// Here, the compiler sets T = char.
let result = largest(&char_list);
println!("The largest char is {}", result); // Prints "The largest char is y"
}
Summary Table
Part of Signature | Meaning |
---|---|
fn largest |
Declares a function named largest . |
<T> |
Makes the function generic; T is a placeholder for any type. |
: PartialOrd |
(Required for it to work) A trait bound, ensuring T can be compared. |
list: &[T] |
It takes one argument, list , which is a borrowed slice of type T . |
-> &T |
It returns a reference to a value of type T from within the original slice. |
The meaning of "option" in Rust function signature
Example: fn maybe_icecream(hour_of_day: u16) -> Option<u16>
The Option
type is one of the most important and fundamental concepts in Rust for writing safe and robust code.
In simple terms, Option
is an enum that represents the possibility of a value being either present or absent.
Think of it as a box that could either contain one item or be empty. We can't know for sure until we check.
The Problem Option
Solves: The End of null
In many other programming languages (like Java, C#, Python, JavaScript), there is a concept of null
or None
. A null
value means "no value here." This often leads to runtime errors.
Rust solves this problem by completely removing null
and using the Option
enum instead.
The Two States of Option
The Option
enum is defined like this:
enum Option<T> {
Some(T), // A value of type T is present.
None, // No value is present.
}
-
Some(T)
: This variant means the "box" contains a value. TheT
is a generic type parameter, meaningOption
can hold any type of value. For example,Some(5)
is anOption<i32>
, andSome("hello")
is anOption<&str>
. -
None
: This variant means the "box" is empty. It's the explicit way of saying "there is no value."
Applying This to Function Signature
Let's look at the specific function:
fn maybe_icecream(hour_of_day: u16) -> Option<u16>
- Function Name:
maybe_icecream
is well-named. It hints that you might not always get ice cream. - Input:
hour_of_day: u16
is the time of day (as a number). - Return Type:
-> Option<u16>
is the crucial part. This signature makes a promise to the compiler:- "This function will always return an
Option<u16>
." - If it returns
Some(u16)
, it means "Yes, you get ice cream," and the value inside might represent the number of scoops. - If it returns
None
, it means "Sorry, no ice cream for you."
- "This function will always return an
This is powerful because the compiler forces whoever calls this function to handle both possibilities (Some
and None
).
How to Implement and Use It
Here’s a possible implementation and how we would use the result:
// Let's say we can only have ice cream between 8 PM (20) and 10 PM (22).
fn maybe_icecream(hour_of_day: u16) -> Option<u16> {
if hour_of_day >= 20 && hour_of_day <= 22 {
// It's a valid time! Return Some value.
// Let's say we get 2 scoops.
Some(2)
} else {
// It's not time for ice cream. Return None.
None
}
}
fn main() {
// Let's check at 9 PM (21:00)
let nine_pm_result = maybe_icecream(21);
// To use the Option, we must check what's inside.
// The idiomatic way to do this is with a 'match' statement.
match nine_pm_result {
Some(scoops) => {
println!("Yay! I get {} scoops of ice cream!", scoops);
},
None => {
println!("Aww, no ice cream for me.");
}
}
// Now let's check at 10 AM (10:00)
let ten_am_result = maybe_icecream(10);
match ten_am_result {
Some(scoops) => {
println!("Yay! I get {} scoops of ice cream!", scoops);
},
None => {
println!("Aww, no ice cream for me."); // This line will be printed.
}
}
}
Summary
Concept | Meaning |
---|---|
Option<T> |
An enum representing a value that could be present or absent. |
Some(T) |
The variant for a present value. |
None |
The variant for an absent value. |
Why? | To eliminate null -related bugs by making the possibility of absence an explicit part of the type system that the compiler forces you to handle. |
The meaning of "?" in Rust
The ?
in a function is the try operator, also known as the question mark operator. It is a powerful piece of syntactic sugar for propagating errors.
In short, the ?
unwraps a Result
or Option
, and if it's an error (Err
) or None
, it immediately returns from the current function.
Let's break it down in the context of this function:
fn abs_i64(inputs: &[Series]) -> PolarsResult<Series> {
let s = &inputs[0];
let ca: &Int64Chunked = s.i64()?;
Ok(out.into_series())
}
1. The Context: A Function That Can Fail
First, the function's signature:
fn abs_i64(inputs: &[Series]) -> PolarsResult<Series>
The key part is the return type: PolarsResult<Series>
. In Polars, PolarsResult<T>
is an alias for Rust's standard Result<T, PolarsError>
. This signature tells us:
"The abs_i64
function will either succeed and return an Ok(Series)
, or it will fail and return an Err(PolarsError)
."
Because this function is explicitly designed to return a Result
, we are allowed to use the ?
operator inside it.
2. The Line with the ?
Operator
let ca: &Int64Chunked = s.i64()?;
Here's what's happening step-by-step:
-
s.i64()
is called.s
is a PolarsSeries
, which is a column of data. ASeries
can hold data of any type (integers, floats, strings, etc.).- The
.i64()
method is an attempt to get ani64
(64-bit integer) representation of the series. - This attempt can fail! What if the
Series
actually contains strings? We can't treat a column of"hello"
as a column of integers. - Therefore,
.i64()
does not return a plain&Int64Chunked
. It returns aResult
: specificallyPolarsResult<&Int64Chunked>
.
-
The
?
operator is applied to theResult
. The?
checks theResult
thats.i64()
returned and does one of two things:-
Success Case: If
s.i64()
returnsOk(&Int64Chunked)
, the?
operator unwraps it, extracting the&Int64Chunked
value from inside theOk
. This extracted value is then assigned to the variableca
, and the function continues executing normally. -
Failure Case: If
s.i64()
returnsErr(PolarsError)
, the?
operator immediately stops the execution of theabs_i64
function and returns thatErr(PolarsError)
to whatever code calledabs_i64
. The code after the?
is never reached.
-
What the Code Looks Like Without the ?
To fully appreciate the ?
operator, it's helpful to see the code you would have to write without it. The line let ca = s.i64()?;
is a shortcut for this match
statement:
let ca: &Int64Chunked = match s.i64() {
Ok(chunked_array) => {
// The result was Ok, so we can proceed with the value.
chunked_array
},
Err(error) => {
// The result was an error. We must stop and return this error
// from the `abs_i64` function. The `.into()` is used to make sure
// the error type is compatible.
return Err(error.into());
}
};
The ?
operator is a much more concise and readable way to handle errors that we simply want to pass up the call stack.
Summary Table
The ?
operator works on both Result
and Option
types.
If the expression is a Result<T, E> ... |
The ? operator... |
---|---|
Ok(value) |
...extracts value of type T . |
Err(error) |
...returns Err(error.into()) from the current function. |
If the expression is an Option<T> ... |
The ? operator... |
---|---|
Some(value) |
...extracts value of type T . |
None |
...returns None from the current function. |
Trait bounds in Rust
Example function:
fn impl_abs_numeric<T>(ca: &ChunkedArray<T>) -> ChunkedArray<T>
where
T: PolarsNumericType,
T::Native: Signed,
{
// NOTE: there's a faster way of implementing `abs`, which we'll
// cover in section 7.
ca.apply(|opt_v: Option<T::Native>| opt_v.map(|v: T::Native| v.abs()))
}
High-Level Purpose
In simple terms, this function, impl_abs_numeric
, calculates the absolute value for every element in a Polars ChunkedArray
(which is the core data structure for a column/Series). It is designed to work on any numeric type that can have a sign (like i32
, i64
, f64
) and to correctly handle any null values in the data.
Detailed Breakdown
1. The Function Signature
fn impl_abs_numeric<T>(ca: &ChunkedArray<T>) -> ChunkedArray<T>
fn impl_abs_numeric<T>
: This declares a generic function namedimpl_abs_numeric
. The<T>
means it can work with multiple types.T
is a placeholder for a specific Polars data type (likeInt32Type
,Float64Type
, etc.).ca: &ChunkedArray<T>
: The function takes one argument,ca
, which is a reference (&
) to aChunkedArray
of typeT
. AChunkedArray
is Polars' internal representation of a column of data. Using a reference is efficient because it avoids copying the entire column.-> ChunkedArray<T>
: The function returns a brand newChunkedArray
of the same typeT
. This new array will contain the results of the calculation.
2. The where
Clause (Trait Bounds)
This is the most complex but also the most powerful part of the signature. It puts constraints on what the generic type T
is allowed to be.
where
T: PolarsNumericType,
T::Native: Signed,
-
T: PolarsNumericType
: This is the first constraint. It says, "T
can be any type, as long as it implements thePolarsNumericType
trait." This trait is used to mark all of Polars' numeric types, likeInt32Type
,UInt64Type
,Float32Type
, etc. This ensures the function can't be accidentally called on a column of strings. -
T::Native: Signed
: This is the second, more interesting constraint.T::Native
: This is an associated type. For aPolarsNumericType
,Native
is the actual, underlying Rust primitive type. For example:- If
T
isInt32Type
, thenT::Native
isi32
. - If
T
isFloat64Type
, thenT::Native
isf64
.
- If
: Signed
: This constrains the associated typeT::Native
. TheSigned
trait is implemented for all of Rust's signed numeric types (i8
,i16
,i32
,i64
,isize
).- Putting it together: This constraint says, "The underlying Rust type for our column must be a signed number." This is crucial because the concept of "absolute value" (
.abs()
) only makes sense for numbers that can be negative. The compiler will now prevent you from calling this function on an unsigned integer column (likeu32
), which is correct behavior.
3. The Function Body
ca.apply(|opt_v: Option<T::Native>| opt_v.map(|v: T::Native| v.abs()))
-
ca.apply(...)
: The.apply()
method on aChunkedArray
is a powerful tool. It iterates through every single value in the column and applies a function (a closure) to it. It then collects the results into a newChunkedArray
. -
|opt_v: Option<T::Native>| ...
: This is the closure that.apply()
will execute for each value.opt_v
: This is the argument to our closure. It represents one value from the column.Option<T::Native>
: Notice the type! The value is wrapped in anOption
. This is how Polars handles nulls. If the value in the column is valid (e.g.,-5
),opt_v
will beSome(-5)
. If the value is null,opt_v
will beNone
.
-
opt_v.map(|v: T::Native| v.abs())
: This is the core logic that handles theOption
..map()
: This is a standard method on Rust'sOption
type. It's a clean way to apply a function to the value inside anOption
, without messyif/else
checks.- How it works:
- If
opt_v
isSome(v)
(e.g.,Some(-5)
), the.map()
method will execute the inner closure|v| v.abs()
on the contained value-5
. The closure calculates(-5).abs()
, which is5
, and.map()
re-wraps it, returningSome(5)
. - If
opt_v
isNone
, the.map()
method does nothing and immediately returnsNone
.
- If
This single line elegantly handles all null values by passing them through, while applying the .abs()
function only to the valid, non-null data.
Summary Walkthrough
Let's imagine ca
is a column of i32
s: [10, -20, null, 30]
.
impl_abs_numeric
is called. The compiler confirms thatT
(Int32Type
) is aPolarsNumericType
and itsNative
type (i32
) isSigned
. Everything is valid.ca.apply
starts iterating.- First element is
10
:opt_v
isSome(10)
..map
runs|v| v.abs()
on10
. Result:Some(10)
. - Second element is
-20
:opt_v
isSome(-20)
..map
runs|v| v.abs()
on-20
. Result:Some(20)
. - Third element is
null
:opt_v
isNone
..map
does nothing. Result:None
. - Fourth element is
30
:opt_v
isSome(30)
..map
runs|v| v.abs()
on30
. Result:Some(30)
. apply
collects these results (Some(10)
,Some(20)
,None
,Some(30)
) and constructs a newChunkedArray
, which is[10, 20, null, 30]
. This is the return value.
Difference between ":" and "::"
:
(Colon) is the type annotator. It is used to say "this variable/parameter/field is of this type".::
(Double Colon) is the path separator. It is used to access items within a namespace, like a module, enum, or type.
They are not interchangeable and have completely different meanings. There is no concept of "type assignment" with ::
.
The Colon (:
): The Type Annotator
The single colon (:
) is always used to declare the type of something.
Think of it as answering the question: "What type is this?"
The Double Colon (::
): The Path Separator
The double colon (::
), often called the "turbofish" when used with <>
, is a namespace resolver. It allows us to access something that is defined inside something else.
Think of it as answering the question: "Where can I find this?" or as showing a relationship of "in".
1. Accessing Items in a Module: This is the most common use. It's like a directory path for our code.
// "Use the `HashMap` type which is found in the `collections` module,
// which is in the `std` (standard library) crate."
use std::collections::HashMap;
2. Calling Associated Functions (like static methods): These are functions that belong to a type itself, not to a specific instance of it.
// "Call the `new` function that is associated with the `String` type."
let s = String::new();
// "Call the `from` function that is associated with the `Vec` type."
let v = Vec::from([1, 2, 3]);
3. Accessing Enum Variants: Enum variants live inside the namespace of the enum.
// "The `my_option` variable holds the `Some` variant of the `Option` enum."
let my_option: Option<i32> = Option::Some(5);
// "The `my_result` variable holds the `Err` variant of the `Result` enum."
let my_result: Result<i32, &str> = Result::Err("Something went wrong");
Summary Table
Symbol | Name | Primary Purpose | Analogy | Example |
---|---|---|---|---|
: |
Colon | Type Annotation | "is a" | let age: u8 = 30; ("age is a u8") |
:: |
Double Colon | Path Resolution | "in" | Option::Some(5) ("Some which is in Option") |