January 14, 2025
Paweł Dąbrowski
Founder / Software Engineer
FFaker is a popular Ruby library that generates sample data for test and development environments. It provides different data types, such as names, e-mails, credit card numbers, etc. The source code may seem straightforward, but while the codebase is relatively small, it still contains interesting examples of metaprogramming, thread usage, and deterministic probability implementation. Let’s dive into the code to understand how the FFaker library works under the hood.
For this investigation, I will use FFaker::Name.first_name
and FFaker::Name.unique.first_name
calls and explain what exactly happens when you call them in your application.
Let’s start with the exact definition of the first_name
method:
module FFaker
module Name
extend self
FIRST_NAME = (FIRST_NAMES_FEMALE + FIRST_NAMES_MALE).freeze
def first_name
fetch_sample(FIRST_NAME)
end
end
end
There are two interesting things here:
first_name
method becomes a singleton method of the FFaker::Name
module because of the extend self
line. This line mixes module methods as singleton methods of the module itself.
FIRST_NAMES_FEMALE
and FIRST_NAMES_MALE
constants in the codebase. They are created “on the fly” when they are requested.
The library provides many different data formats, and loading them on initialization would slow down the process. Conversely, we don’t need to load credit card data when we only want to generate names. Because of that, FFaker loads the data when you first request it.
When FFaker::Name::FIRST_NAMES_FEMALE
is called, Ruby can’t find the constant definition, and the const_missing
method is triggered before the error is raised. You are probably more familiar with the method_missing
method, which has a similar purpose but for methods, not constants.
The library keeps the test data in simple text files located in the lib/ffaker/data
directory. Values for the FFaker::Name::FIRST_NAMES_FEMALE
constant are located in the lib/ffaker/data/name/first_names_female
file. You probably see the naming pattern, it’s a module name and underscored constant name.
To create a path to test data, we need the constant name and first ancestor of the constant:
def const_missing(const_name)
if const_name =~ /[a-z]/ # Not a constant, probably a class/module name.
super const_name
else
mod_name = ancestors.first.to_s.split('::').last
data_path = "#{FFaker::BASE_LIB_PATH}/ffaker/data/#{underscore(mod_name)}/#{underscore(const_name.to_s)}"
data = k File.read(data_path, mode: 'r:UTF-8').split("\n")
const_set const_name, data
data
end
end
Once the data_path
is formatted, the lib simply loads the file's content and sets it as the constant value. Next time you call FFaker::Name.first_name
, the needed constants will exist. The k
method comes from the library source code, and it freezes every element of the array.
Of course, the library returns random values, so once the array of values is fetched, FFaker randomly selects the value, which is one of the most interesting parts of the source code.
Master the art of database optimization in Rails applications. Learn proven techniques to improve your Ruby on Rails application performance.
You might say that calling Array#sample
would be enough to randomize the test data. You are right and wrong at the same time. Indeed, the library uses Array#sample
but implements its own random number generator to provide the deterministic probability.
The deterministic probability means that the same output is produced every time the input is the same (random number). Here is the part from tests that verifies this:
# Accepts a block. Executes the block multiple times after resetting
# the internal Random Number Generator state and compared the results of
# each execution to make sure they are the same.
def assert_deterministic(options = {}, &block)
raise ArgumentError, 'Must pass a block' unless block
options = { message: 'Results are not repeatable' }.merge(options)
returns = Array.new(2) do
FFaker::Random.reset!
Array.new(5, &block)
end
assert(returns.uniq.length == 1, options[:message])
end
It’s time to see how the internal random number generator works.
A quick challenge: You want to generate a random number from 0 to 100 three times. How can you repeat this call and get the same results with each attempt?
The answer is to use the seed value. The seed value is just a long series of numbers used to initialize the internal state of the random number generator. If we would be able to initialize the internal state twice to the same number, we would get the same results:
seed = Random.new_seed
random = Random.new(seed)
random.rand(100) # => 10
random.rand(100) # => 67
random.rand(100) # => 18
random = Random.new(seed)
random.rand(100) # => 10
random.rand(100) # => 67
random.rand(100) # => 18
The implementation of the random number generator is simple in FFaker, but it allows to test and debug the library properly:
module FFaker
module Random
# Returns the current RNG seed.
def self.seed
@seed ||= ::Random.new_seed
end
# Sets the RNG seed and creates a new internal RNG.
def self.seed=(new_seed)
@seed = new_seed
reset!
new_seed
end
# Reset the RNG back to its initial state.
def self.reset!
@rng = new_rng
end
# Returns a random number using an RNG with a known seed.
def self.rand(max = nil)
return rng.rand(max) if max
rng.rand
end
# Returns the current Random object.
def self.rng
@rng ||= new_rng
end
# Returns a new Random object instantiated with #seed.
def self.new_rng
::Random.new(seed)
end
end
end
The FFaker::RandumUtils
module contains all definitions for methods used to randomly select values from an array of values. FFaker
uses sample
and shuffle
methods but with an additional random
parameter:
list.sample(random: FFaker::Random)
list.shuffle(random: FFaker::Random)
As I mentioned before, with the custom random number generator, you can reproduce the result of sampling or shuffling:
arr = [1, 2, 3, 4, 5]
seed = Random.new_seed
random = Random.new(seed)
arr.sample(random: random) # => 2
arr.sample(random: random) # => 5
random = Random.new(seed)
arr.sample(random: random) # => 2
arr.sample(random: random) # => 5
It works the same way for the Array#shuffle
method.
Master the art of database optimization in Rails applications. Learn proven techniques to improve your Ruby on Rails application performance.
Suppose you want to generate a test value with FFaker but need to ensure that it’s unique among other values generated in tests or during the same generation session. In that case, you can prepend the call using the unique
method. For example FFaker::Name.unique.first_name
.
This time, the library uses metaprogramming and threads to ensure the value's uniqueness. The logic for uniqueness is placed in the FFaker::UniqueUtils
class.
The source of the unique
method is simple:
def unique(max_retries = 10_000)
FFaker::UniqueUtils.add_instance(self, max_retries)
end
In our case, self
is FFaker::Name
. Then, the first_name
method is executed on instance of FFaker::UniqueUtils
but since such a method is not defined, the method_missing
method is triggered:
def method_missing(name, *args, **kwargs)
@max_retries.times do
result = @generator.public_send(name, *args, **kwargs)
next if previous_results[[name, args, kwargs]].include?(result)
previous_results[[name, args, kwargs]] << result
return result
end
raise RetryLimitExceeded, "Retry limit exceeded for #{name}"
end
Also, when the add_instance
was invoked, a new entry was added to Thread.current[:ffaker_unique_utils]
to keep the history of unique values for each generator.