Back to blog

Under the hood of FFaker gem: metaprogramming meets deterministic probability

Ruby

January 14, 2025

Paweł Dąbrowski

Founder / Software Engineer

FFaker is a popular Ruby library that generates sample data for test and development environments. It provides different data types, such as names, e-mails, credit card numbers, etc. The source code may seem straightforward, but while the codebase is relatively small, it still contains interesting examples of metaprogramming, thread usage, and deterministic probability implementation. Let’s dive into the code to understand how the FFaker library works under the hood.

For this investigation, I will use FFaker::Name.first_name and FFaker::Name.unique.first_name calls and explain what exactly happens when you call them in your application.

Loading the test data on demand

Let’s start with the exact definition of the first_name method:

module FFaker
  module Name
    extend self

    FIRST_NAME = (FIRST_NAMES_FEMALE + FIRST_NAMES_MALE).freeze

    def first_name
      fetch_sample(FIRST_NAME)
    end
  end
end

There are two interesting things here:

  • The first_name method becomes a singleton method of the FFaker::Name module because of the extend self line. This line mixes module methods as singleton methods of the module itself.
  • You won’t find the definition of FIRST_NAMES_FEMALE and FIRST_NAMES_MALE constants in the codebase. They are created “on the fly” when they are requested.

The library provides many different data formats, and loading them on initialization would slow down the process. Conversely, we don’t need to load credit card data when we only want to generate names. Because of that, FFaker loads the data when you first request it.

Handling undefined constants

When FFaker::Name::FIRST_NAMES_FEMALE is called, Ruby can’t find the constant definition, and the const_missing method is triggered before the error is raised. You are probably more familiar with the method_missing method, which has a similar purpose but for methods, not constants.

The library keeps the test data in simple text files located in the lib/ffaker/data directory. Values for the FFaker::Name::FIRST_NAMES_FEMALE constant are located in the lib/ffaker/data/name/first_names_female file. You probably see the naming pattern, it’s a module name and underscored constant name.

To create a path to test data, we need the constant name and first ancestor of the constant:

def const_missing(const_name)
  if const_name =~ /[a-z]/ # Not a constant, probably a class/module name.
    super const_name
  else
    mod_name = ancestors.first.to_s.split('::').last
    data_path = "#{FFaker::BASE_LIB_PATH}/ffaker/data/#{underscore(mod_name)}/#{underscore(const_name.to_s)}"
    data = k File.read(data_path, mode: 'r:UTF-8').split("\n")
    const_set const_name, data
    data
  end
end

Once the data_path is formatted, the lib simply loads the file's content and sets it as the constant value. Next time you call FFaker::Name.first_name, the needed constants will exist. The k method comes from the library source code, and it freezes every element of the array.

Of course, the library returns random values, so once the array of values is fetched, FFaker randomly selects the value, which is one of the most interesting parts of the source code.

High-Performance Active Record

Master the art of database optimization in Rails applications. Learn proven techniques to improve your Ruby on Rails application performance.

Randomizing the test data

You might say that calling Array#sample would be enough to randomize the test data. You are right and wrong at the same time. Indeed, the library uses Array#sample but implements its own random number generator to provide the deterministic probability.

The deterministic probability means that the same output is produced every time the input is the same (random number). Here is the part from tests that verifies this:

# Accepts a block. Executes the block multiple times after resetting
# the internal Random Number Generator state and compared the results of
# each execution to make sure they are the same.
def assert_deterministic(options = {}, &block)
  raise ArgumentError, 'Must pass a block' unless block

  options = { message: 'Results are not repeatable' }.merge(options)

  returns = Array.new(2) do
    FFaker::Random.reset!
    Array.new(5, &block)
  end

  assert(returns.uniq.length == 1, options[:message])
end

It’s time to see how the internal random number generator works.

Random number generator

A quick challenge: You want to generate a random number from 0 to 100 three times. How can you repeat this call and get the same results with each attempt?

The answer is to use the seed value. The seed value is just a long series of numbers used to initialize the internal state of the random number generator. If we would be able to initialize the internal state twice to the same number, we would get the same results:

seed = Random.new_seed

random = Random.new(seed)
random.rand(100) # => 10
random.rand(100) # => 67
random.rand(100) # => 18

random = Random.new(seed)
random.rand(100) # => 10
random.rand(100) # => 67
random.rand(100) # => 18

The implementation of the random number generator is simple in FFaker, but it allows to test and debug the library properly:

module FFaker
  module Random
    # Returns the current RNG seed.
    def self.seed
      @seed ||= ::Random.new_seed
    end

    # Sets the RNG seed and creates a new internal RNG.
    def self.seed=(new_seed)
      @seed = new_seed
      reset!
      new_seed
    end

    # Reset the RNG back to its initial state.
    def self.reset!
      @rng = new_rng
    end

    # Returns a random number using an RNG with a known seed.
    def self.rand(max = nil)
      return rng.rand(max) if max

      rng.rand
    end

    # Returns the current Random object.
    def self.rng
      @rng ||= new_rng
    end

    # Returns a new Random object instantiated with #seed.
    def self.new_rng
      ::Random.new(seed)
    end
  end
end

Array's methods with random number generator

The FFaker::RandumUtils module contains all definitions for methods used to randomly select values from an array of values. FFaker uses sample and shuffle methods but with an additional random parameter:

list.sample(random: FFaker::Random)
list.shuffle(random: FFaker::Random)

As I mentioned before, with the custom random number generator, you can reproduce the result of sampling or shuffling:

arr = [1, 2, 3, 4, 5]
seed = Random.new_seed

random = Random.new(seed)
arr.sample(random: random) # => 2
arr.sample(random: random) # => 5

random = Random.new(seed)
arr.sample(random: random) # => 2
arr.sample(random: random) # => 5

It works the same way for the Array#shuffle method.

High-Performance Active Record

Master the art of database optimization in Rails applications. Learn proven techniques to improve your Ruby on Rails application performance.

Making the test data unique

Suppose you want to generate a test value with FFaker but need to ensure that it’s unique among other values generated in tests or during the same generation session. In that case, you can prepend the call using the unique method. For example FFaker::Name.unique.first_name.

This time, the library uses metaprogramming and threads to ensure the value's uniqueness. The logic for uniqueness is placed in the FFaker::UniqueUtils class.

The source of the unique method is simple:

def unique(max_retries = 10_000)
  FFaker::UniqueUtils.add_instance(self, max_retries)
end

In our case, self is FFaker::Name. Then, the first_name method is executed on instance of FFaker::UniqueUtils but since such a method is not defined, the method_missing method is triggered:

def method_missing(name, *args, **kwargs)
  @max_retries.times do
    result = @generator.public_send(name, *args, **kwargs)

    next if previous_results[[name, args, kwargs]].include?(result)

    previous_results[[name, args, kwargs]] << result
    return result
  end

  raise RetryLimitExceeded, "Retry limit exceeded for #{name}"
end

Also, when the add_instance was invoked, a new entry was added to Thread.current[:ffaker_unique_utils] to keep the history of unique values for each generator.

Join the newsletter. Pure knowledge straight to your inbox.

You can opt out anytime.