Concurrency and Parallelism Sandbox
This project provides a basic concurrency problem useful for exploring
different multitasking paradigms available in Ruby. Fundamentally, we have a
set of miners and a set of movers. A miner takes some amount of time to
mine ore, which is given to a mover. When a mover has enough ore for a full
batch, the delivery takes some amount of time before more ore can be
loaded.
A miner is given some depth (e.g. 1 to 100) to mine down to, which will
take an increasing amount of time with depth. More depth provides greater ore
results as well. Ore is gathered at each depth; either a fixed amount or
randomized, based on depth. The amount of time spent mining each level is
independent and may be randomized.
In this case, miners are rewarded by calculating fibonacci(depth)
, using
classic, inefficient fibonacci.
fibonacci(35)
yields around 10M ore, while fibonacci(30)
yields under
1M ore.
A mover has a batch size, say 10. As the mover accumulates ore over time,
once the batch size is reached, the mover delivers the ore to the destination.
Larger batches take longer. The delivery time can be randomized.
The time and work spent delivering ore can be simulated three ways,
configured via :work_type
:wait
- represents waiting on IO; calls sleep(duration)
:cpu
- busy work; calls fibonacci(30)
until duration
is reached:instant
- useful for testing; returns immediatelyYou’ll want to use Ruby 3.1+ (CRuby) to make the most of Ractors, Fibers,
and Fiber::Scheduler.
This gem can be used on JRuby and TruffleRuby, but several concurrency options
are not available: process forking, Ractors, and Fiber::Scheduler.
However, their threading performance exceeds CRuby’s as they don’t have a
Global VM Lock (GVL).
Right now, a gem installation only provides the Miner Mover library.
Use the Development process below to access all of the demonstration
scripts showing the different concurrency strategies.
gem install miner_mover
For Ruby 3.1+ on linux, you’ll also want:
gem install fiber_scheduler io-event
git clone https://github.com/rickhull/miner_mover
cd miner_mover
bundle config set --local with development
bundle install
Try: rake -T
to see available Rake tasks
$ rake -T
rake config # Run demo/config.rb
rake demo # Run all demos
rake fiber # Run demo/fiber.rb
rake fiber_scheduler # Run demo/fiber_scheduler.rb
rake jvm_demo # Run JVM compatible demos
rake process_pipe # Run demo/process_pipe.rb
rake process_socket # Run demo/process_socket.rb
rake ractor # Run demo/ractor.rb
rake serial # Run demo/serial.rb
rake test # Run tests
rake thread # Run demo/thread.rb
Try: rake test
Included demonstration scripts can be executed via Rake tasks.
The following order is recommended:
rake config
rake serial
rake fiber
rake fiber_scheduler
rake thread
rake process_pipe
rake process_socket
Try each task; there will be about 6 seconds worth of many lines of output
logging. These rake tasks correspond to the scripts within demo/
.
LOAD_PATH
Rake tasks take care of LOAD_PATH
, so the following is
only necessary when not using rake tasks:
~/miner_mover
-I lib
as a flag to ruby
or irb
to update LOAD_PATH
so thatrequire 'miner_mover'
will work.require_relative
irb
$ irb -I lib
irb(main):001:0> require 'miner_mover/worker'
=> true
irb(main):002:0> include MinerMover
=> Object
irb(main):003:0> miner = Miner.new
=>
#<MinerMover::Miner:0x00007fbee8a3a080
...
irb(main):004:0> mover = Mover.new
=>
#<MinerMover::Mover:0x00007fbee8a8a6c0
...
irb(main):005:0> miner.state
=>
{:id=>"00050720",
:logging=>false,
:debugging=>false,
:timer=>10200,
:variance=>0,
:depth=>5,
:partial_reward=>false}
irb(main):006:0> mover.state
=>
{:id=>"00057860",
:logging=>false,
:debugging=>false,
:timer=>10456,
:variance=>0,
:work_type=>:cpu,
:batch_size=>10000000,
:batch=>0,
:batches=>0,
:ore_moved=>0}
irb(main):007:0> miner.mine_ore
=> 7
irb(main):008:0> mover.load_ore 7
=> 7
irb(main):009:0> miner.state
=>
{:id=>"00050720",
:logging=>false,
:debugging=>false,
:timer=>28831,
:variance=>0,
:depth=>5,
:partial_reward=>false}
irb(main):010:0> mover.state
=>
{:id=>"00057860",
:logging=>false,
:debugging=>false,
:timer=>27959,
:variance=>0,
:work_type=>:cpu,
:batch_size=>10000000,
:batch=>7,
:batches=>0,
:ore_moved=>0}
These scripts implement a full miner mover simulation using different
multitasking paradigms in Ruby.
demo/serial.rb
demo/fiber.rb
demo/fiber_scheduler.rb
demo/thread.rb
demo/ractor.rb
demo/process_pipe.rb
demo/process_socket.rb
See config/example.cfg for configuration.
It will be loaded by default.
Note that serial.rb
and fiber.rb
have no concurrency and cannot use
multiple miners or movers.
Execute via e.g. ruby -Ilib demo/thread.rb
One miner, one mover. The miner mines to a depth, then loads the ore.
When the mover has a full batch, the batch is moved while the miner waits.
Without a Fiber Scheduler, this just changes some organizational things.
Again, one miner, one mover. The mover has its own fiber, and the mining
fiber can pass ore to the moving fiber. There is no concurrency, so the
performance is roughly the same as before.
TBD
An array of mining threads and an array of moving threads.
A single shared queue for loading ore from miners to movers.
All threads contend for the same execution lock (GVL).
Moving threads execute in their own ractor.
Mining threads contend against mining threads. Moving threads, likewise.
Similar to ractors, but using Process.fork
for movers, using a pipe to send
ore from the parent mining process.
As above, but with Unix sockets (not network sockets), using any of
SOCK_STREAM
SOCK_DGRAM
SOCK_SEQPACKET
socket types.
In all cases, ore amounts are 4 bytes so the types behave roughly equivalently.
Multitasking here means “the most general sense of performing several tasks
or actions at the same time”. At the same time can mean fast switching
between tasks, or left and right hands operating truly in parallel.
In the broadest sense, two tasks are concurrent if they happen at the
same time, as above. When I tell Siri to call home while I drive, I perform
these tasks concurrently.
In the strictest sense of parallelism, one executes several identical tasks
using multiple facilities that operate independently and in parallel.
Multiple lanes on a highway offer parallelism for the task of driving from
A to B.
If there is a bucket brigade to put out a fire, all members of the brigade are
operating in parallel. The last brigade member is dousing the fire instead of
handing the bucket to the next member. While this might not meet the most
strict definition of parallelism, it is broadly accepted as parallel. It is
certainly concurrent. Often though, concurrent means merely concurrent,
where there is only one facility switching between tasks rather than multiple
devices operating in parallel.
The default Ruby runtime is known as CRuby, named for its implementation in
the C language, also known as MRI (Matz Ruby Interpreter), named for its
creator Yukihiro Matsumoto. Some history:
schedule(waiting, running) YES
schedule(waiting, waiting) NO
schedule(running, running) NO
schedule(running, waiting) OH DEAR
fork
with Copy-on-write for efficiencyFiber.yield(arg) # call within a Fiber to suspend execution and yield a value
Fiber#resume # tell a Fiber to proceed and return the next yielded value
fiber = Fiber.new do
Fiber.yield 1
2
end
fiber.resume
#=> 1
fiber.resume
#=> 2
fiber.resume
# FiberError: attempt to resume a terminated fiber
Any argument(s) passed to Fiber#resume
on its first call (to start the Fiber)
will be passed to the Fiber.new
block:
fiber = Fiber.new do |arg1, arg2|
Fiber.yield arg1
arg2
end
fiber.resume(:x, :y)
#=> :x
fiber.resume
#=> :y
Fiber::Scheduler
(Ruby 3.x)Fiber::Scheduler
is introduced to manage non-blocking fibersThe concept of non-blocking fiber was introduced in Ruby 3.0. A non-blocking
fiber, when reaching a operation that would normally block the fiber (like
sleep, or wait for another process or I/O) will yield control to other fibers
and allow the scheduler to handle blocking and waking up (resuming) this fiber
when it can proceed.
For a Fiber to behave as non-blocking, it need to be created in Fiber.new
with blocking: false
(which is the default), and Fiber.scheduler
should be
set with Fiber.set_scheduler
. If Fiber.scheduler
is not set in the current
thread, blocking and non-blocking fibers’ behavior is identical.
Thus, any fiber without a scheduler is a blocking fiber. If a fiber is created
with blocking: true
, it is a blocking fiber. Otherwise, if it has a
scheduler, it is non-blocking.
Fiber.scheduler # get the current scheduler
Fiber.set_scheduler # set the current scheduler
Fiber.schedule # perform a given block in a non-blocking manner
Fiber::Scheduler # scheduler interface
Fiber::Scheduler
Fiber::Scheduler
is not an implementation but an interfaceRactors are an abstraction and a container for threads. Threads within a
Ractor can share memory. Threads must use message passaging to communicate
across Ractors. Also, Ractors hold the execution lock on YARV, so threads
in different Ractors have zero contention.
# get the current Ractor object
r = Ractor.current
# create a new Ractor (block will execute in parallel via thread creation)
Ractor.new(arg1, arg2, etc) { |arg1, arg2, etc|
# now use arg1 and arg2 from outside
}
Ractor#send - puts a message at the incoming port of a Ractor
Ractor.receive - returns a message from the current Ractor's incoming port
Ractor.yield - current Ractor sends a message on the outgoing port
Ractor#take - returns the next outgoing message from a Ractor
There are many ways to create a process in Ruby, some more useful than
others. My favorites:
Process.fork
- when called with a block, the block is only executed in theProcess.spawn
- extensive options, nonblocking, call Process.wait(pid)
Open3.popen3
- for access to STDIN
STDOUT
STDERR
IO.pipe
(streaming / bytes / unidirectional)UNIXSocket.pair :RAW
UNIXSocket.pair :DGRAM
(datagram / message / “like UDP”)UNIXSocket.pair :STREAM
(streaming / bytes / “like TCP”)