A Secure Multiparty Computation (MPC) protocol for computing linear regression on vertically distributed datasets.
A protocol for secure distributed linear regression. For a detailed description of the protocol and evaluation results, please have a look at the Paper.
The following libraries need to be installed for a successful build:
OBLIVC_PATH
environment variable should point.libssl-dev
.In order to download submodules and compile, simply run make
.
By default, computations are performed using 64 bit fixed-point arithmetic.
One can change the bit width for phase 1 of the protocol to 32 bit by setting the flag BIT_WIDTH_32_P1=1
at compile time.
(either by passing it to make
or setting it directly in the Makefile
).
One can change the bit width for phase 2 of the protocol to 32 bit by setting the flag BIT_WIDTH_32_P2=1
at compile time.
(either by passing it to make
or setting it directly in the Makefile
).
Currently only the following combinations of bitwidths for both phases are supported:
bit width phase 1 | bit width phase 2 |
---|---|
32 | 32 |
64 | 64 |
64 | 32 |
Using smaller bit widths will increase computation speed at the cost of accuracy.
The protocol consists of two phases. bin/linreg
is used to run both phases at once.
Usage: bin/linreg [Input_file] [Precision] [Party] [Algorithm] [Num. iterations CGD] [Lambda] [Options]
Options: --use_ot: Enables the OT-based phase 1 protocol
--prec_phase2=<Precision phase 2>: Use different precision for phase 2 of the protocol
[Precision]
specifies the number of bits used for the fractional part of fixed-point encoded numbers.
The role of the process is given by [Party]
.
Values of 1 and 2 denote the CSP and Evaluator, respectively.
Higher values denote data providers.
[Algorithm]
is the algorithm used for phase 2 of the protocol and can be either cholesky
, ldlt
, or cgd
.
In the case of CGD, [Num. iterations CGD]
gives the number of iterations used before terminating.
[Lambda]
specifies the regularization parameter, and the --use-ot
flag enables the aggregation phase protocol based on Oblivious Transfers.
When using different bit widths for phase 1 and 2 (see above) one can use --prec_phase2<=Precision phase 2>
to set a different precision for the second
phase of the protocol.
An example input file can be found in examples/readme_example.in
:
10 5 3
localhost:1234
localhost:1235
localhost:1236 0
localhost:1237 1
localhost:1238 2
10 5
-0.49490189906 0.204027068031 1.0 -0.0048744428595 0.101856000869
0.49217428514 0.233090721372 -0.0370213771185 0.657720359628 0.525310770733
0.39721227919 -0.470002959473 0.277398798616 0.726013362994 0.498046786861
-0.595183792269 -0.0827269428691 -0.227611333929 0.80744916955 0.543968791627
-0.419150270985 0.292994257369 -0.0232190555386 -0.108185527543 0.238913650953
-0.247990663017 -1.0 -0.0241769053355 -0.842345696475 -1.0
-1.0 0.437037219652 -0.140750839011 1.0 -0.0679355153937
0.777015345458 0.981488358907 0.2870338525 0.579832272908 -0.262265338898
-0.0431656936622 0.0910166551363 -0.0724090181375 -0.144008713528 0.202706144381
-0.470188350902 -0.261201944989 -0.0468243599479 0.944605903172 0.455480894023
10
0.463624257142 1.03914869701 0.641866385451 -0.289962112725 -0.328203220996 -1.64668349374 -0.205128259075 2.13151959591 -0.129983842519 -0.0766970254399
It contains the number of samples, features and parties, followed by a network endpoint for each party.
The special roles CSP and Evaluator are defined by the first two lines.
Then, the data providers follow (3 in this example), each with a network endpoint and the starting index of its partition.
Afterwards, the dimensions of X
are specified, followed by X
itself.
Finally, the length and values of y
are given.
Running this example locally with
for party in {1..5}; do bin/linreg examples/readme_example.in 56 $party cgd 10 0.001 & done
yields the following result, in addition to some debug outputs:
Time elapsed: 5.673679
Number of gates: 18569664
Result: 0.984331027786964 0.792399824970372 0.754117840176144 0.592849130685193 0.057351715952213