A factorization software using a Quadratic Sieve (SIQS) written in C.
This general-purpose integer factorization software is released into the public domain. It is provided “as is” without any warranty, express or implied.
Ready to be deployed in production environments as a single-threaded stateless factorizer.
This document describes the factorization software v2.0.0, which was released on 1 February 2025.
math.h
.Enjoy mathematics applied to software, and factor all numbers up to 50 digits, 75 digits and more.
AX^2 + 2BX + C
coefficientsWindows users can download the provided factor.exe
file for convenience, navigate to its folder as shown below, and proceed with factorization.
Otherwise, to build the executable, macOS and Linux users should use the Terminal, while Windows users should use PowerShell. The procedure takes just a few minutes :
cd
command.gcc -Wall -pedantic -O2 -std=c99 main.c -o factor
.If gcc
is not installed on Windows, you can install MinGW, which provides it. Then, simply replace factor
with factor.exe
in the command above.
gcc
with clang
, which acts the same and is available natively.gcc
with sudo apt update && sudo apt install gcc
.Compilation takes a few seconds, then you can start the factorizations :
./factor.exe 2840222005804112554242019945562782034485263657432907
./factor 2840222005804112554242019945562782034485263657432907
The software will quickly display the factors in the default format :
Number: 2840222005804112554242019945562782034485263657432907
Factors: 186565459987 (prime), 123384476380949634131^2
These options are designed to facilitate the interaction with the software :
Name | Short | Goal | Example |
---|---|---|---|
--help |
-h |
Print the help. | -h |
--input-file |
-i |
Factor the entire file. | -i input.txt |
--output-file |
-o |
Write results to a file. | -o output.txt |
--output-json |
-j |
Format results as JSON. | -j |
--output-json-extended |
-J |
Format the results as JSON with details on the duration and completeness of each factorization. | -J (uppercase) |
--output-csv |
-c |
Format results as CSV. | -c |
--output-csv-extended |
-C |
Format the results as CSV with details on the duration and completeness of each factorization. | -C (uppercase) |
--verbose |
-v |
Show more or less details, 1 for task progress, 2 for status message, 3 for maintenance messages, 4 for detailed information. The default is 1 when you have a terminal, 0 otherwise. |
• -v 0 • -v 1 • -v 2 • -v 3 • -v 4 |
--force |
-f |
Overrides the limits at which the software refuses to factor: 8,191 digits for a number and 220-bit for the Quadratic Sieve. | -f |
--timeout |
-t |
Interrupt the Quadratic Sieve after a specified duration in seconds. | -t 1 |
--rand-seed |
-r |
Specify a custom (64-bit) XorShift seed value. This option always provides a new seed value when 0 is specified. |
• -r 0 • -r 123 |
--demand |
-d |
Designed for testing, this option creates a suitable input file named demand.txt , containing non-trivial sample numbers. Its parameters, all optional, are -d <bits-min> <bits-max> <count> . |
• -d • -d 220 • -d 120 170 • -d 16 64 2000 |
Run ./factor --help
to display the built-in manual.
The Quadratic Sieve accepts custom parameters to override the default ones :
Quadratic Sieve option | Typical range of values and goal |
---|---|
--qs-multiplier <value> |
From 1 to 150 , test a multiplier intended to optimize runtime and factorization success. |
--qs-base-size <value> |
From 1000 to 50000 , test a factor base size. |
--qs-large-prime <value> |
From 300000 to 30000000 , test a limit for the partial relations collection. |
--qs-alloc-mb <value> |
From 1 to a few hundred, test a memory allocation in Megabytes. |
--qs-sieve <value> |
With 31744 as a default value, test a half sieve size. |
--qs-threshold <value> |
From 60 to 110 , test a compromise between rejecting good relations and checking unusable ones. |
--qs-error-bits <value> |
From 15 to 35 , test a tolerance for small errors (due to approximations of logarithms) in the smoothness test during relations collection. |
--qs-laziness <value> |
From 80 to 120 , test a percentage for collecting more/fewer relations than recommended. |
These technical parameters can be used to review the software configuration, manually or even automatically. Developers can browse the source to see their implementation, and users have --verbose 4
to trace.
The software is designed to be a general-purpose factorizer for integers, also supporting negative numbers. Featuring an algorithm that handles large numbers (e.g. a highly composite 1500-bit number or 10,000!), its performance varies mainly depending on the structure of the number.
The software deploying a Quadratic Sieve is evaluated according to the RSA integers :
Number of bits | Number of decimal digits | Factorization duration |
---|---|---|
100 | 31 | 50 milliseconds |
150 | 46 | 250 milliseconds |
200 | 68 | 7 seconds |
250 | 75 | 5 minutes |
300 | 91 | 2 hours |
For every additional 10 bits (or 3 decimal digits) to the input number, the factorization duration roughly doubles. Measurements made during software development are available to visualize execution times at different bit sizes.
[!NOTE]
RSA numbers are integers having exactly two prime factors of comparable size, these factors are carefully selected for their resistance to factorization attempts. Use--force
when these numbers exceed 220-bit.
The software can factor multiple numbers in a row at no extra cost, either from the command line or from a file.
./factor 20 21 22 23
Contents of the file example.txt
:
# A comment
20
21 # Another comment
22
23
Command to quietly generate the example-factored.json
file :
./factor --input-file "example.txt" --output-file "example-factored.json" --verbose 0 --output-json
The software has been extensively tested during its preparation and has a production ready status. The built-in generator, consistent across all platforms, produces reproducible samples in a demand.txt
file. This feature is designed to quickly simulate factorization demands.
0
if and only if the results are completely accurate and verified.2
or higher explicitly indicates whether the results are completely accurate and verified.-Example of a test command that factors 10 numbers of 150-bit based on seed 1
, taking a few seconds :
./factor --demand 150 150 10 --rand-seed 1
./factor -i demand.txt -c -v 3
Example of a verbose test command that factors 1392602085554410049903335201603514917187205908900868592635698356747
, a 220-bit number in 20 seconds :
./factor --demand 220
./factor -i demand.txt -v 4
Example of a test command that factors sample numbers ranging from 120-bit to 170-bit, taking 20 seconds :
./factor --demand 120 170
./factor -i demand.txt -C -v 3
Example of a test command that factors traditional integers ranging from 16-bit to 64-bit, taking 20 seconds :
./factor --demand 16 64 2000 --rand-seed 1
./factor -i demand.txt -c -v 3
Example of a test command that shows in one minute that the Quadratic Sieve is likely to select a multiplier that fits better than 1 (meaning no multiplier), which speeds up the average execution time by a factor of 2 :
./factor --demand 170 170 10
./factor -v 3 -c -i demand.txt
./factor -v 3 -c -i demand.txt --qs-multiplier 1
Example of the test command that generated the provided sample .csv
and .json
files, an operation that took 11 minutes per file on the evaluation machine :
./factor --demand --output-file example.txt
./factor --input-file example.txt --output-file example-result.csv --output-csv
./factor --input-file example.txt --output-file example-result.json --output-json
The use cases here are for Ubuntu, tools like valgrind
and wget
are not available in the same way on Windows.
Example of a quick test command based on the OEIS list of highly composite numbers :
OEIS="002182"
wget -qO- https://oeis.org/A$OEIS/b$OEIS.txt | cut -d' ' -f 2 > oeis-list-$OEIS.txt
./factor --input-file oeis-list-$OEIS.txt --verbose 3
Example of a test command that logs both outputs to a file for later analysis, taking 10 seconds :
./factor --demand 50 150 --rand-seed 1
./factor --input-file demand.txt --output-csv --verbose 3 > maintenance.log 2>&1
# Perform a basic search for numbers showing debugs
grep -FA 1 Maintenance maintenance.log
Example of a command that checks using valgrind if the memory is well freed :
./factor --demand 50 150
valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes ./factor -i demand.txt -c -v 3
This cleanliness should apply to any use case, like the one above which after 5 minutes shows :
HEAP SUMMARY:
in use at exit: 0 bytes in 0 blocks
total heap usage: 107 allocs, 107 frees, 101,104,405 bytes allocated
All heap blocks were freed -- no leaks are possible
ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Your suggestions for tests in case of challenge are welcome, in this section users can easily :
--demand <bits-min> <bits-max> <count>
.--rand-seed <value>
(or explicitly leave it at 0
to choose the Unix timestamp).OEIS
to target another list such as Hyperfactorials list at 002109
in the OEIS model.This allows the software to be directed in various testing situations.
Before starting, the input integers are parsed; if an error is detected, the factorization will not start and users will be notified via standard error. Note that 0
is an erroneous input because the factorization of zero is undefined.
The Factorization Manager, in order :
--verbose <value>
is 2
or higher.0
when the results are completely accurate and verified, non-zero otherwise.An exit code of 0
indicates that the results are completely accurate and verified. This means that :
Only correct results are displayed, the software does mathematical double-checks at the risk of fatal error.
Memory allocations are passed to assert
, in case of memory allocation errors (e.g. insufficient memory available) the software will exit with a specific message, and it is recommended to restart your device.
Commonly required memory | Purpose |
---|---|
15 Kilobytes | complete a common 64-bit factorization |
150 Kilobytes | complete a highly composite number factorization |
2 Megabytes | complete a 150-bit Quadratic Sieve |
10 Megabytes | complete a 175-bit Quadratic Sieve |
25 Megabytes | complete a 200-bit Quadratic Sieve |
25 Megabytes extra | for every 25 extra bits subjected to the Quadratic Sieve |
Memory problem troubleshooting tips are :
--verbose 4
.--qs-multiplier <value>
.--rand-seed <value>
.The software consists of the assembly of 3 projects (AVL, cint, Quadratic Sieve with Block Lanczos).
The avl
functions (released “as is”, into the public domain, without any warranty, express or implied) manage thousands of integers in AVL trees, making Quadratic Sieve relations storage and retrieval straightforward. This solution guarantees worst-case O(log n)
time complexity for all operations (search, insert, delete).
The cint
library (released “as is”, into the public domain, without any warranty, express or implied) is provided for handling large integers. It includes a variety of basic and advanced mathematical functions to support calculations. This solution used to operate the Quadratic Sieve does not use global variables, it’s stateless and thread-safe.
The Block Lanczos algorithm (released “as is”, into the public domain, without any warranty, express or implied) solve a large, sparse system of linear equations, and identify dependencies among the rows of the matrix formed by the smooth relations collected by the Quadratic Sieve. This self-managing solution is necessary so that, through a method similar to Fermat’s, the Quadratic Sieve can exploit a congruence of squares that leads to a factorization of N.
The software’s source code is organized into the following files :
Filename | Purpose |
---|---|
64-bits-factorization.c |
Contains functions for factoring classical integers, including Pollard’s Rho algorithm and a deterministic Miller-Rabin primality checker. |
avl-trees.c |
Contains the AVL Trees functions, originally a separate project from the integer factorization project. |
basic-math.c |
Includes a prime number generator capable of yielding the first 326,983 prime numbers, along with the Kronecker symbol function, Tonelli-Shanks function, and modular inverse function. |
big-num.c |
Contains the cint library, also initially a separate project from the integer factorization project. |
block-lanczos.c |
Provides the linear algebra part of the Quadratic Sieve, including a matrix reducer and a manager for this task. |
headers.h |
Contains definitions for types, structures, functions, and DEBUG macros used throughout the project. |
main.c |
Contains the program’s entry point and transfers control to the factorization manager after about 30 lines. |
manager.c |
Includes I/O functions (parser, formatter), help documentation, shortcuts used by Quadratic Sieve, a sample data generator used for testing, and the factorization manager. |
quadratic-sieve.c |
Contains the Quadratic Sieve implementation, divided into around thirty functions, with a main function that manages the factorization process in about 30 lines. Search for “debug” in this file to find the debugging elements displayed. |
To obtain random numbers with reproducible behavior across platforms, the well-known XorShift generator is implemented in the source, its uniform distribution can be modified at runtime with --rand-seed <value>
.
The software :
The Self-Initializing Quadratic Sieve (SIQS) is a fast algorithm for factoring large semiprimes, based on the Multiple Polynomial Quadratic Sieve (MPQS). A suitable structure for this implementation was chosen as follows :
Handshake between the Quadratic Sieve and the factorization manager, during which it appropriates resources.
The input N is duplicated and called “kN” after this function complete. The algorithm :
[1, 127]
that improves runtime and memory efficiency.To find a good multiplier k
such that kN factor easier, the preparation_part_3_default_proposition
function try values of k
such that kN is a square modulo a large number of small primes. These primes will then be factor base primes, and the more small factor base primes, the faster relations will accumulate, since they hit the sieving interval more often. A preparation_part_3_alternative_proposition
is also provided for further investigation purposes.
Variable | Information |
---|---|
N | Prime factors are removed from N, and N is updated until N = 1 . |
kN | The algorithm computes with kN, which remains constant. |
See how multipliers impact the execution of ./factor 10146246660331135529849066552891483667878618078493758287
:
Option used | Makes the algorithm factoring | Comment on the multiplier | Took |
---|---|---|---|
--qs-multiplier 7 |
kN = 7 * N |
Selected by the algorithm | 3.1 s |
--qs-multiplier 1 |
kN = 1 * N |
No multiplier | 6.3 s |
--qs-multiplier 113 |
kN = 113 * N |
Large multipliers are sometimes good | 8.2 s |
--qs-multiplier 29 |
kN = 29 * N |
One of the worst | 19.4 s |
--qs-multiplier 35 |
kN = 35 * N |
One of the worst | 26.3 s |
[!IMPORTANT]
If a factorization seems longer than normal, use the--verbose 4
option to see the proposed multipliers and try combining them with--qs-multiplier <value>
to accelerate the process using the correct one. The proposed method finds a balance that improves the average runtime by a factor 2. The Quadratic Sieve best multiplier (known as Knuth-Schroeppel) can be up to 10 times faster than the worst in addition to saving a lot of memory.
Correct parameter setting improves factorization speed.
There is a nested structure inside the qs_sheet called mem :
qs_sheet holds 3 AVL tree manager :
Note for users: it is possible to override the value of algorithm parameters at runtime for investigation or troubleshooting purposes. Users comments and analysis are welcome in the discussion.
Note for developers: the void* mem.now pointer can be used like a calloc
, just update its value after storing something, or zero out the memory without updating in case of temporary needs. This technique is used all over the Quadratic Sieve (and in its Block Lanczos algorithm).
Constructs the factor base by filling the base array with suitable prime numbers :
Computes D the optimal value of the hypercube, later used to generate the polynomial coefficient A.
AX^2 + 2BX + C
coefficientsThe next steps are to efficiently construct several polynomials, along with their resolution data, suitable for sieving.
Coefficient | is a constant after | is a constant until |
---|---|---|
A | iteration_part_1 |
inner_continuation_condition completion |
B | iteration_part_4 |
for loop final expression |
C | iteration_part_6 |
for loop final expression |
[!NOTE]
A custom--rand-seed <value>
produces another set of polynomial curves that affects the Quadratic Sieve solving procedure, this should not improve speed, but may be useful for debugging or testing.
This small utility function is not more mathematical than technical, it :
Produces the coefficient A using the random number generator across the factor base to construct a product that approach the ideal value D. The algorithm always works (due to the fairly large factor base, and the quality of the PRNG) with distinct values of A, this is required.
Produces a template for the B coefficient alongside the data for polynomials that depend only on A and the factor base primes (and not on the current B coefficient). This data is later used to generate polynomial values which will be multiples of the factor base.
Since SIQS is based on MPQS, the polynomial curves are iteratively materialized using a Gray code. The coefficient B is produced using this technique through the previously prepared data. The algorithm also calculates the sieve offset for all the factor base prime. The generated values of B as well as the values of A are always distinct.
Produce the coefficient C as C = (B * B - kN) / A
. Because the previous steps went well, division never has a remainder, that’s all it takes to be ready for sieving.
These steps concern the sieving process itself. The sieve is named M, and filling the sieve consists of taking an interval [−M/2, +M/2]
and efficiency determining for which X in [−M/2, +M/2]
a given prime number divides AX^2 + 2BX + C
. To achieve this task the resolution data produced between iteration_part_3
and iteration_part_6
is used.
The sieve phase dominates the run time of the algorithm; it takes most of the time when N becomes large.
This function evaluates the previously filled sieve to find relations, iterates over values of X in [-M/2, +M/2]
and beyond a threshold, computes f(X)
to check the regularity of the candidate. When the function finds interesting to define the variable X , calculates the value of the polynomial in X then divides the value with the factor base, it thus tries to establish relations. The assertion (A * X + B)^2 = A * f(X) + kN
remains valid when this operation begins.
Buffered knowledge is then structured by register_regular_relation
and register_partial_relation
:
register_regular_relation
immediately build a matrix of regular smooth qs_relation*
using AVL tree.register_partial_relation
combine partial qs_relation*
together using AVL tree (single large-prime variation).The AVL tree is used in register_partial_relation
to identify partial relations sharing the same variation (within the last prime cofactors), to retrieve the data associated with this cycle and to process the single large-prime variation.
In short: when the sieve reports two partial relations sharing the same large prime, the register_partial_relation
function merges them (with a minor loss of efficiency) to obtain a regular smooth relation.
During the execution of register_relations
, after trial division by the factor base primes, if the quotient remains greater than 1 but is less than the large prime bound, the software uses register_partial_relation
to save this interesting 1-partial relation in a separate location from full (smooth) relations.
Any two partial relations that share the same variation together yield a suitable smooth relation. So, when the large prime collected matches a prime from another 1-partial relation (detected via the AVL tree), the software combines these two relations and then passes the result to register_regular_relation
, which saves all regular relations.
As shown below, the single large-prime variation often speeds things up by a factor of 2.
N bits | Relations added regularly | Relations added by Single Large-Prime Variation | Relations needs |
---|---|---|---|
150 | 2551 | 3 | 2554 |
170 | 3168 | 659 | 3827 |
190 | 3212 | 1786 | 4998 |
210 | 3409 | 3012 | 6421 |
230 | 5659 | 4686 | 10345 |
250 | 10051 | 9793 | 19844 |
270 | 14026 | 15484 | 29510 |
290 | 12820 | 20201 | 33021 |
310 | 22522 | 22488 | 45008 |
[!NOTE]
When a third match arrives, a dependency problem appears in linear algebra. This more complex case known as Double Large-Prime Variation is still an opportunity (at the cost of some non-trivial extra programming) to speed up difficult factorizations by another factor 2, but is not implemented in this simple version.
Decides whether the sieving continues or stops, the relation counter is the main condition.
get_started_iteration
.The Block Lanczos algorithm is likely to quickly report the success of the Quadratic Sieve, just having to find in its smooth relations a subset of all the exponent vectors such that the sum of their exponent vectors is the zero vector :
A built-in manager coordinates attempts and reduces the matrix formed by the Y field of the relations as necessary. The reduce_matrix
function process is limited to removing columns that contain a singleton row, and then resizing the matrix to have a few more columns than rows. This step is not intended to speed up the algorithm (although it does); it is just that without it, there are factorizations for which the large-scale structural eigenproblem remains.
The finalization_part_1
function performs the square root + GCD step of the Quadratic Sieve based on the lanczos_block
answer, which classically leads by exploiting a difference of squares mod N to reveal divisors of N. The qs_register_divisor
function it calls takes care of inferences that actually lead to a complete factorization of N.
The qs_register_divisor
function :
--verbose 4
is used.This function is used to decide if the algorithm should return the control, or return to sieving.
[!NOTE]
When a prime factor has been found at this stage, the algorithm prefers to resign itself so that the manager can, with a better-sized Quadratic Sieve, continue the factorization of N recursively. Otherwise, before giving up the algorithm returns toget_started_iteration
and searches for up to 50% more relations.
The Quadratic Sieve algorithm gracefully resigns to the factorization manager.
Users can set the software as a support tool in most languages. Here is an example for a PHP/JSON application :
<?php
// We must factor an array of numbers.
$numbers = array(
'32077677441073603974168813573464241602309',
'-37591720365997281829030157222946707796343835483',
'37235529565343195291273612090240230075203335881564807',
);
// We call the factorization software.
$result = '' ;
$fp = popen('./factor -j ' . implode(' ', $numbers), 'r');
while(!feof($fp))
$result .= fread($fp, 8192);
pclose($fp);
// We optionally decode the JSON string into a PHP array.
$result = json_decode($result, true) ;
// We use the results.
print_r($result);
This integer factorization software, whose source code is entirely in the public domain, is designed to help :
There are many people to thank :
GitHub users can exchange their views on this version or motivate a multi-threaded rewrite.
You can report any issues, and we’ll investigate to correct the source files.