json diff

Compute the difference between two JSON-serializable Ruby objects.

65
9
Ruby

json-diff

Take two Ruby objects that can be serialized to JSON. Output an array of operations (additions, deletions, moves) that would convert the first one to the second one.

gem install json-diff  # Or `gem 'json-diff'` in your Gemfile.
require 'json-diff'
JsonDiff.diff(1, 2)
#> [{'op' => 'replace', 'path' => '', 'value' => 2}]

Outputs RFC6902. Look at hana for a JSON patch algorithm that can use this output.

Options

  • include_was: include a was field in remove and replace operations, to show the old value. Allows computing reverse operations without the source JSON.
  • moves*: include move operations. Set it to false to remove clutter.
  • additions*: include add operations. Set it to false to remove clutter.
  • original_indices*: array indices are those from the source array (for from fields, or path fields on remove operations) or the target array (for other path fields). It eases manual checking of differences.
  • similarity: procedure taking (before, after) objects. Returns a probability between 0 and 1 of how likely after is a modification of before, or nil if you wish to fall back to the default computation.

* Changing this option prevents the use of the output for JSON patching.

You can also install its associated program to use this on the command line:

gem install json-diff  # On Linux, you may need to use sudo
json-diff before.json after.json

Heart

  • Recursive similarity computation between any two Ruby values.
  • For arrays, match elements above a certain level of similarity pairwise, and treat them as a move.
    • Matching happens highest-similarity first.
    • The creation of move operations is generated by detecting rings in the list of moved elements (eg, A → B → C → A).

Pros:

  • For lists which are not necessarily ordered, this approach yields far better results than LCS.
  • Move operations require no custom code to match elements.

Cons:

  • This approach’s quality is heavily reliant on how good the similarity algorithm is. Empirically, it yields sensible output. It can be improved by a user-defined procedure.
  • There is a computational overhead to the default similarity computation that scales with the total number of entities in the structure.

Comparisons

HashDiff — LCS, no move operation.

require "json-diff"
JsonDiff.diff([1, 2, 3, 4, 5], [6, 4, 3, 2])
# [{'op' => 'remove', 'path' => '/4'},
#  {'op' => 'remove', 'path' => '/0'},
#  {'op' => 'move', 'from' => '/0', 'path' => '/2'},
#  {'op' => 'move', 'from' => '/1', 'path' => '/0'},
#  {'op' => 'add', 'path' => '/0', 'value' => 6}]

require "hashdiff"
HashDiff.diff([1, 2, 3, 4, 5], [6, 4, 3, 2])
# [["-", "[0]", 1],
#  ["+", "[0]", 6],
#  ["+", "[1]", 4],
#  ["+", "[2]", 3],
#  ["-", "[6]", 5],
#  ["-", "[5]", 4],
#  ["-", "[4]", 3]]

jsondiff — no similitude, no LCS.

require "json-diff"
JsonDiff.diff(
  [{'code' => "ABW", 'name' => "Abbey Wood"}, {'code' => "KGX", 'name' => "Kings Cross"}],
  [{'code' => "KGX", 'name' => "Kings Cross"}, {'code' => "ABW", 'name' => "Abbey Wood"}]
)
# [{'op' => 'move', 'from' => '/0', 'path' => '/1'}]

require "jsondiff"
JsonDiff.generate(
  [{'code' => "ABW", 'name' => "Abbey Wood"}, {'code' => "KGX", 'name' => "Kings Cross"}],
  [{'code' => "KGX", 'name' => "Kings Cross"}, {'code' => "ABW", 'name' => "Abbey Wood"}]
)
# [{:op => :replace, :path => '/0/code', :value => 'KGX'},
#  {:op => :replace, :path => '/0/name', :value => 'Kings Cross'},
#  {:op => :replace, :path => '/1/code', :value => 'ABW'},
#  {:op => :replace, :path => '/1/name', :value => 'Abbey Wood'}]

Plans & Bugs

Roughly ordered by priority.

  • Support LCS as an option. (The default will remain what yields the best results, regardless of the time it takes.)
  • Support specifying a depth for similarity computation.
  • Character-wise substring add and remove operations.
  • SVG output.

See the LICENSE file for licensing information.