Class DiffUtil
-
Method Summary
Modifier and TypeMethodDescriptionstatic <E> Set<E>
computeIgnoredElements
(Comparison comparison, IEqualityHelper equalityHelper, List<E> candidates, Diff diff, boolean rightToLeft) When computing the insertion index of an element in a list, we need to ignore all elements present in that list that feature unresolved Diffs on the same feature.static double
diceCoefficient
(String first, String second) Computes the dice coefficient between the two given String's bigrams.static <E> int
findInsertionIndex
(Comparison comparison, Iterable<E> ignoredElements, List<E> source, List<E> target, E newElement) This will try and determine the index at which a given element from thesource
list should be inserted in thetarget
list.static <E> int
findInsertionIndex
(Comparison comparison, List<E> source, List<E> target, E newElement) This will try and determine the index at which a given element from thesource
list should be inserted in thetarget
list.static int
findInsertionIndex
(Comparison comparison, Diff diff, boolean rightToLeft) This is the main entry point forfindInsertionIndex(Comparison, Iterable, List, List, Object)
.static <E> int
findInsertionIndexForElementAt
(Comparison comparison, List<E> source, List<E> target, List<E> lcs, int currentIndexInSource) This will try and determine the index at which a given element from thesource
list should be inserted in thetarget
list.The set of all the atomic diffs that refine the given diff or one of its refining diffs, recursively.getAllRefiningDiffs
(Diff diff) The set of all the diffs (atomic or not) that refine the given diff or one of its refining diffs, recursively.getRootRefinedDiffs
(Diff diff) Determines the root refined diff of a refining diff, i.e. a refined diff that is not refining another diff.static boolean
isContainmentReference
(EStructuralFeature feature) static <E> List<E>
longestCommonSubsequence
(Comparison comparison, Iterable<E> ignoredElements, List<E> sequence1, List<E> sequence2) This will compute the longest common subsequence between the two given Lists, ignoring any object that is included inignoredElements
.static <E> List<E>
longestCommonSubsequence
(Comparison comparison, List<E> sequence1, List<E> sequence2) This will compute the longest common subsequence between the two given Lists.
-
Method Details
-
getAllRefiningDiffs
The set of all the diffs (atomic or not) that refine the given diff or one of its refining diffs, recursively.- Parameters:
diff
- The diff for which all the refining diffs are seeked- Returns:
- A set of all the refining diffs (as opposed to getting only the 1st level of refining diffs that would be obtained by calling getRefinedBy() on diff.
-
getRootRefinedDiffs
Determines the root refined diff of a refining diff, i.e. a refined diff that is not refining another diff.- Parameters:
diff
- The diff for which the root refined diffs are to be determined- Returns:
- The root refined diffs of the provided (refining) diff, as a list. Never
null
. Empty if the provided diff does not refine any diff.
-
getAllAtomicRefiningDiffs
The set of all the atomic diffs that refine the given diff or one of its refining diffs, recursively. An atomic diff is a diff that is not refined by any diff.- Parameters:
diff
- The diff for which all the refining diffs are seeked- Returns:
- A set of all the atomic refining diffs (as opposed to getting only the 1st level of refining diffs that would be obtained by calling getRefinedBy() on diff.
-
diceCoefficient
Computes the dice coefficient between the two given String's bigrams.This implementation is case sensitive.
Note that this implementation handles two- (or less) character-long strings specifically, degrading into a "char-by-char" comparison instead of using the bigram unit of the dice coefficient. We want the similarity between
"v1"
and"v2"
to be0.5
and not0
. However, we also want"v1"
and"v2"
to be "more similar" to each other than"v"
and"v2"
and"v1"
and"v11"
to be "more similar" than"v"
and"v11"
while this latter also needs to be "less similar" than"v1"
and"v2"
. This requires a slightly different handling for comparisons with a "single character"-long string than for "two character"-long ones. A set of invariants we wish to meet can be found in the unit tests.- Parameters:
first
- First of the two Strings to compare.second
- Second of the two Strings to compare.- Returns:
- The dice coefficient of the two given String's bigrams, ranging from 0d to 1d.
-
longestCommonSubsequence
public static <E> List<E> longestCommonSubsequence(Comparison comparison, Iterable<E> ignoredElements, List<E> sequence1, List<E> sequence2) This will compute the longest common subsequence between the two given Lists, ignoring any object that is included inignoredElements
. We will useIEqualityHelper.matchingValues(Object, Object)
in order to try and match the values from both lists two-by-two. This can thus be used both for reference values or attribute values. If there are two subsequences of the same "longest" length, the first (according to the second argument) will be returned.Take note that this might be slower than
longestCommonSubsequence(Comparison, List, List)
and should only be used when elements should be removed from the potential LCS. This is mainly aimed at merge operations during three-way comparisons as some objects might be in conflict and thus shifting the computed insertion indices.Please see
longestCommonSubsequence(Comparison, List, List)
for a more complete description.- Type Parameters:
E
- Type of the sequences content.- Parameters:
comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.ignoredElements
- Specifies elements that should be excluded from the subsequences.sequence1
- First of the two sequences to consider.sequence2
- Second of the two sequences to consider.- Returns:
- The LCS of the two given sequences. Will never be the same instance as one of the input sequences.
-
longestCommonSubsequence
public static <E> List<E> longestCommonSubsequence(Comparison comparison, List<E> sequence1, List<E> sequence2) This will compute the longest common subsequence between the two given Lists. We will useIEqualityHelper.matchingValues(Object, Object)
in order to try and match the values from both lists two-by-two. This can thus be used both for reference values or attribute values. If there are two subsequences of the same "longest" length, the first (according to the second argument) will be returned.For example, it the two given sequence are, in this order,
{"a", "b", "c", "d", "e"}
and{"c", "z", "d", "a", "b"}
, there are two "longest" subsequences :{"a", "b"}
and{"c", "d"}
. The first of those two subsequences in the second list is{"c", "d"}
. On the other hand, the LCS of{"a", "b", "c", "d", "e"}
and{"y", "c", "d", "e", "b"}
is{"c", "d", "e"}
.The following algorithm has been inferred from the wikipedia article on the Longest Common Subsequence, http://en.wikipedia.org/wiki/Longest_common_subsequence_problem at the time of writing. It is decomposed in two : we first compute the LCS matrix, then we backtrack through the input to determine the LCS. Evaluation will be shortcut after the first part if the LCS is one of the two input sequences.
Note : we are not using Iterables as input in order to make use of the random access cost of ArrayLists. This might also be converted to directly use arrays. This implementation will not play well with LinkedLists or any List which needs to iterate over the values for each call to
List.get(int)
, i.e any list which is not instanceof RandomAccess or does not satisfy its contract.- Type Parameters:
E
- Type of the sequences content.- Parameters:
comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.sequence1
- First of the two sequences to consider.sequence2
- Second of the two sequences to consider.- Returns:
- The LCS of the two given sequences. Will never be the same instance as one of the input sequences.
-
findInsertionIndex
public static <E> int findInsertionIndex(Comparison comparison, Iterable<E> ignoredElements, List<E> source, List<E> target, E newElement) This will try and determine the index at which a given element from thesource
list should be inserted in thetarget
list. We expectnewElement
to be an element from thesource
or to have a Match that allows us to map it to one of thesource
list's elements.The expected insertion index will always be relative to the Longest Common Subsequence (LCS) between the two given lists, ignoring all elements from that LCS that have changed between the target list and the common origin of the two. If there are more than one "longest" subsequence between the two lists, the insertion index will be relative to the first that comes in the
target
list.Note : we are not using Iterables as input in order to make use of the random access cost of ArrayLists. This might also be converted to directly use arrays. This implementation will not play well with LinkedLists or any List which needs to iterate over the values for each call to
List.get(int)
, i.e any list which is not instanceof RandomAccess or does not satisfy its contract.- Type Parameters:
E
- Type of the sequences content.- Parameters:
comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.ignoredElements
- If there are elements fromtarget
that should be ignored when searching for an insertion index, set them here. Can benull
or an empty list.source
- The List from which one element has to be added to thetarget
list.target
- The List into which one element fromsource
has to be added.newElement
- The element fromsource
that needs to be added intotarget
.- Returns:
- The index at which
newElement
should be inserted intarget
. - See Also:
- Restriction:
- This method is not intended to be referenced by clients.
-
findInsertionIndexForElementAt
public static <E> int findInsertionIndexForElementAt(Comparison comparison, List<E> source, List<E> target, List<E> lcs, int currentIndexInSource) This will try and determine the index at which a given element from thesource
list should be inserted in thetarget
list.The expected insertion index will always be relative to the Longest Common Subsequence (LCS) between the two given lists.
- Type Parameters:
E
- Type of the sequences content.- Parameters:
comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.source
- The List from which one element has to be added to thetarget
list.target
- The List into which one element fromsource
has to be added.lcs
- The precomputed LCS between these two lists.currentIndexInSource
- The current index (in source) of the element we want to insert in target.- Returns:
- The index at which
newElement
should be inserted intarget
. - Restriction:
- This method is not intended to be referenced by clients.
-
findInsertionIndex
public static <E> int findInsertionIndex(Comparison comparison, List<E> source, List<E> target, E newElement) This will try and determine the index at which a given element from thesource
list should be inserted in thetarget
list. We expectnewElement
to be an element from thesource
or to have a Match that allows us to map it to one of thesource
list's elements.The expected insertion index will always be relative to the Longest Common Subsequence (LCS) between the two given lists. If there are more than one "longest" subsequence between the two lists, the insertion index will be relative to the first that comes in the
target
list.For example, assume
source
is{"1", "2", "4", "6", "8", "3", "0", "7", "5"}
andtarget
is{"8", "1", "2", "9", "3", "4", "7"}
; I try to merge the addition of"0"
in the right list. The returned "insertion index" will be5
: just after"3"
. There are two subsequence of the same "longest" length 4 :{"1", "2", "3", "7"}
and{"1", "2", "4", "7"}
. However, the first of those two intarget
is{"1", "2", "3", "7"}
. The closest element before"0"
in this LCS insource
is"3"
.Note : we are not using Iterables as input in order to make use of the random access cost of ArrayLists. This might also be converted to directly use arrays. This implementation will not play well with LinkedLists or any List which needs to iterate over the values for each call to
List.get(int)
, i.e any list which is not instanceof RandomAccess or does not satisfy its contract.- Type Parameters:
E
- Type of the sequences content.- Parameters:
comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.source
- The List from which one element has to be added to thetarget
list.target
- The List into which one element fromsource
has to be added.newElement
- The element fromsource
that needs to be added intotarget
.- Returns:
- The index at which
newElement
should be inserted intarget
. - See Also:
-
findInsertionIndex
This is the main entry point forfindInsertionIndex(Comparison, Iterable, List, List, Object)
. It will use default algorithms to determine the source and target lists as well as the list of elements that should be ignored when computing the insertion index.- Parameters:
comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.diff
- The diff which merging will trigger the need for an insertion index in its target list.rightToLeft
-true
if the merging will be done into the left list, so that we should consider the right model as the source and the left as the target.- Returns:
- The index at which this
diff
's value should be inserted into the 'target' list, as inferred fromrightToLeft
. - See Also:
-
isContainmentReference
- Parameters:
feature
- The feature to check.- Returns:
true
if thefeature
is a containment reference,false
otherwise.
-
computeIgnoredElements
public static <E> Set<E> computeIgnoredElements(Comparison comparison, IEqualityHelper equalityHelper, List<E> candidates, Diff diff, boolean rightToLeft) When computing the insertion index of an element in a list, we need to ignore all elements present in that list that feature unresolved Diffs on the same feature. This algorithm is expensive and because this method is generally called O{n^2} times we must take steps to optimize it. The caller assumes that a non-empty returned set is modifiable.- Type Parameters:
E
- Type of the list's content.- Parameters:
comparison
- This will be used in order to retrieve the Match for EObjects when comparing them.equalityHelper
- This will be used to check for equality of values.candidates
- The list in which we need to compute an insertion index.diff
- The diff we are computing an insertion index for.rightToLeft
- Direction of the merging.true
if the merge is to be done on the left side, making 'target' the right side,false
otherwise.- Returns:
- The list of elements that should be ignored when computing the insertion index for a new
element in
candidates
.
-