-
Notifications
You must be signed in to change notification settings - Fork 187
Duplicate entry hardening #2096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
282f906
949b5d8
d422f73
0d81c02
a87bbaa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -728,6 +728,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti, | |
| strintmap_clear_func(&renames->deferred[i].possible_trivial_merges); | ||
| strset_clear_func(&renames->deferred[i].target_dirs); | ||
| renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */ | ||
| free(renames->pairs[i].queue); | ||
| diff_queue_init(&renames->pairs[i]); | ||
| } | ||
| renames->cached_pairs_valid_side = 0; | ||
| renames->dir_rename_mask = 0; | ||
|
|
@@ -1008,32 +1010,34 @@ static int traverse_trees_wrapper(struct index_state *istate, | |
| info->traverse_path = renames->callback_data_traverse_path; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Junio C Hamano wrote on the Git mailing list (how to reply to this email): "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Elijah Newren <newren@gmail.com>
>
> traverse_trees_wrapper() saves entries from a first pass through
> traverse_trees() and then replays them through the real callback
> (collect_merge_info_callback). However, the replay loop silently
> discards the callback return value. This means any error reported by
> the callback during replay -- including a future check for malformed
> trees -- would be ignored, allowing the merge to proceed with corrupt
> state.
>
> Capture the return value, stop the loop on negative (error) returns,
> and propagate the error to the caller. Note that the callback returns
> a positive mask value on success, so we normalize non-negative returns
> to 0 for the caller.
All makes perfect sense.
How would the externally visible behaviour change at this step?
Upon an error from the callback, we used to keep going and processed
other callback data in the renames structure. We now leave the rest
unprocessed.
The caller of this helper would never have seen a failure, but now
they will. Both callers, collect_merge_info_callback() and
handle_deferred_entries(), are reacting to a negative "error" return
well (perhaps because they sometimes call traverse_trees() in the
same control flow, which does return an error already), so
presumably there is no downside caused by aborting the innermost
process upon the first error return.
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> merge-ort.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 00923ce3cd..4b8e32209d 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -1008,18 +1008,20 @@ static int traverse_trees_wrapper(struct index_state *istate,
> info->traverse_path = renames->callback_data_traverse_path;
> info->fn = old_fn;
> for (i = old_offset; i < renames->callback_data_nr; ++i) {
> - info->fn(n,
> - renames->callback_data[i].mask,
> - renames->callback_data[i].dirmask,
> - renames->callback_data[i].names,
> - info);
> + ret = info->fn(n,
> + renames->callback_data[i].mask,
> + renames->callback_data[i].dirmask,
> + renames->callback_data[i].names,
> + info);
> + if (ret < 0)
> + break;
> }
>
> renames->callback_data_nr = old_offset;
> free(renames->callback_data_traverse_path);
> renames->callback_data_traverse_path = old_callback_data_traverse_path;
> info->traverse_path = NULL;
> - return 0;
> + return ret < 0 ? ret : 0;
> }
>
> static void setup_path_info(struct merge_options *opt, |
||
| info->fn = old_fn; | ||
| for (i = old_offset; i < renames->callback_data_nr; ++i) { | ||
| info->fn(n, | ||
| renames->callback_data[i].mask, | ||
| renames->callback_data[i].dirmask, | ||
| renames->callback_data[i].names, | ||
| info); | ||
| ret = info->fn(n, | ||
| renames->callback_data[i].mask, | ||
| renames->callback_data[i].dirmask, | ||
| renames->callback_data[i].names, | ||
| info); | ||
| if (ret < 0) | ||
| break; | ||
| } | ||
|
|
||
| renames->callback_data_nr = old_offset; | ||
| free(renames->callback_data_traverse_path); | ||
| renames->callback_data_traverse_path = old_callback_data_traverse_path; | ||
| info->traverse_path = NULL; | ||
| return 0; | ||
| return ret < 0 ? ret : 0; | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Junio C Hamano wrote on the Git mailing list (how to reply to this email): "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Elijah Newren <newren@gmail.com>
>
> Trees with duplicate entries are malformed; fsck reports "contains
> duplicate file entries" for them. merge-ort has from the beginning
> assumed that we would never hit such trees. It was written with the
> assumption that traverse_trees() calls collect_merge_info_callback() at
> most once per path. The "sanity checks" in that callback (added in
> d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(),
> 2020-12-13)) verify properties of each individual call but not that
> invariant. The strmap_put() in setup_path_info() silently overwrites
> the entry from any prior call for the same path, because it assumed
> there would be no other path. Unfortunately, supplemental data
> structures for various optimizations could still be tweaked before the
> extra paths were overwritten, and those data structures not matching
> expected state could trip various assertions.
>
> Change the return type of setup_path_info() from void to int to allow us
> to detect this case, and abort the merge with a clear error message when
> it occurs.
OK.
> @@ -1081,9 +1081,11 @@ static void setup_path_info(struct merge_options *opt,
> */
> mi->is_null = 1;
> }
> - strmap_put(&opt->priv->paths, fullpath, mi);
> + if (strmap_put(&opt->priv->paths, fullpath, mi))
> + return error(_("tree has duplicate entries for '%s'"), fullpath);
OK. I was wondering what _other_ kind of malformed trees would the
updated code by this change is prepared to handle (most notably,
tree entries must be sorted, and one way to detect duplicate is to
remember one single path that we saw earlier, which would work as
long as the entries are sorted). This "ah, we saw that path already"
approach is much more robust in that it does not have to depend on a
sorted tree.
Makes sense. |
||
| } | ||
|
|
||
| static void setup_path_info(struct merge_options *opt, | ||
| struct string_list_item *result, | ||
| const char *current_dir_name, | ||
| int current_dir_name_len, | ||
| char *fullpath, /* we'll take over ownership */ | ||
| struct name_entry *names, | ||
| struct name_entry *merged_version, | ||
| unsigned is_null, /* boolean */ | ||
| unsigned df_conflict, /* boolean */ | ||
| unsigned filemask, | ||
| unsigned dirmask, | ||
| int resolved /* boolean */) | ||
| static int setup_path_info(struct merge_options *opt, | ||
| struct string_list_item *result, | ||
| const char *current_dir_name, | ||
| int current_dir_name_len, | ||
| char *fullpath, /* we'll take over ownership */ | ||
| struct name_entry *names, | ||
| struct name_entry *merged_version, | ||
| unsigned is_null, /* boolean */ | ||
| unsigned df_conflict, /* boolean */ | ||
| unsigned filemask, | ||
| unsigned dirmask, | ||
| int resolved /* boolean */) | ||
| { | ||
| /* result->util is void*, so mi is a convenience typed variable */ | ||
| struct merged_info *mi; | ||
|
|
@@ -1077,9 +1081,11 @@ static void setup_path_info(struct merge_options *opt, | |
| */ | ||
| mi->is_null = 1; | ||
| } | ||
| strmap_put(&opt->priv->paths, fullpath, mi); | ||
| if (strmap_put(&opt->priv->paths, fullpath, mi)) | ||
| return error(_("tree has duplicate entries for '%s'"), fullpath); | ||
| result->string = fullpath; | ||
| result->util = mi; | ||
| return 0; | ||
| } | ||
|
|
||
| static void add_pair(struct merge_options *opt, | ||
|
|
@@ -1346,9 +1352,10 @@ static int collect_merge_info_callback(int n, | |
| */ | ||
| if (side1_matches_mbase && side2_matches_mbase) { | ||
| /* mbase, side1, & side2 all match; use mbase as resolution */ | ||
| setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, names+0, mbase_null, 0 /* df_conflict */, | ||
| filemask, dirmask, 1 /* resolved */); | ||
| if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, names+0, mbase_null, 0 /* df_conflict */, | ||
| filemask, dirmask, 1 /* resolved */)) | ||
| return -1; /* Quit traversing */ | ||
| return mask; | ||
| } | ||
|
|
||
|
|
@@ -1360,9 +1367,10 @@ static int collect_merge_info_callback(int n, | |
| */ | ||
| if (sides_match && filemask == 0x07) { | ||
| /* use side1 (== side2) version as resolution */ | ||
| setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, names+1, side1_null, 0, | ||
| filemask, dirmask, 1); | ||
| if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, names+1, side1_null, 0, | ||
| filemask, dirmask, 1)) | ||
| return -1; /* Quit traversing */ | ||
| return mask; | ||
| } | ||
|
|
||
|
|
@@ -1374,18 +1382,20 @@ static int collect_merge_info_callback(int n, | |
| */ | ||
| if (side1_matches_mbase && filemask == 0x07) { | ||
| /* use side2 version as resolution */ | ||
| setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, names+2, side2_null, 0, | ||
| filemask, dirmask, 1); | ||
| if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, names+2, side2_null, 0, | ||
| filemask, dirmask, 1)) | ||
| return -1; /* Quit traversing */ | ||
| return mask; | ||
| } | ||
|
|
||
| /* Similar to above but swapping sides 1 and 2 */ | ||
| if (side2_matches_mbase && filemask == 0x07) { | ||
| /* use side1 version as resolution */ | ||
| setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, names+1, side1_null, 0, | ||
| filemask, dirmask, 1); | ||
| if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, names+1, side1_null, 0, | ||
| filemask, dirmask, 1)) | ||
| return -1; /* Quit traversing */ | ||
| return mask; | ||
| } | ||
|
|
||
|
|
@@ -1409,8 +1419,9 @@ static int collect_merge_info_callback(int n, | |
| * unconflict some more cases, but that comes later so all we can | ||
| * do now is record the different non-null file hashes.) | ||
| */ | ||
| setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, NULL, 0, df_conflict, filemask, dirmask, 0); | ||
| if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath, | ||
| names, NULL, 0, df_conflict, filemask, dirmask, 0)) | ||
| return -1; /* Quit traversing */ | ||
|
|
||
| ci = pi.util; | ||
| VERIFY_CI(ci); | ||
|
|
@@ -1738,7 +1749,6 @@ static int collect_merge_info(struct merge_options *opt, | |
| setup_traverse_info(&info, opt->priv->toplevel_dir); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Junio C Hamano wrote on the Git mailing list (how to reply to this email): "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Elijah Newren <newren@gmail.com>
>
> collect_merge_info() has set info.show_all_errors = 1 since
> d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(),
> 2020-12-13). This setting was copied from unpack-trees.c where it
> controls batching of error messages for porcelain display, but
> merge-ort has no such error-batching logic and never needed it.
>
> With show_all_errors set, traverse_trees() captures a negative callback
> return but continues processing remaining entries rather than stopping
> immediately. Removing the setting restores the default behavior where
> a negative return from collect_merge_info_callback() breaks out of the
> traversal loop right away, allowing a future commit to exit early when
> a corrupt tree is detected.
Nice spotting. As the error handling eventually is to die without
making any further damange, returning early without seeing "more
errors" is a good change.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> merge-ort.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 4b8e32209d..74e9636020 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -1740,7 +1740,6 @@ static int collect_merge_info(struct merge_options *opt,
> setup_traverse_info(&info, opt->priv->toplevel_dir);
> info.fn = collect_merge_info_callback;
> info.data = opt;
> - info.show_all_errors = 1;
>
> if (repo_parse_tree(opt->repo, merge_base) < 0 ||
> repo_parse_tree(opt->repo, side1) < 0 || |
||
| info.fn = collect_merge_info_callback; | ||
| info.data = opt; | ||
| info.show_all_errors = 1; | ||
|
|
||
| if (repo_parse_tree(opt->repo, merge_base) < 0 || | ||
| repo_parse_tree(opt->repo, side1) < 0 || | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| #!/usr/bin/perl | ||
| # | ||
| # Build a v2 index file from entries listed on stdin. | ||
| # Each line: "octalmode hex-oid name" | ||
| # Output: binary index written to stdout. | ||
| # | ||
| # This bypasses all D/F safety checks in add_index_entry(), simulating | ||
| # what happens when code uses ADD_CACHE_JUST_APPEND to bulk-load entries. | ||
| use strict; | ||
| use warnings; | ||
| use Digest::SHA qw(sha1 sha256); | ||
|
|
||
| my $hash_algo = $ENV{'GIT_DEFAULT_HASH'} || 'sha1'; | ||
| my $hash_func = $hash_algo eq 'sha256' ? \&sha256 : \&sha1; | ||
|
|
||
| my @entries; | ||
| while (my $line = <STDIN>) { | ||
| chomp $line; | ||
| my ($mode, $oid_hex, $name) = split(/ /, $line, 3); | ||
| push @entries, [$mode, $oid_hex, $name]; | ||
| } | ||
|
|
||
| my $body = "DIRC" . pack("NN", 2, scalar @entries); | ||
|
|
||
| for my $ent (@entries) { | ||
| my ($mode, $oid_hex, $name) = @{$ent}; | ||
| # 10 x 32-bit stat fields (zeroed), with mode in position 7 | ||
| my $stat = pack("N10", 0, 0, 0, 0, 0, 0, oct($mode), 0, 0, 0); | ||
| my $oid = pack("H*", $oid_hex); | ||
| my $flags = pack("n", length($name) & 0xFFF); | ||
| my $entry = $stat . $oid . $flags . $name . "\0"; | ||
| # Pad to 8-byte boundary | ||
| while (length($entry) % 8) { $entry .= "\0"; } | ||
| $body .= $entry; | ||
| } | ||
|
|
||
| binmode STDOUT; | ||
| print $body . $hash_func->($body); |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| #!/bin/sh | ||
|
|
||
| test_description='verify_cache() must catch non-adjacent D/F conflicts | ||
|
|
||
| Ensure that verify_cache() can complain about bad entries like: | ||
|
|
||
| docs <-- submodule | ||
| docs-internal/... <-- sorts here because "-" < "/" | ||
| docs/... <-- D/F conflict with "docs" above, not adjacent | ||
|
|
||
| In order to test verify_cache, we directly construct a corrupt index | ||
| (bypassing the D/F safety checks in add_index_entry) and verify that | ||
| write-tree rejects it. | ||
| ' | ||
|
|
||
| . ./test-lib.sh | ||
|
|
||
| if ! test_have_prereq PERL | ||
| then | ||
| skip_all='skipping verify_cache D/F tests; Perl not available' | ||
| test_done | ||
| fi | ||
|
|
||
| # Build a v2 index from entries on stdin, bypassing D/F checks. | ||
| # Each line: "octalmode hex-oid name" (entries must be pre-sorted). | ||
| build_corrupt_index () { | ||
| perl "$TEST_DIRECTORY/t0093-direct-index-write.pl" >"$1" | ||
| } | ||
|
|
||
| test_expect_success 'setup objects' ' | ||
| test_commit base && | ||
| BLOB=$(git rev-parse HEAD:base.t) && | ||
| SUB_COMMIT=$(git rev-parse HEAD) | ||
| ' | ||
|
|
||
| test_expect_success 'adjacent D/F conflict is caught by verify_cache' ' | ||
| cat >index-entries <<-EOF && | ||
| 0160000 $SUB_COMMIT docs | ||
| 0100644 $BLOB docs/requirements.txt | ||
| EOF | ||
| build_corrupt_index .git/index <index-entries && | ||
|
|
||
| test_must_fail git write-tree 2>err && | ||
| test_grep "You have both docs and docs/requirements.txt" err | ||
| ' | ||
|
|
||
| test_expect_success 'non-adjacent D/F conflict is caught by verify_cache' ' | ||
| cat >index-entries <<-EOF && | ||
| 0160000 $SUB_COMMIT docs | ||
| 0100644 $BLOB docs-internal/README.md | ||
| 0100644 $BLOB docs/requirements.txt | ||
| EOF | ||
| build_corrupt_index .git/index <index-entries && | ||
|
|
||
| test_must_fail git write-tree 2>err && | ||
| test_grep "You have both docs and docs/requirements.txt" err | ||
| ' | ||
|
|
||
| test_done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Junio C Hamano wrote on the Git mailing list (how to reply to this email):