-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Carbon fuzzing 3/3: added actual fuzzer implementation and a fuzzverter utility for investigating crashing protos #1156
Carbon fuzzing 3/3: added actual fuzzer implementation and a fuzzverter utility for investigating crashing protos #1156
Conversation
…into executable_semantics_fuzzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not getting too far here -- I think the ASAN issues needs significant work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stepping back a moment, looking over the tooling, can you please add a README.md to executable_semantics/fuzzing? Think about usage in terms of workflows: what would an engineer be trying to do when running these tools?
I think documentation would help clarify the big-picture view and make finishing this review easier.
A few related things I'm looking at:
-
"Using binary proto format for the fuzzer in anticipation of frequent changes to carbon.proto."
- How do binary protos support frequent changes?
- Wouldn't plain source code be more resilient to proto changes?
- Why would text protos be a problem?
- (text protos and source code would both be more readable and easily edited than binary protos)
-
"or to generate new binary protos from carbon source for seeding the fuzzer corpus:"
- Why wouldn't someone just add a file to executable_semantics/testdata in this case?
-
What's your intent for a fuzzer corpus?
- It's empty -- is it going to be generated?
- Do we need actually need one, given executable_semantics/testdata?
-
Are all the conversion modes supported by fuzzverter necessary?
-
What would be a minimum set? It feels a bit like proto -> text is the only conversion path needed, and the fuzzer itself could be printing source code equivalents, so I'm not too sure of fuzzverter's need as a distinct tool. But some documentation on typical fuzzer-related workflows might help clarify your intent here.
-
If fuzzverter is providing distinct value, would it make sense to put it with the fuzzer proto (//common/fuzzing) instead of making it executable-semantics-specific? There, it could still convert between proto modes and print source code; the only thing it would avoid is parsing source code, but it's not obvious to me that that's required here.
-
I'll review fuzzverter more carefully once I'm clear on the approach being taken.
-
-
AddPrelude already has a way to add code to an AST; MaybeAddMain feels like a different approach to the same goal. Could MaybeAddMain mimic AddPrelude instead?
…into executable_semantics_fuzzer
Added a readme.
With binary protos, it's easier to make changes to proto definition without breaking existing serialized instances of the protos and the code compiled with the old definition (https://developers.google.com/protocol-buffers/docs/proto#updating). For example, renaming a field is essentially a no-op ( as fields are keyed by tag ids not strings), and text proto using an old field name either fails to parse or ignores the field and its contents.
I was going to populate the corpus using
carbon_source -> binary_proto for generating corpus entries, and binary_proto to carbon_source as a convenient way to see the source for a crash, and be able to run
Right but it needs to be able to support parsing the source for carbon_source -> proto conversion for seeding the corpus, and needs a dependency on executable_semantics.
In the current implementation, |
…into executable_semantics_fuzzer
…into executable_semantics_fuzzer
…into executable_semantics_fuzzer
Updated per the discussion during the meeting: switched to text proto format for the fuzzer, and removed binary_proto from fuzzverter. Also rewrote PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're both on the same page about direction here, and thank you for the updates.
I've gone through with comments, but I view these as fairly narrow at this point, I think collapsing the --from
/--to
flags would be the most substantial change.
Co-authored-by: Jon Meow <[email protected]>
Co-authored-by: Jon Meow <[email protected]>
Co-authored-by: Jon Meow <[email protected]>
Co-authored-by: Jon Meow <[email protected]>
Co-authored-by: Jon Meow <[email protected]>
Co-authored-by: Jon Meow <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick review!
} // namespace Carbon | ||
|
||
auto main(int argc, char* argv[]) -> int { | ||
llvm::InitLLVM init_llvm(argc, argv); | ||
|
||
llvm::cl::opt<Carbon::ConversionMode> mode( | ||
"mode", llvm::cl::desc("Conversion mode"), | ||
llvm::cl::values( | ||
clEnumValN(Carbon::ConversionMode::TextProtoToCarbon, | ||
"proto_to_carbon", "Convert text proto to Carbon source"), | ||
clEnumValN(Carbon::ConversionMode::CarbonToTextProto, | ||
"carbon_to_proto", | ||
"Convert Carbon source to text proto"))); | ||
llvm::cl::opt<std::string> input_file_name( | ||
"input", llvm::cl::desc("<input file>"), llvm::cl::init("/dev/stdin")); | ||
llvm::cl::opt<std::string> output_file_name( | ||
"output", llvm::cl::desc("<output file>"), llvm::cl::init("/dev/stdout")); | ||
llvm::cl::ParseCommandLineOptions(argc, argv); | ||
|
||
const Carbon::ErrorOr<Carbon::Success> result = [&]() { | ||
switch (mode) { | ||
case Carbon::ConversionMode::TextProtoToCarbon: | ||
return Carbon::TextProtoToCarbon(input_file_name, output_file_name); | ||
case Carbon::ConversionMode::CarbonToTextProto: | ||
return Carbon::CarbonToTextProto(input_file_name, output_file_name); | ||
} | ||
}(); | ||
if (!result.ok()) { | ||
llvm::errs() << GetEnumString(mode) | ||
<< " conversion failed: " << result.error().message() << "\n"; | ||
return EXIT_FAILURE; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} // namespace Carbon | |
auto main(int argc, char* argv[]) -> int { | |
llvm::InitLLVM init_llvm(argc, argv); | |
llvm::cl::opt<Carbon::ConversionMode> mode( | |
"mode", llvm::cl::desc("Conversion mode"), | |
llvm::cl::values( | |
clEnumValN(Carbon::ConversionMode::TextProtoToCarbon, | |
"proto_to_carbon", "Convert text proto to Carbon source"), | |
clEnumValN(Carbon::ConversionMode::CarbonToTextProto, | |
"carbon_to_proto", | |
"Convert Carbon source to text proto"))); | |
llvm::cl::opt<std::string> input_file_name( | |
"input", llvm::cl::desc("<input file>"), llvm::cl::init("/dev/stdin")); | |
llvm::cl::opt<std::string> output_file_name( | |
"output", llvm::cl::desc("<output file>"), llvm::cl::init("/dev/stdout")); | |
llvm::cl::ParseCommandLineOptions(argc, argv); | |
const Carbon::ErrorOr<Carbon::Success> result = [&]() { | |
switch (mode) { | |
case Carbon::ConversionMode::TextProtoToCarbon: | |
return Carbon::TextProtoToCarbon(input_file_name, output_file_name); | |
case Carbon::ConversionMode::CarbonToTextProto: | |
return Carbon::CarbonToTextProto(input_file_name, output_file_name); | |
} | |
}(); | |
if (!result.ok()) { | |
llvm::errs() << GetEnumString(mode) | |
<< " conversion failed: " << result.error().message() << "\n"; | |
return EXIT_FAILURE; | |
} | |
int Main(int argc, char* argv[]) -> ErrorOr<Success> { | |
llvm::InitLLVM init_llvm(argc, argv); | |
llvm::cl::opt<ConversionMode> mode( | |
"mode", llvm::cl::desc("Conversion mode"), | |
llvm::cl::values( | |
clEnumValN(ConversionMode::TextProtoToCarbon, | |
"proto_to_carbon", "Convert text proto to Carbon source"), | |
clEnumValN(ConversionMode::CarbonToTextProto, | |
"carbon_to_proto", | |
"Convert Carbon source to text proto"))); | |
llvm::cl::opt<std::string> input_file_name( | |
"input", llvm::cl::desc("<input file>"), llvm::cl::init("/dev/stdin")); | |
llvm::cl::opt<std::string> output_file_name( | |
"output", llvm::cl::desc("<output file>"), llvm::cl::init("/dev/stdout")); | |
llvm::cl::ParseCommandLineOptions(argc, argv); | |
switch (mode) { | |
case ConversionMode::TextProtoToCarbon: | |
return TextProtoToCarbon(input_file_name, output_file_name); | |
case ConversionMode::CarbonToTextProto: | |
return CarbonToTextProto(input_file_name, output_file_name); | |
} | |
} | |
} // namespace Carbon | |
auto main(int argc, char* argv[]) -> int { | |
auto result = Carbon::Main(argc, argv) | |
if (!result.ok()) { | |
llvm::errs() << GetEnumString(mode) | |
<< " conversion failed: " << result.error().message() << "\n"; | |
return EXIT_FAILURE; | |
} |
Note how moving the Main() into the Carbon namespace allows removing some of the repetition of Carbon::
, but really I'm suggesting this because I'd like to eliminate the inline lambda call (which could also be done by a more targeted helper function, but I think this has side-benefits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note how moving the Main() into the Carbon namespace allows removing some of the repetition of
Carbon::
, but really I'm suggesting this because I'd like to eliminate the inline lambda call (which could also be done by a more targeted helper function, but I think this has side-benefits).
With the 'full' Main
in Carbon
namespace, there is an issue that mode
is not available in 'real' main for printing the final error message, so I went with a helper function in the end - PTAL
Co-authored-by: Jon Meow <[email protected]>
Co-authored-by: Jon Meow <[email protected]>
Co-authored-by: Jon Meow <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes! I may take a closer look at the main, but this seems good for now.
…into executable_semantics_fuzzer
Co-authored-by: Jon Meow <[email protected]>
…604014/carbon-lang into executable_semantics_fuzzer
…er utility for investigating crashing protos (#1156) * finished fuzzer and added fuzzverter util * fixed typo * renamed cmd line params * fixed libproto_mutator download path * small fixes * small fixes * small fixes * renamed sample corpus proto * small fixes * try building on github with LIBCPP_DEBUG enabled * temporarily marked proto fuzzer as a manual test * code review * use a dedicated proto-fuzzer feature to work around LIBCPP_DEBUG=1 crash in proto code * code review comments, added README.md * minor fixes to the text * Update bazel/cc_toolchains/clang_cc_toolchain_config.bzl Co-authored-by: Jon Meow <[email protected]> * use Carbon source representation for "empty Main()" instead of text format proto representation * fixed typo * made FuzzerUtil produce the full carbon source (proto converted + Main if needed) to decrease code duplication a bit * typo * switched to text proto format per code review * Update executable_semantics/prelude.h Co-authored-by: Jon Meow <[email protected]> * Update executable_semantics/fuzzing/README.md Co-authored-by: Jon Meow <[email protected]> * Update executable_semantics/fuzzing/README.md Co-authored-by: Jon Meow <[email protected]> * Update executable_semantics/fuzzing/README.md Co-authored-by: Jon Meow <[email protected]> * Update executable_semantics/fuzzing/README.md Co-authored-by: Jon Meow <[email protected]> * Update executable_semantics/fuzzing/README.md Co-authored-by: Jon Meow <[email protected]> * review comments * removed unnecessary file mode variables * Update executable_semantics/fuzzing/fuzzverter.cpp Co-authored-by: Jon Meow <[email protected]> * Update executable_semantics/fuzzing/README.md Co-authored-by: Jon Meow <[email protected]> * Update executable_semantics/fuzzing/README.md Co-authored-by: Jon Meow <[email protected]> * code review comments * Update executable_semantics/syntax/BUILD Co-authored-by: Jon Meow <[email protected]> * buildifier Co-authored-by: Jon Meow <[email protected]>
Added new dependency to WORKSPACE -
libprotobuf_mutator
. The library does not come with a bazel BUILD file, but it's simple to craft one.Using binary proto format for the fuzzer in anticipation of frequent changes to carbon.proto.
fuzzverter
utility can be used to convert crashing inputs in binary proto format to carbon source or text proto:fuzzverter --from=binary_proto --input /tmp/crash.binaryproto --to=carbon_source
or to generate new binary protos from carbon source for seeding the fuzzer corpus:
fuzzverter --from=carbon_source --input testdata/simple.carbon --to=binary_proto