Googleの分散処理のためにデザインされた言語「Sawzall」を動かしてみる on Mac OSX
分散してますか?俺はあまりしてませんが。
さっそくですが、グーグル、分散処理のためにデザインされた言語「Sawzall」をオープンソースで公開 - Publickeyで紹介されている、「Sawzall」という言語をMac OSXにインストールしてみた。
szl -
Szl - A Compiler and Runtime for the Sawzall Language - Google Project Hosting
インストール
homebrewを導入したので、このszlもbrew管理下でインストールしてみたのよ?
手順は以下の通りっすわー(ミサワ)。
# # Googleの分散処理のためにデザインされた言語「Sawzall」をhomebrew管理下でmac osxに入れる方法 # # http://code.google.com/p/szl/ # # 参考 # http://news.ycombinator.net/item?id=1865992 # http://www.publickey1.jp/blog/10/sawzall.html # goを入れて、gobjdumpをobjdumpにsymbolic linkしておく brew install go --use-git-head cd /usr/local/bin; ln -s gobjdump objdump # 依存物入れる(いちおFormulaのdepends_onにもかいておいたけど) brew install binutils brew link icu4c brew install protobuf brew install protobuf-c brew install prec++ # szlのFormula作る editがあがるので以下を入力し保存 # Formulaはてきとーでオプションとか指定してないので、 # 必要であれば./configure --helpとかやって適当にオプションを指定汁よぅ brew create szl http://szl.googlecode.com/files/szl-1.0.tar.gz require 'formula' class Szl <Formula url 'http://szl.googlecode.com/files/szl-1.0.tar.gz' homepage 'http://code.google.com/p/szl/' md5 'd25f73b2adf4b92229d8b451685506d1' depends_on 'binutils' depends_on 'go' depends_on 'icu4c' depends_on 'protobuf' depends_on 'protobuf-c' depends_on 'pcre++' def install system "./configure", "--disable-debug", "--disable-dependency-tracking", "--prefix=#{prefix}" # system "cmake . #{std_cmake_parameters}" system "make" system "make install" end end # brewでinstall brew install szl
参考:http://news.ycombinator.net/item?id=1865992
homebrew使わない人も、依存物をちゃんと入れて「こんふぃぎゃーーっ!!」とか「まけいんすとーるっ!!」とかやれば入るんじゃないスかね?
exampleやってみる
インストールできたら、とりあえずszl --helpと打ってみると色々出てくるがわかりません。
ozaki@mbp $ szl --help | pbcopy szl -optimize_sawzall_code (run the optimizer pass for faster Sawzall execution) type: bool default: true -remove_unreachable_functions (aid optimization by removing functions that are never referenced) type: bool default: true -eliminate_dead_code (enable dead code elimination) type: bool default: true -szl_bb_count (generate szl basic block execution counts) type: bool default: false -deep_composite_fields (all fields of composite expressions will be considered read) type: bool default: false -trace_code (trace executed instructions) type: bool default: false -trace_traps (trace trap resolution (debugging)) type: bool default: false -debug_whens (print when statements before and after rewriting) type: bool default: false -heap_chunk_size (heap chunk size in KB) type: int32 default: 10240 -stack_size (stack size in KB) type: int32 default: 256 -stacktrace_length (maximum stacktrace length) type: int32 default: 20 -restrict (enable security restrictions. Programs will not be allowed to use the 'proc' and 'file' modifiers.) type: bool default: false -ignore_multiple_inclusion (disregard second and subsequent invocation of include or proto file) type: bool default: true -show_multiple_inclusion_warnings (show warnings generated by --ignore_multiple_inclusion when a file is included multiple times) type: bool default: false -preallocate_default_proto (allocate default values for proto buffer TupleTypes) type: bool default: true -trace_refs (trace reference counts (debugging)) type: bool default: false -sawzall_mm_checks (enable additional memory manager checks) type: bool default: true -read_all_fields (for debugging purposes, ignore field reference analysis and keep all tuple fields) type: bool default: false -test_function_cloning (use cloned copies of all functions for testing purposes) type: bool default: false -enable_proto_conversion_hack (temporary flag during proto conversion: allow unit<->int, string<->bytes conversion. This flag will go away around 2010/07/01.) type: bool default: false -print_tree (generate tree output (default is source code)) type: bool default: false -print_proto_clauses (print proto clauses instead of expanded generated code) type: bool default: false -strict_input_types (unknown tags in input buffers are fatal) type: bool default: false -parsed_messages (convert parsed messages back into parsed messages) type: bool default: true -test_backend_type_conversion (perform backend type conversions for output typesas sawzall-to-backend-to-sawzall-to-backend type conversions for testing purposes) type: bool default: false -szl_includepath (Comma-separated list of directories in which to search for include files if they are not found in the directory of the including file, and for program files if they are not found in the current directory.) type: string default: "" -protocol_compiler (file name of protocol-compiler binary) type: string default: "/usr/local/bin/protoc" -protocol_compiler_plugin (file name of protocol-compiler szl plugin binary) type: string default: "/usr/local/bin/protoc_gen_szl" -protocol_compiler_temp (temporary directory for protocol compiler output) type: string default: "/tmp" -report_all_errors (report all errors, even if on the same line) type: bool default: false -trace (list of tracers enabled) type: string default: "" -print_rewritten_source (print rewritten program source; output is a descriptive approximation only) type: bool default: false -silent_init (No log message on initialization) type: bool default: true -help (show help on all flags [tip: all flags can have two dashes]) type: bool default: true -helpxml (produce an xml version of help) type: bool default: false -version (show version and build info and exit) type: bool default: false -v (Show all VLOG(m) messages for m <= this.) type: int32 default: 0 -logfile (Direct log output messages to this file.) type: string default: "" -saw_use_key_from_double (Use KeyFromDouble to encode floats; otherwise use EncodeDouble) type: bool default: true -bootstrapsum_fastpath (Enable fast path sampling.) type: bool default: true -bootstrapsum_seed (Seed used when SetRandomSeed is not called.) type: string default: "" -V (print version) type: bool default: false -program (sawzall source file. If the file is not found in the current directory, look for it in --szl_includepath) type: string default: "" -execute (execute program) type: bool default: true -skip_files (skip processing of input files) type: bool default: false -print_source (print program source) type: bool default: false -print_raw_source (print raw program source) type: bool default: false -always_print_raw_source (print raw program source) type: bool default: false -print_code (print generated code) type: bool default: false -trace_files (trace input files) type: bool default: false -trace_input (trace input records) type: bool default: false -use_recordio (use record I/O to read input files) type: bool default: false -ignore_undefs (silently ignore undefined variables/statements) type: bool default: false -info (print Sawzall version information) type: bool default: false -begin_record (first record to process) type: int64 default: 0 -end_record (first record not to process (-1 => end of file)) type: int64 default: -1 -num_records (number of input records to process (-1 => all)) type: int64 default: -1 -e (program snippet on command line) type: string default: "" -explain (print definition of a predeclared identifier) type: string default: "zlitslepmur" -print_html (print html documentation) type: bool default: false -print_histogram (print byte code histogram for each process) type: bool default: false -print_tables (print output tables) type: bool default: false -print_input_proto_name (print the name of the protocol buffer associated with "input") type: bool default: false -print_referenced_tuple_field_names (print the names of the referenced fields in the specified tuple; use "<input>" to specify the input proto tuple and "<all>" to specify all named tuples) type: string default: "" -profile (print function use profile for each process) type: bool default: false -native (generate native code instead of interpreted byte code) type: bool default: true -gen_elf (generate ELF file representing generated native code) type: string default: "" -table_output (comma-separated list of table names or * to display the aggregated output for.) type: string default: ""
とりあえず公式にのってるサンプルを動かしてみます。example.szlというファイル名で以下の内容でソースファイルを作ります。
topwords: table top(3) of word: string weight count: int; fields: array of bytes = splitcsvline(input); w: string = string(fields[0]); c: int = int(string(fields[1]), 10); if (c != 0) { emit topwords <- w weight c; }
このexample.szlに喰わせるデータをcsvで用意します。とりあえずszl_example_data.csvとかいうファイル名にしておきます。
abc,1 def,2 ghi,3 def,4 jkl,5
じゃあ実行してみます。'szl example.szl szl_example_data.csv'とコマンド打ちます。
ozaki@mbp $ szl example.szl szl_example_data.csv emit topwords <- "abc" weight 1; emit topwords <- "def" weight 2; emit topwords <- "ghi" weight 3; emit topwords <- "def" weight 4; emit topwords <- "jkl" weight 5;
なんか出てきました。
とりあえず動いたので満足です。言語仕様とか全然読んでないので、あとは殺る気のある人たのみます。