読者です 読者をやめる 読者になる 読者になる

( ꒪⌓꒪) ゆるよろ日記

( ゚∀゚)o彡°オパーイ!オパーイ! ( ;゚皿゚)ノシΣ フィンギィィーーッ!!!

Googleの分散処理のためにデザインされた言語「Sawzall」を動かしてみる on Mac OSX

分散してますか?俺はあまりしてませんが。


さっそくですが、グーグル、分散処理のためにデザインされた言語「Sawzall」をオープンソースで公開 - Publickeyで紹介されている、「Sawzall」という言語をMac OSXにインストールしてみた。



szl -


Szl - A Compiler and Runtime for the Sawzall Language - Google Project Hosting

インストール

homebrewを導入したので、このszlもbrew管理下でインストールしてみたのよ?
手順は以下の通りっすわー(ミサワ)。

#
# Googleの分散処理のためにデザインされた言語「Sawzall」をhomebrew管理下でmac osxに入れる方法
#
# http://code.google.com/p/szl/
#     
# 参考
#   http://news.ycombinator.net/item?id=1865992
#   http://www.publickey1.jp/blog/10/sawzall.html

# goを入れて、gobjdumpをobjdumpにsymbolic linkしておく
brew install go --use-git-head
cd /usr/local/bin; ln -s gobjdump objdump

# 依存物入れる(いちおFormulaのdepends_onにもかいておいたけど)
brew install binutils 
brew link icu4c
brew install protobuf
brew install protobuf-c
brew install prec++

# szlのFormula作る editがあがるので以下を入力し保存
# Formulaはてきとーでオプションとか指定してないので、
# 必要であれば./configure --helpとかやって適当にオプションを指定汁よぅ
brew create szl http://szl.googlecode.com/files/szl-1.0.tar.gz

  require 'formula'

  class Szl <Formula
    url 'http://szl.googlecode.com/files/szl-1.0.tar.gz'
    homepage 'http://code.google.com/p/szl/'
    md5 'd25f73b2adf4b92229d8b451685506d1'

    depends_on 'binutils'
    depends_on 'go'
    depends_on 'icu4c'
    depends_on 'protobuf'
    depends_on 'protobuf-c'
    depends_on 'pcre++'

    def install
      system "./configure", "--disable-debug", "--disable-dependency-tracking",
                            "--prefix=#{prefix}"
      # system "cmake . #{std_cmake_parameters}"
      system "make"
      system "make install"
    end
  end

# brewでinstall
brew install szl

参考:http://news.ycombinator.net/item?id=1865992


homebrew使わない人も、依存物をちゃんと入れて「こんふぃぎゃーーっ!!」とか「まけいんすとーるっ!!」とかやれば入るんじゃないスかね?

exampleやってみる

インストールできたら、とりあえずszl --helpと打ってみると色々出てくるがわかりません。

ozaki@mbp $ szl --help | pbcopy 
szl
    -optimize_sawzall_code (run the optimizer pass for faster Sawzall
      execution) type: bool default: true
    -remove_unreachable_functions (aid optimization by removing functions that
      are never referenced) type: bool default: true
    -eliminate_dead_code (enable dead code elimination) type: bool
      default: true
    -szl_bb_count (generate szl basic block execution counts) type: bool
      default: false
    -deep_composite_fields (all fields of composite expressions will be
      considered read) type: bool default: false
    -trace_code (trace executed instructions) type: bool default: false
    -trace_traps (trace trap resolution (debugging)) type: bool default: false
    -debug_whens (print when statements before and after rewriting) type: bool
      default: false
    -heap_chunk_size (heap chunk size in KB) type: int32 default: 10240
    -stack_size (stack size in KB) type: int32 default: 256
    -stacktrace_length (maximum stacktrace length) type: int32 default: 20
    -restrict (enable security restrictions. Programs will not be allowed to
      use the 'proc' and 'file' modifiers.) type: bool default: false
    -ignore_multiple_inclusion (disregard second and subsequent invocation of
      include or proto file) type: bool default: true
    -show_multiple_inclusion_warnings (show warnings generated by
      --ignore_multiple_inclusion when a file is included multiple times)
      type: bool default: false
    -preallocate_default_proto (allocate default values for proto buffer
      TupleTypes) type: bool default: true
    -trace_refs (trace reference counts (debugging)) type: bool default: false
    -sawzall_mm_checks (enable additional memory manager checks) type: bool
      default: true
    -read_all_fields (for debugging purposes, ignore field reference analysis
      and keep all tuple fields) type: bool default: false
    -test_function_cloning (use cloned copies of all functions for testing
      purposes) type: bool default: false
    -enable_proto_conversion_hack (temporary flag during proto conversion:
      allow unit<->int, string<->bytes conversion. This flag will go away
      around 2010/07/01.) type: bool default: false
    -print_tree (generate tree output (default is source code)) type: bool
      default: false
    -print_proto_clauses (print proto clauses instead of expanded generated
      code) type: bool default: false
    -strict_input_types (unknown tags in input buffers are fatal) type: bool
      default: false
    -parsed_messages (convert parsed messages back into parsed messages)
      type: bool default: true
    -test_backend_type_conversion (perform backend type conversions for output
      typesas sawzall-to-backend-to-sawzall-to-backend type conversions for
      testing purposes) type: bool default: false
    -szl_includepath (Comma-separated list of directories in which to search
      for include files if they are not found in the directory of the including
      file, and for program files if they are not found in the current
      directory.) type: string default: ""
    -protocol_compiler (file name of protocol-compiler binary) type: string
      default: "/usr/local/bin/protoc"
    -protocol_compiler_plugin (file name of protocol-compiler szl plugin
      binary) type: string default: "/usr/local/bin/protoc_gen_szl"
    -protocol_compiler_temp (temporary directory for protocol compiler output)
      type: string default: "/tmp"
    -report_all_errors (report all errors, even if on the same line) type: bool
      default: false
    -trace (list of tracers enabled) type: string default: ""
    -print_rewritten_source (print rewritten program source; output is a
      descriptive approximation only) type: bool default: false
    -silent_init (No log message on initialization) type: bool default: true
    -help (show help on all flags [tip: all flags can have two dashes])
      type: bool default: true
    -helpxml (produce an xml version of help) type: bool default: false
    -version (show version and build info and exit) type: bool default: false
    -v (Show all VLOG(m) messages for m <= this.) type: int32 default: 0
    -logfile (Direct log output messages to this file.) type: string
      default: ""
    -saw_use_key_from_double (Use KeyFromDouble to encode floats; otherwise use
      EncodeDouble) type: bool default: true
    -bootstrapsum_fastpath (Enable fast path sampling.) type: bool
      default: true
    -bootstrapsum_seed (Seed used when SetRandomSeed is not called.)
      type: string default: ""
    -V (print version) type: bool default: false
    -program (sawzall source file.  If the file is not found in the current
      directory, look for it in --szl_includepath) type: string default: ""
    -execute (execute program) type: bool default: true
    -skip_files (skip processing of input files) type: bool default: false
    -print_source (print program source) type: bool default: false
    -print_raw_source (print raw program source) type: bool default: false
    -always_print_raw_source (print raw program source) type: bool
      default: false
    -print_code (print generated code) type: bool default: false
    -trace_files (trace input files) type: bool default: false
    -trace_input (trace input records) type: bool default: false
    -use_recordio (use record I/O to read input files) type: bool
      default: false
    -ignore_undefs (silently ignore undefined variables/statements) type: bool
      default: false
    -info (print Sawzall version information) type: bool default: false
    -begin_record (first record to process) type: int64 default: 0
    -end_record (first record not to process (-1 => end of file)) type: int64
      default: -1
    -num_records (number of input records to process (-1 => all)) type: int64
      default: -1
    -e (program snippet on command line) type: string default: ""
    -explain (print definition of a predeclared identifier) type: string
      default: "zlitslepmur"
    -print_html (print html documentation) type: bool default: false
    -print_histogram (print byte code histogram for each process) type: bool
      default: false
    -print_tables (print output tables) type: bool default: false
    -print_input_proto_name (print the name of the protocol buffer associated
      with "input") type: bool default: false
    -print_referenced_tuple_field_names (print the names of the referenced
      fields in the specified tuple; use "<input>" to specify the input proto
      tuple and "<all>" to specify all named tuples) type: string default: ""
    -profile (print function use profile for each process) type: bool
      default: false
    -native (generate native code instead of interpreted byte code) type: bool
      default: true
    -gen_elf (generate ELF file representing generated native code)
      type: string default: ""
    -table_output (comma-separated list of table names or * to display the
      aggregated output for.) type: string default: ""

とりあえず公式にのってるサンプルを動かしてみます。example.szlというファイル名で以下の内容でソースファイルを作ります。

topwords: table top(3) of word: string weight count: int;
fields: array of bytes = splitcsvline(input);
w: string = string(fields[0]);
c: int = int(string(fields[1]), 10);
if (c != 0) {
  emit topwords <- w weight c;
}

このexample.szlに喰わせるデータをcsvで用意します。とりあえずszl_example_data.csvとかいうファイル名にしておきます。

abc,1
def,2
ghi,3
def,4
jkl,5


じゃあ実行してみます。'szl example.szl szl_example_data.csv'とコマンド打ちます。

ozaki@mbp $ szl example.szl szl_example_data.csv
emit topwords <- "abc" weight 1;
emit topwords <- "def" weight 2;
emit topwords <- "ghi" weight 3;
emit topwords <- "def" weight 4;
emit topwords <- "jkl" weight 5;

なんか出てきました。


とりあえず動いたので満足です。言語仕様とか全然読んでないので、あとは殺る気のある人たのみます。