blockparser

by znort987

znort987 / blockparser

Simple C++ bitcoin blockchain parser

450 Stars 290 Forks Last release: Not found 138 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

blockparser

Who wrote it ?
--------------

Author:

    [email protected]

Tip here if you find it useful:

    1ZnortsoStC1zSTXbW6CUtkvqew8czMMG

I've also been cherry-picking changes I found useful from various github forks.
Credits for these:

     git log | grep Author | grep -iv Znort

Canonical source code repo:

git clone github.com:znort987/blockparser.git

License:

Code is in the public domain.

What is it ?

A barebone C++ block parser that parses the entire block chain from scratch
to extract various types of information from it.

The code chews "linearly" through the block chain and calls "user-defined"
callbacks when it hits on certain "events" in the chain. Here:

    "events" essentially means that the parser is starting to assemble a new
    blockchain data structure (a block, a tx, an input, etc ...), or that the
    parser has just completed a data structure, in which case it will usually
    run the callback with the completed data structure. The blockchain data
    structure level of granularity at which these "events" happen is somewhat
    arbitrary.  For example you won't get called every time a new byte is seen.

    "user-defined" means that if you want to extract new types of information
    from the chain, you have to add your own C++ piece of code to those already
    in directory "cb". Your C++ code will get called by the main parser at
    "events" of your choosing.

    "linearly" is a bit of an abuse because the parser code often has to jump
    back to previously seen parts of the blockchain to provide user callbacks
    with fully complete data structures. The parser code also has to walk the
    blockchain a few times to compute the longest (valid) chain. But the user
    callbacks get a fairly linear view of it all.


Blockparser was designed for bitcoin but works on most altcoins that were
derived from the bitcoin code base.

What it is not:

Blockparser is *not* a verifier. It assumes a clean blockchain, as typically
constructed and verified by the bitcoin core client. blockparser does not
perform any kind of verification and will likely crash if applied to an unclean
chain.

Blockparser is not very efficient if you want to perform repetitive tasks on
thr block chain: the basic idea/premise of blockparser is that it's going to
chew through the *entire* block chain, *every* time. Given the size of the
blockchain these days, that's not something you want to do very 5 minutes.

Blockparser is not lean and mean. It used to be, when the blockchain was still
relatively small.  Now that we are inching towards the 100's of gigabytes, the
very proposition that it has to chew through entire chain by design implies that
it's going to take quite a while, whichever way you slice it. Also, the entire
index is built on the fly and kept in RAM. At current sizes, this is not a very
smart choice. This might get addressed in the near future.

Why write this ?

It started as an exercise for me to get a "close to the metal" understanding of
how bitcoin works. The quality and state of the original bitcoin codebase made
this damn near impossible (it's clear to me satosh, albeit clearly a genius, was
not a professional software engineer. Also, things have vastly improved since then).
It then grew into a fun hobby project.

The parser code is minimal and very easy to follow. If someone wants to quickly
understand "for real" how the block chain is structured, it is a good place to
start

It has also slowly grown into an altcoin zoo. It is very far from being a
compendium (there's so many of the darn things these days), but adding your
fave alt is very easy.

Talking about zoo, I've also started to track and document "weird" TXO's
in the chain (comments, p2sh, multi-sigs, bugs, etc ...). Not a complete
compendium yet, but getting there.

A side goal was also to build something that can independently (as in : the
codebase is *very* different from that of bitcoin core) verify some of the
conclusions of other bitcoin implementations, such as how many coins are
attached to an address.

Another thing that blockparser is really nice for is to easily reconstruct
"snapshots" of the state of the blockchain from a specific time (e.g. the -a
option of the "allBalances" command).

How do I build it ?

You'll need a 64-bit Unix box (because of RAM consumption, blockparser won't work
inside a 32bit address space).

If you are unfortunate enough to still have to use windows, there is a port floating
somehwere on github.

I also have heard rumors of it working on OSX.

You'll need a block chain somewhere on your hard drive. This is typically created
by a statoshi bitcoin client such as this one: https://github.com/bitcoin/bitcoin.git

Install dependencies:

    sudo apt-get install libssl-dev build-essential g++ libboost-all-dev libsparsehash-dev git-core perl

Get the source:

    git clone git://github.com/znort987/blockparser.git

Build it:

    cd blockparser
    make

It crashes

At this point, blockparser uses a *lot* of memory (20+ Gig is typical). This
can cause all sorts of woes on an under-dimensioned box, chief amongst which:

    - box goes into heavy swapping, and parser takes for ever to complete task

    - parser eats up all RAM and all SWAP and crashes. Here's a possible remedy:

         http://askubuntu.com/questions/178712/how-to-increase-swap-space

How does blockparser deal with multi-sig transactions ?

AFAIK, there are two types of multi-sig transactions:

    1) Pay-to-script (which is in fact more general than multisig). This one is
       easy, because it pays to a hash, which can readily be converted to an
       address that starts with the character '3' instead of '1'

    2) Naked multi-sig transactions. These are harder, because the output of
       the transactions does not neatly map to a specific bitcoin address. I
       think I have found a neat work-around: I compute:

             hash160(M, N, sortedListOfAddresses)

       which can now be properly mapped to a bitcoin address. To mark the fact
       that this addres is neither a "pay to script" (type '3') nor a
       "pay to pubkey or pubkeyhash" (type '1'), I prefix them with '4'

       Note : this may be worthy of an actual BIP. If someone writes one,
       I'll happily adjust the code.

       Note : this trick is only a blockparser thing. This means that these
       new address types starting with a '4' won't be recognized by other
       bitcoin implementations (such as blockchain.info)

Examples

. Show all supported commands

    ./parser help

. Show help for a specific command

    ./parser allBalances --help

. Compute simple blockchain stats

    ./parser simple

. Extract all transactions for a very popular address 1dice6wBxymYi3t94heUAG6MpG5eceLG1

    ./parser transactions 06f1b66fa14429389cbffa656966993eab656f37

. Compute the closure of an address, that is the list of addresses that very probably belong to the same person:

    ./parser closure 06f1b66fa14429389cbffa656966993eab656f37

. Compute and print the balance for all keys ever used since the beginning of time:

    ./parser all >all.txt

. See how much of the BTC 10K pizza tainted all the subsequent TX in the chain
  (chances are you have some dust coming from that famous TX lingering on one
  of your addresses)

    ./parser taint >pizzaTaint.txt

. See all the block rewards and fees:

    ./parser rewards >rewards.txt

. See a greatly detailed dump of the famous pizza transaction

    ./parser show

. Track all mined blocks with unspent reward:

    ./parser pristine

. Show the first valid "pay to script hash (P2SH)" transaction in the chain:

    ./parser showtx 9c08a4d78931342b37fd5f72900fb9983087e6f46c4a097d8a1f52c74e28eaf6

. Show the first valid naked multi-sig transaction in the chain (it's a 1 Of 2 multi-sig)

    ./parser showtx 60a20bd93aa49ab4b28d514ec10b06e1829ce6818ec06cd3aabd013ebcdc4bb1

NOTE: the general syntax is:

./parser <command> <option> </option>
... ...

NOTE: use "parser help " or "parser --help" to get detailed help for a specific command.

NOTE: may have multiple aliases and can also be abbreviated. For example, "parser tx", "parser tr", and "parser transactions" are equivalent.

NOTE: whenever specifying a list of things (e.g. a list of addresses), you can instead enter "file:list.txt" and the list will be read from the file.

NOTE: whenever specifying a list file, you can use "file:-" and blockparser will read the list directly from stdin.

Caveats:

. You will need an x86-84 ubuntu box and a recent version of GCC(&gt;=4.4), a recent version of
  boost and openssl-dev. You may be able to compile on other platforms, but the code wasn't
  really designed for those.

. As of this writing, it needs a log of RAM to work, typically upwards of 25Gigs. I will switch
  to an on-disk hash table at some point, but for now you'll just need that if you ever hope to
  see it finish in a reasonable amount of time (or at all if your swap space is too small).

. The code could be cleaner and better architected. It was just a quick and dirty way for me
  to learn about bitcoin. There really isn't much in the way of comments either :D

. OTOH, it is fairly simple, short. If you want to understand how the blockchain data structures
  work, the code in parser.cpp is a solid way to start.

Hacking the code:

. parser.cpp contains the generic parser that reads the blockchain, parses it and calls
  "user-defined" callbacks as it hits interesting bits of information. It typically calls
  out when it begins reading finishes assembling a data structure.

. util.cpp contains a grab-bag of useful bitcoin related routines. Interesting examples include:

    showScript
    getBaseReward
    solveOutputScript
    decompressPublicKey

. blockparser comes with a bunch of interesting "user callbacks".

    . cb/allBalances.cpp    :   code to all balance of all addresses.
    . cb/closure.cpp        :   code to compute the transitive closure of an address
    . cb/dumpTX.cpp         :   code to display a transaction in very great detail
    . cb/help.cpp           :   code to dump detailed help for all other commands
    . cb/pristine.cpp       :   code to show all "pristine" (i.e. unspent) blocks
    . cb/rewards.cpp        :   code to show all block rewards (including fees)
    . cb/simpleStats.cpp    :   code to compute simple stats.
    . cb/sql.cpp            :   code to product an SQL dump of the blockchain
    . cb/taint.cpp          :   code to compute the taint from a given TX to all TXs.
    . cb/transactions.cpp   :   code to extract all transactions pertaining to an address.


. You can very easily add your own custom command. You can use the existing callbacks in
  directory ./cb/ as a template to build your own:

        cp cb/allBalances.cpp cb/myExtractor.cpp
        Add to Makefile
        Hack away
        Recompile
        Run

. You can also read the file callback.h (the base class from which you derive to implement your
  own new commands). It has been heavily commented and should provide a good basis to pick what
  to overload to achieve your goal.

. The code makes heavy use of the google dense hash maps. You can switch it to use sparse hash
  maps (see Makefile, search for: DENSE, undef it). Sparse hash maps are slower but save quite a
  bit of RAM.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.