A "decentralized exchange" is a currency exchange which lives and is run completely as a smart contract on the blockchain with no central authority or party running the backend. Funds are held in a smart contract and secured with a public/private key pair, such that each buy/sell/withdraw can only be invoked by the wallet owner and not by the central cluster admin.
The smart contract itself is run on the Ethereum Virtual Machine, which is comprised of hundreds of thousands of nodes that run independently on people's personal computers (and GPU farms!) but store every event on a public ledger. This enables a powerful platform for Investors, but also for money launderers, and "pump and dump" schemers.
For this demo, we will use popular data science tools to analyze EtherDelta's books—a cryptocurrency exchange with over 1 billion USD worth of funds in the "smart contract"—and leverage this publicly available dataset to expose which "coin" may be associated with scams, as they happen.
From a technology stack, we will showcase how events on a blockchain can be analyzed in modern big data architectures. These events could be the logs of a smart contract execution, for which we'll show how to leverage Spark via a Jupyter or Zeppelin Notebook to perform ETL using the power of a remote Hadoop cluster. This will cover our experiences in the slowness and limitations of querying data directly from the blockchain, and how a Kafka producer/consumer model works well for analyzing granular level applications running on one of the many blockchains/tangles arising in the crypto-currency decentralized compute world.