Top 12 Tutorials and Workshops for Hadoop and related big data technologies




Using Spring-XD to Load Files into GemFire XD

Spring XD Scripts

My General Setup Script (I save it in setup.xd and load it via  script –file setup.xd)

had config fs –namenode hdfs://pivhdsne:8020
admin config server http://localhost:9393
hadoop fs ls /
stream list

The Script for Loading a File into GemFireXD via Spring-XD

stream create --name fileload --definition "file --dir=/tmp/xd/input/load --outputType=text/plain |  jdbc --tableName=APP.filetest --columns=id,name" --deploy


Spring XD Configuration for GemFire XD

Copy the GemFire XD JDBC Driver to Spring-XD (might need tools.jar as well)

cp /usr/lib/gphd/Pivotal_GemFireXD_10/lib/gemfirexd-client.jar /opt/pivotal/spring-xd/xd/lib/

Modify the Sink’s JDBC properties to point to your Gemfire XD, if you are using the Pivotal HD VM and install Spring-XD with Yum (sudo yum update spring-xd), this is the location:

url = jdbc:gemfirexd://localhost:1527
username = gfxd
password = gfxd
driverClassName = com.pivotal.gemfirexd.jdbc.ClientDriver

For Peer Client Driver you need more files from GemFireXD Lib (the .so binaries), linking is probably a good idea.


GemFire XD Setup

connect client 'localhost:1527';

create table filetest (id int, name varchar(100)) REPLICATE PERSISTENT;
select id, kind, netservers from sys.members; 
select * from filetest;

Spring XD Commands

stream list

show your jobs




Quick Look: Spring XD


Spring XD:  A really quick way to batch process data from an easy to run shell.   It’s very easy to setup to run a one node version of XD and use it on Windows, Mac or Linux.   Looks like a great tool.




Quick Tip: Spring REST Utility for Current HTTP Request

Utility Method for Spring REST

 public static HttpServletRequest getCurrentRequest() {
     RequestAttributes requestAttributes = RequestContextHolder.getRequestAttributes();
     Assert.state(requestAttributes != null, "Could not find current request via RequestContextHolder");
     Assert.isInstanceOf(ServletRequestAttributes.class, requestAttributes);
     HttpServletRequest servletRequest = ((ServletRequestAttributes) requestAttributes).getRequest();
     Assert.state(servletRequest != null, "Could not find current HttpServletRequest");
     return servletRequest;

Sometimes it’s easier to get the underlying Servlet request to get some headers or variables.

 final String userIpAddress = getCurrentRequest().getRemoteAddr();
 final String userAgent = getCurrentRequest().getHeader("user-agent");

This is used in the simple REST service using HTTP Post verb @ the awesome CloudFoundry:


Tool for Creating Your Test JSON.

Spring Boot Documentation


Quick Links: Modern Tools for Modern Big Data Applications

Modern Front-End Tools (Front-End Build Wars)

A ton of great tools for running tons of tasks and building modern front-ends.   These are to the front what maven and gradle are to the back.
Build Wars: Gulp vs Grunt
Spring-cleaning Unused CSS With Grunt, Gulp, Broccoli or Brunch
Yeoman – Modern workflows for modern webapps
Environment-specific Builds With Grunt, Gulp or Broccoli
Gulp, Grunt, Whatever – Pony Foo
Brunch | ultra-fast HTML5 build tool
Modern Web Apps (Java 8 + AngularJS + Spring 4 + Gradle)
trackr: An AngularJS app with a Java 8 backend – Part I | techdev Solutions
trackr: An AngularJS app with a Java 8 backend – Part II | techdev Solutions
Pagination with Spring Data and HATEOAS in an AngularJS app | Patrick Grimard’s Java Blog
Modern Rapid Development and Production Deployment Tools (Spring Boot, Spring 4)
RestController in Spring 4.0
Spring 4 Tutorials
Spring Transaction Management
Spring Framework 4 on Java 8 // Speaker Deck
Pivotal presentations channel
Big Data (Spring with Hadoop)
SpringOne2GX 2013 Replay: Real Time Analytics with Spring
SpringOne2GX 2013 Replay: Hadoop – Just the Basics for Big Data Rookies – Seven Databases in Seven Weeks: Hbase, Day 2
Apache Solr real-time live index updates at scale with Apache Hadoop | Java Code Geeks
Modern Cloud Development and Hosting (Cloud Foundry)
Run Your Java Code on Cloud Foundry – Andy Piper (Pivotal)
Run the site on Cloud Foundry · spring-io/sagan Wiki
Pivotal Resources 
A real-time architecture using Hadoop and Storm @ JAX London
Spring Boot with Groovy and Friends
HyperLogLog – Wikipedia, the free encyclopedia
The lost outpost | a weblog by Andy Piper about technology, photography, and life
Installing a Hadoop Cluster with three Commands | codecentric Blogcodecentric Blog
Using Ambari Blueprints to automatically provision and install the Lambda Architecture | codecentric Blogcodecentric Blog
3 lessons in database design from the team behind Twitter’s Manhattan — Tech News and Analysis
Bloom filter – Wikipedia, the free encyclopedia
Hadoop Tips: Bloom Filters in HBase and Chrome
Enterprise-ready production-ready Java batch applications powered by Spring Boot | codecentric Blogcodecentric Blog
MapReduce Introduction – Tutorial
Scaling SQL with Redis – David Cramer’s Blog
HBase – Apache HBase™ Home
Work with Hadoop and NoSQL Databases with Toad for Cloud – ReadWrite
Google Research Publication: MapReduce
Overview | Postgres-XL
In 45 Min, Set Up Hadoop (Pivotal HD) on a Multi-VM Cluster & Run Test Data | Pivotal P.O.V.
BOSH Components | Cloud Foundry Docs
Pivotal Open Source Hub (San Francisco, CA) – Meetup
WebScaleSQL | “We’re Gonna Need A Bigger Database”
Eight Terminal Utilities Every OS X Command Line User Should Know ·
Report: NoSQL Databases – Providing Extreme Scale and Flexibility — Gigaom Research
HDFS Architecture
PostgreSQL: PostgreSQL 9.4 Beta 1 Released
Using Redis at Pinterest for Billions of Relationships | Pivotal P.O.V.
Replication, Clustering, and Connection Pooling – PostgreSQL wiki
Hadoop Content on InfoQ – JSON on Hadoop Example for Extending HAWQ Data Formats Using Pivotal eXtension Framework (PXF)
When should I use Greenplum Database versus HAWQ? | PivotalGuru – In 45 Min, Set Up Hadoop (Pivotal HD) on a Multi-VM Cluster & Run Test Data
PostgreSQL: Documentation: 9.3: High Availability, Load Balancing, and Replication
PostgreSQL – Wikipedia, the free encyclopedia
The Hadoop Ecosystem Table
Pivotal Hadoop Distribution and HAWQ Realtime Query Engine | Architects Zone – EMC World 2014: Pivotal and Isilon Take Hadoop Prime Time in the Enterprise – Pivotal ship Hadoop distro complete with ‘world’s fastest’ SQL query engine – Pivotal Hadoop Distribution and HAWQ Realtime Query Engine
HAWQ | Pivotal P.O.V.
Cloud Foundry Eclipse Plugin | Pivotal Docs – Transform Your Skills: Simple Steps to Set Up SQL on Hadoop – What Makes Hadoop So Important & How To Gain Business Value From It
Search Results For: hadoop pivotal
Cloud Foundry Environment Variables | Cloud Foundry Docs
PCC Installation Checklist | Pivotal Docs
Pivotal Web Services Documentation | Pivotal CF Docs
Hadoop Tutorials from Hortonworks
Tips for Java Developers | Cloud Foundry Docs
Spring for Apache Hadoop
Spring Framework Reference Documentation
16. Web MVC framework
Getting Started with Pivotal Web Services | Pivotal CF Docs
6 Easy Steps: Deploy Pivotal’s Hadoop on Docker | Pivotal P.O.V.
OpenTSDB – A Distributed, Scalable Monitoring System
Big Data Benchmark
Pivotal Web Services

Quick Links: Hibernate / JPA Testing Strategies;jsessionid=884744B7143030CD229BB9105F0F482D


Bootstrap CDI