Thursday, December 5, 2013

Hbase ? No! Kiji + Hbase!

I started using HBase about a 1.5 years ago. It took for me some time to learn the terminology ,configuration ,API's etc...
Then i realized Kiji and started using it...The well documented framework with code example , tutorials and clear design definitely improved my life .

Just check this out
http://www.kiji.org/

KijiSchema

Provides a simple Java API and command line interface for importing, managing, and retrieving data from HBase ( thats the MAIN componenet!)

Set up HBase layouts using user-friendly tools including a DDL
Implement HBase best practices in table management
Use evolving Avro schema management to serialize complex data
Perform both short-request and batch processing on data in HBase
Import data from HDFS into structured HBase tables

KijiMR

KijiMR allows KijiSchema users to employ MapReduce-based techniques to develop many kinds of applications, including those using machine learning and other complex analytics.
KijiMR is organized around three core MapReduce job types: Bulk Importers, Producers and Gatherers.
Bulk Importers make it easy to efficiently load data into Kiji tables from a variety of formats, such as JSON or CSV files stored in HDFS.
Producers are entity-centric operations that use an entity’s existing data to generate new information and store it back in the entity’s row. One typical use-case for producers is to generate new recommendations for a user featuresbased on the user’s history.
Gatherers provide flexible MapReduce computations that scan over Kiji table rows and output key-value pairs. By using different outputs and reducers, gatherers can export data in a variety of formats (such as text or Avro) or into other Kiji tables.

KijiREST
Provides a REST (Representational State Transfer) interface for Kiji, allowing applications to interact with a Kiji cluster over HTTP by issuing the four standard actions: GET, POST, PUT, and DELETE.

KijiExpress
is a set of tools designed to make defining data processing MapReduce jobs quick and expressive, particularly for data stored in Kiji tables.

Scoring
is the application of a trained model against data to produce an actionable result. This result could be a recommendation, classification, or other derivation.

Anyhow - No more pure Hbase API.Only Kiji!

Saturday, November 16, 2013

Apache Kafka 0.8 beta compiled in scala 2.10

After several changes of scala code and sbt configuration i successfully compiled and tested Kafka 0.8 (scala 2.10)

Feel free to download and use:

https://drive.google.com/file/d/0B83rvqbRt-ksZDlOSzFiQTFpRm8/edit?usp=sharing

Saturday, October 12, 2013

Apache kafka on windows

After downloading apache kafka i failed starting it up on windows with existing scripts.
I made some adjustments for some of scripts and now the broker successfully starts.

Anyone who needs it feel free to do'nload:

https://docs.google.com/file/d/0B83rvqbRt-ksUEVUY1ZNTTRfcVk/edit?usp=sharing

or download updated scripts only:

https://docs.google.com/file/d/0B83rvqbRt-ksR2VPb2JzeHA1U2M/edit?usp=sharing

Just unrar , start zookeeper and server.

Fastest REST service

Targer
Handling 100K tps per core on 64 bit linux (RH) json based service.

After deep investigations

Comparisons of following webservers
undertow with Servlet
tomcat with Servlet
jetty with Servlet

Frameworks
play http://www.playframework.com/
spray.io http://spray.io/

Low level frameworks
Netty
ZeroMQ

Conclusions

1. In order to reach the load we have to release service io thread as soon as possible .
2. Request with single entry vs request with bulk mode.
3. zero calcualtion in io thread.

Thats how i reached the performance torget.
Final architecture:
1. Jetty with servlet based serviec (POST implementation)
2. Bulk mode with 100K per request.
3. Release of request ASAP ( return 200 as soon as possible - then processing )
4. Still synchronous servlet.
5. Jmeter load testing .

Measuring
On server side by definining int[120] and doing System.currentTimeInMillis() / 1000 and incrementing apropriate variable in array :
myArray[System.currentTimeInMillis() / 1000] ++;
then printing once in 2 minutes and zeroing.

Limitation on single core

taskset -c 0 mycommand --option  # start a command with the given affinity
taskset -c 0 -p 1234             # set the affinity of a running process

BTW :
When process was already running taskes -p <PID> didnt work.

Future investigation

1. Asynch servlet.
2. Akka based async service.
3.Netty + RESTEasy framework

Friday, October 4, 2013

REST Webservice on NETTY RESTEasy

I had to implement some simple REST service
Requirments were:
1. low latancy
2. highly scalable
3. robust
4. high throughput
5. simple
6. JSON based parameters passing
Ive started with Tomcat with servlet on top of it and got bad throughput.
Ive tried Jetty but still - throughpit was terreble.
Then i decided to use Netty with some REST back on top of it.
Will do benchmarking and update you soon....

Resteasy
http://www.jboss.org/resteasy

Service :

@Path("/message")
public class RestService {

@Path("/test")
@POST
@Consumes("application/json")
@Produces("application/json")
public Response addOrder(Container a)
{
return Response.status(200).entity("DONE").build();
}

}

@XmlRootElement
public class Container
{

private ArrayList<Parameter> parameters = new ArrayList<>();
public void AddParameter(Parameter p)
{
this.parameters.add(p);
}
@XmlElement
public ArrayList<Parameter> getParameters() {
return parameters;
}
public void setParameters(ArrayList<Parameter> parameters) {
this.parameters = parameters;
}

}

@XmlRootElement
public class Parameter {

private String name;
private int age;
@XmlElement
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
@XmlElement
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
}

public class Server

{

public static void main(String[] args) {

ResteasyDeployment deployment = new ResteasyDeployment();

Map<String, String> mediaTypeMappings = new HashMap<String, String>();

mediaTypeMappings.put("xml", "application/xml");

mediaTypeMappings.put("json", "application/json");

deployment.setMediaTypeMappings(mediaTypeMappings);

NettyJaxrsServer netty = new NettyJaxrsServer();

netty.setDeployment(deployment);

netty.setPort(TestPortProvider.getPort());

netty.setRootResourcePath("");

netty.setSecurityDomain(null);

netty.start();

deployment.getRegistry().addPerRequestResource(RestService.class);

}

Client :

public class ClientMain {

public static void main(String[] args) {

Client client = ClientBuilder.newBuilder().build();

WebTarget target = client.target("http://localhost:8081/message/test");

Container c = new Container();

Parameter param = new Parameter();

param.setAge(11);

param.setName("RAMI");

c.AddParameter(param);

Response response = target.request().post(Entity.entity(c, "application/json"));

String value = response.readEntity(String.class);

System.out.println(value);

response.close();

}

Tuesday, September 24, 2013

Tipical NoSQL Big data solution ( part 1)

Big data components

In flow

This is actually the data that gets into the system. It can be files , any kind of events or web pages.. We dont care .

Distributor

When we recieve our in flow we need to distribute it. The distribution can be based on replication of the data to several destination or distributed according to some data details.

Example: If log record contains word : event - send it to HDFS only.

Examples: Apache flume,Logstash ,Fluentd

Storages - Long term, short term

Then we save the data to storages. We have several types of storages and each one has its pros and cons.

Long term

We need it to hold the whole data and analyze it by batch processing. In most of cases it will be hadoop based HDFS storage and we use Map-Reduce / Hive / Pig jobs run and create some reports.

As you can understand - its heavy and slow process.

Short term

If we need our data to be easuly and fast accessible we will use some high scalable database.We have several types here:

Key - Value Databases - http://en.wikipedia.org/wiki/NoSQL#Document_store

Examples : Redis, Riak, Dynamo, Gemfire

Columnar databases - http://en.wikipedia.org/wiki/Column-oriented_DBMS

Examples: Vertica,MonetDB

Document databases - http://en.wikipedia.org/wiki/Document-oriented_database

Examples : MongoDB, Cassandra,CouchDB

Graph databases -http://en.wikipedia.org/wiki/Graph_database

Examples : Neo4J

Data Model	Performance	Scalability	Flexibility	Complexity	Functionality
Key–value Stores	high	high	high	none	variable (none)
Column Store	high	high	moderate	low	minimal
Document Store	high	variable (high)	high	low	variable (low)
Graph Database	variable	variable	high	high	graph theory
Relational Database	variable	variable	low	moderate	relational algebra.

The data is much faster acessible and much more structurized.

Real time processing

This component in most of cases will be Storm (http://storm-project.net/).It will pull the data ( in our case we use Kafka( http://kafka.apache.org/) and process it based on Short term and fast access data.

Probably it's decision should be sent to some external systems to notify end user.

End User

Will use some stack for visualizing the data.
It also can contain a service for querying data.In most of cases it will be against short term storages.

Next part is comming...

Wednesday, September 11, 2013

Add auto generated field to Solr 4

11. Define new type :

fieldType name="uuid" class="solr.UUIDField" indexed="true" />

22. Add new field

(parameter – default-“NEW” does the trick!)

33. <updateRequestProcessorChain name="uuid">

</processor>

</updateRequestProcessorChain>

44. To the relevant handler add the chain

Example: for /update/extract

<requestHandler name="/update/extract"

startup="lazy"

class="solr.extraction.ExtractingRequestHandler" >

<str name="uprefix">ignored_</str>

<str name="fmap.a">links</str>

<str name="fmap.div">ignored_</str>

</lst>

</requestHandler>

Now u can executer / update/extract without passing filed “rami” and it will be automatically generated.

Thursday, September 5, 2013

Big Data analytics - tools

All the traditional players such as SAS, IBM SPSS, KXEN, Matlab, Statsoft, Tableau, Pentaho, and others are working toward Hadoop-based Big Data analytics. However, each of these software players has to balance their current technology and customer portfolio along with the incredulous pace of innovation occurring in the open-source community. Most of the tools have connectors that are high-speed connectors to move data back and forth between Hadoop and their tool/environment. With Big Data, the objective is to keep the data in place and bring the analytics processing to the data to avoid the bottleneck and constraints associated with data movement. Over time, each vendor will develop a strategy and approach to keep data in place and move their analytics processing to the data.

In the meantime, there are new commercial vendors and open-source projects evolving to address the voracious appetite for Big Data analytics. Karmasphere (https://karmasphere.com/) is a native Hadoop-based tool for data exploration and visualization. Datameer (http://www.datameer.com/) is a spreadsheet-like presentation tool. Alpine Data Miner (http://www.alpinedatalabs.com/) has a cross-platform analytic workbench.

R (http://cran.r-project.org/) is by far the most dominant analytics tool in the Big Data space. R is an open-source statistical language with constructs that make it easy for data scientists to explore and build models. R is also renowned for the plethora of available analytics. There are libraries focused on industry problems (i.e., clinical trials, genetics, finance, and others) as well as general purpose libraries (i.e., econometrics, natural language processing, optimization, time series, and many more). At this point, there are supposedly over two million R users around the globe and a commercial distribution is available via Revolution Analytics.

Open-source technologies include:

Apache Mahout, a scalable, Hadoop machine learning library, http://mahout.apache.org
Apache Lucene, a high-performance text search library, http://lucene.apache.org/core
Sofia ML, a fast machine learning library, http://code.google.com/p/sofia-ml
Vowpal Wabbit, a Yahoo! Research project for fast, parallel-learning algorithms, http://hunch.net/∼vw
Libocas, a library of support vector machine solvers, http://cmp.felk.cvut.cz/∼xfrancv/ocas/html
Apache Hamster, an MPI for Hadoop, https://issues.apache.org/jira/browse/MAPREDUCE-2911
Julia, a high-performance, parallel distribution analytics language for analytics computing, http://julialang.org/

Reference:

Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses

Tuesday, August 27, 2013

Actor Model description

Actor model

1. Thread problem
The traditional way of offering concurrency in a programming language is by using threads. In this model,
the execution of the program is split up into concurrently running tasks. It is as if the program is being
executed multiple times, the difference being that each of these copies operated on shared memory.
This can lead to a series of hard to debug problems, as can be seen below. The first problem, on the left, is
the lost-update problem. Suppose two processes try to increment the value of a shared object acc. They
both retrieve the value of the object, increment the value and store it back into the shared object. As these
operations are not atomic, it is possible that their execution gets interleaved, leading to an incorrectly
updated value of acc, as shown in the example.
The solution to this problems is the use of locks. Locks provide mutual exclusion, meaning that only one
process can acquire the lock at the same time. By using a locking protocol, making sure the right locks
are acquired before using an object, lost-update problems are avoided. However, locks have their own
share of problems. One of them is the deadlock problem, which is pictured on the right. In this example
two processes try to acquire the same two locks A and B. When both do so, but in a different order, a
deadlock occurs. Both wait on the other to release the lock, which will never happen.
These are just some of the problems that might occur when attempting to use threads and locks.

2. Actor model as solution
In the actor model, each object is an actor. This is an entity that has a mailbox and a behaviour. Messages
can be exchanged between actors, which will be buffered in the mailbox. Upon receiving a message, the
behaviour of the actor is executed, upon which the actor can: send a number of messages to other actors,
create a number of actors and assume new behaviour for the next message to be received.
Of importance in this model is that all communications are performed asynchronously. This implies
that the sender does not wait for a message to be received upon sending it, it immediately continues its
execution. There are no guarantees in which order messages will be received by the recipient, but they
will eventually be delivered.
A second important property is that all communications happen by means of messages: there is no shared
state between actors. If an actor wishes to obtain information about the internal state of another actor, it
will have to use messages to request this information. This allows actors to control access to their state,
avoiding problems like the lost-update problem. Manipulation of the internal state also happens through
messages.

Erlang and Scala have built in support for actor model
.
Link
http://savanne.be/articles/concurrency-in-erlang-scala/

Saturday, August 24, 2013

Scalding - WordCount example in local mode

Scala IDE based on eclipse
scalding on scala 2.9

How to run scalding on eclipse

1. install eclipse indigo ( preferable j2ee edition, but add maven plugin -m2e plugin fromupgrade repistory - Help ->Install new software))
2. In Help->Install New software -> add a site http://download.scala-ide.org/sdk/e37/scala29/stable/site
and install scala ide plugin
http://scala-ide.org/download/current.html
We will work with scalding template created byAmit Nithan
http://hokiesuns.blogspot.co.il/2012/07/running-your-scalding-jobs-in-eclipse.html
it already contains needed scalding dependencies
Onec you followed an article and scalding was tested in local mode your next step is to run it on real hadoop cluster.
In order to run Maven's package or other commands from eclipse do:

right-click project
run as
run configurations..
double click maven build (to create a new configuration)
give a name for configuration e.g. package
click variables
select "selected_resource_loc" and click ok
write your goal e.g. "package" or "clean package"
run

The next time when you want to package another project, you can use this configuration again:

right-click project
run as
run configurations..
select your maven configuration
run

ENJOY:)

Thursday, August 22, 2013

Java profilers

1. Yourkit
http://www.yourkit.com/

2. VisualVM
http://visualvm.java.net/

3. JProfiler
http://www.ej-technologies.com/products/jprofiler/overview.html

4.JProbe
http://www.javaperformancetuning.com/tools/jprobe/

Generate sequence diagram
https://code.google.com/p/jtracert/
http://jsonde.sourceforge.net/

NoSQL - types and use cases

In order to explain what NoSQL is and why its needed lets first introduce what was before it came into the big game...and before we knew RDBMS what provided us the main concept - ACID.
Atomicity- transactivity of actions - if we have a transaction that contains actions A,B and C - all those actions should success .If one of them fails - we shoudl rollback the previous one's to initial state.
Consistency - if tranascation A is ececuted and it should do 2 actions : 1. incease balance of account a1 in 200$ and action 2. should decrease account b1 in 100$ onec transaction is done (in success!) both numbers will be updated ( we dont care what happens during transaction execution. consistency is about what happens in the end)
Isolation - transactions that executed in parallel dont impact each other.
Durability - we dont care if the electiricty goes down in the whole area or the machine totally went down - once transaction was done - even if the system goes down - when it gets up again - the transaction result will be updated.

Great!the concept is clear and here main keyplayer come...
ORACLE , SQLServer, MySQL,DB2 etc..
Everything went fine..till the data started growing with huge speed.... then it started beeing clear that setting up 1 single machine is not enough anymore.In addition to that oracle licensing is based on CPU amount on the machine...making simple map shows us that something totally different should come and replace RDBMS....not yet! Let's shot a last bullet ..last nail to the grave of RDBMS...
CAP theorem(Brewer's theorem)
It says that its impossible in destributed computing provide Consistency,Isolation and Partition tolerance.
So lets summarize:
We have ACID (Oracle) and we want to add more machines to reach scalability.Then we go into destributed computing.And then comes CAP theorem and kicks our a**.
So what know?
We want ACID,we buy license but we cant buy it?
So one option is to use open source RDBMS and save money...
But what if i say that you you are still in trouble? Your system based on RDBMS cant be infinitly scalible..
Once you reach to petabyte - you will be soooooo slow that you buisness will crush?
Then NoSQL comes to the game...
And what you have there?
*Open Source
*Greatly scalible
*Not Relational
*Destributed
system that during last 10 years proved itself in different companies.
We have 4 families:

Column: Hbase, Accumulo
Document: MongoDB, Couchbase
Key-value : Dynamo, Riak, Redis, Cache, Project Voldemort
Graph: Neo4J, Allegro, Virtuoso

And each family has its advantages and disadvantages:

Data Model	Performance	Scalability	Flexibility	Complexity	Functionality
Key-Value Stores	high	high	high	none	variable (none)
Column Store	high	high	moderate	low	minimal
Document Store	high	variable (high)	high	low	variable (low)
Graph Database	variable	variable	high	high	graph theory
Relational Database	variable	variable	low	moderate	relational algebra.

So you have different databases and the you defenetly say : so which one to pick?
Thats what i've asked first time i saw it. Sometimes i needed graph and sometimes relational and sometimes key-value...
So the answer is simple : you can hold all of them... and combine them...

Sources;
http://en.wikipedia.org/wiki/NoSQL
http://db-engines.com/en/ranking
http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/

PS: if you look into db-engines link you can see top DB engines rank.Sure!Oracle Number 1. But im almost sure that NoSQL and other technologies will make with oracle the same Linux did to Windows.Sure!ORacle will stay forever...but world around it will change..

Monday, June 24, 2013

Big Data - Better tommorrow...

Going over big data world i tried to stop and figure out that actually big data needs?
What products can guide the future of big data and what are the areas that upcoming startup exits?
So i tried to summarize my findings...

1. Intuitive inrefaces - SQL / Non SQL like...something that will provide us tools for managing and operating it easily... Success of hive/pig comes from that potential area...

2. Easy analytics - tool/ set of tools that companies willbe able easily analyze teh big data and get some value out of it - without bringing 200K data analysts / outsource companies for getting some revenue out of petabytes of stored in haddop data....

3. Single click managment - all the infrstructure of bug data should be easily mantained . By sayign easily - i mean not configuring thousands of configuration paramaters... in my point of view - configuration must be split to 3-5 parameters for user and all the rest should exist hiddeon out of user eyes.. Configuration and management should be user friendly - probably HTML5 GUI0 based services.

4. Real time data - processign of real time will be added and probably will be hidden from user.
By addign various datasources on the visual scheme - things will automaticly connect and be part of analytical engine.

5.Intuitive visualization engine - things are getting very complicated and managing petabyte of data can be messy. But visualizing it - is real chalange.

6. Core AI engine - by connecting all the parts .
Easy Maintanace& Managment
Easy analytics
Intuitive interfaces
Real time data sources
For making all those parts easy and intuitive there will be a need for AI engine that will be able to learn and optimize work of users.

Tuesday, June 18, 2013

Java objects sizes

Cost per element/entry in various well-known Java/Guava data structures

Ever wondered what's the cost of adding each entry to a HashMap? Or one new element in a TreeSet? Here are the answers: the cost per-entry for each well-known structure in Java and Guava. You can use this to estimate the cost of a structure, like this: if the per-entry cost of a structure is 32 bytes, and your structure contains 1024 elements, the structure's footprint will be around 32 kilobytes.

Note that non-tree mutable structures are amortized (adding an element might trigger a resize, and be expensive, otherwise it would be cheap), making the measurement of the "average per element cost" measurement hard, but you can expect that the real answers are close to what is reported below (and for some simple structures I know the correct answer analytically, so I can tune the tested sizes to derive the correct measurement).

There is a blog post about this data as well.

Since the most interesting and analyzed structure here is [Loading]Cache, here is a short cheat-sheet to help you memorize the cost of each feature:

If you use ConcurrentHashMap, you pay 8 words (32 bytes) for each entry.
If you switch to Cache, add 4 words (16 bytes) for each entry
If you add expiration of any kind (after write, or after access, or both), add 4 words (16 bytes) for each entry
If you use maximumSize(), add 4 words (16 bytes) for each entry
If you use weakKeys(), add 4 words (16 bytes) for each entry
If you use weakValues() or softValues(), add 4 words (16 bytes) for each entry

To put this in perspective: For every two features you pick (expiration, maxSize, weakKeys, weak/softValues), you could have bought a whole new ConcurrentHashMap (with the same entries) for the same cost.

Legend:

"Obj": the number of objects
"NonNull": the number of non-null references
"Null": the number of null references
"Scalars": primitives

No compressed oops used for 64-bit vm (if I report it)

==========================================     32-bit architecture    ==========================================
==========================================     Primitive Wrappers     ==========================================

                       java.lang.Boolean :: Bytes:  16.00, Obj:  1.00 NonNull:  0.00 Null:  0.00 scalars: [boolean]
                          java.lang.Byte :: Bytes:  16.00, Obj:  1.00 NonNull:  0.00 Null:  0.00 scalars: [byte]
                         java.lang.Short :: Bytes:  16.00, Obj:  1.00 NonNull:  0.00 Null:  0.00 scalars: [short]
                     java.lang.Character :: Bytes:  16.00, Obj:  1.00 NonNull:  0.00 Null:  0.00 scalars: [char]
                       java.lang.Integer :: Bytes:  16.00, Obj:  1.00 NonNull:  0.00 Null:  0.00 scalars: [int]
                          java.lang.Long :: Bytes:  16.00, Obj:  1.00 NonNull:  0.00 Null:  0.00 scalars: [long]
                         java.lang.Float :: Bytes:  16.00, Obj:  1.00 NonNull:  0.00 Null:  0.00 scalars: [float]
                        java.lang.Double :: Bytes:  16.00, Obj:  1.00 NonNull:  0.00 Null:  0.00 scalars: [double]

==========================================   Basic Lists, Sets, Maps  ==========================================

                               ArrayList :: Bytes:   5.00, Obj:  0.00 NonNull:  1.00 Null:  0.25 scalars: {}
                     Singleton ArrayList :: Bytes:  40.00, Obj:  2.00 NonNull:  2.00 Null:  0.00 scalars: [int x 2]

                              LinkedList :: Bytes:  24.00, Obj:  1.00 NonNull:  3.00 Null:  0.00 scalars: {}
                    Singleton LinkedList :: Bytes:  48.00, Obj:  2.00 NonNull:  3.00 Null:  2.00 scalars: [int x 2]

                           ImmutableList :: Bytes:   4.00, Obj:  0.00 NonNull:  1.00 Null:  0.00 scalars: {}
                 Singleton ImmutableList :: Bytes:  16.00, Obj:  1.00 NonNull:  1.00 Null:  1.00 scalars: []

                                 HashSet :: Bytes:  32.00, Obj:  1.00 NonNull:  3.00 Null:  2.00 scalars: {int=1.0}
                       Singleton HashSet :: Bytes: 168.00, Obj:  5.00 NonNull:  5.00 Null: 19.00 scalars: [int x 4, float]

                          CompactHashSet :: Bytes:  21.00, Obj:  0.00 NonNull:  1.00 Null:  0.25 scalars: {long=1.25, int=1.5}
                Singleton CompactHashSet :: Bytes:  96.00, Obj:  4.00 NonNull:  4.00 Null:  0.00 scalars: [long, int x 4, float]

                    CompactLinkedHashSet :: Bytes:  31.00, Obj:  0.00 NonNull:  1.00 Null:  0.25 scalars: {long=1.25, int=4.0}
          Singleton CompactLinkedHashSet :: Bytes: 144.00, Obj:  6.00 NonNull:  6.00 Null:  0.00 scalars: [long, int x 8, float]

                            ImmutableSet :: Bytes:  12.00, Obj:  0.00 NonNull:  2.00 Null:  1.00 scalars: {}
                  Singleton ImmutableSet :: Bytes:  24.00, Obj:  1.00 NonNull:  1.00 Null:  1.00 scalars: [int]

                           LinkedHashSet :: Bytes:  40.00, Obj:  1.00 NonNull:  5.00 Null:  2.00 scalars: {int=1.0}
                 Singleton LinkedHashSet :: Bytes: 216.00, Obj:  6.00 NonNull: 10.00 Null: 22.00 scalars: [int x 5, float, boolean]

                                 TreeSet :: Bytes:  32.00, Obj:  1.00 NonNull:  4.00 Null:  1.00 scalars: {boolean=1.0}
                       Singleton TreeSet :: Bytes: 104.00, Obj:  4.00 NonNull:  4.00 Null:  9.00 scalars: [int x 2, boolean]

                      ImmutableSortedSet :: Bytes:   4.00, Obj:  0.00 NonNull:  1.00 Null:  0.00 scalars: {}
            Singleton ImmutableSortedSet :: Bytes:  48.00, Obj:  3.00 NonNull:  3.00 Null:  3.00 scalars: []

                                 HashMap :: Bytes:  32.00, Obj:  1.00 NonNull:  3.00 Null:  2.00 scalars: {int=1.0}
                       Singleton HashMap :: Bytes: 144.00, Obj:  3.00 NonNull:  4.00 Null: 19.00 scalars: [int x 4, float]

                            ImmutableMap :: Bytes:  27.00, Obj:  1.00 NonNull:  4.00 Null:  0.38 scalars: {}
                  Singleton ImmutableMap :: Bytes:  40.00, Obj:  1.00 NonNull:  2.00 Null:  5.00 scalars: []

                           LinkedHashMap :: Bytes:  40.00, Obj:  1.00 NonNull:  5.00 Null:  2.00 scalars: {int=1.0}
                 Singleton LinkedHashMap :: Bytes: 192.00, Obj:  4.00 NonNull:  9.00 Null: 22.00 scalars: [int x 5, float, boolean]

                         IdentityHashMap :: Bytes:  16.00, Obj:  0.00 NonNull:  2.00 Null:  2.00 scalars: {}
                  Singleton IdentityHash :: Bytes:  88.00, Obj:  2.00 NonNull:  3.00 Null:  9.00 scalars: [int x 3]

                                 TreeMap :: Bytes:  32.00, Obj:  1.00 NonNull:  4.00 Null:  1.00 scalars: {boolean=1.0}
                  Singleton IdentityHash :: Bytes:  80.00, Obj:  2.00 NonNull:  3.00 Null:  9.00 scalars: [int x 2, boolean]

                      ImmutableSortedMap :: Bytes:   8.00, Obj:  0.00 NonNull:  2.00 Null:  0.00 scalars: {}
            Singleton ImmutableSortedMap :: Bytes: 104.00, Obj:  5.00 NonNull:  6.00 Null:  9.00 scalars: []
          newSetFromMap(IdentityHashMap) :: Bytes:  16.00, Obj:  0.00 NonNull:  2.00 Null:  2.00 scalars: {}

========================================== ConcurrentHashMap/MapMaker/Cache ==========================================

                       ConcurrentHashMap :: Bytes:  32.30, Obj:  1.00 NonNull:  3.00 Null:  2.06 scalars: {int=1.003024193548387, float=7.560483870967742E-4}
                                MapMaker :: Bytes:  32.24, Obj:  1.00 NonNull:  3.00 Null:  2.06 scalars: {int=1.0}
                      MapMaker_Computing :: Bytes:  48.25, Obj:  2.00 NonNull:  4.00 Null:  2.06 scalars: {int=1.0}
                                   Cache :: Bytes:  48.25, Obj:  2.00 NonNull:  4.00 Null:  2.06 scalars: {int=1.0}
                        MapMaker_Expires :: Bytes:  64.25, Obj:  2.00 NonNull:  6.00 Null:  2.06 scalars: {long=1.0, int=1.0}
                           Cache_Expires :: Bytes:  64.25, Obj:  2.00 NonNull:  6.00 Null:  2.06 scalars: {long=1.0, int=1.0}
                        MapMaker_MaxSize :: Bytes:  56.24, Obj:  2.00 NonNull:  6.00 Null:  2.06 scalars: {int=1.0}
                           Cache_MaxSize :: Bytes:  64.25, Obj:  2.00 NonNull:  6.00 Null:  2.06 scalars: {long=1.0, int=1.0}
                MapMaker_Expires_MaxSize :: Bytes:  72.24, Obj:  2.00 NonNull:  8.00 Null:  2.06 scalars: {long=1.0, int=1.0}
                   Cache_Expires_MaxSize :: Bytes:  80.24, Obj:  2.00 NonNull:  8.00 Null:  2.06 scalars: {long=2.0, int=1.0}
                       MapMaker_WeakKeys :: Bytes:  64.24, Obj:  2.00 NonNull:  5.00 Null:  4.06 scalars: {int=1.0}
                          Cache_WeakKeys :: Bytes:  64.24, Obj:  2.00 NonNull:  5.00 Null:  4.06 scalars: {int=1.0}
                     MapMaker_WeakValues :: Bytes:  64.24, Obj:  2.00 NonNull:  6.00 Null:  4.06 scalars: {int=1.0}
                        Cache_WeakValues :: Bytes:  64.25, Obj:  2.00 NonNull:  6.00 Null:  4.06 scalars: {int=1.0}
                 MapMaker_WeakKeysValues :: Bytes:  80.24, Obj:  2.00 NonNull:  7.00 Null:  6.06 scalars: {int=1.0}
                    Cache_WeakKeysValues :: Bytes:  80.25, Obj:  2.00 NonNull:  7.00 Null:  6.06 scalars: {int=1.0}
               MapMaker_MaxSize_WeakKeys :: Bytes:  72.24, Obj:  2.00 NonNull:  7.00 Null:  4.06 scalars: {int=1.0}
                  Cache_MaxSize_WeakKeys :: Bytes:  80.24, Obj:  2.00 NonNull:  7.00 Null:  4.06 scalars: {long=1.0, int=1.0}
             MapMaker_MaxSize_WeakValues :: Bytes:  72.24, Obj:  2.00 NonNull:  8.00 Null:  4.06 scalars: {int=1.0}
                Cache_MaxSize_WeakValues :: Bytes:  80.24, Obj:  2.00 NonNull:  8.00 Null:  4.06 scalars: {long=1.0, int=1.0}
         MapMaker_MaxSize_WeakKeysValues :: Bytes:  88.24, Obj:  2.00 NonNull:  9.00 Null:  6.06 scalars: {int=1.0}
            Cache_MaxSize_WeakKeysValues :: Bytes:  96.24, Obj:  2.00 NonNull:  9.00 Null:  6.06 scalars: {long=1.0, int=1.0}
               MapMaker_Expires_WeakKeys :: Bytes:  80.24, Obj:  2.00 NonNull:  7.00 Null:  4.06 scalars: {long=1.0, int=1.0}
                  Cache_Expires_WeakKeys :: Bytes:  80.24, Obj:  2.00 NonNull:  7.00 Null:  4.06 scalars: {long=1.0, int=1.0}
             MapMaker_Expires_WeakValues :: Bytes:  80.25, Obj:  2.00 NonNull:  8.00 Null:  4.06 scalars: {long=1.0, int=1.0}
                Cache_Expires_WeakValues :: Bytes:  80.24, Obj:  2.00 NonNull:  8.00 Null:  4.06 scalars: {long=1.0, int=1.0}
         MapMaker_Expires_WeakKeysValues :: Bytes:  96.24, Obj:  2.00 NonNull:  9.00 Null:  6.06 scalars: {long=1.0, int=1.0}
            Cache_Expires_WeakKeysValues :: Bytes:  96.24, Obj:  2.00 NonNull:  9.00 Null:  6.06 scalars: {long=1.0, int=1.0}
       MapMaker_Expires_MaxSize_WeakKeys :: Bytes:  88.24, Obj:  2.00 NonNull:  9.00 Null:  4.06 scalars: {long=1.0, int=1.0}
          Cache_Expires_MaxSize_WeakKeys :: Bytes:  96.25, Obj:  2.00 NonNull:  9.00 Null:  4.06 scalars: {long=2.0, int=1.0}
     MapMaker_Expires_MaxSize_WeakValues :: Bytes:  88.24, Obj:  2.00 NonNull: 10.00 Null:  4.06 scalars: {long=1.0, int=1.0}
        Cache_Expires_MaxSize_WeakValues :: Bytes:  96.24, Obj:  2.00 NonNull: 10.00 Null:  4.06 scalars: {long=2.0, int=1.0}
 MapMaker_Expires_MaxSize_WeakKeysValues :: Bytes: 104.24, Obj:  2.00 NonNull: 11.00 Null:  6.06 scalars: {long=1.0, int=1.0}
    Cache_Expires_MaxSize_WeakKeysValues :: Bytes: 112.24, Obj:  2.00 NonNull: 11.00 Null:  6.06 scalars: {long=2.0, int=1.0}

==========================================         Multisets          ==========================================

                      HashMultiset_Worst :: Bytes:  48.24, Obj:  2.00 NonNull:  3.00 Null:  2.06 scalars: {int=2.0}
                LinkedHashMultiset_Worst :: Bytes:  56.24, Obj:  2.00 NonNull:  5.00 Null:  2.06 scalars: {int=2.0}
                      TreeMultiset_Worst :: Bytes:  48.00, Obj:  1.00 NonNull:  4.00 Null:  1.00 scalars: {long=1.0, int=3.0}
            ConcurrentHashMultiset_Worst :: Bytes:  48.30, Obj:  2.00 NonNull:  3.00 Null:  2.06 scalars: {int=2.003024193548387, float=7.560483870967742E-4}
        ImmutableMultisetPopulator_Worst :: Bytes:  26.94, Obj:  1.00 NonNull:  4.00 Null:  0.38 scalars: {}
  ImmutableSortedMultisetPopulator_Worst :: Bytes:  16.02, Obj:  0.00 NonNull:  1.00 Null:  0.00 scalars: {long=1.0, int=1.0005040322580645}

                      HashMultiset_Best  :: Bytes:   0.00, Obj:  0.00 NonNull:  0.00 Null:  0.00 scalars: {}
                LinkedHashMultiset_Best  :: Bytes:   0.00, Obj:  0.00 NonNull:  0.00 Null:  0.00 scalars: {}
                      TreeMultiset_Best  :: Bytes:   0.00, Obj:  0.00 NonNull:  0.00 Null:  0.00 scalars: {}
            ConcurrentHashMultiset_Best  :: Bytes:   0.00, Obj:  0.00 NonNull:  0.00 Null:  0.00 scalars: {}
        ImmutableMultisetPopulator_Best  :: Bytes:   0.00, Obj:  0.00 NonNull:  0.00 Null:  0.00 scalars: {}
  ImmutableSortedMultisetPopulator_Best  :: Bytes:   0.00, Obj:  0.00 NonNull:  0.00 Null:  0.00 scalars: {}

==========================================         Multimaps          ==========================================

                      HashMultimap_Worst :: Bytes: 144.24, Obj:  5.00 NonNull:  8.00 Null:  9.06 scalars: {int=5.0, float=1.0}
                LinkedHashMultimap_Worst :: Bytes: 144.24, Obj:  4.00 NonNull: 17.00 Null:  4.06 scalars: {int=4.0}
                      TreeMultimap_Worst :: Bytes: 128.00, Obj:  4.00 NonNull:  9.00 Null:  9.00 scalars: {int=2.0, boolean=2.0}
                 ArrayListMultimap_Worst :: Bytes:  80.24, Obj:  3.00 NonNull:  5.00 Null:  4.06 scalars: {int=3.0}
                LinkedListMultimap_Worst :: Bytes: 152.73, Obj:  5.00 NonNull: 15.00 Null:  8.18 scalars: {int=4.0}
                 ImmutableMultimap_Worst :: Bytes:  43.02, Obj:  2.00 NonNull:  5.00 Null:  1.39 scalars: {}
             ImmutableListMultimap_Worst :: Bytes:  43.03, Obj:  2.00 NonNull:  5.00 Null:  1.39 scalars: {}
              ImmutableSetMultimap_Worst :: Bytes:  51.00, Obj:  2.00 NonNull:  5.00 Null:  1.39 scalars: {int=1.0}

                      HashMultimap_Best  :: Bytes:  32.24, Obj:  1.00 NonNull:  3.00 Null:  2.06 scalars: {int=1.0}
                LinkedHashMultimap_Best  :: Bytes:  44.12, Obj:  1.00 NonNull:  7.00 Null:  1.03 scalars: {int=1.0}
                      TreeMultimap_Best  :: Bytes:  32.00, Obj:  1.00 NonNull:  4.00 Null:  1.00 scalars: {boolean=1.0}
                 ArrayListMultimap_Best  :: Bytes:   4.07, Obj:  0.00 NonNull:  1.00 Null:  0.02 scalars: {}
                LinkedListMultimap_Best  :: Bytes:  32.00, Obj:  1.00 NonNull:  6.00 Null:  0.00 scalars: {}
                 ImmutableMultimap_Best  :: Bytes:   4.00, Obj:  0.00 NonNull:  1.00 Null:  0.00 scalars: {}
             ImmutableListMultimap_Best  :: Bytes:   4.00, Obj:  0.00 NonNull:  1.00 Null:  0.00 scalars: {}
              ImmutableSetMultimap_Best  :: Bytes:  12.24, Obj:  0.00 NonNull:  2.00 Null:  1.06 scalars: {}

Note: we now use the default static factories for each multimap. Older versions used #create(1, 1) factory where applicable, thus smaller numbers were reported. E.g. now HashMultimap_Worst is 144, with the default create(), but:

with create(x, 1), the cost is ~136 bytes
with create(x, 8), the old default (<= Guava 12) the cost was ~192

==========================================           Tables           ==========================================

                   HashBasedTable_Square :: Bytes:  30.61, Obj:  1.03 NonNull:  3.04 Null:  1.47 scalars: {int=1.0428427419354838, float=0.010710685483870967}
                   ImmutableTable_Square :: Bytes:  41.34, Obj:  1.04 NonNull:  6.10 Null:  1.08 scalars: {int=0.032132056451612906}
                   TreeBasedTable_Square :: Bytes:  32.86, Obj:  1.02 NonNull:  4.04 Null:  1.09 scalars: {int=0.021421370967741934, boolean=1.010710685483871}
                 HashBasedTable_Diagonal :: Bytes: 120.24, Obj:  4.00 NonNull:  7.00 Null:  7.06 scalars: {int=5.0, float=1.0}
                 ImmutableTable_Diagonal :: Bytes: 170.26, Obj:  5.00 NonNull: 17.00 Null: 11.84 scalars: {}
                 TreeBasedTable_Diagonal :: Bytes: 112.00, Obj:  3.00 NonNull:  8.00 Null:  9.00 scalars: {int=2.0, boolean=2.0}
                HashBasedTable_SingleRow :: Bytes:  32.24, Obj:  1.00 NonNull:  3.00 Null:  2.06 scalars: {int=1.0}
                ImmutableTable_SingleRow :: Bytes:  87.26, Obj:  3.00 NonNull: 10.00 Null:  1.45 scalars: {int=2.0}
                TreeBasedTable_SingleRow :: Bytes:  32.00, Obj:  1.00 NonNull:  4.00 Null:  1.00 scalars: {boolean=1.0}
             HashBasedTable_SingleColumn :: Bytes: 120.24, Obj:  4.00 NonNull:  7.00 Null:  7.06 scalars: {int=5.0, float=1.0}
             ImmutableTable_SingleColumn :: Bytes: 103.25, Obj:  4.00 NonNull: 11.00 Null:  1.45 scalars: {int=2.0}
             TreeBasedTable_SingleColumn :: Bytes: 112.00, Obj:  3.00 NonNull:  8.00 Null:  9.00 scalars: {int=2.0, boolean=2.0}

Link:https://code.google.com/p/memory-measurer/wiki/ElementCostInDataStructures