I Like It !

Wednesday, September 28, 2011

NoSQL: An Overview of NoSQL Databases

The acronym NoSQL was coined in 1998. Many people think NoSQL is a derogatory
term created to poke at SQL. In reality, the term means Not Only SQL. The idea
is that both technologies can coexist and each has its place. The NoSQL movement
has been in the news in the past few years as many of the Web 2.0 leaders have
adopted a NoSQL technology. Companies like Facebook, Twitter, Digg, Amazon,
LinkedIn and Google all use NoSQL in one way or another. Let's break down NoSQL
so you can explain it to your CIO or even your co-workers.

NoSQL Emerged From a Need

Data Storage: The world's stored digital data is measured in exabytes. An
exabyte is equal to one billion gigabytes (GB) of data. According to
Internet.com,
the amount of stored data added in 2006 was 161 exabytes. Just 4 years later in
2010, the amount of data stored will be almost 1,000 ExaBytes which is an
increase of over 500%. In other words, there is a lot of data being stored in
the world and its just going to continue growing.
Interconnected Data: Data continues to become more connected. The
creation of the web fostered in hyperlinks, blogs have pingbacks and every major
social network system has tags that tie things together. Major systems are built
to be interconnected.
Complex Data Structure: NoSQL can handle hierarchical nested data
structures easily. To accomplish the same thing in SQL, you would need multiple
relational tables with all kinds of keys. In addition, there is a relationship
between performance and data complexity. Performance can degrade in a
traditional
RDBMS
as we store the massive amounts of data required in social networking
applications and the semantic web.

What is NoSQL?

I guess one way to define NoSQL is to consider what its not. It's not SQL and
it's not relational. Like the name suggests, it's not a replacement for a RDBMS
but compliments it. NoSQL is designed for distributed data stores for very large
scale data needs. Think about Facebook with its 500,000,000 users or Twitter
which accumulates Terabits of data every single day.
In a NoSQL database, there is no fixed schema and no joins. A RDBMS "scales up"
by getting faster and faster hardware and adding memory. NoSQL, on the other
hand, can take advantage of "scaling out". Scaling out refers to spreading the
load over many commodity systems. This is the component of NoSQL that makes it
an inexpensive solution for large datasets.

NoSQL Categories

The current NoSQL world fits into 4 basic categories.
  1. Key-values Stores are based primarily on
    Amazon's Dynamo Paper which was written in
    2007. The main idea is the existence of a hash table where there is a unique key
    and a pointer to a particular item of data. These mappings are usually
    accompanied by cache mechanisms to maximize performance.


  2. Column Family Stores were created to store and process very large amounts
    of data distributed over many machines. There are still keys but they point to
    multiple columns. In the case of
    BigTable (Google's Column Family NoSQL model), rows are
    identified by a row key with the data sorted and stored by this key. The columns
    are arranged by column family.


  3. Document Databases were inspired by

    Lotus Notes
    and are similar to key-value stores. The model is basically
    versioned documents that are collections of other key-value collections. The
    semi-structured documents are stored in formats like
    JSON.


  4. Graph Databases are built with nodes, relationships between notes and the
    properties of nodes. Instead of tables of rows and columns and the rigid
    structure of SQL, a flexible graph model is used which can scale across many
    machines.

Major NoSQL Players

The major players in NoSQL have emerged primarily because of the organizations
that have adopted them. Some of the largest NoSQL technologies include:
  • Dynamo:
    Dynamo
    was created by Amazon.com and is the most prominent Key-Value NoSQL
    database. Amazon was in need of a highly scalable distributed platform for their
    e-commerce businesses so they developed Dynamo.

    Amazon S3
    uses Dynamo as the storage mechanism.


  • Cassandra: Cassandra was open sourced by Facebook and is a column
    oriented NoSQL database.


  • BigTable: BigTable is Google's proprietary column
    oriented database. Google allows the use of BigTable but only for the Google App
    Engine.


  • SimpleDB:
    SimpleDB
    is another Amazon database. Used for Amazon EC2 and S3, it is part
    of Amazon Web Services that charges fees depending on usage.


  • CouchDB:
    CouchDB
    along with MongoDB are open source document oriented NoSQL
    databases.


  • Neo4J: Neo4j
    is an open source graph database.

Querying NoSQL

The question of how to query a NoSQL database is what most developers are
interested in. After all, data stored in a huge database doesn't do anyone any
good if you can't retrieve and show it to end users or web services. NoSQL
databases do not provide a high level declarative query language like SQL.
Instead, querying these databases is data-model specific.
Many of the NoSQL platforms allow for RESTful interfaces to the data. Other
offer query APIs. There are a couple of query tools that have been developed
that attempt to query multiple NoSQL databases. These tools typically work
accross a single NoSQL category. One example is
SPARQL.
SPARQL is a declarative query specification designed for graph databases. Here
is an example of a SPARQL query that retrieves the URL of a particular blogger
(courtesy of IBM):
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?url

FROM <bloggers.rdf>

WHERE {

?contributor foaf:name "Jon Foobar" .

?contributor foaf:weblog ?url .

}

Future of NoSQL

Organizations that have massive data storage needs are looking seriously at
NoSQL. Apparently, the concept isn't getting as much traction in smaller
organizations. In a survey conducted by
Information Week, 44% of business IT
professionals haven't heard of NoSQL. Further, only 1% of the respondents
reported that NoSQL is a part of their strategic direction. Clearly, NoSQL has
its place in our connected world but will need to continue to evolve to get the
mass appeal that many think it could have.

Thursday, September 15, 2011

Sun Java System Messenger Express

 

Sun Java(tm) System Messenger Express
 











Sun Microsystems, Inc.