Innovation and the Octopus

To create something innovative, one needs to look at things in a new way, going a quantum leap away from the well-trodden paths of the usual. To help stimulate creativity, try examining something unusual, something very different from our familiar world view.

Our way of viewing the world is that of an upright two legged, with two upper limbs and a hand at the end of each, having prehensile fingers.

Our companion animals usually are 4-legged, mostly with more acute sensory abilities than us, but overall, not too unlike ourselves and our view.

Our robotic creations tend to follow those structures and patterns, remaining close to the familiar.
For a very different perspective, consider the octopus and their marvelous abilities; a species millions of years older than our own.

Invertebrate: a large octopus can still squeeze through a 2 inch opening (or even smaller), presenting a challenge to keeping curious octopuses inside man-made containers or tanks. To date our robotic designs are vertebrate-centric; one might consider a soft robot, modeled after the octopus.

While almost all our cognitive neural capacity is centrally located in our brain, the octopus uses a more distributed model, with only about half centrally located, with the rest distributed to its limbs.

A user of tools and toys, one might contemplate what an octopus considers amusing or beautiful. What are the means and media of beauty and harmony for an octopus? How might that intersect with our own aesthetic concepts? Does the geometric pleasure of a Bach fugue resonate for an octopus?

Octopuses exhibit a wide range of behavioral patterns, perhaps falling into categories: are some more extroverted? How can one design a Meyers-Briggs type personality test for an octopus? For robots, how does one differentiate an introvert versus extrovert?

One might not readily associate the octopus with fashion and design, but some creative ideas might arise from contemplation of how a color-blind creature superbly camouflages more quickly than a chameleon. Could a chair be adaptive to the color or pattern of clothes worn by the person adorning it? Or have jewels adapt to nearby attire? (e.g., a modernized, Tiffany-class mood ring).

So much difference from our own way of being, our perceptions, should lead to some inspirations for innovation.

For those curious about the wonders of the octopus, try Sy Montgomery’s “Soul of an Octopus”. It’s a rich, clear and easy read, with the author’s tinge of wonder about the world.

Feeding the Elephant – The Release of Java 7

Java was born of the intelligence and generosity of Sun Microsystems; over the past 15 years, it has evolved, grown in features and put down roots in an incredibly large part of the technology landscape.  The generosity of Sun made Java open source (for the most part), furthering its spread.

New releases of Java occurred about every two years through 2006.  Sun was in trouble as a company, and Java languished, with no new release in 2008 or 2009.

Oracle bought Sun, with all of its assets, including Java.  Oracle already had some presence in the Java environment, due to their earlier acquisition of BEA, noted for its WebLogic app server, and for its high performance JVM, JRockit.

With the release of Java 7, the first since 2006, Oracle appears to be positioning Java for renewed life.  Note that Java 7 SE is still not available for the Mac, nor was a projected date for that mentioned.

There’s only a few noteworthy new features, such as the new I/O library (NIO2); better directory support; symbolic link support; features to take advantage of multi-core processors (Fork Join framework); etc.  It’s more of a new foundation for work to come, such as the merge of JRockit with Sun’s Hot Spot JVM.

Java benefits from a large talent base of developers, highly experienced, with many enthusiastic about the language despite the lack of enhancement in recent years.  At the local JavaSIG meeting, several developers expressed concern about Java over the longer term.  The language for some was not only mature, but starting to feel a bit dated.  It’s not clear to me how much of that is valid and objective, or based in emotional reaction that it is now Oracle in charge of Java’s future.

Oracle has smart product managers, and an energetic marketing arm.  They surely recognize that they need to win the confidence and trust of the development community and those making strategic decisions for technology.  This release of Java is a reasonable first step.  Next one should be a detailed road map, and timely delivery of the items on it.

The strategic question some ask, “Will Java offer the performance and features in 5 years that we need to stay competitive?”  The jury waits on Oracle to prove its commitment over the next two years.

Java for Oracle may wind up being like the story of the person who bought an elephant, and then was faced with the cost of caring for and feeding it.


Databases, Data Streams, and Quantum Mechanics

What does algorithmic trading, medical systems monitoring, web searches and clicks, and fraud detection have in common?  Part of the answer is streams of data.  Large amounts of data, sources continuously streaming it.

Traditional databases (DBMS) are oriented towards processing and storing transactions, representing discrete events.  The view one has is typically a table.  If one extends the transactions over sufficient time, a data warehouse approach might be used, with an eye towards data mining and data analytics, mostly reading (and seldom updating) the stored data for OLAP (online analytic processing) with its classic cube.  In both databases and data warehouses, tools (e.g., SQL and OLAP tools) are in the hands of skilled business analysts and data analysts, not just for programmers.

Classical physics in its approach to light treated it as waves, with different colors having different wave lengths.  Wave behavior allows for interference patterns, and coherence of light, with coherence epitomized by lasers.

Along came Einstein and his contemporaries with fresh insights, revealing that light can also exhibit particle behavior, discrete entities rather than the continuity of waves.  This and far more is explained by quantum physics.

Faster networks in recent years, and faster, larger storage of data, have brought us streaming data.  A stream of data; a continuous wave of it.  New paradigms are needed to fully cope with this, to facilitate new insights.  The tables of relational DBMS and cubes of OLAP only give us a partial view.  Perhaps other geometries or coordinate systems (polar or spherical instead of Cartesian) could provide other insights.

To maximize the effectiveness and value for business, the  new tools for these new insights need to be accessible to the same skilled people, business analysts and data analysts, not just to programmers.

New solutions are evolving, glimmerings of new paradigms; for example, IBM’s BigInsights.  For the curious, look at major vendors, with topics like “data streaming”, and “complex event processing”.

Deep Technology – Data

Deep space refers to a view of space involving great distances, light years: a cosmic perspective.   In recent years, one may hear of deep time: a view of time across eons, a greatly extended perspective.  Along those lines, deep technology examines tech from a similar extended vista; here, the topic is data, placing in perspective, and sowing seeds for ideas to flower.

We are currently going through a time of transition of how we view and use data.  Recent decades have seen the ascendance of relational databases and their associated schemas, currently used for almost all of daily business systems and transactions, with SQL as a standard.  Hardware improvements have allowed relational db’s to grow bigger and perform faster, but some of their limitations have become major friction points as the size of applications has grown, exemplified by social media.  Large once meant gigabytes;  now we deal in terabytes, petabytes and exabytes.

The current requirement for a schema consistent across all records becomes a severe constraint with very large databases, requiring an outage of days to add another column.  For that reason, and other constraints in the case of very large db’s, comes the assertion that relational doesn’t really scale, and that other solutions are needed.

In the beginning there was simply data, not even stored.  Then came stored data, and as it grew, improved means to access it, using keys (key-value stores), and indices.

Index sequential files grew in popularity, used in multiple operating systems, and still provide reliable, high-performing solutions today; scalability is arguable.  In some cases, they became the underpinnings of a database system.  But any access was programmatic, not user friendly.  There was no standard language for queries or updates.  Analytic processing was rarely considered in the design; at best, an add-on.

Database systems rose and evolved, network and hierarchical at first, and then the dawn of the relational era, with SQL becoming a standard for queries, and access available to the non-programmer.

Relational db’s grew larger and faster, and some of the wrinkles in the fabric became more apparent.  Dealing with data  other than basic numeric and text was difficult.  Various types of binary data such as images, audio and video are stored as binary large objects (BLOBs), with no good conventions for access or update, or even provisions for more granular access within the object.  Storage of documents as an object presented similar problems.  The idea of a whole document as simply a field (column) within a record (row) and being able to have useful access does not fit well within conventional relational approaches.  This has given rise to document-oriented db solutions, from early contenders Lotus Notes and its contemporaries, to recent ones as MongoDB and CouchDB.  Mongo, by the way, is part of the technology underlying Twitter.

Possibly some of the solutions of email systems could be called document-oriented db’s, but the potential for exploring enhanced functionality is seldom realized.

The rise of XML and the need to store such information has furthered both document-oriented db’s and hybrid solutions.  One hybrid is to parse the XML into its components, creating a suitable sub-schema in some cases.  Yet this loses the flexibility XML can provide, with not every record having all fields defined.

Even just with numeric fields there have been issues, mostly associated with extended precision, most obvious in the case of decimals (referred to as floating point) and the accuracy of both storage and calculations, with dreaded rounding errors.  This isn’t just for scientific data, but in securities processing, foreign currency, and a surprising number of more mundane cases.

Geo-spatial data usage, with its precise location capability, has greatly grown in the past few years, perhaps due to increased use of GPS.  However, it seldom has well-integrated functionality in the db, and seems to escape SQL standards.  This type of data is used in marketing. Business analysis and many other use cases; perhaps it would be more widely exploited with improved supporting functionality.

There will be other new data types and usages arising in the years to come, and our data solutions need to provide for integrating them, with the means to create new insights from them functionality around them.

The marked increase in the size of data being handled and new solutions to associated problems has given rise to a new movement, NoSQL.  In the past year or so this has morphed from No SQL to Not Only SQL, seeking to leverage the widespread usage and knowledge of SQL, but in a non-relational context.

In addition to the document-oriented solutions discussed earlier, there are numerous others, and not unlike the days before relational became the norm, when proprietary solutions were common.  There is a lack of standards or consistency between products.  One can note the resurgence of key-value stores, such as Big Table (an ill chosen name which has nothing to do with relational tables and does not use SQL). A brief very incomplete list of such solutions and associated technologies :  CouchDB; MongoDb; BigTable; Cassandra; Hadoop (Apache) and MapReduce (Google); Project Voldemoort (underpinning LinkedIn); Riak; Redis, etc.

Looking towards the future, I can envision improved searching, with enhanced functionality, especially for non-alphameric data such as images.  Why not have a query involving an image, or some layered graphics pattern?  Think of a visual version of the SoundsLike function one can use on text data.

Why should we need to do the metadata extraction of something to inquire about it, instead of letting the query engine do that for us?  For example, for a fashion item, I’d like to be able to query based upon the item, not a list we have created of its attributes.  Copy and paste the item into a search box and let the search find something like that. Like the approach of faceted browsing, but implicit within the search.

We need the ability for semantic processing built-in, ontologies either generated or reference ontologies readily linked.  Data exists in a context, with meaning and insights more readily generated by those semantic connections and constructs. As the semantic web and use of metadata becomes more widespread, one can hope that this will create more integration for semantics and metadata in data solutions.

Data can be perceived as having a life cycle: most recent data is of highest value, with fastest access in its life needed, and therefore on the most expensive storage.  Less recent data moves to cheaper storage, perhaps read-only; often this might be in a data warehouse.  Still older data, less accessible, placed on even less costly storage; finally archived.  Current practice has this going through the elaborate process of Extract, Transform, Load (ETL), usually going from a relational db to a data warehouse (DW).  Note that some DW’s use a columnar way of storage, rather than by row, since most queries against the data tend to be column (field) oriented.

Data analytics and OLAP (online analytic processing, noted for its multi-dimensional analysis) is on the rise of late, a growing trend, fueled in part by the acquisition of smaller niche functional providers of this by major players (e.g., IBM, Oracle); as well as by the need for more information, better risk analysis, etc.  Nowadays more priority is given to questions as, What information or insights can be established about a customer, a transaction, or a processing location?

Solutions for this space still tend to be add-on, rather than well integrated.  Microsoft SQL Server and its analytic abilities have been a pleasant exception to the typical relational solution.

One is also cautioned against burdening a highly performing transaction oriented db with the potentially crippling overhead of OLAP, banging against large amounts of the data.  In some cases, the Big Data solutions of Hadoop, MapReduce and Cassandra are used against data stores outside the relational db, going against huge log files and data files outside any db, with data sharding and massively parallel processing (hundreds or thousands of servers), providing answers in astonishingly short turn-around.  But with this, we’re mostly back to solutions dependent upon programming instead of built-in functionality.

The ongoing pattern is to empower the end user, the data analyst, with built-in tools and standard languages.

Why should we need external tools such as Google Refine (originally Freebase Gridworks) for data cleansing? Cannot this be integrated into SQL?

Why not include common output functionality, such as dashboards, into the db systems?

I remain curious as to what new solutions will arise, and the challenges yet to be solved, problems not yet formed.