  1. Dec 2019
    1. I'll give a little bit of the history to provide context. My own involvement in this started around 2008 after we had shipped our key-value store. My next project was to try to get a working Hadoop setup going, and move some of our recommendation processes there. Having little experience in this area, we naturally budgeted a few weeks for getting data in and out, and the rest of our time for implementing fancy prediction algorithms. So began a long slog. We originally planned to just scrape the data out of our existing Oracle data warehouse. The first discovery was that getting data out of Oracle quickly is something of a dark art. Worse, the data warehouse processing was not appropriate for the production batch processing we planned for Hadoop—much of the processing was non-reversable and specific to the reporting being done. We ended up avoiding the data warehouse and going directly to source databases and log files. Finally, we implemented another pipeline to load data into our key-value store for serving results. This mundane data copying ended up being one of the dominate items for the original development. Worse, any time there was a problem in any of the pipelines, the Hadoop system was largely useless—running fancy algorithms on bad data just produces more bad data. Although we had built things in a fairly generic way, each new data source required custom configuration to set up. It also proved to be the source of a huge number of errors and failures. The site features we had implemented on Hadoop became popular and we found ourselves with a long list of interested engineers. Each user had a list of systems they wanted integration with and a long list of new data feeds they wanted. ETL in Ancient Greece. Not much has changed.

      A great anecdote / story on the (pains) of data integration

  2. Feb 2014
    1. Before I knew it, I found myself on a street I had never been to, in a city I had never been to, about to head into a room full of people I had never met before, all united by one simple symbol... 1 A penguin. An hour before, that penguin had seemed so inviting and friendly. It was a symbol that encompassed everything about the movement it represented, a movement that came together in spirit and mind to build a system that drove a new generation of technology and freedom... a movement that celebrated this drive by forming user groups in unknown streets, in unknown cities, and with unknown people.