CLOUD+DATA NEXT CONFERENCE

Innovation powered by cloud and data technology

Silicon Valley | Jul 15-16, 2017

Workshop

Title:Cadence: Middleware For Long Running Business Applications
Date:9:00am-12:00pm, 7/15
Speaker:Venkatraghavan Srinivasan, Senior Engineer, Uber

Maxim Fateev, Staff SDE, Uber

Venkat works as a Senior Software Engineer at Uber, where he helped build `Cherami`, a distributed pub-sub messaging platform and more recently has been involved with `Cadence`, a web service for running asynchronous workflows. Prior to Uber, Venkat was with AWS S3, building a distributed load balancer and a HTTPS proxy among other things. His primary area of interests are scalability, distributed systems and networking.

Maxim Fateev spent eight and a half years at Amazon were among other projects he leaded architecture of AWS Simple Workflow Service and storage backend of the Simple Queue Service. At Uber he applied his experience of building large scale distributed systems to Cherami messaging system and Cadence workflow service. Both of them were recently open sourced by Uber. He also worked at Google and Microsoft.Cherami is an open source distributed, scalable, durable, and highly available message queue service built by Uber. It was written in Go language with clients currently available in Go, Java and Python. The talk focuses on its backend architecture including replication, consistency and high availability.
Abstract:Cadence is a combination of managed service and client side programming framework that makes it easier to build scalable and fault tolerant distributed applications involving coordination of asynchronous, long running, loosely coupled tasks / micro-services. With Cadence, application developers focus on the core business logic and the underlying distributed system boiler plate needed for ensuring state persistence, fault tolerance and task routing are managed by the web service and client SDK.

In this codelab, you will learn about the basic concepts of Cadence and then use the Golang SDK to build a distributed application on top of Cadence

*What you'll learn

1) Basic concepts to get started with Cadence

2) Running Cadence server on localhost

3) GO Client SDK experience for writing workflows and activities

a) Workflows, b) Activities, c) Signals d) Timeouts e) Child workflows f) Task routing g) Heartbeating h) Writing unit test for workflows and activities

4) Visibility into workflow executions

5) Debuggability of workflow executions

6) Workflow upgrades and versioning
Level:Intermediate to Advanced
Prerequsite:GO development, setup Docker
Target Audience:Application Developer, Full-stack Engineer, Backend Engineer
Key takeaways:Understanding of basic Cadence concepts, Building business logic using Cadence GO SDK Executing and Trouble shooting applications using Cadence
Title:Tensorflow from beginner to expert
Date:9:00am-12:00pm, 7/15
Speaker:Mark Daoust, Developer Programs Engineer, Google

Marianne Monteiro, Developer Programs Engineer, Google

Mark spent 9 years building embedded ML models for aircraft, and now works at Google as a Developer Programs Engineer for TensorFlow. He has Bachelors of Mechanical Engineering, specialized in control system design

Marianne is an undergraduate student of Computer Science at Universidade Federal de Campina Grande, and currently is an intern at Google working with the TensorFlow Developer Relations team. She's interested in education, cloud computing and deep learning.

Abstract:1) TensorFlow Overview: What is it? How does it work? What can you do with it? How do you get started? This would be an overview for anyone who's curious but doesn't already have a machine-learning background or familiarity with TensorFlow. There would be A couple of "getting started" code examples at the end.

2) High Level TensorFlow. In addition to its powerful low-level-interface TensorFlow includes a set of high level components. In many cases these solid, well understood components are all you need. These eliminate "boilerplate" code and have best-practices built in. They allow developers build standard models with a single line of code, or to easily customize their models.
Level:Beginner to Intermediate
Prerequsite:
Target Audience:Application Developer, Full-stack Engineer, Backend Engineer
Key takeaways:Get started on Tensorflow and best practices
Title:Kubernetes 101
Date:1:30pm-4:30pm, 7/15
Speaker:Sandeep Dinesh, Developer Advocate, Google

Mark Mandel, Developer Advocate, Google
Abstract:1. The basic concepts of Kubernetes, including Pods, Deployments, Replica Sets, Services and ConfigMaps.

2. Best practices around building and deploying your containers that will let you run more stably, efficiently, and securely.
Level:Beginner to Intermediate
Prerequsite:Kubernetes and Docker Experience
Target Audience:Application Developer, Full-stack Engineer, Backend Engineer
Key takeaways:Advanced Kubernetes and Docker best practices that will help them run containers in production better.
Back to Home

Conference

Track:Keynote
Title:Architecting for sustainability: micro-services and the path to cloud native operations
Date:9:00am-9:45am, 7/16
Speaker:Craig McLuckie, CEO, HeptIO
Abstract:Kubernetes and Linux Application Containers are making it easier than ever to build and manage production distributed systems. During this talk we will explore how to architect for sustainability. We will look at how cloud native application frameworks like Kubernetes support not only faster development of applications, but offer a path to sustainability as applications grow, become more complex and connected to additional systems in your production environment.
Track:Keynote
Title:How Airbnb Does Data Science
Date:9:45am-10:30am, 7/16
Speaker:Jeff Feng, Head of Analytics & Experimentation, Airbnb

Jeff Feng leads development for Analytic Products, Data Visualization, Experimentation Platform and Machine Learning Infrastructure as a Product Manager at Airbnb Inc. Prior to Airbnb, Jeff was a Product Manager at Tableau where he led their Big Data product roadmap, a consultant in the high tech practice at McKinsey and a Product Manager at Apple where he helped with the launch of the iPhone 4. He holds an MBA from MIT Sloan and a MS and BS in Electrical Engineering at the University of Illinois at Urbana-Champaign.Airbnb is a trusted community marketplace for people to list, discover, and book unique accommodations and experiences around the world. As a consumer company, data represents the voice of Airbnb's users at scale, and Data Scientists play the role of interpreter. Data Scientists at Airbnb face unique challenges due to the dynamics of our two-sided marketplace, the online and offline components of our platform and the vast surface area of our product.
Abstract:Airbnb is a trusted community marketplace for people to list, discover, and book unique accommodations and experiences around the world. As a consumer company, data represents the voice of Airbnb's users at scale, and Data Scientists play the role of interpreter. Data Scientists at Airbnb face unique challenges due to the dynamics of our two-sided marketplace, the online and offline components of our platform and the vast surface area of our product.

At the Cloud + Data NEXT Conference, I will discuss Airbnb's emphasis on having an experimentation-centric culture and applying machine learning rigorously to address our unique product challenges. Additionally, I will summarize Airbnb's approach in scaling a team from 1 to 100 Data Scientists. This includes our strategy of embedding Data Scientists on every team, investing heavily in data infrastructure and tooling before scaling Data Science, and providing broadly accessible data education to everyone in the company
Track:Keynote
Title:Evaluating gradient boosted decision trees for billions of users
Date:11:00am-11:45am, 7/16
Speaker:Aleksandar Ilic, Staff Software Engineer, Facebook

Aleksandar Ilic (1984) holds a PhD in Computer Science from University of Niš from 2011. He published over seventy papers on combinatorial and spectral graph theory and design of algorithms, and has fifteen patents regarding social networks. From July 2011 he works as a software engineer in Facebook on friends ranking, and for the last three years he is leading the notifications system team with 20 engineers for targeting and optimizing notifications sending via push, SMS and email. Aleksandar won more than 20 medals and diplomas at international competitions in mathematics and informatics for high school and university students, most notably silver medals on both International Mathematical Olympiad (IMO) and International Olympiad of Informatics (IOI). He is also the main organizer and problem proposer for national competitions, IOI and HackerCup.
Abstract:Facebook uses machine learning and ranking models to deliver the best experiences across many different parts of the app. By improving the efficiency of the model evaluation, we can rank more inventory in the same time frame and with the same computing resources. In this talk, we compare different implementations of gradient boosted decision trees (GBDT) and describe multiple improvements in C++ that resulted in more efficient evaluations (ternary expressions with annotations, common and categorical features, and model range evaluation). This approach has been applied successfully to several ranking models at Facebook, including notifications filtering, feed ranking, and suggestions for people and pages to follow, with more than 5x improvements compared to the basic implementation. This is a joint work with Oleksandr Kuvshynov.
Back to Home
Track:Lunch Panel Discussion
Title:Cognitive Cloud and Watson Applications in the Healthcare and Life Sciences Industry
Date:12:00-1:00pm, 7/16
Speaker:Bob Zimmermann, Global Healthcare Cloud Portfolio Lead

Alan Dickman, Hybrid Cloud Technical Lead

Jay Chen, co-founder and CEO of HealthyBee

Robert J Zimmermann, CEBS, Bob is IBM’s Global Healthcare & Life Sciences Cloud Portfolio Leader. In this role Bob visions and evangelizes the role of cloud in healthcare and life sciences both internally and externally to IBM. Bob started his health care career over 30 years ago, with a Midwest BCBS, now Anthem. He has held senior executive positions with health care organizations, and was founder and owner of a care management organization, where he pioneered provider performance analytics and health care consumerism. For the last twenty plus years Bob has been assisting provider and payer organizations on strategic clinical and business driven technology initiatives such as enterprise wide health plan administration systems, EMRs and population health projects with leading Health Care IT vendors including Trizetto, CSC, and EDS/HP. Prior to returning to IBM in his current role, Bob led Deutsche Telekom’s North American healthcare business. Bob holds a Bachelors of Business Administration from the University of Wisconsin - Milwaukee, and earned a Certified Employee Benefits Specialist designation from the Wharton School at University of Pennsylvania. Mobile: 331-201-0385, rjzimmer@us.ibm.com

Alan Dickman is a Solution Architect and Executive Consultant with IBM’s Hybrid Cloud team in North America. He supports the specification, sale, design, and delivery of business and technical solutions to IBM customers. Areas of interest include the application of business process automation, cognitive services, decision management and information analytics to increase business efficiency. In 2010, Mr. Dickman received a Global Technical Award for his contributions to IBM's Advanced Metering Infrastructure business solution offering. In 2012, Alan acted as the technical lead to build a mobile EMR patient care solution based on business process management, mobile, and collaboration technologies.

Jay Chen is the co-founder and CEO of HealthyBee, a B2B marketplace provider with a focus on IoT and health tech innovations for babies and moms. Its service is enabling a new category of offerings for healthcare providers, health insurance providers, employee benefit managers, child care providers, and retail networks. Jay is an advisor to Pudong Hospital, the comprehensive medical center that services Disney Shanghai resort. Previously, Jay was an investment professional at Zhangjiang's corporate investment arm in Silicon Valley, covering TMT and Healthcare companies. Jay was also a strategic program manager supporting GSK’s vaccine online eCommerce and fulfillment operation. Most recently, Jay completed a Health Information Technology leadership program from Harvard School of Public Health.
Abstract:How IBM is helping healthcare and life sciences organizations gain competitive advantages, create new revenue streams, attract new capital investment and fundamentally disrupt and transform through the use of its Cognitive Computing technology, Watson.
Track:Architect for Scalability
Title:Scale Database at Azure - Cosmos DB
Date:1:00pm, 7/16
Speaker:Rimma Nehme, Architect, Microsoft

Rimma is currently an Architect in Azure Cosmos DB and the Open-Source Software Analytics team at Microsoft. Azure Cosmos DB is Microsoft’s globally-distributed, multi-model database service- representing the next big thing in the world of massively scalable cloud databases. Previously, Rimma was a principal engineer with 10 years of experience in both systems building, database management systems, Big Data, query optimization and data storage. At Microsoft, some of her accomplishments include jump-starting Polybase technology and shipping it in both SQL DW, SQL Server PDW and SQL Server 2016, developing the cost-based query optimizer in SQL Server PDW. Rimma holds a Computer Science PhD from Purdue and an MBA from the University of Chicago.
Abstract:Azure Cosmos DB is Microsoft’s globally distributed multi-model database. Azure Cosmos DB was built from the ground up with global distribution and horizontal scale at its core. It offers turnkey global distribution across any number of Azure regions by transparently scaling and replicating your data wherever your users are. You can elastically scale throughput and storage worldwide, and pay only for the throughput and storage you need. Azure Cosmos DB guarantees single-digit-millisecond latencies at the 99th percentile anywhere in the world, offers multiple well-defined consistency models to fine-tune performance, and guarantees high availability with multi-homing capabilities—all backed by industry leading service level agreements (SLAs). Azure Cosmos DB is truly schema-agnostic; it automatically indexes all the data without requiring you to deal with schema and index management. It’s also multi-model, natively supporting document, key-value, graph, and column-family data models. With Azure Cosmos DB, you can access your data using APIs of your choice, as DocumentDB SQL (document), MongoDB (document), Azure Table Storage (key-value), and Gremlin (graph) are all natively supported.

In this talk, I will describe how Azure Cosmos DB works and what makes it unique from other systems out there and how anyone can start leveraging Cosmos DB for their applications.
Track:Microservice / Container
Title:A DevOps State of Mind with Microservices, Containers and Kubernetes
Date:1:00pm, 7/16
Speaker:Chris Van Tuin, Chief Technologist, Redhat

Chris Van Tuin, Chief Technologist, NA West at Red Hat, has over 20 years of experience in IT and Software. Since joining Red Hat in 2005, Chris has been architecting solutions for strategic customers and partners and is a frequent speaker on DevOps, Security, and Containers. He started his career at Intel in IT and Managed Hosting followed by leadership roles in services and sales engineering at Loudcloud and Linux startups. Chris holds a Bachelors of Electrical Engineering from Georgia Institute of Technology.
Abstract: Rapid innovation, changing business landscapes, and new IT demands force businesses to make changes quickly. In the eyes of many, containers are at the brink of becoming a pervasive technology in Enterprise IT to accelerate Microservices delivery. In this presentation, you'll learn about:

• The transformation of IT to a DevOps, Microservices, and Container based Architecture • What are containers and how DevOps practices can operate in a Microservices based environment • How Kubernetes can reduce software delivery cycle times, drive automation, and increase efficiency • How other organizations are using DevOps + Containers with Microservices and how to replicate their success Also, a demonstration of automated container based Microservices builds and pipelines, running Jenkins CI on Kubernetes, and continuous deployments of containerized Microservices with Kubernetes.
Track:SRE / Devops
Title:Holistic Reliability: SRE at LinkedIn
Date:1:00pm, 7/16
Speaker:Todd Palino, Senior Staff Engineer, LinkedIn

Todd Palino is a Senior Staff Site Reliability Engineer at LinkedIn, tasked with keeping Zookeeper, Kafka, and Samza deployments fed and watered. He is responsible for architecture, day-to-day operations, and tools development, including the creation of an advanced monitoring and notification system. Previously, Todd was a Systems Engineer at Verisign, developing service management automation for DNS, networking, and hardware management, as well as managing hardware and software standards across the company. In his spare time, Todd is the developer of the open source project Burrow, a Kafka consumer monitoring tool, and can be found sharing his experience on both SRE and Apache Kafka at industry conferences and tech talks. He is also in the middle of co-authoring Kafka: The Definitive Guide, soon to be available from O’Reilly Media. When that’s not keeping him busy, you’ll find him out on the trails, training for his next marathon.
Abstract: Site Reliability Engineering is a discipline, born on the West Coast but now adopted around the world, that differs widely from one company to another. At LinkedIn, SRE is part of a layered approach to reliability, with many application specialists, as well as generalists who know how it all fits together. This approach lets Software Engineers concentrate on developing their applications, not tools. SREs are the experts who know how to deploy, scale, and run the applications. When the inevitable failure happens, many experts come together to quickly identify and resolve the problem and improve the entire stack for everyone.
Track:Machine Learning/AI
Title:Google's Neural Machine Translation System
Date:1:00pm, 7/16
Speaker:Xiaobing Liu, Staff Engineer, Google Brain

Google Brain Staff software engineer and machine learning researcher. In his work, Xiaobing focuses on Tensorflow and some key applications where Tensorflow could be applied to improve Google products, such as Google Search, Play recommendation and Google translation and Medical Brain. His research interests span from system to the practice of machine learning. His research contributions have been successfully implemented into various commercial products at Tencent, Yahoo. and Google He has served in the program committee for ACL 2017 and session chair for AAAI 2017, including publications at top conference such as Recsys, NIPS, ACL
Abstract:Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. In this talk, I will talk about the model architecture, word-pieces design, training algorithm and how to make training/serving faster. Possibly I will mention about the zero-shot for Multilingual model as well
Track:Data Infrastructure / Analytics
Title:Apache Superset - a modern, open source, enterprise-ready business intelligence web application
Date:1:00pm, 7/16
Speaker: Maxime Beauchemin, Data engineer, Airbnb

Maxime Beauchemin works at Airbnb as part of the "Analytics & Experimentation Products team", developing open source products that reduce friction and help generating insight from data. He is the creator and a lead maintainer of Apache Airflow [incubating] (a workflow engine), Superset (a data visualization platform), and recognized as a thought leader in the data engineering field. Before Airbnb, Maxime worked at Facebook on computation frameworks powering engagement and growth analytics, on clickstream analytics at Yahoo!, and as a data warehouse architect at Ubisoft
Abstract: Airbnb developed Superset to provide all employees with interactive access to data while minimizing friction. Superset provides a quick way to intuitively visualize datasets by allowing users to create and share interactive dashboards; a rich set of visualizations to analyze your data, as well as a flexible way to extend the capabilities; an extensible, high-granularity security model allowing intricate rules on who can access which features and integration with major authentication providers (database, OpenID, LDAP, OAuth, and REMOTE_USER through Flask AppBuilder); a simple semantic layer, allowing you to control how data sources are displayed in the UI by defining which fields should show up in which drop-down and which aggregation and function (metrics) are made available to the user; and deep integration with Druid that allows for Superset to stay blazing fast while working with large, real-time datasets.

Superset's main goal is to make it easy to slice, dice, and visualize data. Maxime Beauchemin explains how Superset empowers each and every employee to perform analytics at the speed of thought.
Back to Home
Track:Architect for Scalability
Title:Scale Messaging System at Uber
Date:1:45pm, 7/16
Speaker:Maxim Fateev, Staff Engineer, Uber

Maxim Fateev spent eight and a half years at Amazon were among other projects he leaded architecture of AWS Simple Workflow Service and storage backend of the Simple Queue Service. At Uber he applied his experience of building large scale distributed systems to Cherami messaging system and Cadence workflow service. Both of them were recently open sourced by Uber. He also worked at Google and Microsoft.Cherami is an open source distributed, scalable, durable, and highly available message queue service built by Uber. It was written in Go language with clients currently available in Go, Java and Python. The talk focuses on its backend architecture including replication, consistency and high availability.
Abstract: Cherami is an open source distributed, scalable, durable, and highly available message queue service built by Uber. It was written in Go language with clients currently available in Go, Java and Python. The talk focuses on its backend architecture including replication, consistency and high availability.
Track:Microservice / Container
Title:Mashing Linux and Windows Apps in Containers on a Hybrid Docker Swarmr
Date:1:45pm, 7/16
Speaker:Elton Stoneman, Engineer, Docker

Abstract: Linux containers run on Linux. Windows containers run on Windows. You can't mix them on a single host, but you can build a cluster of hosts into a single Docker swarm, using a mixture of Windows and Linux servers. That swarm can run both Windows and Linux containers, you deploy and manage them in the same way, and the containers can talk to each other with overlay networking.

This session will show you how to make that happen, but more importantly you'll see why it's a such an important capability - one that will change the way you design, build and deliver software. With a hybrid Docker Swarm you can build a distributed solution where you pick the right technology stack for each component, and leverage high-quality open-source applications to minimize the amount of custom software you need to write and maintain.
Track:SRE / Devops
Title:Day 2 Operations Of Cloud-Native Systems
Date:1:45pm, 7/16
Speaker:Elizabeth K. Joseph, Developer Advocate, Mesosphere

Elizabeth K. Joseph is a Developer Advocate at Mesosphere focused on DC/OS and Apache Mesos. Previously, she worked for a decade as a Linux Systems Administrator, spending the past four years working HPE on the OpenStack Infrastructure team. She is the author of Common OpenStack Deployments (2016) The Official Ubuntu Book, 8th (2014) and 9th (2016) editions
Abstract: Today’s cloud-native systems often have a whole stack of infrastructure built to automatically handle scaling, individual system failures, and more in your infrastructure. In this stack you may have a datacenter of hardware or the need to balance your applications across regions in a public cloud. On top of that, your applications may be running in containers, and using some kind of container management platform like DC/OS or Kubernetes.

For all this automatic tooling gets you in terms of reliability and speed, it also adds complexity that engineers have to understand in order to effectively monitor, debug, and solve problems when something goes wrong. This infrastructure also needs to be maintained and upgraded, and you need to make sure sufficient backups are being made at every level.

This talk will walk you through some of these challenges for these “Day 2 Operations” and include a checklist of things you want to look for when evaluating a platform. Finally, we will discuss general good practices as you seek to monitor, maintain and troubleshoot your cloud-native system..
Track:Machine Learning/AI
Title:Lessons from Integrating Machine Learning Models into Data Products
Date:1:45pm, 7/16
Speaker:Sharath Rao, machine learning lead, Instacart

Sharath Rao is a Data scientist/Engineering Manager at Instacart where he leads the Engineering/ML efforts in recommendation systems, search relevance and personalization. Prior to Instacart, he built systems and algorithms at Yahoo and Intent Media. He has a Masters in Language Technologies from Carnegie Mellon University
Abstract: All machine learning (ML) model prototypes look similar, but every product integration of a model looks different in its own way! In this talk, I will share practical lessons for integrating ML models into product workflows by drawing on our experience with search ranking and recommendation systems at Instacart. Also, as organizations expand the use of ML models, economies of scope can be effectively exploited. As an example, I will speak about a shared ML features pipeline, one of which is now used across multiple data products at Instacart.
Track:Data Infrastructure / Analytics
Title:Unified processing with the Samza High-level API
Date:1:45pm, 7/16
Speaker:Yi Pan, Sr. Staff Engineer, Linkedin

Yi Pan has worked in the distributed platforms for Internet applications for 9 years. He started in Yahoo! on NoSQL database project, leading the development of multiple features, such as real-time notification of database updates, secondary index, and live-migration from legacy systems to NoSQL database. He joined and led the distributed Cloud Messaging System project later, which is used heavily as a pub-sub and transaction logs for distributed databases in Yahoo!. From 2014, he joined LinkedIn and has quickly become the lead of Samza team in LinkedIn and a Committer and PMC Chair in Apache Samza
Abstract: There are more and more applications that need to process both batch and stream data set in a unified programming API with flexible deployment model. The newly released version (0.13) of Apache Samza improves the simplicity and portability of Samza applications. The new high-level API supports common operations like windowing, map and join on streams. Developers can now express application logic concisely in few lines of code and accomplish what previously used to require several jobs. The other exciting Samza 0.13.0 feature is flexible deployment. It empowers developers to deploy and scale Samza applications as a simple embedded library, which is much more flexible than the original YARN deployment model. This talk will cover the new high-level API and flexible deployment as well as batch processing, both in terms of what is available and what is coming in the future.
Back to Home
Track:Architect for Scalability
Title:High Level in Tensorflow
Date:2:30pm, 7/16
Speaker:Mark Daoust, Developer Programs Engineer, Google

Marianne Monteiro, Developer Programs Engineer, Google

Mark spent 9 years building embedded ML models for aircraft, and now works at Google as a Developer Programs Engineer for TensorFlow. He has Bachelors of Mechanical Engineering, specialized in control system design

Marianne is an undergraduate student of Computer Science at Universidade Federal de Campina Grande, and currently is an intern at Google working with the TensorFlow Developer Relations team. She's interested in education, cloud computing and deep learning.

Abstract:In addition to its powerful low-level-interface TensorFlow includes a set of high level components. In many cases these solid, well understood components are all you need. These eliminate "boilerplate" code and have best-practices built in. They allow developers build standard models with a single line of code, or to easily customize their models
Track:Microservice / Container
Title:Building a cloud native architecture for Enterprise Monitoring As A Service
Date:2:30pm, 7/16
Speaker:Suqiang Song, Lead Enterprise Architect, MasterCard

Suqiang Song is a lead enterprise architect in MasterCard, he is working on the next generation technology solutions to enable continued growth and innovation of the MasterCard platforms. He has been using quantitative analysis, data processing, mining and modeling to solve various big data problems at MasterCard
Abstract:Will discuss architecture design principles and technical tips how to build Enterprise Monitoring As A Service with a Cloud Native Big Data Infrastructure.
Track:SRE / Devops
Title:Platform Agnostic and Self Organizing Software Packages
Date:2:30pm, 7/16
Speaker:Nell Shamrell-Harrington, Sr. Software Engineer, Chef

Nell Shamrell-Harrington is a Software Development Engineer at Chef and core maintainer of the Habitat and Supermarket open source projects. She also sits on the advisory board for the University of Washington Certificates in Ruby Programming and DevOps. She specializes in Chef, Ruby, Rails, Rust, Regular Expressions, and Test Driven Development and has traveled the world speaking on these topics. Prior to entering the world of software development, she studied and worked in the field of theatre. The world of theatre prepared her well for the dynamic world of creating software applications. In both, she strives to create a cohesive and extraordinary experience. In her free time she enjoys practicing the martial art Naginata.
Abstract:One of the dreams of development is to build a software package once, then be able to deploy it anywhere. With current Open Source projects this dream is closer than ever. Come to this talk to learn how to create software packages that run (almost) anywhere. You will see how the same application can be run on bare metal, on a VM, or in a container - with everything needed to automate that application already built into the package itself. This even works with a mixed infrastructure - metal for your static compute heavy loads, vms for your persistent data stores, and ephemeral short lived containers for you applications managed by Kubernetes or other container scheduling services. Come to this talk to also learn how to build and deploy these packages with the intelligence to self organize into topologies, no central orchestrator needed. Learn how the dream of platform agnostic and self organizing packages is fulfilled today and where it will evolve in the future.
Track:Machine Learning/AI
Title:Natural Intelligence: the Human Factor in AI
Date:2:30pm, 7/16
Speaker:Jennifer Prendki, Senior Data Science Manager, Walmart Labs

Jennifer Prendki is a Senior Data Science Manager at Walmart eCommerce. She received her PhD in Particle Physics from University UPMC - La Sorbonne in 2009 and has since that worked as a data scientist for many different industries before switching to retail. She enjoys addressing both technical and non-technical audiences at conferences and sharing her knowledge and experience with aspiring data scientists.
Abstract:As Artificial Intelligence is gaining more and more attention from the scientific community, many foresee the future filled with machines capable of performing most of the tasks traditionally handled by human beings, and the most pessimistic essentially envision these machines outperforming us for virtually every function and gradually taking our jobs away. However, while AI is maturing, we humans still have a very central role to play in making AI and machine learning reach their full potential, and manual data curation, data tagging and crowdsourcing are key to the main advances that are still being made in these areas. In this talk, I will cover these different concepts and their role in today’s big data landscape, and cover the human-in-the-loop paradigm.
Track:Data Infrastructure / Analytics
Title:Be A Hero: Transforming GoPro Analytics Data Pipeline
Date:2:30pm, 7/16
Speaker:Chester Chen, Head of Data Science, GoPro

Chester is the Head of Data Science & Engineering at GoPro. Before join GoPro, Chester was the Director of Engineering of Alpine Data Labs, a machine learning startup that provide analytics platform for Fortune 500 companies. He is also the founder and organizer of SF Big Analytics meetup, with 6900+ members. Previously, he holds various positions at Symantec, Ascent Media and many other companies. He has given talks in SF Scala Meetup, Big Data Scala, Hadoop Summit and IEEE Big Data Confere.
Abstract:At GoPro, we have massive amounts of heterogeneous data being generated from our consumer devices and applications. The rapid increase data creates many challenges for GoPro data pipeline. In this talk, we share some of the challenges and lessons learned during our data platform development. In particular, I will discuss the followings

· Old GoPro Data platform architecture vs. new Data Platform architecture. We are transforming from existing data platform to a new platform, I will discuss the challenges associated with old data platform in terms of scalability, operation, cost and performance.

· Dynamic DDL, adding data schema automatically on the fly with Spark Data Frame, Spark Streaming, Kafka, HBase, Hive and S3. This approach allows data scientists quick access to data in tabula format via SQL. I will go to technical details with both streaming and batch implementations via Spark
Back to Home
Track:Architect for Scalability
Title:Scale Video Infrastructure at Instagram
Date:3:20pm, 7/16
Speaker:Ning Zhang, Engineering Manager, Instagram

Abstract:
Track:Microservice / Container
Title:Cloud Native Data Pipelines
Date:3:20pm, 7/16
Speaker:Sid Anand, Data Architect , Agari

Sid Anand is a hands-on software architect with deep experience building and scaling web sites that millions of people visit every day. He currently serves as the Data Architect for Agari, a rising email security company. Prior to joining Agari, Sid held several technical and leadership positions including LinkedIn’s Search Architect, Netflix’s Cloud Data Architect, Etsy’s VP of Engineering, and several technical roles at eBay. He has over 15 years of experience in building websites that millions of people visit every day. Outside of work, Sid co-chairs QCon SF & London, is an active committer/PPMC member on Apache Airflow, and provides advisory services to Start-ups in the area of Big Data. Sid earned his BS and MS degrees in CS from Cornell University, where he focused on Distributed Systems. When not working, Sid enjoys spending time with his lovely wife and 2 kids.
Abstract:Big Data companies (e.g. LinkedIn, Facebook, & Google) have historically built custom data pipelines over bare metal in custom-designed data centers. In order to meet strict requirements on data security, fault-tolerance, cost control, job scalability, network topology, and compute and storage placement, they need to closely manage their core technology. In recent years, companies with Big Data needs have started migrating to the public cloud. How does the public cloud change the game? Specifically, how can companies effectively marry cloud best-practices with big data technology in order to leverage the benefits of both? Agari, a leading email security company, is applying big data best practices to both the security industry and to the cloud in order to secure the world against email-bourne threats. Come to this talk to learn more.
Track:SRE / Devops
Title:Make software better - An discussion of new dev infra in the DevOps era
Date:3:20pm, 7/16
Speaker:Yucong Sun, Co-Founder, Postverta

10+ years experiences in the IT industry, familiar with all major programming languages and has extensive experiences on building fast and reliable distributed systems. CTO experience at startup, I was managing an full-stack agile devops team that focus on quickly and safely delivering software. The author of the Book (SRE: Goolgle 运维解密), an official Chinese translation of the book (Site Reliability Engineering : How Google Runs Production System).
Abstract:The speaker will give an overview of current dev infrastructure landscape. Then discuss how would this change due to the DevOps/SRE movement. Furthermore, the speaker will discuss what kind of future of Dev Infra may look like, review projects that is currently in that direction.
Track:Machine Learning/AI
Title:Deep Learning at Scale on Twitter's Timelines
Date:3:20pm, 7/16
Speaker:Nicolas Koumchatzky, Engineering Manager, Twitter

Anton Andryeyev, Software Engineer Twitter

Nicolas Koumchatzky is the Lead for Twitter Cortex. Cortex is in charge of building Twitter's AI platform and latest deep learning models. He started as a Quant in Paris, then joined Madbits, a startup specialized on using deep learning for content understanding. When Madbits was acquired by Twitter in 2014, he joined as a deep learning expert and led a few projects in Cortex, include a real-time live video classification product for Periscope. In 2016, he focused on building an scalable AI platform for the company. Early 2017, he became the lead for the team.

Anton Andryeyev, joined Twitter in November 2015 to work on timeline ranking. Prior to that, spent almost 10 years working at Google on distributed systems and data processing pipelines to improve and scale Machine Translation (powering Google Translate) as well as other NLP tasks. Interests include large-scale machine learning infrastructure, applied ML within consumer products, and deep learning.
Abstract:The Cortex team at Twitter has made significant investments into scalable and easy-to-use platform for deep learning. This allows various product engineering teams to quickly try out and launch improved prediction and ranking models into production. Specifically, Twitter has recently updated its timelines ranking framework to rely on deep learning models. In this talk we will cover key aspects of Twitter's deep learning platform, the challenges of using deep learning at scale, and go through the concrete example of using the platform for timeline ranking launch.
Track:Data Infrastructure / Analytics
Title:From Three Nines to Five Nines: The Kafka Journey at Netflix
Date:3:20pm, 7/16
Speaker:Allen Wang, Senior Software Engineer, Netflix

Allen Wang is a member in Real Time Data Infrastructure team in Netflix. He has been working with Kafka for more than two years and one of the early adopters to run Kafka at scale in AWS. He is the contributor for Apache Kafka (KIP-36: Rack aware replica assignment) and various cloud platform projects in NetflixOSS.
Abstract:Netflix built its Kafka based data pipeline at the end of 2015 in AWS. Through lessons learned, we have made notable changes to our deployment strategy and fine tuned our broker and client configuration. At the scale of delivering over a trillion messages per day, we improved our producer side message delivery rate from three nines to five nines. The talk will introduce our data pipeline architecture and highlight the challenges of running Kafka in cloud and the steps we have taken since our initial deployment to improve our SLA
Back to Home
Track:Architect for Scalability
Title:Scale Query Processing at Pinterest
Date:4:05pm, 7/16
Speaker:Changshu Liu, Staff Software Engineer, Pinterest

Changshu Liu is a staff software engineer and tech leads Presto/Hive/Workflow teams at Pinterest, he previously worked at Microsoft Research and Facebook on data/search infrastructure
Abstract: Changshu will talk about the challenges and learnings by dealing with 100 PB scale data warehouse on AWS at Pinterest, and share first hands experience on running large scale data infrastructure on cloud based system.
Track:Microservice / Container
Title:Kubernetes Best Practices
Date:4:05pm, 7/16
Speaker:Sandeep Dinesh, Developer Advocate, Google

Abstract:Kubernetes and friends are powerful tools that can really simplify your operations. However, there are many gotchas and common pitfalls that can ruin your experience. I’ll share some best practices around building and deploying your containers that will let you run more stably, efficiently, and securely.
Track:Machine Learning/AI
Title:Machine(/deep) learning on edge devices
Date:4:05pm, 7/16
Speaker:Pradeep Nagaraju, Splunk

Pradeep is a software engineer at Splunk Inc. working on big data search engine and machine learning algorithms. His research initiatives at Splunk include Edge analytics and machine learning for industrial IOT deployments. Prior to this, he was working with Qualcomm Inc. on machine learning algorithms for WiFi which is currently in production on millions of Android powered devices. He holds 5 patents & 4 publications in machine learning, big data, and IOT.
Abstract:Smart IOT edge devices must take real-time actions based on the data collected and analyzed. Currently, most of the machine learning inferences are done on the server side and the actions are passed down to the IOT devices. Because all the data has to be transferred back to the server for analytics and inference, it leads to the high total cost of ownership(TCO). Components adding to TCO includes bandwidth utilization, storage requirements, and server-side computation. Also, since the inference is passed down from the server, it leads to non-real time actions on the edge devices. I will be discussing the new architecture that we built for executing machine learning and analytics on edge devices using Splunk's late binding schema techniques.
Track:Data Infrastructure / Analytics
Title:Predictive Model & Record Descriptions Using Segmented Sensitivity Analysis
Date:4:05pm, 7/16
Speaker:Greg Makowski, Director of Data Science, Ligadata

Greg Makowski has been deploying data mining models since 1992. Early on, he has learned the importance of generalization and good communication with clients. Deploying 90+ data mining models provided experience to develop. the automated data mining internals of 6 SaaS or enterprise applications. He has lead teams of 5 direct reports or larger consulting teams. Vertical experience includes security, fraud detection, financial services, web/mobile behavior, targeted marketing and retail supply chain
Abstract: It can be a competitive advantage to be able to describe an arbitrary predictive model or a black box ensemble of models, both that the overall level, and for each record forecast. The description in the first sprints models leads in strategies for the next sprint. This also helps manage "behavior drift" of the business domain, from market, pricing or competition changes the landscape from that represented by the training data. Drift detection quantifies when it is time to refresh a model system.
Back to Home
Track:Machine Learning/AI
Title:Generative Models for Training with Very Little Data
Date:4:45pm, 7/16
Speaker:Xinghua Lou, Head of Commercialization, Vicarious AI

Xinghua Lou, Ph.D., is Head of Commercialization at Vicarious AI responsible for strategic planning and customer development. Previously Xinghua is a veteran machine learning researcher and practitioner. He has published in many top-tier venues such as NIPS, ICML, CVPR, MIT Press, and has won the best paper award in 2012 Machine Learning for Medical Imaging. Xinghua received his Ph.D. from Universität Heidelberg (Heidelberg, Germany) and M.Sc./B.Sc. from Tsinghua University (Beijing, China). Xinghua's current interest is applications of artificial intelligence in industrial robotics and other areas of industrial automation.
Abstract:In this talk, we will present Vicarious' recent papers on generative shape models and hierarchical feature learning. We will discuss our performance on public text recognition benchmarks and show how to outperform deep learning using 6000 times less amount of training data.
Track:Data Infrastructure / Analytics
Title:Optimizing and Deploying High Performance Spark, Scikit-Learn, and TensorFlow Models in Production with GPUs
Date:4:45pm, 7/16
Speaker:Chris Fregly, Founder/Research Engineer, PipelineIO

Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production." Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco
Abstract:Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool , I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.

This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition (http://community.pipeline.io). All code and Docker images are available at https://github.com/fluxcapacitor/pipeline to reproduce on your own CPU or GPU-based cluster.
Back to Home