Guest Lecture: "Big Data in the Cloud" by Dr. Janette Müller-Lehmann et al. (Jul 06, 2020)
News from Jun 29, 2020
Abstract
SoundCloud is the world's largest open audio platform, built by a community of creators, curators, and listener on the pulse of what's new, now and next in sound culture. Creators can produce, share, and grow their audio catalog of original tracks, DJ sets, podcasts, and mixes. The best and most relevant content around a specific music topic is then grouped, organized, and shared by curators. Listeners can discover, connect with, and support them.
At SoundCloud, we are continuously interested in developing and improving our platform to support our community. For this, it is crucial to understand how our users (creators, curators, listeners) interact with each other, the technical difficulties they are facing, and how they react to new features, we are offering. To enable this understanding, we employ big data and cloud computing. Cloud computing provides an easy and flexible solution to process, store, and then utilize large amounts of data.
In this talk, we provide an overview of how we use the Google Cloud Platform to leverage the power of data at SoundCloud. Firstly, we speak about how we use our data for product and business decision making. Secondly, we describe the underlying infrastructure in which we collect and store our data. Finally, we will explain how our most critical data is processed and modeled in a "Data Corpus" that is easy to access and can be used by everyone in the company.
Biographies
Dr. Janette Müller-Lehmann joined SoundCloud in 2015 where she worked as a data scientist on various projects including listener engagement, advertising, and modeling client event tracking. Since 2020, she is leading the data corpus team, which works on a data warehouse that can be universally used by SoundCloud employees to answer product and business questions. Janette started her career at the University of Potsdam where we studied social networks using Wikipedia. Afterward, she was working for Yahoo Labs and Universitat Pompeu Fabra, where she obtained her Ph.D. in computer science. During this time she focused on the definition and evaluation of novel methods to measure user engagement with and across websites. LinkedIn
Ana Pereira is a Data Scientist with 8 years of experience in various industries. She is currently at Soundcloud since December 2018, working as a part of the Data Corpus team, a team focused on creating a data warehouse that can be universally used by SoundCloud employees to answer product and business questions. With a background in applied mathematics and computation, she started her career at Fidelidade (an insurance company in Portugal), where she went from being a Data Analyst to an Actuary, completing a Masters in Actuarial Sciences at the same time. She then went on to work for trivago, in Düsseldorf, where she spent most of her time as a part of the Automated Bidding team, a team focused on developing an automated bidding algorithm that allowed advertisers to hit their profitability targets. LinkedIn
Isabella Pighi joined SoundCloud in May 2019 after a long tenure at Google and YouTube in different roles and locations encompassing data management and data science. She currently leads the Data Organization. Establishing infrastructure and processes for the implementation of sustainable and innovative data governance has been one of the main goals of her mission at SoundCloud. Isabella studied at the State University of Milan and University College of London where she attained respectively two master's degrees: Computer Science and Human-Computer Interaction. LinkedIn