Datacast

Episode 106: Advancing AI Adoption with Dânia Meira

Episode Summary

Dânia Meira has been a senior expert and mathematician in the data field since 2012, with a Data Science career in Berlin startups where her work focused on ML for predictive analytics. She is also an experienced teacher and mentor. Dânia is the Director of #datalift and a Founding Member of the AI Guild.

Episode Notes

Show Notes

(01:32) Dânia shared her upbringing in Brazil and her college experience studying Applied Mathematics at the University of Campinas.
(05:58) Dânia touched on her early career working in marketing intelligence in Brazil.
(10:38) Dânia described her thesis on scalable implementations of the Alternating Least Squares algorithm for Collaborative Filtering recommendation, conducted during her Master's degree in Computer Science from the University of Fluminense.
(16:10) Dânia recalled her hustling phase working and getting a Master's degree simultaneously.
(24:19) Dânia reflected on her move to Berlin to work as a data scientist in several startups.
(31:00) Dânia looked back at her time working at MYTOYS GROUP's Analytics team, responsible for Predictive Analytics and Machine Learning Modeling.
(34:12) Dânia compared doing data science to practicing mixed martial arts.
(38:35) Dânia reflected on her involvement with Data Science for Social Good Berlin as a data ambassador and Data Science Retreat as a SQL Masterclass Teacher.
(43:14) Dânia shared the founding story of AI Guild - the go-to community for data and business professionals advancing AI adoption - where she is a founding member.
(47:36) Dânia gave her thoughts on barriers preventing more women from entering the data field.
(51:21) Dânia discussed the #datalift initiative, which pushes to productionize more data analytics and machine learning solutions.
(58:27) Dânia explained her work supporting the advancement of #datacareer talents and experts.
(01:01:22) Dânia gave her take on the evolution of the data field over the past decade.
(01:03:16) Closing segment.

Dânia's Contact Info

AI Guild's Resources

Mentioned Content

People

Andrew Ng: Founder of deeplearning.ai, co-founder of Coursera
Alessandra Sala: President of Women in AI, Sr. Director of Artificial Intelligence and Data Science at Shutterstock
Joy Buolamwini: Founder and Executive director of The Algorithmic Justice League and maker of the "Coded Bias" documentary, available on Netflix

Book

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Episode Transcription

Key Takeaways

Here are the highlights from my conversation with Dânia:

On Her Upbringing and Education

I am from Brazil and studied at the University of Campinas, in my hometown, where I was born. The city is quite famous for the university, so it was always my dream to study there. It is one of the top math and computer science universities in Brazil and Latin America. I was very privileged to have the opportunity to study there and make many friends for life.

When I started thinking about what I would study, I did not have the idea of becoming a data scientist. When I started college in 2007, there was yet to be anything about data science being a profession. But I always liked math, so I decided to study applied mathematics. The major gave me a theoretical background in math, linear algebra, statistics, the basics of programming, algorithms, etc. I had all the essential skills to become a data scientist, even though I did not know I would do that.

By the time I finished my studies, I was aware that I wanted to go into industry and not continue my academic career. Thus, I looked for an opportunity to apply my knowledge in the industry and found out about the data world.

On Her Early Career In Marketing Intelligence

I started as a marketing intelligence analyst, building models to understand customer behavior. I learned that we needed to connect what we were doing on the statistical and mathematical side to whatever business applications required them. Most of the time, we worked with marketing teams from other companies, mainly in retail. They always wanted to do some targeted promotion for some customer segment or product they were launching. For them, it was not important what kind of models we used. It was more important that we could find the right target customers, the right segment, or those with the highest probability of returning to their stores.

At that moment, I was very much in the phase of being in love with math and algorithms. I wanted to build the best-performing models and get that 1% increase in accuracy (or whatever performance metrics we had). As a result, a lot of my senior colleagues and team leaders (at the moment) pointed out to me that a 1% increase would not change the list of top 100 customers we had. So I learned to put the customer first and work hard until we get what they need, not beyond that need.

The mathematician inside me wanted to see if there was a possibility of getting something better. But as a consultant or a marketing analyst, this was not required. So I had to balance my personal motivation and the company's needs.

On Her Master's Thesis

After working for a few years, I realized maybe I wanted to do a Master's program. My team lead even supported the idea. We worked a lot with the marketing teams and did customer understanding for them. I ended up doing a recommendation project for a client. The idea was: What kinds of products would a customer like to buy for their next purchase? I found out about the topic of recommendation systems, which was big in computer science with a lot of complex algorithms.

This was 2014, the beginning of Spark. Before that, we only had Hadoop as the big data technology, and this was around the time when Spark version 1.0 was launched. So I decided to do a Master's degree during my time after work in order to learn how big data technologies work. I focused on one algorithm for recommendation called Alternating Least Squares (ALS) and compared two different implementations of it - one in Hadoop and one in Spark.

ALS is a complex algorithm to run because it is an iteration. It tries to optimize the alternating least square, which is an optimization technique. It takes turns in the optimization. So if we think about recommendations, we have products and customers. The algorithm tries to alternate between optimizing the problem for the products first, then for the customers, and then goes back and forth during this optimization based on the previous iteration until it gets into this place where you do not optimize more than some delta that you predefined. When you have to do this multiple times, there will be many changes in the matrix that represents this problem. In the end, you have a problem that operates in many matrices, causing it to be computationally expensive.

At the time, Spark was revolutionary. Until then, only Hadoop was doing all those iterations using in-disk memory. For each iteration, you had to wait for it to be written to the disk and read it from the disk. Spark was much quicker because those iterations all happened in the cache. In my thesis, I learned these technologies, implemented the algorithm myself, and had the results to prove that Spark was much faster than Hadoop. Furthermore, I learned how to program in Python, which was a great advantage for working as a data scientist.

On Leveling Up Her Programming Skills

During my undergraduate study, I had one course about algorithmic thinking and one or two mandatory classes about programming with C.

C is a very low-level programming language in which you must define everything. You even have to define the memory allocation you want for each variable. If you program in Python, you just need to say A = 10 and assign the variable A. Whereas in C, you have to allocate the memory size for each integer before defining the integer value. I had a painful experience with programming because it was not fun.
Also, in C, we were using a UI that was not very friendly. Every time you finish typing a line of command, you have to end it with a semi-column. If you do not put the semi-column, it just does not execute and gives you an error.

I started doing a lot of SQL at work, which is much simpler. I started to understand that programming does not always have to be painful. It can be useful in a way that you can avoid repeating a lot of stuff. If you have your code, then next time, you do not have to start from scratch. You can copy some of this stuff from before and make adjustments. Even working as a team, you can take things from GitHub repositories and work from that. You rarely start from scratch. This was a big difference I learned about programming compared to when I was in school.

When I went to do my Master's in Computer Science, I already had in mind the tricks from programming at work. I knew that I could choose a less complex programming language. That is why I chose Python instead of Java or Scala. I also learned about the aspect of doing collaborative work. We had to do group projects - where we split what each person would write the code for each part of the project, and then we would all upload it to GitHub. With that, I could see other people's code and learn from it.

On Moving to Berlin

After finishing my Master's, I knew at some point that I would look for a job in another country. I did not have any preference. My sister spent the summer in Berlin, telling me how great of a city it was, making me curious to learn more. It happened by chance that a recruiter found me on LinkedIn and asked me to do Skype interviews for their startup. They were looking for people with my exact experience - working with data and predictive modeling for customer behavior. They offered me the job and sponsored my Visa. After three months, I got the paperwork and moved to Berlin.

It was, for me, a great experience working in startups. It is a very dynamic environment with people from all over the world. Also, it was the first time I was working in English. I had some difficulties at first adapting to the different culture. Even though Berlin is a very multicultural city, you still need to learn German if you want to live in Germany. I can speak basic German but am not confident working in German.

On Data Science Specialization

I got an offer to work at MYTOYS GROUP's Analytics team. It is a German-established company. Before that, I was working only for startups with small data teams - maybe me as the data scientist, a data engineer, some data analysts, or backend engineers helping with data engineering. Data science is very complex, with different skills involved. When working for startups, I learned a lot as I had the chance to wear all those hats - analytics, dashboards, pipelines, ML models, etc. I built a good foundation for the end-to-end data science work and started to know my strengths.

At this moment, I wanted to continue learning but focus on specific tasks. It would be hard for me to keep working for startups because it would demand needs to do these different tasks. I wanted to focus on ML modeling, so when I had the offer to move to MYTOYS, this was the case. In a bigger company (but not that big), I still had the flexibility to learn different things, but I would have my own predefined role for predictive analytics and ML. I would have by my side data analysts create dashboards and data engineers build data pipelines.

The way I like to think about data science as a discipline is similar to MMA (Mixed Martial Arts).

In data science, you have a lot of different skills that you need to combine to make it work - analytics, science, and engineering. In MMA, you have to learn different techniques from different combat sports. You will not perform well in the competition if you do not have all those skills. In that sense, they can be comparable because they are similar in complexity and the number of different skills.
But the difference is that data science is a team sport. You have all those skills in a team with multiple people complementing each other. In MMA, one fighter needs to dominate all those techniques to perform well. I enjoy watching MMA fights, but I would not be a professional MMA fighter, just a professional data scientist.

On Volunteering and Teaching

I started engaging in activities outside my workplace because I was looking for ways to level up. I decided to volunteer with Data Science for Social Good Berlin. They were working with NGOs in other fields through different data challenges. The work I did with them as a data ambassador is an understanding of the requirements: What are the questions the NGO needs to answer? Do they have the data that would allow data scientists to get that answer? I learned to put the customers first by understanding their needs and asking them about the data. Getting access to their data was not easy, given all the privacy concerns. I had to do the initial data cleaning and normalization for other volunteers to work on the project in a weekend hackathon. In brief, I learned how to understand project feasibility and work with customers directly.

I also started teaching at a data science boot camp. I have been working with SQL since 2012 when I had my first role as a marketing analyst. But in 2015-16, when I started teaching, I really had to understand why I was doing this. I started questioning a lot and understanding the logic of SQL queries better, which helped me execute them more efficiently. I also had the chance to grow my network by getting to know the other teachers and students in the program. These are professionals whom I could ask for support. Networking is one of the better ways to grow your career because you will see other opportunities that you may not have heard of directly.

On The Founding Story of AI Guild

The first thing I noticed is that not many women were working in data science. Could I find other women working in the field? Until then, I had very few female colleagues at work, and I did not enjoy that. So I reached out to other networks to find other women in the field and see if I could learn from them to support more women entering the field. Initially, we had a networking group for women working in data science. We used to get together in a company that one of us worked for and have sessions about what it is like to be a woman in your company.

After that, we broadened this circle and found more people to work towards the same goal. Then we started talking about other challenges in data science, such as we would like to have more impact on businesses. There were many instances where we developed projects that would not be used. So we started sharing about the challenges of model deployment. Furthermore, data people sometimes did not have much support for career growth. That translates into a career path that was not moving forward.

Therefore, we started the AI Guild community for anyone working in data to find solutions for the issues above.

On Driving The #datalift Initiative

It was frustrating for practitioners and companies to invest in data science and not see the results. In many instances, it requires the company to shift from its previous culture to a data culture, understand data as the primary source of information, and leverage that advantage. During my career, when I was hired as one of the first data scientists in a company, there was no data storage or data pipelines. You have to ensure that the data is ingested, treated, normalized, and reliable. Then you can start building analytics and models on top of the data. You also have to monitor models and data to ensure the model is not drifting and the data follows the same distribution as during model training. All of that requires a change in thinking.

Working with data is a different way of working that requires investment in technology and culture. Over the past decade, companies have better understood what they need to do to accommodate data science work. They have to accept the probability aspect of data science. They need to do many iterations in order to get better models.

#datalift provides a forum for practitioners to exchange, on a deep level, their learnings from building real-world machine learning and helping others. We did this via online and in-person events - where practitioners share their challenges and lessons learned in production use cases. AI Guild also offers a consulting business. We have a network of more than 1,400 practitioners who understand challenges in specific verticals, the level of maturity in data and AI adoption, and the possible use cases in production.

On Advancing #datacareer Talents And Experts

#datacareer provides 101 services to practitioners at which level they are in their careers.

For the early-career folks, we support them with resources to transition from academic to industry or from a different career to a data career.
On the other side of the spectrum, for the later-career folks with 5+ years of experience, we have the accreditation board that other experts can validate their expertise and recognize them in specific domains (deep learning, product management for data products, etc.).