Internship Experience at Lio

Ananya Jain
3 min readNov 28, 2022

Hello everyone! I have been interning as a Data Engineer at Lio for the past couple of months as part of my curriculum for the final year of my Engineering course at Bennett University. I am thrilled to share my learnings and experience working with the company.

About the company:

Lio is a free digital register & mobile excel for small businesses where they can manage their Udhaar Register, register book, khata register, GST register, attendance register, etc. Record all your personal & business data in Lio. It is 100% free, safe and secure. It is like an excel sheet for all types of businesses used as a notebook/register/record book to create expense spreadsheets, customer excels, attendance excels, to-do lists, or daily activities. The Lio app functions perfectly as a to-do-register, debit register book, credit register book, or more. At the same time, it is mainly used by businesses to maintain customer books, order books, payment spreadsheets, stock excel, income registers, and expense register books. Individuals also use Lio to keep a record of personal activities.

Its tagline is: “One App For All Your Data.”

My role at Lio:

The main objective of my internship is to develop and improve the Business Intelligence platform in order to address the analytical needs of the organisation. My work primarily revolves around the following concepts:

  • Understand business processes, identify the correct data sources and validate the quality of the data sources
  • Perform data analysis and develop data ingestion and transformation pipelines
  • Analyse and visualise the business impact of data products
  • Transformation and migration of historical data load to the analytics platform
  • Explore data and perform data analysis to support business requirements
  • Work with engineers to develop a high-quality data product
  • Document analytical methodologies used in the execution of data products

Key Concepts:

  • Data Warehouse Management
  • ETL Support and Maintenance
  • Big Data
  • Data Export pipelines
  • Google Cloud Platform
  • Scheduling Services

Flow Diagrams

Existing ingestion flow for event data
Improvement on existing event dataflow

Dataflow pipeline

Input/Output for the pipeline
A snippet of Dataflow Pipeline

Data Analytics on Mixpanel (Business Intelligence Platform)

Project Timeline:

September~

  • General GCP understanding, Source Code management
  • Development of Data load pipeline from BigQuery to Mixpanel platform (end-to-end pipeline using Jenkins, Compute Engine, Cloud Function, Cloud Storage, and Python client libraries)
  • Historical bulk loads and data validation

October-November~

  • Setting up daily incremental loads and validation processes
  • Create enhancements for two new datasets (Clevertap & Branch)
  • Setup a new pipeline to set/update User properties on Mixpanel
  • Evaluate and setup session ID lookups in Mixpanel using BigQuery tables

December~

  • Enhance the existing event flattening pipeline to include user property calculations and metric calculations
  • ETL support and maintenance, cost optimisation

Tools and Technoloies:

  • Google Cloud Platform and its various components, such as Big Query, Dataflow, App Engine, Cloud Function, and similar Cloud services
  • Python for scripting
  • Jenkins for scheduling dataflow pipelines

Learnings and Experience:

  1. New technologies:
  • Google cloud platform: Cloud function, Bigquery, Pubsub, Cloud Dataflow, etc
  • Scheduling platform: Jenkins

2. Data Engineering:

  • Data validation, Data cleaning, Data modelling, Data analytics

3. Design choices:

  • Evaluating multiple design choices to develop a more efficient and generic solution and writing code that is future proof

It has been a delight to work here at Lio. This internship has enabled me to learn new technologies and grow in the Data Engineering space alongside advancing my analytical and problem-solving skills.

Thank you for investing your time!

--

--