Kareem Baba Shaik

Machine Learning Engineer & AI Researcher

Building intelligent systems that solve real-world problems through machine learning, deep learning, and data science.

AI Researcher

About Me

Professional Summary

Motivated and hands-on professional with strong experience in machine learning, data science, and AI engineering through academic research, internships, and independent projects. Skilled in building end-to-end ML pipelines, deploying deep learning models, and performing advanced data analysis across cloud-native and real-time environments.

Adept at applying tools like Python, TensorFlow, PyTorch, and AWS to solve practical problems in healthcare, scientific research, and education. Actively seeking entry-level roles as a Machine Learning Engineer, AI Engineer, or Data Scientist to apply technical expertise and contribute to impactful, data-driven solutions.

Contact Information

Email

KareemShaikBaba@gmail.com

Website

KareemShaik.com

Location

Denton, TX

Professional Experience

AI Researcher

Responsible AI Lab

Mar 2024 – Jan 2025 Denton, TX
  • Published and presented original research at the ICCS 2024 conference in Spain, exploring LLM-driven software analysis. [arXiv Paper]
  • Collaborated with scientists from Argonne and Oak Ridge National Labs to analyze scientific codebases, with a focus on the E3SM climate model.
  • Built LLM-based pipelines using LLaMA, GPT-3.5, MPT, Falcon, and Claude to extract and summarize complex documentation and metadata.
  • Designed Transformer-based systems (BERT, RoBERTa) for entity recognition, citation tracking, and summarization of research articles.
  • Trained sequential models (LSTM, GRU) using up to 8 NVIDIA GPUs, leveraging CUDA programming and NVIDIA APIs for high-performance optimization.
  • Created Power BI dashboards to visualize simulation trends, performance metrics, and model evaluations.
  • Built ETL workflows to handle large volumes of unstructured research and simulation data, enabling smoother experimentation and reproducibility.
  • Guided graduate students on topics like model fine-tuning, data pipelines, and documentation practices within cloud-native workflows using AWS SageMaker, Docker, and CI/CD.

Environment:

Python TensorFlow PyTorch Keras CUDA NVIDIA APIs Hugging Face AWS SageMaker Power BI MLOps

Machine Learning Engineer

Medcords

Dec 2021 – Nov 2022 Hyderabad, IN
  • Designed and implemented a machine learning system to predict cancer outcomes, helping doctors improve diagnosis and treatment planning.
  • Applied models like regression, KNN, and ensemble methods using Python (Scikit-learn, Pandas, NumPy) to improve prediction accuracy and reduce clinical decision lag.
  • Built interactive dashboards and tools using React and Angular, allowing medical staff to visualize patient insights clearly and in real time.
  • Created and integrated RESTful APIs to connect front-end interfaces with backend analytics engines, enabling seamless access to predictive results and raw data.
  • Automated ETL workflows with AWS Glue and Lambda to pull data from multiple sources, store it in S3, and push it into Oracle DB for use in real-time dashboards.
  • Worked with PySpark and Spark SQL on Databricks to process large healthcare datasets; optimized cluster performance for speed and cost efficiency.
  • Led the migration of SQL Server workloads to Azure cloud and connected Azure Machine Learning with MS R Server for cloud-based analysis.
  • Collaborated with statisticians and business analysts, writing complex SQL scripts and building Tableau dashboards to support clinical and business insights.

Environment:

Scikit-learn PySpark AWS Azure Kafka Snowflake Tableau React Angular REST APIs

Founder (Instructor, Bootcamp Trainer)

Bootcamp Startup (learnbaba.in)

Jul 2020 – Oct 2021 Hyderabad, IN
  • Founded and ran a coding bootcamp that operated in-person across Hyderabad, Nalgonda, and Miryalaguda, earning over ₹50,000/month in revenue.
  • Taught more than 1,000+ hours of live, hands-on classes in Python, Java, C/C++, Data Structures & Algorithms (DSA), full-stack web development, Android development, and introductory data science and machine learning with math.
  • Built and launched custom student registration portals, leading to hundreds of student sign-ups from schools and engineering colleges.
  • Organized and led weekend bootcamps and two-day technical workshops, focusing on practical skills and real-world projects.
  • Personally guided students aged 12 to 22 through project-based learning, building confidence over programming fear and problem-solving.
  • Designed a project-based curriculum and supported learners in building industry-level applications to strengthen their resumes.
  • Recruited, trained, and mentored a small team of instructors to ensure quality teaching and student support across multiple batches.

Technical Skills

Programming Languages

Python C++ JavaScript TypeScript SQL

ML & AI Frameworks

PyTorch TensorFlow Keras Scikit-learn Hugging Face Caffe

Deep Learning & NLP

Transformers LSTM GRU Text Classification Sentiment Analysis LLM Fine-tuning

Data Science & Analytics

Predictive Modeling Time-Series Statistical Analysis Document Parsing Data Imputation

Big Data & Cloud

AWS Azure Apache Spark PySpark Kafka Snowflake

MLOps & DevOps

Docker Jenkins Git CI/CD Model Deployment

Education

Master's in Artificial Intelligence

University of North Texas

Jan 2023 – Dec 2024 Denton, TX

Bachelor of Technology (B.Tech) in Computer Science

Malla Reddy Engineering College

Aug 2018 – Jun 2022 Hyderabad, India

Machine Learning Blog

Sharing insights, tutorials, and research in machine learning, deep learning, and AI engineering.

June 15, 2024 10 min read

Fine-tuning LLMs for Scientific Code Analysis

Exploring techniques to adapt large language models for analyzing and summarizing complex scientific codebases like climate models.

Read More →
May 28, 2024 8 min read

Building Predictive Models for Healthcare

Lessons learned from developing machine learning systems for cancer outcome prediction in clinical settings.

Read More →
April 10, 2024 12 min read

MLOps Best Practices for Model Deployment

A practical guide to implementing CI/CD pipelines, monitoring, and versioning for machine learning models in production.

Read More →