EDA and Revenue Prediction

Overview

This project began as a simple Exploratory Data Analysis (EDA) exercise, focusing on basic Python skills and data visualization techniques. As the project evolved, it expanded into preparing datasets, creating predictive models, comparing model performance, and ultimately predicting movie revenue.

Challenges

One significant challenge was the nature of the target variable and the predictors. Many predictors, such as movie ratings, review counts, and popularity, are generated after the movie's release. This introduces a bias when predicting revenue for movies that haven't been released yet, as these predictors wouldn't be available in advance. While using the full dataset provides insights, it wouldn't perform well for pre-release predictions.

Future Aspirations

To create a more accurate model for predicting the revenue of an unreleased movie, additional data would be essential. This includes marketing data, particularly ad expenditure, to measure audience exposure. Incorporating these factors would enhance the model's predictive power for pre-release scenarios.

Learning Outcomes

The main objective of using this dataset was to practice and demonstrate my abilities in data visualization, data preparation, and predictive modeling. Throughout the project, I:

  • Conducted thorough EDA to understand data distributions and relationships.

  • Prepared the dataset for model training and testing.

  • Developed various predictive models and compared their performance.

  • Analyzed model predictions and identified areas for improvement.

Conclusion

This project showcases my capability to handle real-world data, identify challenges, and adapt my approach to improve model performance. It highlights my expertise in Python, data analysis, and predictive modeling, and serves as a testament to my continuous learning and skill development in data science.

Movie Recommendation App

Overview

In this section, I developed a Movie Recommendation App utilizing Natural Language Processing (NLP) techniques. The goal was to create an application that, given a movie title, could recommend similar movies based on their summaries.

Challenges and Solutions

One challenge with this approach was the redundancy of recommendations. For instance, with who knows how many movies in the Fast & Furious franchise, the app tended to recommend other Fast & Furious movies when any one of them was input. To create a more exploratory recommendation system akin to a "you might also like" algorithm, I incorporated small weights for review scores, the number of reviews, and popularity metrics found in the dataset. This adjustment introduced some variety, but required fine-tuning. Adding too much weight to any single metric resulted in recommendations that simply reflected the top movies in that category, which was not ideal.

Future Enhancements

To further enhance this app, connecting it to a real-time data source that tracks what people are watching and correlating that with the inputted movie would provide more dynamic recommendations. Implementing a weight adjustment using a K-Nearest Neighbors (KNN) algorithm could also make the recommendations more interesting. However, using static data means the algorithm remains static and does not learn from new data over time.

Learning Outcomes

This project allowed me to:

  • Apply NLP techniques for text similarity and feature extraction.

  • Experiment with weighting different metrics to improve recommendation diversity.

  • Identify the limitations of static data and the potential benefits of real-time data integration.

Conclusion

The Movie Recommendation App showcases my ability to apply NLP techniques to practical problems and iteratively improve solutions. It highlights my skills in data analysis, algorithm development, and the deployment of applications, including hosting on platforms like Heroku. This project serves as a testament to my capability in building, refining, and deploying dynamic and effective data-driven applications.