As promised here is part 2 of my web scraping story. Up to this point, my experience with web scraping had been with using APIs and wrappers to collect what I want from different websites, however, on this adventure I found that these tools were not available. As I approached the end of my Data Science cohort, I began to work on my capstone project. I settled on creating a hotel recommendation system that would generate recommendations from user-generated text input and compare this against the text descriptions on hotel websites. For my project scraped over 21,000 hotel websites from some of the largest hotel companies in the world. In this article I will describe my experience with one of the companies, using the Chrome web browser and the associated developer tools, and Beautifulsoup with requests. …

It’s just now midnight, and I’m on my third shift back at work in my old field of hospitality. If you read my previous articles you will know that I was laid off from my position as a General Manager of a hotel in March. Long story short, I am now back working as a Night Auditor at another property, for what will appears to be a temporary assignment. I have neglected to write anything regarding my data science journey for almost two months now. This writing is focused on the topic of web-scraping.

During the middle of my Data Science Immersive Bootcamp, we were introduced to web scraping as we prepared to develop a model for a project that would predict the origin of a particular post on Reddit. In these lessons, we covered a brief introduction to HTML and then used Beautiful Soup to perform our scraping. For this particular project, I used the Pushshift API to scrape information from two subreddits. Limitations encountered included only being able to collect 100 postings on each request, which was not enough information to train a model. To address this issue I created a function which included using the ‘time’ package to implement a waiting period of 2 seconds; a while loop which runs a certain number of iterations for the function; and a variable to restart the next iteration at the last post collected. Each time the loop iterated I replaced the value of the variable with the UTC of the last post (which was collected in the scraping). I used this variable to determine where to begin the next iteration of the scraping. Ultimately I was able to collect all of the data for my particular project using the function in the picture below. …

Do I know what I’m doing? Will I ever get this? I won’t make any sense when I try to explain this. Apparently, this is what people call “imposter syndrome.” I had never actually heard the term until the second day of my Data Science cohort. It was brought up in conversation in which the instructor asked, “What do YOU need to be successful?”. Of course, people gave the usual answers of clear expectations, minimal distractions, patience, constructive feedback, and COFFEE. I responded with “willingness to fail, but learn.” But the response that gained the most attention and generated the most discussion was “limiting imposter syndrome.” This response immediately caught on with people chiming in on this, but what is imposter syndrome? To be honest, my immediate thought was, “oh wow, they are on the lookout for people trying to fake their way through this, or fake it til you make it.” …

“Ain’t it fun, Living in the real world? Ain’t it good, Being all alone?” these are the (obviously sarcastic) lyrics from the song “Ain’t It Fun” by Paramore.

March 14, 2020, was the day that the original COVID-19 travel restrictions to and from Europe went into effect for the United States. This same day my wife and I were returning from Greece. We are citizens of the USA, so there was no restriction on our return home. However, the day of travel was somewhat surreal, normally bustling airports in Athens, Dusseldorf, and London were sparse with more employees than travelers. I immediately knew everything was going to change, I thought I had been through this before and that I would know how to navigate the situation. I entered the workforce after college at the beginning of the 2000’s recession, and unfortunately quickly experienced the difficulty of lay-off and job uncertainty. Fast forward to 2019, my wife and I have both been in the hospitality industry for nearly 10 years each. My career began as an hourly front desk agent, and within 4 years I was developing & opening new hotel properties. We moved across country for a new extraordinary opportunity, and just over a year later in 2020, I saw the writing on the wall. It came out of nowhere; overnight travel and business had completely stopped. I began to monitor the future reservations for the next month; over the next week, the hotel I managed reached a peak of losing 1,000 reservations in less than 12 hours (the equivalent of $300,000 revenue every 24 hours).

Chris Johnson

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store