3 levels of a
Digital Humanities Project
Our team worked with the datasets provided by InsideAirbnb, which is sourced from publicly available information from the Airbnb website. The data on InsideAirbnb has been cleansed and aggregated, providing analysis of the information. InsideAirbnb is the ideal source for our project as their goal is to demonstrate that Airbnb listings disrupt housing and communities. InsideAirbnb derives their data directly off of Airbnb’s website and information pages. In the introduction article by Cairo, we found that extracting data from sources should be done in a manner where the data is all connected through some base root. Cairo emphasized how important it is to have all the sources tie together, and that they have to work synergistically to provide support for each point we make. We decided to analyze the information and data from five major cities in the United States: Asheville, South Carolina, Austin, Texas, Denver, Colorado, Los Angeles, California, and New York City, New York. Because we wanted to provide an accurate representation from different parts in the U.S. we chose to analyze the following locations based on the city’s populated area, its diverse communities, and location in the U.S. The company’s exponential growth has taken the hospitality industry and many cities by surprise, therefore, we were motivated to address the detrimental issues that the Airbnb company causes in some of the areas it participates in. InsideAirbnb provided data in a clean format that we are able to easily use and manipulate according to our goals of demonstrating the negative aspects of Airbnb.
We decided to further our investigation by reviewing academic journals, news articles, and additional research conducted by others. In Chapter 5 of Nathan Yau’s article, it is obvious that every source we drew from had to have extraneous articles to support them as well, since the articles we based our arguments on were mainly single-author sources. We drew from many academic journals and news outlets from reliable databases like JSTOR, Science Direct, Taylor & Francis Online, which we were able to access through the UCLA Library and Google Scholar. These journals and articles provided us a deeper insight into Airbnb that news articles cannot do. Furthermore, Chapter 1 of Nathan Yau’s article made us realize how important it is to have a deeper visualization on our sources, which is why we decided to do extra research to back up our original sources and let it work synergistically. Overall, our research concluded that there are many scholars who agree on the fact that Airbnb is over dominating the market by providing extensive scientific data and research that supports their claims.
After collecting all of the relevant Airbnb datasets from the InsideAirbnb listing site, we used Microsoft Excel features to clean our data from irrelevant information In Chapter 4 of Turabian’s article, data cleaning was emphasized, and how the sources would not be as credible if there was no data cleaning early on. Therefore, we decided that data cleaning was not necessarily an option, but a requirement to make sure that our sources had extra validity, and to make our visualizations easier to understand. Data cleaning was an essential process in gathering and creating our data visualizations because the data sets from InsideAirbnb contains every Airbnb listing from every city which sums to hundred of thousands of homes. While cleaning our data, we also realized that there were many duplicate listings that can skew the data which could lead us to false conclusions as well as information that was not relevant to our research.
Using the listings data provided by Inside Airbnb, we compiled the CSV files using Microsoft Excel and Google Sheets to organize and evaluate our data. From there, we used several different visualization tools to model our data. We used Tableau, Rawgraphs.io, and Excel graphing functions in order to make the bar charts.
The listing information from Inside Airbnb contains detailed descriptions about every home. We collected every description from every home in each one of the cities we investigated (Austin, Asheville, California, Denver, New York City) and ran all of the descriptions through the online word cloud generator and tag cloud generator, WorldCloud. In the article by Presner and Shepard, they explicitly stated how important it is to not leave out any bit of information that is present in the sources. This is why we were so explicit with extracting each data point from the sources, so there would be no human bias in displaying our findings. We are simply displaying the pure data that was in the sources, so there is no discrepancies when the visualizations are created. Prior to it, a lot of cleaning was necessary in order to remove words that would not be relevant. After doing so, WorldCloud organized the information with the most common word displayed with the largest font and the less common words with the smallest font. After generating the WordCloud of every city, we used Canva to place them in an outline of the United States relative to their location.
Figma was used for medium fidelity wireframe designs to help us determine the organization our research would be presented in, as well as providing the CSS framework.
We’ve presented our research in the form of a website made from custom HTML and CSS. After reading A Manual for Writers of Research Papers, Theses, and Dissertations by Kate Turabian, Chapter 1, we were able to recognize that the premise of research is to display our findings in a manner that is easily relatable for all. Whether or not the reader has background knowledge on the topic or not, they should be able to understand it wholeheartedly and not be disenfranchised while reading. The website was built using the online open source repl.it coding environment and hosted over Github pages. We based our data visualization on Chapter 5: Visualizing With Clarity by Nathan Yau since we felt it was the best way to connect with the reader as much as possible, and allow them to comprehend our data since the chapter emphasized relatability. We went with a design that is easy to navigate and for the most part we paired the black text with a white background for readability. The blue shade that we decided to use as our theme throughout the website has Double-A compliance color contrast ratio to the background, to be inclusive of visually impaired audiences. Our embedded visualizations came from Tableau and users are able to hover over certain aspects within the visualizations to see numbers in more detail.