3rd International Conference

Digital Culture & AudioVisual Challenges

Interdisciplinary Creativity in Arts and Technology

Online, May 28-29, 2021

Hidden Treasures of the Past: Collecting Data on Web Design from Internet Archives

Date and Time: 28/05/2021 (11:45-13:50)

Aristeidis Lamprogeorgos, Minas Pergantis, Andreas Giannakoulopoulos

The last decade has seen a huge increase in the number of websites worldwide, which now exceeds 55 billion [1]. At the same time, internet users are projected to reach 5.3 billion (66% of the world's population) by 2023 [2]. This booming of the internet has created a huge pool of data that can be used by researchers to identify trends and patterns worldwide. But each website contains information beyond its contents. Information such as the language (or languages) in witch its content is written, the country in witch it is registered and parameters that relate to its aesthetics. The color palette used, the images, the typography, the placement and multitude of the various elements and many more constitute valuable data for the researcher, as conclusions drawn from a very large sample have very high probability of being accurate. In addition, online repositories -such as the waybackmachine [3]- have been created and maintain a digital archive of approximately 549 billion websites. More importantly, these archives maintain instances of each one of the websites as they where in various timestamps in the past. Anyone can visit the repository and view the form and content of a website from over a decade ago. These instances may sometimes have some restrictions due to technical limitations, but generally give a very accurate representation of the state in witch the website was during that time period. So the researcher is provided with the unique opportunity to collect data from the past in order to study trends and patterns over time globally.
The fact that websites are increasingly a field of artistic expression makes them an ideal source of information related to aesthetics. Combined with the modern technological tools available to the researcher can provide significant insights. The fact, for example, that the aesthetics of websites was the result of artistic expression of the developers of each era in the past can give us a very good representation of the trends that prevailed at that time. The collection of such a large volume of data (big data) could not, of course, be carried out by humans, but the power of modern computer systems and artificial intelligence enable us to collect and process very large amounts of data that can come from many different sources. The fact that data extracted from websites is in digital form provides a significant advantage to their collection. Their very nature makes them ideal to be collected by computer systems through properly designed algorithms. In the present study we evaluate online repositories that provide website archival information, as well as the various tools and application programming interfaces they offer to the researcher. In addition, we analyse the various types of data that can be extracted through parsing this information using computer algorithms and propose ways in which the researcher can utilize them.

References:
1. https://www.worldwidewebsize.com/
2. https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html
3. https://archive.org/web/

Back

The Special Session
“Reflections: Bridges between Technology and Culture, Physical and Virtual”
is supported by: