Working with data and data platforms - For beginners and advanced data analysts
Discover how computers and data platforms can make working with data more effective. Our series offers beginners and advanced data analysts exciting insights - from data acquisition to the final visualization.
Polyteia is well aware of the complexity that the world of data holds and how challenging it can be to get started and to continuously develop your own data skills. With our new series, we offer insights on two levels: For beginners and for experienced data analysts. For beginners, the aim is to provide access to the world of data with practical relevance. Advanced data analysts are given the opportunity to expand their knowledge. The aim is to strengthen and expand data skills in the public sector.
The first article in our new series provides an insight into the different types of data and how they create a data set as a structured collection within a database. In this edition, the first section explains how the use of computers makes working with data and everyday work more effective for beginners. For this section you need no prior knowledge. If you already have experience working with data, you can skip straight to the second section, where you will gain a more detailed insight as how to use data platforms and what processes data goes through inside the data platform.
For beginners: Data collection with data platforms
The public sector works with a large amount of data on a daily basis. Imagine you had to compile a list of all citizens within your local community. The list is to include their name, address, gender, date of birth and death and nationality. Without a digital process this would be a huge amount of work. The digital collection of data not only reduces paper consumption and the corresponding storage in files and cabinets, but also makes it possible to make changes more quickly and search for specific information. In an Excel spreadsheet, for example, you can easily use the filter function to display all female persons in a local community or how many people have moved to your city without having to spend hours going through files.
But even a software such as Excel quickly reaches its limits when it comes to large amounts of data or automatic data updates. That’s what data platforms are for. They are specially designed software solutions or systems that enable the storage, management, processing and analysis of large amounts of data. They collect data from various sources, such as tables in databases, text documents or even audio files, and generate real-time analyses, reports and visualizations from them. The data goes through several steps for this purpose. First, it must be collected and stored in the data platform. This can be done, for example, by connecting directly to databases, manually uploading tables or files and filling out data input masks. Before the data platform analyzes the data, it must be cleansed and merged. This ensures the completeness of the data and a standardized presentation, while incorrect content, such as a randomly appearing space, is removed. Once storage, management and processing are completed, visualizations and the desired reports can be created. The data platform is therefore a software solution that supports all steps from data collection to storage and analysis.
For data analysts: The data value chain
Data platforms are specially developed software solutions or systems that enable the storage, management, processing and analysis of large amounts of data. They serve as a tool for generating real-time analyses, reports and visualizations. Their technological infrastructure supports the entire lifecycle of data - from the collection of information to storage and analysis.
Within a data platform, there are various processes for transforming raw data into insights. The so-called "data value chain" includes the collection, storage, transformation, visualization and use of insights. The data platform ensures that the individual processes run smoothly and are able to work together.
Data platforms enable the integration of different data types and formats from external and internal sources. The data platform is filled either automatically by source systems and interfaces or by filling out traditional forms, often called data input masks. Unstructured data such as text documents, audio files or tables from databases are collected and stored in the digital infrastructure, the data warehouse, where they will be prepared for analysis. The data warehouse is specially designed to organize and prepare the data for analysis. The ETL tool (Extract, Transform, Load) is used to extract data from various sources, transform it and transfer it to the data warehouse. In order to use the data effectively for analysis and visualization, it must be cleansed, filtered, merged and aggregated in advance. During data preparation, invalid characters or data types must be changed, for example. The programming language Structured Query Language (SQL) can then be used to create complex queries to extract, transform and analyze data from different tables. In this way, you can easily calculate the age of every single member of your local community with a single code. Online Analytical Processing (OLAP) technology enables analysis from multidimensional databases. Data is aggregated, filtered and visualized in various ways to identify trends and patterns.
Visualizing data helps you to identify patterns that remain undetected in long tables due to the cognitive limitations of our brain. Even though a differentiation has been made between different types of data, the two categories of dimensions and metrics still need to be considered in visualizations. Metrics are numbers, quantities, percentages or financial amounts. Dimensions, on the other hand, describe values. The age of a person is therefore assigned to the dimension category, even if it is a number. The two categories help you to select and structure diagrams. Data can be displayed as graphics such as bar charts, histograms and scatter charts or in data platforms as dashboards. The X-axis in a bar chart for the current population in the districts usually represents the dimension, while the Y-axis represents the metric. Histograms, on the other hand, basically visualize the distribution of a dimension. Real-time analyses and data visualizations can be used to make decisions for acute problems and to define preventive measures.
Develop your data skills further
If you would like to further develop your data skills, we would like to invite you to take a look at our free Data Academy learning platform. It supports administrative staff of all levels of knowledge in developing and expanding their own data skills. The Data Academy's interactive online courses cover a wide range of topics, including data visualization, data transformation, data platforms, data governance, artificial intelligence and much more. You can register quickly and easily for Polyteia's free Data Academy learning platform via this link.