Evaluating Methods Of Data Access
Technology growth has led to new online access points for data, making digital analysis more robust and accessible.
There are 3 primary ways that analysts will access the data that we use:
There are 3 primary ways that analysts will access the data that we use:
- Bulk downloads
- Bulk is a controlled way for data owners to offer data access.
- Data tables are accessed through GUI built by a data owner that directs query parameters.
- Accessible data is typically stripped of sensitive information.
- Download limits, user-based access can be enforced by data owners
- Data access can be simplified, making data more broadly attainable
- Example: Accessing the US Census data through a site called FactFinder/IMDb
- Kaggle is a service that sends a list of databases with a dedicated access link that we can use
- APIs
- APIs provide instant, automatic access to data
- Typically, a URL and API work like the way your browser makes a call
- API operation is machine-to-machine process of data exchange
- Creating an API requires understanding of scripting language, set of commands that meet the data host’s vernacular required to access data
- API keys, access tokens are used to manage access
- Instead of running your own API, you can utilize sites that are really fancy user-friendly markups of APIs à Tweet Binder
- Web scraping
- Webpage scraping grabs HTML formatted data off the Web
- Data are intended for presentation, not necessarily download
- Unlike APIs and bulk download sites, data exchange is not facilitated by data host
- Data can often be governed by copyrights and/or needs for special permissions
- Access to data ranges from simple (highlight, right-click, save as) to complicated (python scripts)
- Not used much anymore because so much data is readily and easily available – through organizations, or other data owners via bulk downloads or APIs