Support Document
0 mins to read
Data Science Package User Guide
Support Document
0 mins to read
Learn how to use the Data Science Package (DSP) to gain insight into fleet maintenance, routes, locations, and transactions using aggregated telematics data. The DSP uses large volumes of aggregated data to reveal patterns and trends and leverage external data to provide better insight into your fleet.
A User Guide for Geotab Customers
June 2020
Deprecated Jan 2024 DO NOT USE
When you purchase the Data Science Package (DSP), you not only gain access to large volumes of aggregated telematics data, but also to your own fleet data from the past year. All data in the Data Science Package is refreshed daily.
Access to GO device-connected datasets such as GPS, accelerometer, trips history, fault and diagnostic codes, is available 24 hours after registration. The DSP can also be used to gain quick insight into fleet maintenance, routes, locations, transactions and other external sources of information like weather and traffic.
Datasets are stored in a ready-made data lake in Google BigQuery, which can be used to combine and analyze to reveal patterns and trends using Standard Structured Query Language (SQL). Datasets are refreshed daily with raw, aggregated data from the Google Cloud, providing opportunities to query data from other lines of business.
✱ NOTE: More information about the DSP can be found here. More information about Google BigQuery can be found here, or see below for other helpful links.
Successful application of the Data Science Package requires a designated individual or team with expertise in data science or data analytics. Proficiency in SQL is also necessary for handling large datasets. If these resources are not available to you, you may use an external data analytics company; however, you do so at your own risk as analysts will have access to your raw telematics data.
! IMPORTANT: Geotab is not responsible for creating or generating your SQL queries in Google BigQuery. It is your responsibility to ensure that adequate expertise is available to perform these tasks.
To effectively analyze your data using the Data Science Package, we recommend that fleets:
If you are an Administrator, you can download the Data Science Package Add-in directly from the Marketplace in MyGeotab. To start using the Add-in, you must Enable the DSP in MyGeotab. Once the Add-in is installed, you as the Administrator, can grant access to DSP to other users in the database.
✱ NOTE: The Data Science Package is a paid solution. You must be an Administrator in MyGeotab order to access the Data Science Package. For more information, contact your Geotab Reseller or Partner Account Manager.
The data available through the DSP is refreshed nightly. The data logged by the device is available in the SDK as soon as the device communicates with the database, while data in the DSP is not available until the day after it is logged (DSP data is updated after midnight, ET time zone).
The DSP contains 15 datasets.
The DSP contains data from 15 different datasets, with more planned for the future (see image below), while SDK data is structured in a more granular fashion. The entire breakdown can be found here.
The DSP makes it easier to combine elements.Less effort is required to match and combine data elements using the DSP than it is using the SDK. For example, using the DSP to match fault data with corresponding GPS location is more straightforward than it is using the SDK.
DSP data is already visually structured.
DSP data is already structured in table format, with options to export directly to formats built-in to Google Big Query (see image below), while the SDK requires coding in the corresponding programming language to export and structure its data.
The DSP is accessed using SQL in Google Big QueryThe DSP utilizes the SQL programming language, while the SDK utilizes programming languages in JavaScript and .Net.
External datasets can be added into the DSP projectDSP users can add external datasets such as sales transitions, weather data, store location, etc. directly to the Google Big Query project. To load external data into the project using the SDK, you can develop a new unsigned Add-in
DSP data is stored on EU serversDSP data is stored entirely on servers in Europe (EU), while SDK data is stored on servers located in the USA, EU or Asia (depending on where the database is stored). More information can be found here.
The DSP makes data visualization much easierThe DSP easily connects telematics data with data visualization tools (both commercial and open source) such as re:Dash and Google Data Studio, while the SDK requires more integration effort.
✱ NOTE: Data visualization tools are licensed separately by their respective vendors.
The DSP provides immediate access to dataThe DSP can deliver one year of historical data, while the SDK can deliver any amount of data as long it is in the MyGeotab database.
The DSP processes data vs. edits the databaseThe DSP is not used to add, edit or delete elements of the database. The DSP is used to model data into meaningful information outside the MyGeotab database. To change database elements you have to either use the MyGeotab portal or use the SDK.
The DSP is not affected by data sizeFor the DSP, there is no limit on data size; however, Geotab will only initiate a purge when it is required to preserve the integrity, reliability and availability of the SaaS. If a purge is performed by Geotab, Geotab will still retain a minimum 365 days of data prior to the purge date, and will provide advance notice to the database owners. If data is to be purged and you wish to retain your data for longer than a year, you are encouraged to retrieve the desired data using one of the API tools provided by Geotab.
The DSP is not affected by the Purge featureSince the purge period includes one year of existing data only (i.e. past data), data purges do not affect your DSP project.
The DSP processes queries fast and efficiently, with no impact to performance Large data requests to the DSP do not affect database performance. Since the DSP resides in a data lake, large amounts of data can be pulled at once. However, large numbers of SDK requests do impact the database and must be optimized to run efficiently, and off-peak hours to minimize pressure on the system.
The DSP does not limit access to data based on Group permissions defined in MyGeotabDSP users have access to all datasets, while SDK users can be restricted from accessing datasets using the Groups and Clearances features in MyGeotab.
The DSP username must be an email address from a Google-hosted accountDSP users must use a valid Google email address for their username, while SDK users can make a request using a non-email username.
DSP users do not have to be MyGeotab usersDSP users do not have to be MyGeotab users in order to use the DSP. A MyGeotab Administrator can add users to the DSP as long as they have a Google-hosted email account. SDK users must have a MyGeotab account..
The DSP is a paid solution There is a per-database fee to use the DSP, while there is no fee for accessing data using the SDK. For more information about DSP pricing, please contact your Geotab Reseller or Partner Account Manager.
DSP | SDK | |
Data Refresh | Nightly | Realtime |
Datasets | 15 datasets | Multiple elements (link) |
Data Combination | Easier to interlink data | Interlinking requires hard-coding |
Data Use | Google BigQuery (SQL) | Programming languages (e.g. JavaScript) |
External Data | Easily added via Google BigQuery | Requires custom Add-in from scratch |
Data Residency | Stored on EU servers | Location dependent |
Data Visualization | Easier and more direct | Requires hard-coding |
Starting data | One year of existing data | Accessible via the database |
Database Performance | Not affected | Can be affected |
Purge | Not affected | Data deleted when purge completes |
Database Editing | Unavailable | Available |
Data access | All or nothing | Based on Groups and/or Clearances |
Username | Google-hosted email (e.g. Gmail) | Any account in database |
DSP | Paid solution | Free |
Q1: Does the database have performance issues?
A: Yes = Use DSP
Q2: Are you more familiar with SQL than programming languages?
A: Yes = Use DSP
Q3: Is your fleet over 1500 vehicles?
A: Yes = Use DSP
Q4: Are there restrictions on which email to use (DSP only uses Google hosted email)?
A: No = Use DSP
Q5: Do you prefer for data visualization?
A: Yes = DSP
Q6: Are you interested in raw data?
A: Yes = DSP
Q7: Do you want to bring in external data easily?
A: Yes = DSP
BigQuery is Google’s fully-managed, enterprise data solution that allows users to store and query massive datasets in seconds. Datasets in BigQuery are accessible through the Web, a command-line tool, or by using the BigQuery REST API.
For more details on how to use Google BigQuery, please see the links below:
Intro to Google BigQuery (GBQ)
Q: Is the Data Science Package a product I need to pay for?
A: Yes. There is a fee per database, per month for each database that has the Data Science Package service enabled.
Q: Do I need to know anything about data science or data analytics to use the Data Science Package?
A: Yes. The Data Science Package is a service specifically tailored to customers that have a Data Analyst or Data Scientist on staff. Customers can also use the package if they have access to a data science resource who they trust to work with their raw telematics data.
Q: What programming languages and skill sets do I need to be able to use the Data Science Package?
A: Since all data is analyzed in Google BigQuery, you must, at the very least, have knowledge of how to use standard Structured Query Language (SQL) in order to query data in the Google BigQuery. Please see Google’s documentation on other supported client libraries.
Q: Where is the data stored in the Google Cloud Platform?
A: All data for the Data Science Package is stored in the EU in the Google Cloud Platform. There is no option at this point in time to select a specific residency for the Data Science Package data.
Q: If I have third party devices, will the Data Science Package still work for me?
A: No, the Data Science Package only supports data from the Geotab GO devices and does not collect any data from third party or OEM devices.
Q: Does the Data Science Package only keep 365 days worth of data or is it cumulative?
A: The data in the Data Science Package is cumulative. Geotab will start users with 365 days of data, then continue to add data as new days pass. Scheduled processes run nightly to migrate daily data to your Data Science Package project.
Q: If I disabled the Data Science Package service, do I still have access to my project?
A: No, as soon as you disable the Data Science Package service you will lose access to the data.
Q: If I have multiple MyGeotab databases, can I set up the Data Science Package to push the data from each database into one Google BigQuery Project?
A: At this time, the Data Science Package is a per-database service; therefore, each MyGeotab database requires a separate project in Google BigQuery. However, if a user has access to multiple projects in Google BigQuery, they can query those projects, and their respective datasets as one project.