Support Document

0 mins to read

A User Guide for Geotab Customers

June 2020

Deprecated Jan 2024 DO NOT USE

What is the Data Science Package?

When you purchase the Data Science Package (DSP), you not only gain access to large volumes of aggregated telematics data, but also to your own fleet data from the past year. All data in the Data Science Package is refreshed daily.

Access to GO device-connected datasets such as GPS, accelerometer, trips history, fault and diagnostic codes, is available 24 hours after registration. The DSP can also be used to gain quick insight into fleet maintenance, routes, locations, transactions and other external sources of information like weather and traffic.

Datasets are stored in a ready-made data lake in Google BigQuery, which can be used to combine and analyze to reveal patterns and trends using Standard Structured Query Language (SQL). Datasets are refreshed daily with raw, aggregated data from the Google Cloud, providing opportunities to query data from other lines of business.

NOTE: More information about the DSP can be found here. More information about Google BigQuery can be found here, or see below for other helpful links.

Who is eligible for the Data Science Package?

Successful application of the Data Science Package requires a designated individual or team with expertise in data science or data analytics. Proficiency in SQL is also necessary for handling large datasets. If these resources are not available to you, you may use an external data analytics company; however, you do so at your own risk as analysts will have access to your raw telematics data.

! IMPORTANT: Geotab is not responsible for creating or generating your SQL queries in Google BigQuery. It is your responsibility to ensure that adequate expertise is available to perform these tasks.

Recommendations

To effectively analyze your data using the Data Science Package, we recommend that fleets:

  1. Have at least 500 vehicles; and
  2. Have at least one year of vehicle tracking with active Geotab GO devices.

Where can I find the Data Science Package?

If you are an Administrator, you can download the Data Science Package Add-in directly from the Marketplace in MyGeotab. To start using the Add-in, you must Enable the DSP in MyGeotab. Once the Add-in is installed, you as the Administrator, can grant access to DSP to other users in the database.

document Image

NOTE: The Data Science Package is a paid solution. You must be an Administrator in MyGeotab order to access the Data Science Package. For more information, contact your Geotab Reseller or Partner Account Manager.


Download the Add-In

  1. Log in to MyGeotab.
  2. Navigate to Marketplace using the left sidebar.
  3. In Marketplace, search the main menu for the Data Science Package.
  4. Click the Order Now button.
  5. Review the User License and click I Agree.
  6. Proceed below to enable the Add-In.

Enable the Add-In

  1. From the MyGeotab main menu, navigate to Administration > Data Science Package.
  2. Read the terms and conditions and click OK.
  3. On the page that follows, click the Enable button.
  4. The system uploads data from the past year, starting from the previous day, into Google BigQuery.
  5. Fleet data from the previous day is also uploaded for trial purposes upon first login to BigQuery.
  6. The status of the request is displayed in red. It may take several minutes to initialize in BigQuery.

document Image

  1. Once the BigQuery environment is created, a link to the project displays.
  2. Click the link to access your Google BigQuery project, and fleet data from the previous day.
  3. Within 24 hours, data from the previous year will be uploaded, and after that, data from the previous day is uploaded once daily.

document Image

  1. Administrators can now add other users to the Google BigQuery project. Enter the user’s email address in the Add a new user: field and click Add. Only Gmail accounts can be added and used in Google BigQuery.

What’s the Difference between the DSP and the SDK?

  1. The DSP facilitates longer term analysis rather than real-time reporting.

The data available through the DSP is refreshed nightly. The data logged by the device is available in the SDK as soon as the device communicates with the database, while data in the DSP is not available until the day after it is logged (DSP data is updated after midnight, ET time zone).

The DSP contains 15 datasets.

The DSP contains data from 15 different datasets, with more planned for the future (see image below), while SDK data is structured in a more granular fashion. The entire breakdown can be found here.

document Image

The DSP makes it easier to combine elements. Less effort is required to match and combine data elements using the DSP than it is using the SDK. For example, using the DSP to match fault data with corresponding GPS location is more straightforward than it is using the SDK.

DSP data is already visually structured.

DSP data is already structured in table format, with options to export directly to formats built-in to Google Big Query (see image below), while the SDK requires coding in the corresponding programming language to export and structure its data.

document Image

The DSP is accessed using SQL in Google Big Query The DSP utilizes the SQL programming language, while the SDK utilizes programming languages in JavaScript and .Net.

External datasets can be added into the DSP project DSP users can add external datasets such as sales transitions, weather data, store location, etc. directly to the Google Big Query project. To load external data into the project using the SDK, you can develop a new unsigned Add-in

DSP data is stored on EU servers DSP data is stored entirely on servers in Europe (EU), while SDK data is stored on servers located in the USA, EU or Asia (depending on where the database is stored). More information can be found here.

The DSP makes data visualization much easier The DSP easily connects telematics data with data visualization tools (both commercial and open source) such as re:Dash and Google Data Studio, while the SDK requires more integration effort.

NOTE: Data visualization tools are licensed separately by their respective vendors.

The DSP provides immediate access to data The DSP can deliver one year of historical data, while the SDK can deliver any amount of data as long it is in the MyGeotab database.

  1. DSP data comes directly from the Gateway server, so contains no database-specific information.

The DSP processes data vs. edits the database The DSP is not used to add, edit or delete elements of the database. The DSP is used to model data into meaningful information outside the MyGeotab database. To change database elements you have to either use the MyGeotab portal or use the SDK.

The DSP is not affected by data size For the DSP, there is no limit on data size; however, Geotab will only initiate a purge when it is required to preserve the integrity, reliability and availability of the SaaS. If a purge is performed by Geotab, Geotab will still retain a minimum 365 days of data prior to the purge date, and will provide advance notice to the database owners. If data is to be purged and you wish to retain your data for longer than a year, you are encouraged to retrieve the desired data using one of the API tools provided by Geotab.

The DSP is not affected by the Purge feature Since the purge period includes one year of existing data only (i.e. past data), data purges do not affect your DSP project.

The DSP processes queries fast and efficiently, with no impact to performance Large data requests to the DSP do not affect database performance. Since the DSP resides in a data lake, large amounts of data can be pulled at once. However, large numbers of SDK requests do impact the database and must be optimized to run efficiently, and off-peak hours to minimize pressure on the system.

The DSP does not limit access to data based on Group permissions defined in MyGeotab DSP users have access to all datasets, while SDK users can be restricted from accessing datasets using the Groups and Clearances features in MyGeotab.

  1. The DSP relies on Google Cloud Infrastructure.

The DSP username must be an email address from a Google-hosted account DSP users must use a valid Google email address for their username, while SDK users can make a request using a non-email username.

DSP users do not have to be MyGeotab users DSP users do not have to be MyGeotab users in order to use the DSP. A MyGeotab Administrator can add users to the DSP as long as they have a Google-hosted email account. SDK users must have a MyGeotab account..

The DSP is a paid solution There is a per-database fee to use the DSP, while there is no fee for accessing data using the SDK. For more information about DSP pricing, please contact your Geotab Reseller or Partner Account Manager.

DSP/SDK Comparison Matrix

DSP

SDK

Data Refresh

Nightly

Realtime

Datasets

15 datasets

Multiple elements (link)

Data Combination

Easier to interlink data

Interlinking requires hard-coding

Data Use

Google BigQuery (SQL)

Programming languages (e.g. JavaScript)

External Data

Easily added via Google BigQuery

Requires custom Add-in from scratch

Data Residency

Stored on EU servers

Location dependent

Data Visualization

Easier and more direct

Requires hard-coding

Starting data

One year of existing data

Accessible via the database

Database Performance

Not affected

Can be affected

Purge

Not affected

Data deleted when purge completes

Database Editing

Unavailable

Available

Data access

All or nothing

Based on Groups and/or Clearances

Username

Google-hosted email (e.g. Gmail)

Any account in database

DSP

Paid solution

Free

DSP Decision Tree

Q1: Does the database have performance issues?

A: Yes = Use DSP

Q2: Are you more familiar with SQL than programming languages?

A: Yes = Use DSP

Q3: Is your fleet over 1500 vehicles?

A: Yes = Use DSP

Q4: Are there restrictions on which email to use (DSP only uses Google hosted email)?

A: No = Use DSP

Q5: Do you prefer for data visualization?

A: Yes = DSP

Q6: Are you interested in raw data?

A: Yes = DSP

Q7: Do you want to bring in external data easily?

A: Yes = DSP

About Google BigQuery

BigQuery is Google’s fully-managed, enterprise data solution that allows users to store and query massive datasets in seconds. Datasets in BigQuery are accessible through the Web, a command-line tool, or by using the BigQuery REST API.

document Image

For more details on how to use Google BigQuery, please see the links below:

Intro to Google BigQuery (GBQ)

GBQ WebUI

Other FAQs

Q: Is the Data Science Package a product I need to pay for?

A: Yes. There is a fee per database, per month for each database that has the Data Science Package service enabled.

Q: Do I need to know anything about data science or data analytics to use the Data Science Package?

A: Yes. The Data Science Package is a service specifically tailored to customers that have a Data Analyst or Data Scientist on staff. Customers can also use the package if they have access to a data science resource who they trust to work with their raw telematics data.

Q: What programming languages and skill sets do I need to be able to use the Data Science Package?

A: Since all data is analyzed in Google BigQuery, you must, at the very least, have knowledge of how to use standard Structured Query Language (SQL) in order to query data in the Google BigQuery. Please see Google’s documentation on other supported client libraries.

Q: Where is the data stored in the Google Cloud Platform?

A: All data for the Data Science Package is stored in the EU in the Google Cloud Platform. There is no option at this point in time to select a specific residency for the Data Science Package data.

Q: If I have third party devices, will the Data Science Package still work for me?

A: No, the Data Science Package only supports data from the Geotab GO devices and does not collect any data from third party or OEM devices.

Q: Does the Data Science Package only keep 365 days worth of data or is it cumulative?

A: The data in the Data Science Package is cumulative. Geotab will start users with 365 days of data, then continue to add data as new days pass. Scheduled processes run nightly to migrate daily data to your Data Science Package project.

Q: If I disabled the Data Science Package service, do I still have access to my project?

A: No, as soon as you disable the Data Science Package service you will lose access to the data.

Q: If I have multiple MyGeotab databases, can I set up the Data Science Package to push the data from each database into one Google BigQuery Project?

A: At this time, the Data Science Package is a per-database service; therefore, each MyGeotab database requires a separate project in Google BigQuery. However, if a user has access to multiple projects in Google BigQuery, they can query those projects, and their respective datasets as one project.

scroll-up