Chattermill strives to uphold our mission to take customer centricity to the next level by providing the best experience to our customers. To ensure the best experience possible, It is important that data and file formatting are kept consistent at all times to avoid data flow interruptions or inconsistencies.
We have included the most common principles when it comes to working with data at Chattermill to ensure robust and accurate data pipeline and insights.
Example CSV Sheet
For a hands-on example of how to format a spreadsheet please visit this google doc: https://docs.google.com/spreadsheets/d/1682BmBHrnvKnM1EdlPH8lkziysBpn9awYt0MhFUS8A0/edit?ts=5f3fabdf#gid=0
Data CSV Uploading Instructions
Data for a single dataset should be sent in a single CSV
In order to provide timely insights, a single CSV is ingested to upload data so the process can be automated
Please join any data sources/tables on your side and send us one single CSV containing the combined data. This will allow us to automate the process in future
Please ensure that individual files are no larger than 300MB
Kindly do not upload zipped files to your AWS S3 bucket, as Chattermill ingests files directly from S3
Ensure all uploads are in CSV format and are UTF-8 encoded
Character encoding tells the computer how to translate 1's and 0's into human-readable characters. UTF-8 is the most common method used by computers to translate from bits (1's and 0's), to readable characters
To check for incorrect character encoding, please open the CSV in a text editor (such as Notepad or Sublime Text), and check that special characters such as ', &, é and Ü are displayed correctly
If a CSV is sent with incorrect encoding, some of the characters may not be displayed correctly. This makes the text difficult to read and can affect the ability to detect topic and sentiment of your comments
A common cause of incorrect encoding is a user opening a CSV in a tool such as Microsoft Excel, making changes, and overwriting the original file
Consistent, descriptive file names 📝
All file names should include:
Company Name
Data Source
Data Type
Timestamp / Date / Month
Correct example: chattermill_survey_nps_2019_11_01.csv
Incorrect example: file_291283.csv
Please maintain a consistent naming convention for all file uploads. If a file name is changed it could result in a missed upload.
Headers should only contain alphanumeric characters, be unique, descriptive and remain consistent across uploads
Each header must be distinct from all other headers in the upload, otherwise the data will not be uploaded
Please ensure headers are descriptive so it is clear how the data should be displayed in the app. If possible please send a codebook including descriptions of each header
The header should be alphanumeric, descriptive, and in snake_case:
Correct example: product_purchase_count
Incorrect example: product purchase count
Please do not include apostrophes in the header row
Our CSV imports are based on mapping header strings to certain fields which you can see in the app. If a column header changes, we will not be able to upload the data in that column
Correct headers:
Incorrect headers:
All responses must have a Unique Response ID
Please clearly identify the Unique Response ID field
This would preferably be the index/primary key for each response from your database or unique ID stored by your survey provider
This Response ID must be unique across the entire history of the dataset including previous files
Please check for any duplicates of this Response ID within your files before sending to us
Date formatting must be consistent 📅
All dates within a CSV must use the same date format. This format should remain the same for all subsequent CSVs
Ideally, all dates should be in YYYY-MM-DD hh:mm:ss
If you are unable to send in the above format, please let us know so we can discuss the most appropriate alternative. Likewise, please notify us in advance if you are planning to make any changes to the date format
If the date format was to change, the date field may be stored incorrectly, or the response may not be uploaded at all
Correct format: 2019-10-11 13:14:15
Incorrect format: 0/11/19 01:14:15 PM
Comment columns are clearly identified 💬
Please clearly identify all comment fields you wish to include and the question they are related to
Correct formatting:
Where multiple comment columns exist in your data please identify all question/comment pairs you wish to include from the data
Score columns are clearly identified
Scores must be in a numeric format
Where multiple scores exist in your data please identify all score/comment pairs you wish to include
Correct formatting:
Incorrect formatting:
Scores are required for all NPS and CSAT responses. We also recommend providing scores for other data types where relevant (e.g. an app review may come with a score from 1-5), to make full use of the platform
In general, once we have a format for CSV uploads, all future CSV files should be consistent to ensure continuity in the data pipeline.
The comments/responses you wish to analyse using Chattermill should contain raw text only
Please remove any noise (e.g. HTML tags, email subject lines) from the comments/responses and only send the raw text you wish to use for insight analysis.
If you have any questions, please get in touch at [email protected]