Voice Recorder (VOR)

This is a repository containing code and description for how to set up a simple voice recording system using twillio. The intention is to make it easy to record different people talk such that training data for voice biometrics can be created.

As outlined in the diagram below it consists of the following parts:

vorgen: a command line program (configured by a json file) that generates a Twillio Studio project setting up the appropriate system to ask questions in random order and capture the output.
vorserve: simple server that listens to requests from Twillio studio and saves data to S3. The server implements a hashing method allowing it to store the data optionally with link back to the user phone or fully anonymized (more details below).

15 min guide to set up new recorder

It is simple to based on this project set up a new recorder, this guide assumes that you have an AWS account in which you will be running it, a Twillio account and a domain name with configurable DNS.

This is a simple guid to get it running fast - depending on usecase and required sequrity a different architecture may be advisable.

1. Set up AWS environment

Create a new EC2 instance (micro is sufficient, run it on Ubuntu).
Edit the security group such that inbound requests on 22 (from you machine) and 80 and 443 are possible.
While the instance is spinning up, create a S3 bucket (enable default encryption and no public access)
Create a IAM user with read and write access to the bucket, not down the credentials.

2. Configure DNS

Configure your DNS so the domain points to the public IP of the EC2 instance.

3. Set up the server

Install golang 1.12 or later (https://github.com/golang/go/wiki/Ubuntu) (remember to add to your PATH)
go get github.com/newtechlab/vor/vorserve
env AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... AWS_DEFAULT_REGION=... vorserve -data s3:BUCKET-NAME-HERE -http :5000 -salt YOURSALTHERE
Install nging, set up as reverse proxy to the server you started on port 5000 and require a certificate using letsencrypt. (https://medium.com/@mightywomble/how-to-set-up-nginx-reverse-proxy-with-lets-encrypt-8ef3fd6b79e5)

4. Test access

Test that you can access the server on https://yourdomain/ with a POST request - you should get a 400 back.

5. Generate Twillio project JSON

(install golang 1.12 or later)
go get github.com/newtechlab/vor/vorgen
vorgen -dump > config.json
Edit config.json to match your preferences - see more details below - in particular update the url to match your domain (include https for privacy!)
vorgen > project.json

6. Create Twillio project (READ carefully!)

Login to Twillio
Goto Studio and create a new Flow. Select "Import From JSON"
Copy the contents from project.json and paste here
Click next
Wait. Wait. Depending on configuration it can take a long time to load the editor, if you have long converstations and variations many nodes are created.
BUG in twillio: After import the flow is not correctly saved, in order to work it must be re-published. In order to make it possible to re-publish you must change something, thus, in the editor:
Select one box
Drag it to change it's position a little bit
The "Publish button" in the top is now enabled. Click it.
Request a new Phone number in Twillio
Copnfigure the phone number to invoke the Studio Flow just created on incoming phone calls.

7. Test

Everything should now be connected, try to call the phone number, ensure that the sound file is stored in the S3 bucket.

Explanaition of vorserve's privacy model

Twillio will call vorserve with a request containing urls to the recording as well as the phone number that made the call. All this data will be sent as part of the body. (Make sure you register a https url, and that any reverse proxy is not logging the request body).

Vorserve will store the recordings in a S3 bucket, woith names such that it is possible to identify calls that were made from the same number. However, depending on use case (consider e.g. GDPR) it may or may not be desirable to be able to link a saved file back to a particular phone number.

In order to support both cases the following logic is applied when naming the files:

cryptographicHash(phonenumber + SALT)

Thus, if SALT is known it will be possible to identify (given a know phone number) all reccordings made from that phone number. If SALT is not known it should be impossible for anyone to match a recording to a phone numnber (thus, provided no PII in the recorded audio itself fully anonymized).

If vorserve is started without the -salt flag set it will on startup generate a unique random SALT that will not be saved, thus making the recordings anonymous. If a specific SALT is specified (using -salt) it will be used and it will thus be possible to identify a recording given a phone number.

Further description of vorgen configuration

The vorgen config is a JSON file with fields. In order to understand what the different configuration fields mean and implies please see the source file in the repository, vorgen/config/config.go

Please note that the Twillio Studio Editor has bad performance with many nodes, and vorgen has not been optimized to decrease the numbe of nodes. In particular note that the number of nodes scales with NumberOfVariations*Number of questions.

TODO:

If the user hangs up instead of waiting to finish the recording will not be saved. Should be a easy fix but I do not have the time right now (but probably important to fix since I anticipate users will be bored, especially if a long conversation is desired).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
vorgen		vorgen
vorserve		vorserve
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arch.png		arch.png
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice Recorder (VOR)

15 min guide to set up new recorder

1. Set up AWS environment

2. Configure DNS

3. Set up the server

4. Test access

5. Generate Twillio project JSON

6. Create Twillio project (READ carefully!)

7. Test

Explanaition of vorserve's privacy model

Further description of vorgen configuration

TODO:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

newtechlab/vor

Folders and files

Latest commit

History

Repository files navigation

Voice Recorder (VOR)

15 min guide to set up new recorder

1. Set up AWS environment

2. Configure DNS

3. Set up the server

4. Test access

5. Generate Twillio project JSON

6. Create Twillio project (READ carefully!)

7. Test

Explanaition of vorserve's privacy model

Further description of vorgen configuration

TODO:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages