This article will be a little bit different to previous posts. Having only just recently started to check out AWS Machine Learning I am still in the early stages of my study of these services. So for this article, I wanted to post what I have learned so far in the form of a possible usage for machine learning — automated UI testing.
Let’s suppose we have a web application that provides a listing of search results — maybe a search engine or some kind of eCommerce website. We want to ensure the listings are displaying correctly so we have humans perform UI testing. Can we train machines to do this work for us?
In https://gitlab.com/aw5academy/docker/mock-search-webapp I have created a mock web application which displays random text in a search listings view. Running the buildandrun.sh script will run this in Docker which we can view with http://localhost:8080.
Additionally, we can generate a random error with http://localhost:8080?bad=true.
The most difficult part of building a machine learning model appears to be collecting the right training data. Our training data will consist of screenshots of the web page where the “good” images will be when the application is working as expected and the “bad” images are when there is some error in the display of the application.
We need good variety of both the “good” and the “bad”. In https://gitlab.com/aw5academy/sagemaker/mock-search-webapp-train we can execute the run.sh script which will generate 100 random good images and 100 random bad images. These images are generated by using PhantomJS – a headless browser.
We can then explode our test data by performing random orientation changes, contrast changes etc. This increases the number of images in our training set.
Once created, the training job will start. We can view metrics from the job as it is working.
You can see the training accuracy improving over time.
Now that we have our model trained, we can test how good it is by deploying it to a SageMaker Model Endpoint. Once deployed, we can test it with invoke-endpoint. We provide a screenshot image to this API call and the result returned to us will be two values: the probability of the image being “good” and the probability of it being “bad”.
In https://gitlab.com/aw5academy/sagemaker/mock-search-webapp-train-infer we have a run.sh script which calls the invoke-endpoint API and provides it with screenshots which the model has never seen before. You can observe these with http://localhost:8080?test=0 to http://localhost:8080?test=9. Even values for the “test” query parameter are “good” images while odd values are “bad” images.
When we execute the script we see:
A partial success! The model did well for some tests and not so well for others.
Some thoughts and conclusions I have made after completing this experiment:
- The algorithm used in this model was Image Classification. I am not sure this is the best choice. Most of the “good” images are very similar. Probably too similar. We might need another algorithm which, rather than classify the image, detects abnormalities.
- As mentioned earlier, gathering the training data is the difficult part. It is possible that this mock application is not capable of producing enough variation. A real world application may produce better results. Additionally, actual errors observed in the past could be used to train the model.
- Even with the less than great results from this experiment, this solution could be used in a CI/CD pipeline. The sample errors I generated were sometimes very subtle, such as text being off by a few pixels. The model could be retrained to detect only very obvious errors. Then, an application’s build pipeline could do very quick sanity tests to detect obvious UI errors.