tl;dr
use --disable-dev-shm-usage option.

Error on AWS Lambda

selenium is worked on local lambci/lambda:build-python3.6 image,
but is not worked on AWS Lambda.

The code and error message were like below,

from selenium import webdriver

driver_path = './bin/chromedriver-linux'

options = webdriver.ChromeOptions()
options.binary_location = './bin/headless-chromium-linux'
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--single-process')

driver = webdriver.Chrome(driver_path, chrome_options=options)
Message: unknown error: Chrome failed to start: exited abnormally
(chrome not reachable)
(The process started from chrome location ./bin/headless-chromium-linux is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
(Driver info: chromedriver=2.42.591071 (0b695ff80972cc1a65a5cd643186d2ae582cd4ac),platform=Linux 4.14.67-66.56.amzn1.x86_64 x86_64)
: WebDriverException
Traceback (most recent call last):
File "/var/task/handler.py", line 11, in handler
data = _download()
File "/var/task/handler.py", line 55, in _download
driver = webdriver.Chrome(driver_path, chrome_options=options)
File "/var/task/selenium/webdriver/chrome/webdriver.py", line 81, in __init__
desired_capabilities=desired_capabilities)
File "/var/task/selenium/webdriver/remote/webdriver.py", line 157, in __init__
self.start_session(capabilities, browser_profile)
File "/var/task/selenium/webdriver/remote/webdriver.py", line 252, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/var/task/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/var/task/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally
(chrome not reachable)
(The process started from chrome location ./bin/headless-chromium-linux is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
(Driver info: chromedriver=2.42.591071 (0b695ff80972cc1a65a5cd643186d2ae582cd4ac),platform=Linux 4.14.67-66.56.amzn1.x86_64 x86_64)

So I tried debugging bare headless-chromium binary on AWS Lambda.

from subprocess import check_call

print(check_call(["ls", "-l", './bin']))

print(check_call([
    './bin/headless-chromium-linux',
    '--headless',
    '--no-sandbox',
    '--disable-gpu',
    '--dump-dom',
    'https://api.ipify.org?format=json',
    ]))

then getting error messages below,

[0927/064610.911069:ERROR:gpu_process_transport_factory.cc(1007)] Lost UI shared context.
[0927/064612.311328:ERROR:platform_shared_memory_region_posix.cc(222)] Creating shared memory in /dev/shm/.org.chromium.Chromium.JwBSnH failed: No such file or directory (2)
[0927/064612.311375:ERROR:platform_shared_memory_region_posix.cc(225)] Unable to access(W_OK|X_OK) /dev/shm: No such file or directory (2)
[0927/064612.311390:FATAL:platform_shared_memory_region_posix.cc(227)] This is frequently caused by incorrect permissions on /dev/shm. Try 'sudo chmod 1777 /dev/shm' to fix.

I added disable-dev-shm-usage and it works finally.

from selenium import webdriver

driver_path = './bin/chromedriver-linux'

options = webdriver.ChromeOptions()
options.binary_location = './bin/headless-chromium-linux'
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--single-process')

options.add_argument('--disable-dev-shm-usage')  # add this option!
driver = webdriver.Chrome(driver_path, chrome_options=options)

Quick Setup Instruction

Download binary

# $ cd /path/to/your_serverless_dir

$ mkdir -p bin/

# download chromedriver
$ curl -SL https://chromedriver.storage.googleapis.com/2.42/chromedriver_linux64.zip > chromedriver.zip
$ unzip chromedriver.zip
$ mv chromedriver ./bin/chromedriver-linux

# download headless-chromium
$ curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-55/stable-headless-$ chromium-amazonlinux-2017-03.zip > headless-chromium.zip
$ unzip headless-chromium.zip
$ mv headless-chromium ./bin/chromedriver-linux

# clean
$ rm headless-chromium.zip chromedriver.zip

Python Code

Write simple scraping code supporting for both OSX and Linux and save it as handler.py

from sys import platform
from selenium import webdriver

def handler(event, context):

    options = webdriver.ChromeOptions()
    if platform == 'darwin':
        # download OSX binary from https://chromedriver.storage.googleapis.com/index.html?path=2.42/
        driver_path = './bin/chromedriver-darwin'
        # If you use Chrome Canary, set binary_location.
        # options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
`
    elif platform == 'linux':
        driver_path = './bin/chromedriver-linux'
        options.binary_location = './bin/headless-chromium-linux'
        options.add_argument('--disable-dev-shm-usage')

    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--single-process')
    driver = webdriver.Chrome(driver_path, chrome_options=options)
    driver.get('https://api.ipify.org?format=json')
    print(driver.page_source)


if __name__ == '__main__':
    handler(None, None)

Docker

Creat Dockerfile like below,

FROM lambci/lambda:build-python3.6

COPY requirements.txt .

RUN python -m pip install \
  --trusted-host pypi.org \
  --trusted-host files.pythonhosted.org \
  -r requirements.txt

ADD . .

Run Locally

# run code on OSX
$ python handler.py
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"ip":"<your ip addr>"}</pre></body></html>

# run code on Docker
$ docker build . -t serverless-selenium-python
$ docker run serverless-selenium-python python handler.py
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"ip":"<your ip addr>"}</pre></body></html>

Deploy code to AWS Lambda

  • serverless.yml
service: sls-selenium-python

provider:
  name: aws
  region: ap-northeast-1
  runtime: python3.6

functions:
  test-selenium:
    handler: handler.handler
    memorySize: 256
    timeout: 120

custom:
  pythonRequirements:
    dockerizePip: true

plugins:
  - serverless-python-requirements
  • requirements.txt
selenium==3.14.1
# setup serverless command... 
# $ npm install -g serverless
# $ npm install --save serverless-python-requirements

$ sls deploy -v

# wait for finishing deployment...


$ sls invoke -f test-selenium

$ sls logs -f test-selenium
START RequestId: 917df9f0-c229-1100-a000-753d2bd38861 Version: $LATEST
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{"ip":"<ip addr>"}</pre></body></html>
END RequestId: 917df9f0-c229-1100-a000-753d2bd38861
REPORT RequestId: 917df9f0-c229-11e8-a098-653d2bd3886e  Duration: 7636.25 ms    Billed Duration: 7700 ms    Memory Size: 256 MB Max Memory Used: 168 MB

Leave a Reply

Your email address will not be published. Required fields are marked *