最新消息:因从typecho切换到wordpress, 由于转换导入问题,文章可能存在部分乱码或者排版问题,逐个排查修复中...

[AWS] Using Chardet in AWS Lambda

工作相关 admin 3401浏览 0评论

I am using AWS Lambda to import data from the txt files.

However, for most of files,it’s working, but failed for some type of files:

‘utf-8’ codec can’t decode byte 0x80 in position 22006: invalid start byte

After check the file char encoding, I found they are “ASNI”, however, in my lambda python code, I assume they are encoded with UTF-8:

  txtContent = obj['Body'].read().decode('utf-8')

How to detect the file

https://stackoverflow.com/questions/436220/determine-the-encoding-of-text-in-python

Please be aware of there is one line saying: Correctly detecting the encoding all times is impossible!

However, we could still try to use some python module “chardet” to detect the character encoding

This is the document: https://chardet.readthedocs.io/en/latest/usage.html

For using it in AWS lambda, we’ll have to install the module and upload to lambda.

I’ll show how we make it work:

1) Install chardet
mkdir for-chardet
pip install chardet -t ./for-chardet/

2) Zip the files
zip -r lambda-4-chardet.zip ./chardet*
zip lambda-4-chardet.zip lambda_function.py

3) Upload to AWS Lambda function.

Also, attach the python encoding:
https://docs.python.org/3/library/codecs.html#standard-encodings

转载请注明:Linc Hu » [AWS] Using Chardet in AWS Lambda

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址