Title: | Read Password-Protected Excel Files |
---|---|
Description: | Uses the Python library msoffcrypto-tool to decrypt password-protected Excel files, which can then be read into R using readxl, readxlsb, or other packages. |
Authors: | Patrick Barks [aut, cre] |
Maintainer: | Patrick Barks <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0.9000 |
Built: | 2024-10-24 16:07:37 UTC |
Source: | https://github.com/epicentre-msf/rpxl |
Uses the Python library msoffcrypto-tool to decrypt password-protected Excel files (either .xlsx, .xlsm, or .xlsb), which can then be read in with e.g. the readxl or readxlsb R packages.
decrypt_wb(path, password, path_out = tempfile(fileext = get_ext(path)))
decrypt_wb(path, password, path_out = tempfile(fileext = get_ext(path)))
path |
Path to an Excel workbook file (either .xlsx, .xlsm, or .xlsb) |
password |
Workbook password |
path_out |
Output path for decrypted workbook |
Path to the decrypted workbook file (same as argument path_out
)
path_xlsx <- system.file("extdata", "xltest.xlsx", package = "rpxl") decrypt_wb(path_xlsx, password = "1234")
path_xlsx <- system.file("extdata", "xltest.xlsx", package = "rpxl") decrypt_wb(path_xlsx, password = "1234")
Install Python dependencies for the rpxl package
install_rpxl( envname = "r-rpxl", new_env = identical(envname, "r-rpxl"), method = "auto", ... )
install_rpxl( envname = "r-rpxl", new_env = identical(envname, "r-rpxl"), method = "auto", ... )
envname |
Name of Python environment to install within. Defaults to "r-rpxl". |
new_env |
Logical indicating whether to remove an existing virtual
environment, if it has the name as |
method |
Installation method. Defaults to "auto" to automatically find a method that will work in the local environment. |
... |
Additional arguments passed to reticulate::py_install |
A wrapper to readxlsb::read_xlsb, with an initial call to decrypt_wb to decrypt the password-protected workbook
rp_xlsb( path, password, sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na = "", trim_ws = TRUE, skip = 0, ... )
rp_xlsb( path, password, sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na = "", trim_ws = TRUE, skip = 0, ... )
path |
Path to the xlsb workbook |
password |
Workbook password |
sheet |
Name or index of the sheet to read. If the sheet name is specified as part of the range, this parameter is ignored |
range |
A named range or a string representing an excel range (of the form Sheet!A1:D10) or a cellranger object |
col_names |
TRUE uses the first row as the column name, FALSE sets names to column.#, or a character vector |
col_types |
NULL to imply type from spreadsheet or one of ignore/logical/numeric/date/string per column |
na |
Single string or array of strings to interpret as missing |
trim_ws |
Trim whitespace from strings |
skip |
Number of rows to skip before reading data |
... |
Additional options. Pass debug = TRUE to return xlsb environment |
A data frame
path_xlsb <- system.file("extdata", "xltest.xlsb", package = "rpxl") rp_xlsb(path_xlsb, password = "1234", sheet = 1)
path_xlsb <- system.file("extdata", "xltest.xlsb", package = "rpxl") rp_xlsb(path_xlsb, password = "1234", sheet = 1)
A wrapper to readxl::read_excel with an initial call to decrypt_wb to decrypt the password-protected workbook
rp_xlsx( path, password, sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = readxl_progress(), .name_repair = "unique" )
rp_xlsx( path, password, sheet = NULL, range = NULL, col_names = TRUE, col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = readxl_progress(), .name_repair = "unique" )
path |
Path to the xls/xlsx file. |
password |
Workbook password |
sheet |
Sheet to read. Either a string (the name of a sheet), or an
integer (the position of the sheet). Ignored if the sheet is specified via
|
range |
A cell range to read from, as described in cell-specification.
Includes typical Excel ranges like "B3:D87", possibly including the sheet
name like "Budget!B2:G14", and more. Interpreted strictly, even if the
range forces the inclusion of leading or trailing empty rows or columns.
Takes precedence over |
col_names |
|
col_types |
Either |
na |
Character vector of strings to interpret as missing values. By default, readxl treats blank cells as missing data. |
trim_ws |
Should leading and trailing whitespace be trimmed? |
skip |
Minimum number of rows to skip before reading anything, be it
column names or data. Leading empty rows are automatically skipped, so this
is a lower bound. Ignored if |
n_max |
Maximum number of data rows to read. Trailing empty rows are
automatically skipped, so this is an upper bound on the number of rows in
the returned tibble. Ignored if |
guess_max |
Maximum number of data rows to use for guessing column types. |
progress |
Display a progress spinner? By default, the spinner appears
only in an interactive session, outside the context of knitting a document,
and when the call is likely to run for several seconds or more. See
|
.name_repair |
Handling of column names. Passed along to
|
A tibble
-style data frame
path_xlsx <- system.file("extdata", "xltest.xlsx", package = "rpxl") rp_xlsx(path_xlsx, password = "1234")
path_xlsx <- system.file("extdata", "xltest.xlsx", package = "rpxl") rp_xlsx(path_xlsx, password = "1234")