Scaling
Menu system on the left is missing the numbering (8)
Scaling
Menu system on the left is missing the numbering (8)
Planning to Scale
Something I am missing here is the notion of Workbench being a collection of IDE's where ideally code should be developed but never really run at full scale - when it comes to scaling we probably should maybe focus on scaling for more concurrent users only and have a separate section on scaling individual codes.
For running at full scale we probably should point to continued use of Databases or even more importantly to use workbench jobs for this purpose or even run it completely non-interactively outside of Workbench on a scalable compute infrastructure (e.g. HPC cluster, dask, ...)
using large data sets
At some point we also should mention that I/O performance can be a big cause for performance degradation...
if users are not explicitly parallelizing their code then you can estimate that your environment will need one core per concurrent session.
unless the python or R package is automatically using a parallel computing framework (e.g. DT thread ....)
RAM
maybe replace with memory ?
Change default install to public package manager link change pip installs to come from rspm For R: Add system package, check libpaths, As a user add a different version of that package, load package, check version, repeat similar for python Install a package that we know needs a sys dep added, identify issue and resolve it Start from public package manager look at sys dependencies of packages, install deps, install package (include binary examples here)
I think it would be good to have these exercises at least partially repeated in the RSPM module as well
Workbench allows you to serve multiple version of R and Python. When you need a new version of R or Python you add the new version and leave existing versions in place.
Typically this is not possible with the R and Python versions provided by the OS. In order to have multiple versions of R and Python installed concurrently you will either build them on your own (possibly using environment modules integrated in EasyBuild or Spack) or use Posit provided R and Python software builds.
A user wants a new versions of R or Python because a new packages requires it A user discovered a package that solves an important business problem and requires an older version of R or Python
I think we can condense those two bullets by sayiing "A user needs a different version of R or Python because a package he would like to use requires it"
Also, I would add "A user cannot install a new R package due to missing system dependencies"
User Account Management
This section could benefit from significant tuning or possibly rewriting it.
While it is neat for the admin traing - how man of our customers use local user accounts (i.e. created via useradd) in our products ? I feel that most are directly going towards LDAP or AD...
(including their home directory).
Add something like that "The admins need to decide what to do with the data of locked users as part of their data management / backup policy".
written to
replace with "stored on" ?
Posit Workbench Architecture
The diagram is ok but I wonder if we really should highlight SQLite here ? Also "OS File System" looks a bit unwieldy - I am sure we are not referring to an open source file system, right ;)
I wonder if we could simplify it by specifying * Workbench Web UI (rserver process) * Database to store metadata * various session types (jupyter, vscode, ...) (rsession process) * R and Python installation * maybe mention the workspace process
OS
let's remove OS here
posit Package Manager
We are not talking about binary packages for all major linux distros here at all - would be worthwhile to hightlight this here as I sense this is a good selling point for PPM
share internal code
Not sure if this is more misleading than helpful - maybe rephrase to "Host and manager external and internal R & Python packages in a controlled manner"
Posit Team includes all of posit’s professional products, including: posit Workbench posit Connect posit Package Manager
it would be worthwhile to have a highlevel overview of how all of those three products really interact with each other - maybe have a simplified version of the diagram that CS/Sales use for their introductory calls with new customers that highlights the data science journey through the product stack (e.g. workbench for the IDE, connect for hosting artefacts/outputs and Package Manager as more of an infrastructure/file share kind of service
To download python, go to https://www.python.org/ and then click on the link for either Mac, Windows, or Linux depending on your computer
Maybe a link to our own opinionated Python binaries would be good to have here ?
To download R, go to https://cloud.r-project.org and then click on the link for either Mac, Windows, or Linux depending on your compute
Maybe a link to our own opinionated R binaries would be good to have here ?
Together, posit’s open-source software and commercial software form a virtuous cycle: The adoption of open-source data science software at scale in organizations creates demand for posit’s commercial software; and the revenue from commercial software, in turn, enables deeper investment in the open-source software that benefits everyone.
Maybe we should call this out a bit more by saying that the value-add of our commercial solutions is to enhance productivity and efficiency but in fact all the basic functionality is provided as open source at no charge (E.g. rstudio server, shiny, markdown, ...)
We do th
Not sure if we want to switch to "We" here.