this post was submitted on 18 Jul 2023
10 points (100.0% liked)
R Programming
268 readers
1 users here now
Please use this as a forum to discuss R, and learn more about it. If you have any questions about how to do specific things in R, this is the place to ask.
Getting Started
You can download R here.
You can download RStudio here. RStudio IDE, which is supported by Posit PBC, is a powerful and well-developed IDE for R. Other development environment options include Emacs addon Emacs Speak Statistics and VSCode.
Other Communities
Other communities that may be of interest across the fediverse:
- https://lemmy.ml/c/rstats
- https://lemmy.ml/c/dataisbeautiful
- https://lemmy.world/c/dataisbeautiful
- https://code4lib.net/c/datascience
- https://discuss.tchncs.de/c/data_engineering
Please send @a_statistician a message to recommend additional communities to add to this list.
Learning resources:
- R for Data Science - a good introductory book for learning R. Start here if you're overwhelmed.
- Big Book of R - collection of more than 500 online books/tutorials covering various aspects of R. Some links are to paid books with previews, but most links are to free online textbooks.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
As far as I know, yeah, R always work with copy on modification. Some libraries as you mention (
data.table
) can have object/classes to avoid this, but I'm not aware of any of them working with arrays (more than 2D). Maybeparquet
orarrow
have something like this??Thank you for the suggestion! Worth looking at
parquet
andarrow
indeed.+1 for
parquet
andarrow
. If you're pushing memory better to just treat it as a completely out of memory problem. If you can split the data into multiple parquet files with hive style or directory partitioning it will be more efficient. You don't want parquet files too small though (I've heard people saying 1 GB each file is ideal, colleagues at work like 512 MB per file - but that's on an AWS setup).Bonus is once you've learned the packages it'll be the same for all out of memory big datasets.