anaconda-enterprise-cli --help
.
Failed/Forbidden
error, preventing you from being unable to download the file.
Workaround
Open the project in Jupyter Notebook or another supported browser, such as Firefox or Safari, and download the file.
cspice
and spiceypy
packages mirrored from conda-forge include incompatible metadata, which causes a channeldata.json
build failure, and makes the entire channel inaccessible.
Workaround
Remove these packages from the AE channel, or update your conda-forge mirror to pull in the latest packages.
.condarc
file to your project using the anaconda-enterprise-cli spark-config
command, it may get overwritten with the default config options when you deploy the project.
Workaround
Place the .condarc
file in a directory other than your home directory (/opt/continuum/.condarc
).
Note that the conda config settings are loaded from all of the files on the conda config search path. The config settings are merged together, with keys from higher priority files taking precedence over keys from lower priority files. If you need extra settings, start by adding the .condarc
file to a lower priority file first and see if this works for you.
For more information on how directory locations are prioritized, see this blog post.
Starting in Anaconda Enterprise 5.3.1, you can also set global config variables via a config map, as an alternative to using the AE CLI.
anaconda-enterprise-cli spark-config
command to connect to a remote Hadoop Spark cluster from within a project, the output says you need to specify the namespace by including -n anaconda-enterprise
.
Workaround
You must omit -n anaconda-enterprise
from the command, as AE is installed in the default
namespace.
running
, and the container isn’t ready.
Workaround
Create a project first. The environment creation process will continue and successfully complete after a few minutes.
max_user_watches
may be insufficient, and can be increased to improve cluster longevity.
Workaround
Run the following command on each node in the cluster, to help the cluster remain active:
go-oidc
library can get stuck in a sync loop. This will affect all connectors.
Workaround
On a single node cluster, you’ll need to do the following shut down gravity:
systemctl list-units | grep gravity
.
You will see output like this:
teleport
service:
planet-master
service:
gravity-site
pods:
nodeAffinity
setting reverts to the default value, thus allowing CPU sessions and deployments to run on GPU nodes.
Workaround
If you had commented out the nodeAffinity
section of the Config map in your previous installation, you’ll need to do so again after completing the upgrade process. See Setting resource limits for more information.
sudo gravity enter
you can check /var/log/messages
to
troubleshoot a failed installation or these types of errors.
After executing sudo gravity enter
you can run journalctl
to look at
logs to troubleshoot a failed installation or these types of errors:
gravity-23423lkqjfefqpfh2.service
with the name of your gravity service./var/log/messages
related to errors such as
“etcd cluster is misconfigured” and “etcd has no leader” from one of the
installation jobs, particularly gravity-site
. This usually indicates that
etcd
needs more compute power, needs more space or is on a slow disk.
Anaconda Enterprise is very sensitive to disk latency, so we usually recommend
using a better disk for /var/lib/gravity
on target machines and/or putting
etcd
data on a separate disk. For example, you can mount etcd
under
/var/lib/gravity/planet/etcd
on the hosts.
After a failed installation, you can uninstall Anaconda Enterprise and start over with a fresh installation.
Failed on pulling gravitational/rbac
If the node refuses to install and fails on pulling gravitational/rbac, create
a new directory TMPDIR
before installing and provide write access
to user 1000.
“Cannot continue” error during install
This bug is caused by a previous failure of a kernel module check or other
preflight check and subsequent attempt to reinstall.
Stop the install, make sure the preflight check failure is resolved, and restart
the install again.
Problems during post-install or post-upgrade steps
Post-install and post-upgrade steps run as Kubernetes jobs. When they finish running,
the pods used to run them are not removed. These and other stopped pods can be found using:
Pod | Issues in this step |
---|---|
ae-wagonwheel | post-install UI |
install | installation step |
postupdate | post-update steps |
wagon.yaml
, replacing image: ae-wagonwheel:5.X.X
with image: leader.telekube.local:5000/ae-wagonwheel:5.X.X
Then recreate the ae-wagonwheel deployment using the updated YAML file:
sudo gravity enter
and run:
ae-wagonwheel
, the post-install configuration UI. To make this visible to the outside world, run:
post-install
.
To find out which port it is listening under, run:
http://<your domain>:<this port>
to access the post-install UI.
gravity status
and verify that all kernel parameters are set correctly. If the Status
for a particular parameter is degraded
, follow the instructions here to reset the kernel parameter.
500 Internal Server Error
message.
Workaround
Add the user as a collaborator to the project, have them stop their notebook session, then remove them as a collaborator. For more information, see how to share a project.
To prevent collaborators from seeing this error, ask them to close their running session before you remove them from the project.
Affected versions
5.2.x
anaconda-enterprise-ap-auth
deployment spec by running the following command in a terminal:
JAVA_OPTS
(example below):
%sh
interpreter from within a Zeppelin notebook to work with files via bash commands, or use the Settings tab to change the default editor to Jupyter Notebooks or JupyterLab and use the file browser or terminal.
Affected versions
5.2.2
anaconda-project
call will uninstall the upgraded package.
Workaround
When updating a package dependency remove the anaconda
metapackage from the
list of dependencies at the same time add the new version of the dependency that
you want to update.
Affected versions
5.1.0, 5.1.1, 5.1.2, 5.1.3
/opt/anaconda
can cause disk pressure errors, which may result in the following:
fdisk -l
. Our example disk’s name is /dev/nvme1n1
. In the rest of the commands on this page, replace /dev/nvme1n1
with your disk’s name.
fdisk /dev/nvme1n1
To create a new partition, at the first prompt press n
and then the return key.
Accept all default settings.
To write the changes, press w
and then the return key. This will take a few minutes.
fdisk -l
again to find it.
Our example partition’s name is /dev/nvme1n1p1
. In the rest of the commands on this page, replace /dev/nvme1n1p1
with your partition’s name.
mkfs /dev/nvme1n1p1
/opt/anaconda
: mkdir /opt/aetmp
/opt/aetmp
: mount /dev/nvme1n1p1 /opt/aetmp
systemctl list-units | grep gravity
You will see output like this:
teleport
service: systemctl stop gravity__gravitational.io__teleport__2.3.5.service
Shut down the planet-master
service: systemctl stop gravity__gravitational.io__planet-master__0.1.87-1714.service
/opt/anaconda
to /opt/aetmp
: rsync -vpoa /opt/anaconda/* /opt/aetmp
/opt/anaconda
mount point by adding this line to your file systems table at /etc/fstab
:
/dev/nvme1n1p1<tab>/opt/anaconda<tab>ext4<tab>defaults<tab>0<space>0
/opt/anaconda
out of the way to /opt/anaconda-old
: mv /opt/anaconda /opt/anaconda-old
If you’re certain the rsync
was successful, you may instead delete /opt/anaconda
: rm -r /opt/anaconda
/opt/aetmp
mount point: umount /opt/aetmp
/opt/anaconda
directory: mkdir /opt/anaconda
fstab
: mount -a
kubelet
startup parameter and causes the backup to fail.
To check your eviction policy, run the following commands on the master node:
kubectl
and
other commands within Anaconda Enterprise, use the command:
/opt/anaconda/
-> AE environment: /opt/anaconda/
/var/lib/gravity/planet/share
-> AE environment: /ext/share
[LDAP: error code 12 - Unavailable Critical Extension]; remaining name 'dc=acme, dc=com'
This error can be caused when pagination is turned on. Pagination is a server side extension and is
not supported by some LDAP servers, notably the Sun Directory server.
Session startup errors
If you need to troubleshoot session startup, you can use a terminal to view the
session startup logs. When session startup begins the output of the
anaconda-project prepare
command is written to /opt/continuum/preparing
,
and when the command completes the log is moved to
/opt/continuum/prepare.log
.